Start with Python and Pandas – Part 1
Hello Friends!! recently I started with Python again. I’ll share basic intro to python. I’m be using Jupyter notebook for the tutorial. This will require a basic understanding of programming knowledge to understand the flow. Syntax are bit similar to R so if you know it will be easy to catch.
Import Pandas
import pandas as pd
Check default Working directory
%pwd
Change working directory
%cd "C:\Users\kedia niket\Documents\Python"
# %cd and then the location you want add
Reading csv file
input = pd.read_csv('titanic.csv')
Check type of any input
type(input)
#will return pandas.core.frame.DataFrame so Dataframe is the type of the input
Get basic stats about the numerical columns
input.describe()
Result:
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
count | 891.0 | 891.0 | 891.0 | 714.0 | 891.0 | 891.0 | 891.0 |
mean | 446.0 | 0.4 | 2.3 | 29.7 | 0.5 | 0.4 | 32.2 |
std | 257.4 | 0.5 | 0.8 | 14.5 | 1.1 | 0.8 | 49.7 |
min | 1.0 | 0.0 | 1.0 | 0.4 | 0.0 | 0.0 | 0.0 |
25% | 223.5 | 0.0 | 2.0 | 20.1 | 0.0 | 0.0 | 7.9 |
50% | 446.0 | 0.0 | 3.0 | 28.0 | 0.0 | 0.0 | 14.5 |
75% | 668.5 | 1.0 | 3.0 | 38.0 | 1.0 | 0.0 | 31.0 |
max | 891.0 | 1.0 | 3.0 | 80.0 | 8.0 | 6.0 | 512.3 |
Get data frame size
input.shape
# will return (# rows, # columns)
input.shape[0]
# will return # rows
input.shape[1]
# will return # columns
Get the top n rows and bottom n rows of the data
input.head()
# get top 5 rows
input.head(n)
# get top n rows
input.tail()
# get bottom 5 rows
input.tail(n)
# get bottom n rows
Get column names
input.columns
#will return the column header
Keep visiting Analytics Tuts for more tutorials.
Thanks for reading! Comment your suggestions and queries
Pingback: Start with Python and Pandas – Part 2 – Analytics Tuts