데이터분석

pandas basic 01

jaehwi0823 2019. 10. 10. 20:04

1. pandas basic elements

index = myData.index
columns = myData.columns
data = myData.values

2. Data types

# check all data types
myData.dtypes

# counts them
myData.get_dtype_counts()

3. Handling a Series

Select a column

# choose one
myData['column_name']
myData.column_name

if you want to treat it as a dataframe,

mySeries.to_frame()

check frequencies

# total
mySeries.size
mySeries.shape
len(mySeries)

# not Null only
mySeries.count()
mySeries.notnull().sum()

# counts per item
mySeries.value_counts()
mySeries.value_counts(normalize=True)

Statistics

# summary
mySeries.describe()

# percentile
mySeries.quantile([.1, .2, .3, .5, .8, .9])

Treat null

# check null
mySeries.notnull().all()
mySeries.isnull().sum()
mySeries.hasnans

# fill it
mySeries.fillna(0)

# or remove it
mySeries.dropna()

change dtype

mySeries.astype(int)

4. Index

set index

myData.set_index('column')

import pandas as pd
myData = pd.read_csv('./data/d.csv', index_col='index_column')
myData = pd.read_csv('./data/d.csv', index_col='index_column', drop=False)

bring back the index

myData.reset_index()

change index

newData = myData.rename(index={'old_idx':'new_idx'},
                        columns={'old_col':'new_col'})

5. Column insert / delete

# insert 
idx = myData.columns.get_loc('myCol')
myData.insert(loc=idx+1,
              column=newCol,
              value=myData.V1 - myData.V2)

# Delete
myData = myData.drop('myCol', axis=1)