ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • pandas basic 02
    데이터분석 2019. 10. 10. 20:08

    01. Column Selection

    there are a few ways to select columns in a DataFrame.

    # select by indexing
    mydf[[col1, col2, col3]]
    
    # select by dtype
    mydf.select_dtypes(include=['int'])
    
    # select by filter
    mydf.filter(like='flag')
    mydf.filter(regex='\d')
    mydf.filter(like='pre_')
    
    # ex1) select columns having NaN
    null_cols = (mydf.select_dtypes(['object']).isnull().sum()>0).to_list()
    mydf.select_dtypes(['object']).loc[:, null_cols]

    It is useful to group columns and align them

     col_A = [col_A1, col_A2, col_A3]
     col_B = [col_B1, col_B2]
     col_C = [col_C1, col_C2, col_C3, col_C4]
    
     mydf = mydf[col_A + col_B + col_C]

    02. Basic Calculation

    Apply basic calculations to all columns

    # ignore NaN
    mydf.count()
    mydf.min()
    mydf.max()
    mydf.sum()
    mydf.cumsum()
    
    # consider NaN
    mydf.sum(skipna=False)
    
    # ex1) check nulls
    mydf.isnull().sum()
    mydf.isnull().sum().sum()
    
    # basic calculation
    mydf + 2019 # works only when num type
    mydf == value
    mydf1 == mydf2 # doesn't work when NaN exists
    mydf.equal(mydf2)

    03. Memory Save

    # check
    mydf.memory_usage(deep=True)
    
    # type change
    mydf.col1 = mydf.col1.astype(np.int8)
    
    # to Categorical
    mydf.select_dtypes(include=['object']).nunique()
    mydf.col2 = mydf.col2.astype('category')

    04. Largest & Smallest

    top1000 = mydf.nlargest(1000, 'col1')
    target = top1000.nsmallest(10, 'col2')

    '데이터분석' 카테고리의 다른 글

    sklearn basic 01  (0) 2019.10.10
    pandas groupby 활용하기  (1) 2019.10.10
    pandas 테이블 양식 수정하기  (0) 2019.10.10
    pandas basic 03  (0) 2019.10.10
    pandas basic 01  (0) 2019.10.10

    댓글

Designed by Tistory.