Pandas (pt.2)

Fancy Indexing (Masking) of Python

  • Through bool type of array, indexing the multidimensional array
  • When extracting some data samples that are needed for the data analysis
  • You will use this method a lot for data analysis

Combining two tables

  • use the code merge or concat and create the pivot table
  • based on the key, merge tables (SQL: Join)
  1. inner/outer join
  2. left_on/right_on: the different key names
  3. left/right
  • concat: just combine two tables based on axis (axis = 0: concat based on the rows / axis = 1: concat based on the columns)

Editing Indexes/Columns

  • Editing Indexes: use the code .reset_index(drop = True(or False), inplace = True(or False))
  • Editing Columns: use the code .drop('[column name]', axis = 1) or del (data_name)['column name']
  • Changing Column name: .rename(columns = {'orginal column name':'column name to change'}, inplace = True(or False))

If you do not want to use the variable name, use the inplace = True!

Data Sampling and Analysis

  • When using the basic indexing, slicing, the conditional sampling(masking/fancy indexing), the data analysis will be nice!!
  • You can also extract the meaningful information in those data.

Pandas with the high-quality analysis

  • usage of the apply function with the lambda function
  • make the function with def and apply it to columns and apply function
  • broad casting and data masking