Skip to content

vijananish/pandas-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PANDAS

INTRODUCTION

It is a powerful, flexible, easy to use open source data analysis and manipulation tool. alt text

DATA FOR PANDAS

In the data is mostly in tabular, database, json

  1. Explore
  2. Clean
  3. Process

READ, WRITE DATA

It supports many formats. Eg. (excel, csv, sql, json)

  1. For read prefix = “read_*”
  2. For write prefix = “to_*”

CREATE PLOT

For plotting the data (scatter, lin, pie, etc) one can use the power of matplotlib, seaborn, plotly.

USES

  1. Reshape
  2. Create new column
  3. Calculate summary
  4. Combine multiple table
  5. Select subset

LOC VS ILOC

  1. Loc one needs to specify the name of the column and rows. Many operations can be performed on loc.
  2. iLoc one needs to specify the index of the column and row. alt text

DATA STRUCTURE

  1. Series (1-D)
  2. Dataframe (2-D)
  3. Panel (3-D) alt text

DROP ROWS/COLUMNS

On dropping rows, the value of the index will not adjust automatically. Therefore, use reset_index but will create a new column having old index values therefore drop=True in order not to make a column. Make use of subset in order to drop “na” for a particular column.

KEYWORDS

  1. (inplace = True) : will make sure that the method does NOT return a new DataFrame, but it will remove all duplicates from the original DataFrame
  2. to_ : prefix in order to convert type of data column to other type.
  3. corr() : to find the correlation in the data.
  4. Both isna() and isnull() functions are used to find the missing values in the pandas dataframe. isnull() and isna() literally does the same things. isnull() is just an alias of the isna() method as shown in pandas source code. Missing values are used to denote the values which are null or do not have any actual values.
  5. df.replace : in order to replace the particular value with another.
  6. df.pivot_table() : The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.
  7. level parameter in .sum() : Optional, default None. Specifies which level ( in a hierarchical multi index) to check along
  8. pd.cut() : Use cut when you need to segment and sort data values into bins. This function is also useful for going from a continuous variable to a categorical variable. For example, cut could convert ages to groups of age ranges. Supports binning into an equal number of bins, or a pre-specified array of bins.

About

It is a powerful, flexible, easy to use open source data analysis and manipulation tool.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages