pandas is a library that defines three data structures and algorithms
that are useful in the context of data analysis and data science. It
represents Series
, DataFrame
, and Panel
, or 1D, 2D, and 3D arrays.
DataFrame
is especially useful, and defines methods such as pivot_table
,
and query
, and has many facilities to deal with missing data.
For analysis purposes, pandas has some nice plotting features that are easy to use.
agt_analysis.ipynb
: a notebook illustrating the analysis and visualization of water levels as measured by variouus sensors.agt_data
: three CSV files using in the notebook.data_generation.ipynb
: notebook that generates some simulated gene expression data usingnumpy
and 'pandas`.pandas_intro.ipynb
: illustrates various aspects of using pandas such as importing data, usingSeries
,DataFrame
, cleaning and formatting data, dealing with missing data, adding and removing columns, and various algorithms and visualizations.data
: some data sets used in the notebook above.patients.ipynb
: runninng example used in the Python slides.patient_data.ipynb
: extended version of therunninng example used in the Python slides.pipes.ipynb
: consolidating data processing using pipes.screenshots
: screenshots made for the slides.generate_csv_files.py
: script to generate CSV files in different formats.