# Introduction to Pandas

## Introduction
* Pandas is a newer package built on top of NumPy
* Pandas provides an efficient implementation of a DataFrame
* DataFrames are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data. 
* Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs
* documentation: http://pandas.pydata.org/pandas-docs/stable/

In [3]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
plt.style.use("default")


In [4]:
data = pd.DataFrame(np.random.randn(50, 5))
# Make sure the directory exists
os.makedirs('data', exist_ok=True)
data.to_csv('data/test_csv.csv', sep=';', index_label=['a', 'b', 'c', 'd', 'e', 'f'], decimal=',')
data.to_excel('data/test_excel.xlsx')

## Read csv-data

http://pandas.pydata.org/pandas-docs/stable/io.html#

In [None]:
data_csv = pd.read_csv('data/test_csv.csv')
data_csv.head()

In [None]:
# sometimes you have to give some more details to read the csv correctly

data_csv = pd.read_csv('data/test_csv.csv', sep=';', decimal=',', usecols=(0, 1, 2, 3, 4))
data_csv.head()

## Basic Operations on Data Frames

In [None]:
# shape of the data frame

data_csv.shape

In [None]:
# filter rows of the data frame

data_csv = data_csv.query('a < 0.8 | c > 0')
data_csv.head()

In [None]:
# filter columns of the data frame

data_csv = data_csv.filter(items=['a', 'b', 'c'])
print(data_csv.shape)
data_csv.head()

In [None]:
# sort the data frame

data_csv = data_csv.sort_values(by=['c'])
data_csv.head()

In [None]:
# set index

data_csv = data_csv.set_index(keys=['a'])
data_csv = data_csv.sort_values(by=['a'])
data_csv.head()

### Indexing

In [None]:
# save a new csv

data_csv.to_csv('data/test_neue_csv.csv', sep=';', decimal=',')

## Statistical Operations

In [None]:
data_csv.sum()

In [None]:
data_csv.mean()

In [None]:
data_csv.median()

In [None]:
data_csv.std()

In [None]:
data_csv['b'].describe()

## Plotting functions

In [None]:
data_csv.plot()

In [None]:
plt.figure()
data_csv['b'].plot()
plt.show()

## Read excel-data

In [None]:
data_excel = pd.read_excel('data/test_excel.xlsx')
data_excel.head()