# Pandas

After NumPy, [Pandas](https://pandas.pydata.org/) is the most frequently-used tool in the data science pipeline. Pandas allows us to quickly work on tabular data (e.g. CSV, Excel spreadsheets) and time-series data. We can slice, dice, group, transform and plot the data in many ways.

The name is derived from the term "panel data", an economics term for data over multiple time periods for the same individuals, such as stock portfolio prices. Wes McKinney created Pandas while working for a Wall Street firm to analyze stocks. Nevertheless, the same tool is essential for data in healthcare, manufacturing, and many other industries.

In [1]:
import pandas as pd  #pip install pandas

import matplotlib.pyplot as plt   #pip install matplotlib

%matplotlib inline

## Reading into a DataFrame

Pandas uses something called a [**DataFrame**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). It's a 2D matrix that looks like a spreadsheet table (but is more powerful). Often your data is in the form of a CSV file. Pandas has a nice function that let's you read from a CSV and create a DataFrame.

In [2]:
!wget https://raw.githubusercontent.com/tonyreina/python_for_poets/main/pandas_play_dataset.csv

df_demo = pd.read_csv('pandas_play_dataset.csv')

In [3]:
display(df_demo)

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success
0,3/21/2010,mPlum + sonic,3,12.0,1000.35,0.035987,True
1,3/21/2010,GFP+,2,21.0,235.123,0.17863,False
2,1/1/2008,GFP CRISPR,1,34.0,76.345,11.445347,True
3,3/21/2010,sgRNA A816,1,78.0,23.981,3.252575,True
4,3/21/2011,Sleeping beauty,1,56.0,274.5,2.074074,True
5,7/4/2010,mPlum + KI U6,3,23.0,10003.5,0.009197,False
6,3/21/2010,Dox 10 + GFP + SWIFT,1,3.0,22897.345,1.300786,False
7,11/25/1998,Secret sauce 816,2,9.0,8.45,1.065089,True
8,8/16/2014,RMS 2304b,2,10.0,11.816,1.69262,False
9,3/21/2010,Texas Red + KO,2,98.0,816.816,0.239956,False


## Heads or tails

You can easily get the first or last few rows of a DataFrame.

In [4]:
df_demo.head(4)

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success
0,3/21/2010,mPlum + sonic,3,12.0,1000.35,0.035987,True
1,3/21/2010,GFP+,2,21.0,235.123,0.17863,False
2,1/1/2008,GFP CRISPR,1,34.0,76.345,11.445347,True
3,3/21/2010,sgRNA A816,1,78.0,23.981,3.252575,True


In [5]:
df_demo.tail(6)

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success
45,10/14/2004,mPlum + KI U6,2,23.0,8888.0,0.005176,False
46,7/4/2010,mPlum + sonic,2,42.0,1212.0,0.069307,True
47,12/6/2010,GFP+,1,18.0,1212.1234,0.01485,False
48,3/21/2010,GFP CRISPR,1,28.0,9034.3,0.003099,False
49,2/3/2001,sgRNA A816,1,2.0,30010.5,6.7e-05,False
50,8/16/2010,Sleeping beauty,5,7.0,68692.035,1.5673,True


## Middles?

How can you get some middle rows. Note that we can **chain** two functions together (head and tail).

In [6]:
df_demo.head(30).tail(5)

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success
25,3/21/2010,Sleeping beauty,2,6.0,24.0,0.5,False
26,8/16/2007,mPlum + KI U6,2,10.0,3.0,3.666667,False
27,3/21/2010,Dox 10 + GFP + SWIFT,3,10.0,18.0,1.666667,False
28,4/12/2021,Secret sauce 816,1,55.0,8731.11,0.006299,True
29,3/21/2010,sgRNA A816,2,3.0,6.0,1.2,True


## Describe the basic stats of the data

In [7]:
df_demo.describe()

Unnamed: 0,Initial Count,Final Count,RFP,Concentration
count,51.0,51.0,51.0,51.0
mean,1.843137,22.729216,9132.559318,1.076406
std,0.945993,23.192422,17077.30193,1.933509
min,1.0,2.0,1.0,6.7e-05
25%,1.0,7.0,25.35,0.011429
50%,2.0,12.0,1212.1234,0.225683
75%,2.0,28.0,8961.15,1.607817
max,5.0,98.0,68692.035,11.445347


In [8]:
df_demo['Concentration'].describe()

count    51.000000
mean      1.076406
std       1.933509
min       0.000067
25%       0.011429
50%       0.225683
75%       1.607817
max      11.445347
Name: Concentration, dtype: float64

## Stats for individual columns

In [9]:
df_demo['Initial Count'].mean()

1.8431372549019607

In [10]:
df_demo['Final Count'].median()

12.0

In [11]:
df_demo['RFP'].max()

68692.035

In [12]:
df_demo[['RFP', 'Initial Count', 'Concentration']].min()

RFP              1.000000
Initial Count    1.000000
Concentration    0.000067
dtype: float64

## Math between columns

In [13]:
df_demo[['RFP', 'Initial Count', 'Concentration']].std() - df_demo[['RFP', 'Initial Count', 'Concentration']].min()

RFP              17076.301930
Initial Count       -0.054007
Concentration        1.933442
dtype: float64

In [14]:
df_demo[['RFP', 'Initial Count', 'Concentration']]**2

Unnamed: 0,RFP,Initial Count,Concentration
0,1000700.0,9,0.001295093
1,55282.83,4,0.03190864
2,5828.559,1,130.996
3,575.0884,1,10.57924
4,75350.25,1,4.301783
5,100070000.0,9,8.458078e-05
6,524288400.0,1,1.692045
7,71.4025,4,1.134414
8,139.6179,4,2.864963
9,667188.4,4,0.05757894


## A better way to carve out data

In [15]:
df_demo.iloc[12:18, :]

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success
12,6/6/2013,GFP CRISPR,1,4.0,1.0,0.045,True
13,3/21/2010,sgRNA A816,2,3.0,6.0,1.2,True
14,12/21/2016,Sleeping beauty,3,3.0,17327.4,0.000519,False
15,10/14/2004,mPlum + KI U6,2,23.0,8888.0,0.005176,True
16,3/21/2010,mPlum + sonic,2,42.0,1212.0,0.069307,True
17,9/9/2010,GFP+,1,18.0,1212.1234,0.01485,True


In [16]:
df_demo.loc[12:18, ['Date', 'Concentration', 'Success']]

Unnamed: 0,Date,Concentration,Success
12,6/6/2013,0.045,True
13,3/21/2010,1.2,True
14,12/21/2016,0.000519,False
15,10/14/2004,0.005176,True
16,3/21/2010,0.069307,True
17,9/9/2010,0.01485,True
18,3/21/2010,0.003099,False


## Fancy functions -- Apply, Map

**Never, never, never** use **for** loops! They are orders of magnitude slower than the built in *apply* and *map* functions. 

In [17]:
import numpy as np    # NumPy superiority!

In [18]:
df_demo['Concentration'].apply(np.sqrt)

0     0.189703
1     0.422646
2     3.383097
3     1.803490
4     1.440165
5     0.095900
6     1.140520
7     1.032031
8     1.301007
9     0.489853
10    0.098988
11    2.150581
12    0.212132
13    1.095445
14    0.022791
15    0.071941
16    0.263262
17    0.121860
18    0.055671
19    0.008164
20    1.341831
21    0.561767
22    0.475061
23    0.080805
24    0.153157
25    0.707107
26    1.914854
27    1.290994
28    0.079368
29    1.095445
30    1.000000
31    0.583095
32    0.263262
33    0.211069
34    0.055671
35    0.008164
36    1.341679
37    0.561767
38    0.648768
39    0.114275
40    0.153157
41    1.951922
42    1.694107
43    1.283874
44    2.049390
45    0.071941
46    0.263262
47    0.121860
48    0.055671
49    0.008164
50    1.251919
Name: Concentration, dtype: float64

In [19]:
df_demo.apply(lambda row: np.cos(row) if row.name in ['Initial Count', 'Concentration'] else row)

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success
0,3/21/2010,mPlum + sonic,-0.989992,12.0,1000.35,0.999353,True
1,3/21/2010,GFP+,-0.416147,21.0,235.123,0.984088,False
2,1/1/2008,GFP CRISPR,0.540302,34.0,76.345,0.434761,True
3,3/21/2010,sgRNA A816,0.540302,78.0,23.981,-0.993848,True
4,3/21/2011,Sleeping beauty,0.540302,56.0,274.5,-0.482299,True
5,7/4/2010,mPlum + KI U6,-0.989992,23.0,10003.5,0.999958,False
6,3/21/2010,Dox 10 + GFP + SWIFT,0.540302,3.0,22897.345,0.266741,False
7,11/25/1998,Secret sauce 816,-0.416147,9.0,8.45,0.484427,True
8,8/16/2014,RMS 2304b,-0.416147,10.0,11.816,-0.121523,False
9,3/21/2010,Texas Red + KO,-0.416147,98.0,816.816,0.971348,False


In [20]:
def my_function1(z):
    
    if z.name == 'Initial Count':
        return z+5
    elif z.name == 'Concentration':
        return z*2 + 13.8
    else:
        return z

df_demo.apply(my_function1)

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success
0,3/21/2010,mPlum + sonic,8,12.0,1000.35,13.871975,True
1,3/21/2010,GFP+,7,21.0,235.123,14.15726,False
2,1/1/2008,GFP CRISPR,6,34.0,76.345,36.690694,True
3,3/21/2010,sgRNA A816,6,78.0,23.981,20.30515,True
4,3/21/2011,Sleeping beauty,6,56.0,274.5,17.948148,True
5,7/4/2010,mPlum + KI U6,8,23.0,10003.5,13.818394,False
6,3/21/2010,Dox 10 + GFP + SWIFT,6,3.0,22897.345,16.401572,False
7,11/25/1998,Secret sauce 816,7,9.0,8.45,15.930178,True
8,8/16/2014,RMS 2304b,7,10.0,11.816,17.18524,False
9,3/21/2010,Texas Red + KO,7,98.0,816.816,14.279912,False


## Conditional applies

In [21]:
df_demo.loc[:, 'RFP'].apply(lambda row: True if row > 1000 else False)

0      True
1     False
2     False
3     False
4     False
5      True
6      True
7     False
8     False
9     False
10     True
11    False
12    False
13    False
14     True
15     True
16     True
17     True
18     True
19     True
20     True
21    False
22    False
23     True
24     True
25    False
26    False
27    False
28     True
29    False
30     True
31     True
32     True
33     True
34     True
35     True
36     True
37    False
38    False
39     True
40     True
41    False
42    False
43    False
44     True
45     True
46     True
47     True
48     True
49     True
50     True
Name: RFP, dtype: bool

## Map

Adding new labels to values. Works columnwise (series)

In [22]:
df_demo['comment'] = df_demo['Success'].map({True: 'Good job!', False: 'Sorry' })
df_demo

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success,comment
0,3/21/2010,mPlum + sonic,3,12.0,1000.35,0.035987,True,Good job!
1,3/21/2010,GFP+,2,21.0,235.123,0.17863,False,Sorry
2,1/1/2008,GFP CRISPR,1,34.0,76.345,11.445347,True,Good job!
3,3/21/2010,sgRNA A816,1,78.0,23.981,3.252575,True,Good job!
4,3/21/2011,Sleeping beauty,1,56.0,274.5,2.074074,True,Good job!
5,7/4/2010,mPlum + KI U6,3,23.0,10003.5,0.009197,False,Sorry
6,3/21/2010,Dox 10 + GFP + SWIFT,1,3.0,22897.345,1.300786,False,Sorry
7,11/25/1998,Secret sauce 816,2,9.0,8.45,1.065089,True,Good job!
8,8/16/2014,RMS 2304b,2,10.0,11.816,1.69262,False,Sorry
9,3/21/2010,Texas Red + KO,2,98.0,816.816,0.239956,False,Sorry


## Pivot Tables

Pivot Tables help you reshape and recreate the data table for new use.

In [23]:
df_demo.head(7)

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success,comment
0,3/21/2010,mPlum + sonic,3,12.0,1000.35,0.035987,True,Good job!
1,3/21/2010,GFP+,2,21.0,235.123,0.17863,False,Sorry
2,1/1/2008,GFP CRISPR,1,34.0,76.345,11.445347,True,Good job!
3,3/21/2010,sgRNA A816,1,78.0,23.981,3.252575,True,Good job!
4,3/21/2011,Sleeping beauty,1,56.0,274.5,2.074074,True,Good job!
5,7/4/2010,mPlum + KI U6,3,23.0,10003.5,0.009197,False,Sorry
6,3/21/2010,Dox 10 + GFP + SWIFT,1,3.0,22897.345,1.300786,False,Sorry


In [24]:
pd.pivot_table(df_demo.head(7),index=["Initial Count"])  # Mean values

Unnamed: 0_level_0,Concentration,Final Count,RFP,Success
Initial Count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,4.518195,42.75,5818.04275,0.75
2,0.17863,21.0,235.123,0.0
3,0.022592,17.5,5501.925,0.5


In [25]:
pd.pivot_table(df_demo.head(7),index=["Initial Count", 'Success'])  # Mean values by default

Unnamed: 0_level_0,Unnamed: 1_level_0,Concentration,Final Count,RFP
Initial Count,Success,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,False,1.300786,3.0,22897.345
1,True,5.590665,56.0,124.942
2,False,0.17863,21.0,235.123
3,False,0.009197,23.0,10003.5
3,True,0.035987,12.0,1000.35


In [26]:
pd.pivot_table(df_demo.head(7),index=["Initial Count", 'Success'], aggfunc='count')

Unnamed: 0_level_0,Unnamed: 1_level_0,Concentration,Date,Experiment,Final Count,RFP,comment
Initial Count,Success,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,False,1,1,1,1,1,1
1,True,3,3,3,3,3,3
2,False,1,1,1,1,1,1
3,False,1,1,1,1,1,1
3,True,1,1,1,1,1,1


In [27]:
pd.pivot_table(df_demo.head(7),index=["Initial Count", 'Success'], aggfunc='max')

Unnamed: 0_level_0,Unnamed: 1_level_0,Concentration,Date,Experiment,Final Count,RFP,comment
Initial Count,Success,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,False,1.300786,3/21/2010,Dox 10 + GFP + SWIFT,3.0,22897.345,Sorry
1,True,11.445347,3/21/2011,sgRNA A816,78.0,274.5,Good job!
2,False,0.17863,3/21/2010,GFP+,21.0,235.123,Sorry
3,False,0.009197,7/4/2010,mPlum + KI U6,23.0,10003.5,Sorry
3,True,0.035987,3/21/2010,mPlum + sonic,12.0,1000.35,Good job!


We're taking the max value from each column for the subset. (Not just the max row)

In [28]:
df_demo.head(7)

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success,comment
0,3/21/2010,mPlum + sonic,3,12.0,1000.35,0.035987,True,Good job!
1,3/21/2010,GFP+,2,21.0,235.123,0.17863,False,Sorry
2,1/1/2008,GFP CRISPR,1,34.0,76.345,11.445347,True,Good job!
3,3/21/2010,sgRNA A816,1,78.0,23.981,3.252575,True,Good job!
4,3/21/2011,Sleeping beauty,1,56.0,274.5,2.074074,True,Good job!
5,7/4/2010,mPlum + KI U6,3,23.0,10003.5,0.009197,False,Sorry
6,3/21/2010,Dox 10 + GFP + SWIFT,1,3.0,22897.345,1.300786,False,Sorry


## Creating a new dataframe from scratch

DataFrames can be crearted directly from dictionaries.

In [29]:
df = pd.DataFrame.from_dict(
    {
        'Name': ['Tony', 'Emily', 'John', 'Shannon'],
        'Age': [50, 25, 35, 64],
        'Birth City': ['Baltimore', 'Paris', 'Houston', 'San Diego'],
        'Gender': ['M', 'F', 'M', 'F']
    }
)
df

Unnamed: 0,Name,Age,Birth City,Gender
0,Tony,50,Baltimore,M
1,Emily,25,Paris,F
2,John,35,Houston,M
3,Shannon,64,San Diego,F


## No for loop necessary (or desired)

In [30]:
conditions = [
    (df['Age'] < 20),
    (df['Age'] >= 20) & (df['Age'] < 40),
    (df['Age'] >= 40) & (df['Age'] < 59),
    (df['Age'] >= 60)
]
values = ['<20 years old', '20-39 years old', '40-59 years old', '60+ years old']
df['Age Group'] = np.select(conditions, values)

display(df)

Unnamed: 0,Name,Age,Birth City,Gender,Age Group
0,Tony,50,Baltimore,M,40-59 years old
1,Emily,25,Paris,F,20-39 years old
2,John,35,Houston,M,20-39 years old
3,Shannon,64,San Diego,F,60+ years old


## Apply, ApplyMap, Map


| Function | Applies To                    |
|----------|-------------------------------|
| apply    | rows and columns on DataFrame |
| applymap | elementwise                   |
| map      | elementwise on Series         |


### USE CASE

**map** is meant for mapping values from one domain to another very quickly 

`df['A'].map({1:'a', 2:'b', 3:'c'}))`

**applymap** is good for elementwise transformations across multiple rows/columns 

`df[['A', 'B', 'C']].applymap(str.strip))`

**apply** is for applying any function that cannot be vectorised 

`df['sentences'].apply(my_function1)`

## Groups

In [31]:
df_demo.groupby('Experiment').count()

Unnamed: 0_level_0,Date,Initial Count,Final Count,RFP,Concentration,Success,comment
Experiment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Dox 10 + GFP + SWIFT,3,3,3,3,3,3,3
GFP CRISPR,7,7,7,7,7,7,7
GFP+,7,7,7,7,7,7,7
RMS 2304b,1,1,1,1,1,1,1
Secret sauce 816,3,3,3,3,3,3,3
Sleeping beauty,8,8,8,8,8,8,8
Texas Red + KO,1,1,1,1,1,1,1
mPlum + KI U6,6,6,6,6,6,6,6
mPlum + sonic,7,7,7,7,7,7,7
sgRNA A816,8,8,8,8,8,8,8


In [32]:
df_demo.groupby(['Experiment', 'Initial Count']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Date,Final Count,RFP,Concentration,Success,comment
Experiment,Initial Count,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Dox 10 + GFP + SWIFT,1,1,1,1,1,1,1
Dox 10 + GFP + SWIFT,3,2,2,2,2,2,2
GFP CRISPR,1,5,5,5,5,5,5
GFP CRISPR,2,2,2,2,2,2,2
GFP+,1,4,4,4,4,4,4
GFP+,2,2,2,2,2,2,2
GFP+,3,1,1,1,1,1,1
RMS 2304b,2,1,1,1,1,1,1
Secret sauce 816,1,1,1,1,1,1,1
Secret sauce 816,2,2,2,2,2,2,2


In [33]:
df_demo.groupby(['Experiment', 'Initial Count']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Final Count,RFP,Concentration,Success
Experiment,Initial Count,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Dox 10 + GFP + SWIFT,1,3.0,22897.345,1.300786,0.0
Dox 10 + GFP + SWIFT,3,9.945,18.0,1.6575,0.5
GFP CRISPR,1,24.4,5436.049,2.299929,0.4
GFP CRISPR,2,12.0,2450.448,0.009794,1.0
GFP+,1,20.25,616.9237,1.220096,0.5
GFP+,2,16.5,179.2865,0.299765,0.5
GFP+,3,18.0,1212.1234,0.04455,1.0
RMS 2304b,2,10.0,11.816,1.69262,0.0
Secret sauce 816,1,55.0,8731.11,0.006299,1.0
Secret sauce 816,2,32.15,4369.78,2.632544,1.0


In [34]:
df_demo.groupby(['Experiment', 'Initial Count']).agg(['mean', 'std', 'min']).sort_values(by='Initial Count', ascending=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,Final Count,Final Count,Final Count,RFP,RFP,RFP,Concentration,Concentration,Concentration,Success,Success,Success
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,std,min,mean,std,min,mean,std,min,mean,std,min
Experiment,Initial Count,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
Dox 10 + GFP + SWIFT,1,3.0,,3.0,22897.345,,22897.345,1.300786,,1.300786,0.0,,False
mPlum + sonic,1,12.666667,8.082904,8.0,765.31,1281.648316,25.35,0.213654,0.176544,0.009799,0.333333,0.57735,False
sgRNA A816,1,21.0,38.0,2.0,22513.87025,14993.2595,23.981,0.813194,1.626254,6.7e-05,0.25,0.5,False
Secret sauce 816,1,55.0,,55.0,8731.11,,8731.11,0.006299,,0.006299,1.0,,True
Sleeping beauty,1,23.0,28.583212,6.0,22996.845,39573.39358,24.0,2.561392,1.089969,1.800102,0.666667,0.57735,False
GFP+,1,20.25,12.120919,8.0,616.9237,687.368758,8.0,1.220096,2.272111,0.01485,0.5,0.57735,False
GFP CRISPR,1,24.4,11.696153,4.0,5436.049,4927.180111,1.0,2.299929,5.112476,0.003099,0.4,0.547723,False
GFP+,2,16.5,6.363961,12.0,179.2865,78.964736,123.45,0.299765,0.171311,0.17863,0.5,0.707107,False
mPlum + sonic,2,42.0,0.0,42.0,1212.0,0.0,1212.0,0.069307,0.0,0.069307,1.0,0.0,True
mPlum + KI U6,2,17.8,7.120393,10.0,5376.06,4809.496654,3.0,1.377404,1.754337,0.005176,0.4,0.547723,False


In [35]:
df_demo.groupby(['Experiment', 'Initial Count']).agg(['mean', 'std', 'min']).sort_values(by='Experiment', ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Final Count,Final Count,Final Count,RFP,RFP,RFP,Concentration,Concentration,Concentration,Success,Success,Success
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,std,min,mean,std,min,mean,std,min,mean,std,min
Experiment,Initial Count,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
sgRNA A816,2,41.0,43.87862,3.0,3370.845,3885.388333,6.0,0.611729,0.679277,0.023457,0.5,0.57735,False
sgRNA A816,1,21.0,38.0,2.0,22513.87025,14993.2595,23.981,0.813194,1.626254,6.7e-05,0.25,0.5,False
mPlum + sonic,3,12.0,,12.0,1000.35,,1000.35,0.035987,,0.035987,1.0,,True
mPlum + sonic,2,42.0,0.0,42.0,1212.0,0.0,1212.0,0.069307,0.0,0.069307,1.0,0.0,True
mPlum + sonic,1,12.666667,8.082904,8.0,765.31,1281.648316,25.35,0.213654,0.176544,0.009799,0.333333,0.57735,False
mPlum + KI U6,3,23.0,,23.0,10003.5,,10003.5,0.009197,,0.009197,0.0,,False
mPlum + KI U6,2,17.8,7.120393,10.0,5376.06,4809.496654,3.0,1.377404,1.754337,0.005176,0.4,0.547723,False
Texas Red + KO,2,98.0,,98.0,816.816,,816.816,0.239956,,0.239956,0.0,,False
Sleeping beauty,3,3.0,0.0,3.0,17327.4,0.0,17327.4,0.50026,0.70674,0.000519,0.0,0.0,False
Sleeping beauty,5,7.0,0.0,7.0,68692.035,0.0,68692.035,1.683905,0.164904,1.5673,1.0,0.0,True


In [36]:
plt.figure()
df_demo['Final Count'].hist()
plt.title('My great title is this')
plt.ylabel('Bacon')
plt.xlabel(r'My cool equation is $ \frac{\partial \rho}{\partial t} + \nabla \cdot + \frac{x^5}{2y^2}$'
          , fontsize=10, color='blue')

<IPython.core.display.Javascript object>

Text(0.5, 0, 'My cool equation is $ \\frac{\\partial \\rho}{\\partial t} + \\nabla \\cdot + \\frac{x^5}{2y^2}$')

In [37]:
plt.figure()
df_demo['Final Count'].hist(bins=23)
plt.grid(False)
plt.gca().spines[['right', 'top']].set_visible(False)  # Get rid of right/top lines
plt.ylim([-3, 14])

<IPython.core.display.Javascript object>

(-3.0, 14.0)

In [38]:
df_demo.hist(column=['Initial Count', 'Final Count', 'RFP'])

<IPython.core.display.Javascript object>

array([[<AxesSubplot:title={'center':'Initial Count'}>,
        <AxesSubplot:title={'center':'Final Count'}>],
       [<AxesSubplot:title={'center':'RFP'}>, <AxesSubplot:>]],
      dtype=object)

In [39]:
df_demo.hist(column=['Initial Count', 'Final Count', 'RFP'], sharey=True)

<IPython.core.display.Javascript object>

array([[<AxesSubplot:title={'center':'Initial Count'}>,
        <AxesSubplot:title={'center':'Final Count'}>],
       [<AxesSubplot:title={'center':'RFP'}>, <AxesSubplot:>]],
      dtype=object)

In [40]:
df_demo

Unnamed: 0,Date,Experiment,Initial Count,Final Count,RFP,Concentration,Success,comment
0,3/21/2010,mPlum + sonic,3,12.0,1000.35,0.035987,True,Good job!
1,3/21/2010,GFP+,2,21.0,235.123,0.17863,False,Sorry
2,1/1/2008,GFP CRISPR,1,34.0,76.345,11.445347,True,Good job!
3,3/21/2010,sgRNA A816,1,78.0,23.981,3.252575,True,Good job!
4,3/21/2011,Sleeping beauty,1,56.0,274.5,2.074074,True,Good job!
5,7/4/2010,mPlum + KI U6,3,23.0,10003.5,0.009197,False,Sorry
6,3/21/2010,Dox 10 + GFP + SWIFT,1,3.0,22897.345,1.300786,False,Sorry
7,11/25/1998,Secret sauce 816,2,9.0,8.45,1.065089,True,Good job!
8,8/16/2014,RMS 2304b,2,10.0,11.816,1.69262,False,Sorry
9,3/21/2010,Texas Red + KO,2,98.0,816.816,0.239956,False,Sorry
