# Learn Python

### Python Packages - Pandas

##### Source: https://pandas.pydata.org/
Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

Pandas is well suited for many different kinds of data:

- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
- Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. For R users, DataFrame provides everything that R’s data.frame provides and much more. pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries.

Here are just a few of the things that pandas does well:

- Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
- Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
- Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
- Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
- Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
- Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
- Intuitive merging and joining data sets
- Flexible reshaping and pivoting of data sets
- Hierarchical labeling of axes (possible to have multiple labels per tick)
- Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format
- Time series-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging.

### Importing Pandas Package

In [2]:
import numpy as np
import pandas as pd

### Series in Pandas

In [6]:
alphabets = ['a','b','c','d','e','f']
alphabets

['a', 'b', 'c', 'd', 'e', 'f']

In [7]:
numbers = [1,2,3,4,5,6]
numbers

[1, 2, 3, 4, 5, 6]

In [8]:
alphabetsArray = np.array(alphabets)
alphabetsArray

array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='<U1')

In [9]:
numbersArray = np.array(numbers)
numbersArray

array([1, 2, 3, 4, 5, 6])

In [10]:
simpleDictionary = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6}
simpleDictionary

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}

In [11]:
seriesFromList = pd.Series(data=alphabets)
seriesFromList

0    a
1    b
2    c
3    d
4    e
5    f
dtype: object

In [12]:
seriesFromArray = pd.Series(data=numbersArray)
seriesFromArray

0    1
1    2
2    3
3    4
4    5
5    6
dtype: int32

In [13]:
# To create custom labels
seriesFromArray = pd.Series(data=numbersArray, index=alphabets)
seriesFromArray

a    1
b    2
c    3
d    4
e    5
f    6
dtype: int32

In [14]:
# If the sequence of Data and Index is maintained then we need not mention the variable / parameter names.
seriesFromArray = pd.Series(numbersArray, alphabets)
seriesFromArray

a    1
b    2
c    3
d    4
e    5
f    6
dtype: int32

In [15]:
# Creating a Series from Dictionary
seriesFromDict = pd.Series(simpleDictionary)
seriesFromDict

a    1
b    2
c    3
d    4
e    5
f    6
dtype: int64

### Dataframes in Pandas

In [16]:
sports = pd.Series(data=[1,2,3,4,5], index=['Golf', 'Football', 'Baseball', 'Basketball', 'Volleyball'])
sports

Golf          1
Football      2
Baseball      3
Basketball    4
Volleyball    5
dtype: int64

In [17]:
sports['Golf']

1

In [19]:
sports[2]

3

In [20]:
sportsILove = pd.Series(data=[1,2,3], index=['Volleyball', 'Cricket', 'Baseball'])
sportsILove

Volleyball    1
Cricket       2
Baseball      3
dtype: int64

In [21]:
sportsILove + sports

Baseball      6.0
Basketball    NaN
Cricket       NaN
Football      NaN
Golf          NaN
Volleyball    6.0
dtype: float64

In [35]:
dataframe = pd.DataFrame(randn(3,10), index=['A', 'B', 'C'], columns='A B C D E F G H I J'.split())
dataframe

Unnamed: 0,A,B,C,D,E,F,G,H,I,J
A,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
B,1.018552,1.230558,-1.181103,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
C,-1.348696,1.302536,-0.362612,-1.485156,-0.592461,-2.304908,-0.031817,0.112488,0.288078,1.498108


In [36]:
dataframe['B']

A    0.627437
B    1.230558
C    1.302536
Name: B, dtype: float64

In [38]:
dataframe[['D', 'A', 'C']]

Unnamed: 0,D,A,C
A,0.140395,0.450218,0.751337
B,-0.45993,1.018552,-1.181103
C,-1.485156,-1.348696,-0.362612


In [39]:
dataframe['K'] = dataframe['A'] + dataframe['B']
dataframe

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K
A,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676,1.077655
B,1.018552,1.230558,-1.181103,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026,2.249111
C,-1.348696,1.302536,-0.362612,-1.485156,-0.592461,-2.304908,-0.031817,0.112488,0.288078,1.498108,-0.046161


In [41]:
dataframe.drop('K', axis=1)

Unnamed: 0,A,B,C,D,E,F,G,H,I,J
A,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
B,1.018552,1.230558,-1.181103,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
C,-1.348696,1.302536,-0.362612,-1.485156,-0.592461,-2.304908,-0.031817,0.112488,0.288078,1.498108


In [42]:
dataframe

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K
A,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676,1.077655
B,1.018552,1.230558,-1.181103,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026,2.249111
C,-1.348696,1.302536,-0.362612,-1.485156,-0.592461,-2.304908,-0.031817,0.112488,0.288078,1.498108,-0.046161


In [43]:
# Drop only drops in the view. To broadcast the change, ensure that you use inplace=True as a parameter in drop.
dataframe.drop('K', axis=1, inplace=True)
dataframe

Unnamed: 0,A,B,C,D,E,F,G,H,I,J
A,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
B,1.018552,1.230558,-1.181103,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
C,-1.348696,1.302536,-0.362612,-1.485156,-0.592461,-2.304908,-0.031817,0.112488,0.288078,1.498108


### Selecting Techniques

In [46]:
# Locating the data by Labels
dataframe.loc['A']

A    0.450218
B    0.627437
C    0.751337
D    0.140395
E   -0.926872
F   -0.182420
G   -0.491125
H    0.134373
I   -0.268371
J   -0.131676
Name: A, dtype: float64

In [47]:
# To use the default index instead of labels
dataframe.iloc[0]

A    0.450218
B    0.627437
C    0.751337
D    0.140395
E   -0.926872
F   -0.182420
G   -0.491125
H    0.134373
I   -0.268371
J   -0.131676
Name: A, dtype: float64

In [48]:
# To get specific values, mention the row and column labels
dataframe.loc['A', 'J']

-0.13167562628701832

In [50]:
dataframe.loc[['A','B'],['I','J']]

Unnamed: 0,I,J
A,-0.268371,-0.131676
B,-0.477581,0.026


In [56]:
dataframe > 0.5

Unnamed: 0,A,B,C,D,E,F,G,H,I,J
A,False,True,True,False,False,False,False,False,False,False
B,True,True,False,False,False,True,False,True,False,False
C,False,True,False,False,False,False,False,False,False,True


In [57]:
dataframe[dataframe > 0.5]

Unnamed: 0,A,B,C,D,E,F,G,H,I,J
A,,0.627437,0.751337,,,,,,,
B,1.018552,1.230558,,,,1.223722,,1.448989,,
C,,1.302536,,,,,,,,1.498108


In [59]:
dataframe[dataframe['A']>0.4]['B']

A    0.627437
B    1.230558
Name: B, dtype: float64

In [60]:
dataframe

Unnamed: 0,A,B,C,D,E,F,G,H,I,J
A,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
B,1.018552,1.230558,-1.181103,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
C,-1.348696,1.302536,-0.362612,-1.485156,-0.592461,-2.304908,-0.031817,0.112488,0.288078,1.498108


In [61]:
dataframe[(dataframe['A'] > 0.4) & (dataframe['B'] > 1)]

Unnamed: 0,A,B,C,D,E,F,G,H,I,J
B,1.018552,1.230558,-1.181103,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026


### Playing with Indices

In [62]:
dataframe

Unnamed: 0,A,B,C,D,E,F,G,H,I,J
A,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
B,1.018552,1.230558,-1.181103,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
C,-1.348696,1.302536,-0.362612,-1.485156,-0.592461,-2.304908,-0.031817,0.112488,0.288078,1.498108


In [64]:
# Resetting the index to its original values
dataframe.reset_index()

Unnamed: 0,index,A,B,C,D,E,F,G,H,I,J
0,A,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
1,B,1.018552,1.230558,-1.181103,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
2,C,-1.348696,1.302536,-0.362612,-1.485156,-0.592461,-2.304908,-0.031817,0.112488,0.288078,1.498108


In [65]:
newIndex = 'Alpha Beta Gamma'.split()

In [68]:
dataframe['newIndex'] = newIndex
dataframe.set_index('newIndex', inplace=True)
dataframe

Unnamed: 0_level_0,A,B,C,D,E,F,G,H,I,J
newIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Alpha,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
Beta,1.018552,1.230558,-1.181103,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
Gamma,-1.348696,1.302536,-0.362612,-1.485156,-0.592461,-2.304908,-0.031817,0.112488,0.288078,1.498108


### Handling Missing Data

In [75]:
tempDF = dataframe[dataframe > -1]
tempDF

Unnamed: 0_level_0,A,B,C,D,E,F,G,H,I,J
newIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Alpha,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
Beta,1.018552,1.230558,,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
Gamma,,1.302536,-0.362612,,-0.592461,,-0.031817,0.112488,0.288078,1.498108


In [76]:
# To drop the rows with na values
tempDF.dropna()

Unnamed: 0_level_0,A,B,C,D,E,F,G,H,I,J
newIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Alpha,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676


In [77]:
tempDF.dropna(axis=1)

Unnamed: 0_level_0,B,E,G,H,I,J
newIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alpha,0.627437,-0.926872,-0.491125,0.134373,-0.268371,-0.131676
Beta,1.230558,-0.7908,-0.059368,1.448989,-0.477581,0.026
Gamma,1.302536,-0.592461,-0.031817,0.112488,0.288078,1.498108


In [78]:
tempDF.dropna(axis=1, thresh=2)

Unnamed: 0_level_0,A,B,C,D,E,F,G,H,I,J
newIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Alpha,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
Beta,1.018552,1.230558,,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
Gamma,,1.302536,-0.362612,,-0.592461,,-0.031817,0.112488,0.288078,1.498108


In [79]:
tempDF.dropna(axis=1, thresh=1)

Unnamed: 0_level_0,A,B,C,D,E,F,G,H,I,J
newIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Alpha,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
Beta,1.018552,1.230558,,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
Gamma,,1.302536,-0.362612,,-0.592461,,-0.031817,0.112488,0.288078,1.498108


In [81]:
# Replacing the NaN values
tempDF.fillna(value=0)

Unnamed: 0_level_0,A,B,C,D,E,F,G,H,I,J
newIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Alpha,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
Beta,1.018552,1.230558,0.0,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
Gamma,0.0,1.302536,-0.362612,0.0,-0.592461,0.0,-0.031817,0.112488,0.288078,1.498108


In [82]:
tempDF

Unnamed: 0_level_0,A,B,C,D,E,F,G,H,I,J
newIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Alpha,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
Beta,1.018552,1.230558,,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
Gamma,,1.302536,-0.362612,,-0.592461,,-0.031817,0.112488,0.288078,1.498108


In [85]:
# Replacing the NaN values with mean of that column
tempDF['A'].fillna(value=tempDF['A'].mean(), inplace=True)
tempDF

Unnamed: 0_level_0,A,B,C,D,E,F,G,H,I,J
newIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Alpha,0.450218,0.627437,0.751337,0.140395,-0.926872,-0.18242,-0.491125,0.134373,-0.268371,-0.131676
Beta,1.018552,1.230558,,-0.45993,-0.7908,1.223722,-0.059368,1.448989,-0.477581,0.026
Gamma,0.734385,1.302536,-0.362612,,-0.592461,,-0.031817,0.112488,0.288078,1.498108


### Group By

In [98]:
dataDict = {'CustID': [1001,1002,1003,1002,1001], 
            'CustName': ['Vidya', 'Jeevan', 'Lokesh', 'Dheeraj', 'Sai'], 
            'ProfitInLakhs': [10, 30, 40, 50, 60]}

In [99]:
dataDict

{'CustID': [1001, 1002, 1003, 1002, 1001],
 'CustName': ['Vidya', 'Jeevan', 'Lokesh', 'Dheeraj', 'Sai'],
 'ProfitInLakhs': [10, 30, 40, 50, 60]}

In [100]:
dataDF = pd.DataFrame(data=dataDict)
dataDF

Unnamed: 0,CustID,CustName,ProfitInLakhs
0,1001,Vidya,10
1,1002,Jeevan,30
2,1003,Lokesh,40
3,1002,Dheeraj,50
4,1001,Sai,60


In [101]:
CustID_group = dataDF.groupby('CustID')

In [102]:
CustID_group.mean()

Unnamed: 0_level_0,ProfitInLakhs
CustID,Unnamed: 1_level_1
1001,35
1002,40
1003,40


In [104]:
CustID_group.describe().transpose()

Unnamed: 0,CustID,1001,1002,1003
ProfitInLakhs,count,2.0,2.0,1.0
ProfitInLakhs,mean,35.0,40.0,40.0
ProfitInLakhs,std,35.355339,14.142136,
ProfitInLakhs,min,10.0,30.0,40.0
ProfitInLakhs,25%,22.5,35.0,40.0
ProfitInLakhs,50%,35.0,40.0,40.0
ProfitInLakhs,75%,47.5,45.0,40.0
ProfitInLakhs,max,60.0,50.0,40.0


In [144]:
dict1 = {'StudentID': [101,102,103,104], 
        'Score': [1223,1212,33445,45456],
        'Dept': ['CAT0', 'CAT1', 'CAT2', 'CAT3'],
        'Pass': ['yes', 'no', 'yes', 'no']}
dict2 = {'StudentID': [101, 103, 104, 105],
        'Score': [1223,3456,6767,7878],
        'SportsDept': ['CAT4', 'CAT5', 'CAT6', 'CAT7'],
        'Qualified': ['yes', 'no', 'no', 'no']}

df1 = pd.DataFrame(data = dict1, index='0 1 2 3'.split())
df2 = pd.DataFrame(data = dict2, index='4 5 6 7'.split())

In [145]:
pd.concat(objs=[df1,df2], sort=True)

Unnamed: 0,Dept,Pass,Qualified,Score,SportsDept,StudentID
0,CAT0,yes,,1223,,101
1,CAT1,no,,1212,,102
2,CAT2,yes,,33445,,103
3,CAT3,no,,45456,,104
4,,,yes,1223,CAT4,101
5,,,no,3456,CAT5,103
6,,,no,6767,CAT6,104
7,,,no,7878,CAT7,105


In [146]:
pd.merge(df1,df2, how='outer', on='StudentID')

Unnamed: 0,StudentID,Score_x,Dept,Pass,Score_y,SportsDept,Qualified
0,101,1223.0,CAT0,yes,1223.0,CAT4,yes
1,102,1212.0,CAT1,no,,,
2,103,33445.0,CAT2,yes,3456.0,CAT5,no
3,104,45456.0,CAT3,no,6767.0,CAT6,no
4,105,,,,7878.0,CAT7,no


### Other Operations

In [147]:
df1.head(1)

Unnamed: 0,StudentID,Score,Dept,Pass
0,101,1223,CAT0,yes


In [148]:
df1['StudentID'].unique()

array([101, 102, 103, 104], dtype=int64)

In [149]:
# To find the number of unique values
df1['StudentID'].nunique()

4

In [150]:
# To find the counts of categorical values
df1['Dept'].value_counts()

CAT1    1
CAT0    1
CAT3    1
CAT2    1
Name: Dept, dtype: int64

In [151]:
df1[df1['Pass']!='no']

Unnamed: 0,StudentID,Score,Dept,Pass
0,101,1223,CAT0,yes
2,103,33445,CAT2,yes


### Applying Functions on DataFrame Columns

In [152]:
def moreScore(a):
    if(a > 1500):
        return 1
    else:
        return 0

In [153]:
df1['Score'].apply(moreScore)

0    0
1    0
2    1
3    1
Name: Score, dtype: int64

In [154]:
df1['Dept'].apply(len)

0    4
1    4
2    4
3    4
Name: Dept, dtype: int64

In [155]:
df1['Score'].sum()

81336

In [156]:
df1['Score'].mean()

20334.0

### Deleting a column in a DataFrame

In [157]:
del df1['Pass']

In [158]:
df1

Unnamed: 0,StudentID,Score,Dept
0,101,1223,CAT0
1,102,1212,CAT1
2,103,33445,CAT2
3,104,45456,CAT3


### To know the column and index names

In [159]:
df1.columns

Index(['StudentID', 'Score', 'Dept'], dtype='object')

In [160]:
df1.index

Index(['0', '1', '2', '3'], dtype='object')

### Sorting

In [161]:
df1.sort_values(by='Score')

Unnamed: 0,StudentID,Score,Dept
1,102,1212,CAT1
0,101,1223,CAT0
2,103,33445,CAT2
3,104,45456,CAT3


### Checking null values in dataframe

In [162]:
df1.isnull()

Unnamed: 0,StudentID,Score,Dept
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False


### Reading data from external sources

In [163]:
trainDF = pd.read_csv('pandas-train.csv')

In [165]:
trainDF.head(5)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [168]:
trainDF.to_csv('pandas-train-df.csv', index=False)

In [169]:
consumerDF = pd.read_excel('pandas-consumer.xlsx', sheet_name='Data1')
consumerDF.head(5)

Unnamed: 0,Income,HouseholdSize,AmountCharged
0,54,3,4016
1,30,2,3159
2,32,4,5100
3,50,5,4742
4,31,2,1864


In [171]:
consumerDF.to_excel('ConsumerData.xlsx', sheet_name='DataFrame', index=False)