<a href="https://colab.research.google.com/github/vasanthL/Coding_notes/blob/main/pandas_notes/pandas__tutorial_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Pandas tutorial

In [None]:
# Contains code samples and learning tutorials on python pandas

In [None]:
# Tutorial point link for pandas
# https://www.tutorialspoint.com/python_pandas/python_pandas_quick_guide.htm

In [None]:

#    Definition for pandas - Pandas is an open-source Python Library providing high-performance data manipulation and 
# analysis tool using its powerful data structures



In [None]:
'''
Key Features of Pandas
    1. Fast and efficient DataFrame object with default and customized indexing.
    2. Tools for loading data into in-memory data objects from different file formats.
    3. Data alignment and integrated handling of missing data.
    4. Reshaping and pivoting of date sets.
    5. Label-based slicing, indexing and subsetting of large data sets.
    6. Columns from a data structure can be deleted or inserted.
    7. Group by data for aggregation and transformations.
    8. High performance merging and joining of data.
    9. Time Series functionality.

'''

In [None]:
# PIP command to install pandas
#    pip install pandas

# Package managers of respective Linux distributions are used to install one or more packages in SciPy stack.
# For Ubuntu Users
# sudo apt-get install python-numpy python-scipy python-matplotlibipythonipythonnotebook python-pandas python-sympy python-nose

## Introduction to Data Structures

In [None]:
'''
    Pandas deals with the following three data structures −
    1. Series
    2. DataFrame
    3. Panel

These data structures are built on top of Numpy array, which means they are fast.
'''

In [None]:
# Dimension & Description
# The best way to think of these data structures is that the higher dimensional data structure is a container of its lower dimensional data structure. 
# For example, DataFrame is a container of Series, Panel is a container of DataFrame.



## <span style="color:#34568b"> Datastructures types</span>

|<span style="color:#88b04b">Data Structure| <span style="color:#88b04b">Dimensions</span> | <span style="color:#88b04b">Description</span> |
:------------:|:----------:|:------------:|
| Series |	1	| 1D labeled homogeneous array, sizeimmutable. |
| Data Frames |	2	| General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns. |
| Panel |	3	| General 3D labeled, size-mutable array. |


> Building and handling two or more dimensional arrays is a tedious task, burden is placed on the user to consider the orientation of the data set when writing functions. But using Pandas data structures, the mental effort of the user is reduced.

> For example, with tabular data (DataFrame) it is more semantically helpful to think of the index (the rows) and the columns rather than axis 0 and axis 1.


# Mutability
>All Pandas data structures are value mutable (can be changed) and except Series all are size mutable. Series is size immutable.
Note − DataFrame is widely used and one of the most important data structures. Panel is used much less.
-------------------------------------------------------------------------------------------------------

## Series
Series is a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers 10, 23, 56, …

| 10 |	23	 |56 | 17 |


#### Key Points
- Homogeneous data
- Size Immutable
- Values of Data Mutable

## DataFrame

The data types of the four columns are as follows −


| Column |  Type  |
|------|------|
|Name|	String|
|Age|	Integer|
|Gender|	String|
|Rating|	Float|

#### Key Points
- Heterogeneous data
- Size Mutable
- Data Mutable

## Panel

Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But a panel can be illustrated as a container of DataFrame.

#### Key Points
- Heterogeneous data
- Size Mutable
- Data Mutable

## <span style="color:#34568b; text-align:right">Python Pandas - Series</span> 

In [None]:
# A pandas Series can be created using the following constructor −

# pandas.Series( data, index, dtype, copy)

The parameters of the constructor are as follows −

 | Sr.No| Parameter & Description |
|:-----:|:-----------------------:|
|1| data - data takes various forms like ndarray, list, constants|
|2 |index - Index values must be unique and hashable, same length as data. Default np.arange(n) if no index is passed.|
| 3 |	dtype - dtype is for data type. If None, data type will be inferred|
| 4 |	copy Copy data. Default False |


A series can be created using various inputs like −

- Array
- Dict
- Scalar value or constant

In [None]:
## Create an Empty Series
# A basic series, which can be created is an Empty Series.

# Example 
#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print(s) 

Series([], dtype: float64)


  s = pd.Series()


In [None]:
# Create a Series from ndarray
# If data is an ndarray, then index passed must be of the same
# length. If no index is passed, then by default index will be# 
# range(n) where n is array length, 
# i.e., [0,1,2,3…. range(len(array))-1].

import pandas as pd 
import numpy as np

data = np.array([1,2,4,4,5])
print('data ->',data,sep="\n")
print('---------------------------------------------------')

#print series from numpy ndarray
series = pd.Series(data)
print('series ->',series,sep="\n")
print('---------------------------------------------------')

#with custom index specified
series = pd.Series(data, index=['a', 'b', 'r', 'd','e'])
print('series with custom index->',series,sep="\n")

print('---------------------------------------------------')

#with custom index length more or lessser both throws error

# seriesWithIndexLengthGreat = pd.Series(data, index = ['a','b', 'c', 'd', 'e','f'])
# print('series with custom index length greater ->', seriesWithCustomIndexLengthGreat, sep="\n")
# seriesWithIndexLengthLess = pd.Series(data, index = ['a','b','d','e'])
# print('series with custom index length lesser ->', seriesWithIndexLengthLess, sep="\n")

# with custom data type 
seriesWithDType = pd.Series(data, dtype=bool)
print('seriesWithDType ->',seriesWithDType,sep="\n")

data ->
[1 2 4 4 5]
---------------------------------------------------
series ->
0    1
1    2
2    4
3    4
4    5
dtype: int32
---------------------------------------------------
series with custom index->
a    1
b    2
r    4
d    4
e    5
dtype: int32
---------------------------------------------------
seriesWithDType ->
0    True
1    True
2    True
3    True
4    True
dtype: bool


In [None]:
'''Create a Series from dict
A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.'''

import pandas as pd
import numpy as np
data = {'a':0.2, 'b':0.9, 'c':0.4}
series_dict = pd.Series(data)
print('Series from dict -> ',series_dict,sep='\n')

#with changed index
data2 = {'a':0.2, 'b':0.9, 'c':0.4}
series_dict_index = pd.Series(data2, index=['c', 'b', 'd'])
print('Series from dict with custom index',series_dict_index,sep='\n')
# Observe − Index order is persisted and the missing element is filled with NaN (Not a Number).

Series from dict -> 
a    0.2
b    0.9
c    0.4
dtype: float64
Series from dict with custom index
c    0.4
b    0.9
d    NaN
dtype: float64


In [None]:
# Create a Series from Scalar
# If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

import pandas as pd

# index need not be in asc or dec order  and can contain various var types
series_scalar =  pd.Series(5,index=['a',1,2.0,3])
print('Series with scalar',series_scalar,sep='\n')

Series with scalar
a      5
1      5
2.0    5
3      5
dtype: int64


In [None]:
'''
   Accessing Data from Series with Position
Data in the series can be accessed similar to that in an ndarray.

Example 1
Retrieve the first element. As we already know, the counting starts from zero for the array, which means the first element is stored at zeroth position and so on. 
'''

import pandas as pd
#series dtype can be changed using dtype specs
series_pos = pd.Series(5, index=[1,2,4,5], dtype=float)
print('series ->', series_pos,sep='\n')
print('------------------------------')

#getting series value is same as in array
print('series value at pos 1', series_pos[1],sep='\n')

# getting a series range
print('Series Range',series_pos[:3],sep='\n')

#printing custom values
print('Series random values',series_pos[[1,4,5]],sep='\n')

#printing custom random str indices
series_indeces = pd.Series(4,index=['a','b','c','d'])
print('Series with custom indices',series_indeces[['a','d']],sep='\n')

series ->
1    5.0
2    5.0
4    5.0
5    5.0
dtype: float64
------------------------------
series value at pos 1
5.0
Series Range
1    5.0
2    5.0
4    5.0
dtype: float64
Series random values
1    5.0
4    5.0
5    5.0
dtype: float64
Series with custom indices
a    4
d    4
dtype: int64


## Python Pandas - DataFrame
> A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

### Features of DataFrame
- Potentially columns are of different types
- Size – Mutable
- Labeled axes (rows and columns)
- Can Perform Arithmetic operations on rows and columns Structure

> Let us assume that we are creating a data frame with student’s data.



In [None]:
# pandas.DataFrame
# A pandas DataFrame can be created using the following constructor −
# pandas.DataFrame(data,index,columns, dtype,copy)

| Sr No | Parameter & Description |
|-------|-------------------------|
| 1|data - data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.|
| 2|index - For the row labels, the Index to be used for the resulting frame is Optional Default np.arange(n) if no index is passed.|
| 3|columns - For column labels, the optional default syntax is - np.arange(n). This is only true if no index is passed.|
| 4|	dtype - Data type of each column.|

### Create DataFrame
A pandas DataFrame can be created using various inputs like −

- Lists
- dict
- Series
- Numpy ndarrays
- Another DataFrame
In the subsequent sections of this chapter, we will see how to create a DataFrame using these inputs.


In [None]:
# Create ab empty DataFrame
import pandas as pd
df= pd.DataFrame()
print(df) # empty dataframe with 0 rows and columns 

Empty DataFrame
Columns: []
Index: []


In [None]:
# DataFrame from lists
list = [22,33,44,55,66]
df = pd.DataFrame(list)
print('data frame from list',df,sep='\n')

data frame from list
    0
0  22
1  33
2  44
3  55
4  66


In [None]:
# import dataframe directly
from pandas import DataFrame as df

# DataFrame from dict of ndarray / list
dict = {'Name':['vasanth','umesh','bhai'],'age':[22,21,21],'rollno':[179,174,173]}
df_dict = df(dict)
print('data frame from dict',df_dict,sep='\n')
print('-------------------------------------')

# creating an indexed data frame
df_dict_index =  df(dict,index=['rank3','rank2','rank1'])
print('data frame with index from dict',df_dict_index,sep='\n')

data frame from dict
      Name  age  rollno
0  vasanth   22     179
1    umesh   21     174
2     bhai   21     173
-------------------------------------
data frame with index from dict
          Name  age  rollno
rank3  vasanth   22     179
rank2    umesh   21     174
rank1     bhai   21     173


In [None]:
# data frame from a list of dict
from pandas import Series as ss
ls_dict = [{'a':2,'b':3,'c':8},{'a':2,'b':3,'c':8},{'a':2,'b':3,'c':8}]
df_ls_dict = df(ls_dict) # in df keys act as colum labels
print('data frame from list of dict', df_ls_dict, sep='\n')
print("---------------------------------------------------")
 # whereas in series they act as indexes
series_ls_dict = ss(ls_dict[0])
print('series from dict',series_ls_dict,sep='\n')

data frame from list of dict
   a  b  c
0  2  3  8
1  2  3  8
2  2  3  8
---------------------------------------------------
series from dict
a    2
b    3
c    8
dtype: int64


In [None]:
# removing a value from list dict first row
del ls_dict[0]['c']

df_ls_dict_naN = df(ls_dict)
print('data frames with a missing row element', df_ls_dict_naN,sep='\n')

data frames with a missing row element
   a  b    c
0  2  3  NaN
1  2  3  8.0
2  2  3  8.0


In [None]:
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys
df_ls_dict_index= df(data, index=[1,2],columns=['a','b'])
print('data frame from list of dict with two col',df_ls_dict_index,sep='\n')

# With two column indices with one index with other name
df_ls_dict_index_custom_col = df(data,index=[1,2],columns=['a2','b'])
print('data frame from list of dict with one diff col', df_ls_dict_index_custom_col,sep='\n')

'''
    Note − Observe, df2 DataFrame is created with a column index other than the dictionary key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices same as dictionary keys, so NaN’s appended.
'''

data frame from list of dict with two col
   a   b
1  1   2
2  5  10
data frame from list of dict with one diff col
   a2   b
1 NaN   2
2 NaN  10


'\n    Note − Observe, df2 DataFrame is created with a column index other than the dictionary key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices same as dictionary keys, so NaN’s appended.\n'

## Create a DataFrame from Dict of Series
> Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the series indexes passed.

In [None]:
# Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the series indexes passed.

dict_series = {'one':ss([1,2,3],index=['a','b','c']), 'two':ss([1,4],index=['a','b'])}
df_dict_series = df(dict_series)
print('data frame from dict of series',df_dict_series,sep='\n')

'''
    Observe, for the series one, there is no label ‘d’ passed, but in the result, for the d label, NaN is appended with NaN.

Let us now understand column selection, addition, and deletion through examples.
'''

data frame from dict of series
   one  two
a    1  1.0
b    2  4.0
c    3  NaN


In [None]:
# Column Selection
# We will understand this by selecting a column from the DataFrame.

data = {'one':ss([1,2,3]),'two':ss([4,5,6])}
df_cols = df(data)
print('data frame ',df_cols,'data frame first col ',df_cols['one'],sep='\n')

data frame 
   one  two
0    1    4
1    2    5
2    3    6
data frame first col 
0    1
1    2
2    3
Name: one, dtype: int64


In [None]:
# Column Addition
# We will understand this by adding a new column to an existing data frame.

dict_series = {'one':ss([1.0,2.8,3.3]),'two':ss(['a','b','c'])}
df_dict_series = df(dict_series)
print('dict from series', df_dict_series,sep='\n')

print('-----------------------------------------------------------')

# adding columns to dict series
df_dict_series['three']= ss([2.3,4.5,0.9])
print('dict from series with new col added',df_dict_series,sep='\n')

print('----------------------------------------------------------------')

# adding new columns from existing columns
df_dict_series['four'] = df_dict_series['one']+df_dict_series['three']
print('dict from series with new cols from existing cols',df_dict_series,sep='\n')
print('----------------------------------------------------------------')

df_dict_series['five'] =  df_dict_series['one']*df_dict_series['three']
print('dict from series with new cols from existing cols',df_dict_series,sep='\n')
print('----------------------------------------------------------------')

def breakline():
    print('---------------------------------------------------------------')


NameError: name 'ss' is not defined

In [None]:
# Column Deletion
# Columns can be deleted or popped; let us take an example to understand how.

dict_ss_del = {'one':ss([1,2,3]),'two':ss(['a','b','c']),'three':ss([2.0,3.4,4.5])}
df_ss_del = df(dict_ss_del)
print('data frame series delete',df_ss_del,sep='\n')
breakline()

# using del function
del df_ss_del['one']
print('data frame with col one removed',df_ss_del,sep='\n')
breakline()

# using pop function
df_ss_del.pop('three')
print('data frame with col three removed',df_ss_del,sep='\n')
breakline()

data frame series delete
   one two  three
0    1   a    2.0
1    2   b    3.4
2    3   c    4.5
---------------------------------------------------------------
data frame with col one removed
  two  three
0   a    2.0
1   b    3.4
2   c    4.5
---------------------------------------------------------------
data frame with col three removed
  two
0   a
1   b
2   c
---------------------------------------------------------------


## Row Selection, Addition, and Deletion
> We will now understand row selection, addition and deletion through examples. Let us begin with the concept of selection.

### Selection by Label
> Rows can be selected by passing row label to a loc function.

In [None]:
#bk for break line
bk = breakline
 
dict_rows_select = {'one':ss([1,2,3],index=['a','b','c']),'two':ss([1,2,3,4],index=['a','b','c','d'])}
df_dict_row_select = df(dict_rows_select)
print('data frame from dict with row selection',df_dict_row_select,sep='\n')
bk()

# using loc function
print('data frame row one',df_dict_row_select.loc['c'],sep='\n')
bk()
# The result is a series with labels as column names of the DataFrame. And, the Name of the series is the label with which it is retrieved.


'''
Selection by integer location
Rows can be selected by passing integer location to an iloc function.'''

# using iloc function (index location)
print('data frame row one iloc',df_dict_row_select.iloc[0],sep='\n')
bk()



data frame from dict with row selection
   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4
---------------------------------------------------------------
data frame row one
one    3.0
two    3.0
Name: c, dtype: float64
---------------------------------------------------------------
data frame row one iloc
one    1.0
two    1.0
Name: a, dtype: float64
---------------------------------------------------------------


In [None]:
# Slice Rows
# Multiple rows can be selected using ‘ : ’ operator.

dict_multi_rows = {'one':ss([1,2,3],index=[1,2,3]),'two':ss([1.0,2.0,3.0,4.0],index=[1,2,3,4]),'three':ss(['1','2','3'],index=[1,2,3])}
df_multi_rows = df(dict_multi_rows)

print('data frame with multiple rows',df_multi_rows,sep='\n')
bk()

print('data frame with selected rows',df_multi_rows[1:3],sep='\n')
bk()

data frame with multiple rows
   one  two three
1  1.0  1.0     1
2  2.0  2.0     2
3  3.0  3.0     3
4  NaN  4.0   NaN
---------------------------------------------------------------
data frame with selected rows
   one  two three
2  2.0  2.0     2
3  3.0  3.0     3
---------------------------------------------------------------


In [None]:
# Addition of Rows
# Add new rows to a DataFrame using the append function. 
# This function will append the rows at the end.

df_ls_add1 = df([[1,2],[4,5],[7,8]],columns = ['a','b'],index=[0,1,2],dtype='float')
df_ls_add2 = df([[22,33],[44,55],[66,77]],columns = ['a','b'],index=[3,4,5],dtype='float')

df_ls_add1 = df(df_ls_add1.append(df_ls_add2), index=[1,2,3,4,5,6])
print('data frame from list with add',df_ls_add1,sep='\n')
bk()


data frame from list with add
      a     b
1   4.0   5.0
2   7.0   8.0
3  22.0  33.0
4  44.0  55.0
5  66.0  77.0
6   NaN   NaN
---------------------------------------------------------------


## Deletion of Rows
> Use index label to delete or drop rows from a DataFrame. If label is duplicated, then multiple rows will be dropped.

> If you observe, in the above example, the labels are duplicate. Let us drop a label and will see how many rows will get dropped.



In [None]:
df1 = df([[1,2],[3,4]], index=[11,'a'])
df2 = df([[5,6][7,8]], index = ['b',22])

df1 = df1.append(df2)
       
print('data frame with append',df1,sep='\n')       

  df2 = df([[5,6][7,8]], index = ['b',22])


TypeError: list indices must be integers or slices, not tuple

In [None]:
df1 = df([[1,2],[3,4]], index=[11,'a'], columns = ['a','b'])
df2 = df([[5,6],[7,8]], index = ['b',22], columns = ['a','b'])

df1 = (df1.append(df2))
       
print('data frame with append',df1,sep='\n')    
bk()

# In the above example, two rows were dropped because those two contain the same label 0.


data frame with append
    a  b
11  1  2
a   3  4
b   5  6
22  7  8
---------------------------------------------------------------


## Python Pandas - Panel
> A panel is a 3D container of data. The term Panel data is derived from econometrics and is partially responsible for the name pandas − pan(el)-da(ta)-s.

> The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data. They are −

- items − axis 0, each item corresponds to a DataFrame contained inside.

- major_axis − axis 1, it is the index (rows) of each of the DataFrames.

- minor_axis − axis 2, it is the columns of each of the DataFrames.

## pandas.Panel()
> A Panel can be created using the following constructor −

```pandas.Panel(data, items, major_axis, minor_axis, dtype, copy) ```

The parameters of the constructor are as follows −

| Parameter | Description |
|-----------|-------------|
|data|	Data takes various forms like ndarray, series, map,lists, dict, constants and also another DataFrame|
|items|	axis=0|
|major_axis|	axis=1|
|minor_axis|	axis=2|
|dtype|	Data type of each column|
|copy|	Copy data. Default, false|


In [None]:
# Create Panel
# A Panel can be created using multiple ways like −

# From ndarrays
# From dict of DataFrames

import pandas
import numpy as np
data = np.random.rand(2,4,5)
print(data)
panel = pandas.Panel(data)
print('panel from 3 axis ndarray', panel, sep='\n')


[[[0.87015369 0.30327016 0.60892351 0.07105187 0.81906598]
  [0.13891841 0.57972525 0.20178706 0.13886965 0.34882085]
  [0.11482248 0.0906774  0.33119132 0.66832262 0.84413523]
  [0.60697486 0.86967492 0.34240953 0.08220014 0.11657713]]

 [[0.04430067 0.27413261 0.65057802 0.77813511 0.2634327 ]
  [0.59046382 0.4127382  0.93102042 0.26799935 0.64196773]
  [0.60242456 0.48114038 0.19423234 0.2226016  0.52412405]
  [0.30439317 0.85611811 0.75740782 0.84032953 0.06580974]]]


  panel = pandas.Panel(data)


TypeError: Panel() takes no arguments

In [None]:
import pandas
print(pandas.panel())

AttributeError: module 'pandas' has no attribute 'panel'

## Series Basic Functionality


|Sr.No.|	Attribute or Method & Description|
|------|-------------------------------------|
|1|	axes Returns a list of the row axis labels|
|2|	 dtype Returns the dtype of the object.|
|3|	empty Returns True if series is empty.|
|4|	ndim Returns the number of dimensions of the underlying data, by definition |
|5|	size Returns the number of elements in the underlying data.|
|6 |values Returns the Series as ndarray.|
|7	|head() Returns the first n rows.|
|8	|tail() Returns the last n rows.|

In [None]:
# pandas and numpy as np and pd

def breakline():
    print('---------------------------------------------------------------')

import pandas as pd
data = pd.Series(np.random.randn(4)) # nd array of 4 rand int
print('series from random nd array',data,sep='\n')
breakline()

print('data axes values ->',data.axes,sep='\n')
breakline()
#The above result is a compact format of a list of values from 0 to 5, i.e., [0,1,2,3,4].


print('data value is empty', data.empty)

series from random nd array
0   -0.637833
1    1.432949
2   -0.218300
3    0.544254
dtype: float64
---------------------------------------------------------------
data axes values ->
[RangeIndex(start=0, stop=4, step=1)]
---------------------------------------------------------------
data value is empty False


In [None]:
# Returns the number of dimensions of the object. 
# By definition, a Series is a 1D data structure, so it returns

series_ndim= pd.Series(np.random.randn(4))
print('series ndim value = ',series_ndim.ndim)
breakline()

series_size = pd.Series(np.random.randn(3))
print('Series size = ',series_size.size)
breakline()

# Returns the actual data in the series as an array.
series_value = pd.Series(np.random.randn(4))
print('Series value = ',series_value.values)
breakline()


'''
    To view a small sample of a Series or the DataFrame object, use the head() and the tail() methods.

 head() returns the first n rows(observe the index values). The default number of elements to display is five, but you may pass a custom number.

 tail() returns the last n rows(observe the index values). The default number of elements to display is five, but you may pass a custom number.

'''

series_head_fn = pd.Series(np.random.randn(6))
series_tail_fn = pd.Series(np.random.randn(5))

print('Series head funtion ', series_head_fn.head(2),sep='\n')
breakline()
print('Series tail funtion ', series_head_fn.tail(2),sep='\n')
breakline()

series ndim value =  1
---------------------------------------------------------------
Series size =  3
---------------------------------------------------------------
Series value =  [-0.00878432  0.93202479  0.37129011  1.40621339]
---------------------------------------------------------------
Series head funtion 
0    0.229041
1   -1.028952
dtype: float64
---------------------------------------------------------------
Series tail funtion 
4   -0.668416
5   -0.988496
dtype: float64
---------------------------------------------------------------


## DataFrame Basic Functionality
> Let us now understand what DataFrame Basic Functionality is. The following tables lists down the important attributes or methods that help in DataFrame Basic Functionality.


|Sr.No.|	Attribute or Method & Description|
|------|-------------------------------------|
|0|T Transposes rows and columns.|
|1|	axes Returns a list of the row axis labels|
|2|	 dtype Returns the dtype of the object.|
|3|	empty Returns True if series is empty.|
|4|	ndim Returns the number of dimensions of the underlying data, by definition |
|5|	size Returns the number of elements in the underlying data.|
|6 |values Returns the Series as ndarray.|
|7	|head() Returns the first n rows.|
|8	|tail() Returns the last n rows.|

In [None]:
from pandas import DataFrame as df

# create a dictionary
dict_series = {'Name':pd.Series(['vasi','rio','axel']),
              'age':pd.Series([21,20,21]),
              'rating':pd.Series([4.3,4.5,4.1])}
df_dict= df(dict_series)
print('data frame from dictionary series ', df_dict,sep='\n')
breakline()

# T (Transpose)
# Returns the transpose of the DataFrame. The rows and columns will interchange.
print('data frame transposed',df_dict.T,sep='\n')
breakline()

# axes
# Returns the list of row axis labels and column axis labels.
print('dat frame axes',df_dict.axes,sep='\n')
breakline()

print('data frame axes converted to list',list(df_dict.axes),sep='n')
breakline()

#dtypes
#Returns the data type of each column.
print('data frame data type',df_dict.dtypes,sep='\n')
breakline()

# empty
# Returns the Boolean value saying whether the Object is empty or not; True indicates that the object is empty.
print('data frame is empty = ',df_dict.empty)
breakline()

# ndim
# Returns the number of dimensions of the object. By definition, DataFrame is a 2D object.
print('data frame ndim(dimension) = ',df_dict.ndim)
breakline()

# shape
# Returns a tuple representing the dimensionality of the DataFrame. Tuple (a,b), where a represents the number of rows and b represents the number of columns.
print('data frame shape = ',df_dict.shape)
breakline()

#size
#Returns the number of elements in the DataFrame.
print('data frame size = ',df_dict.size)
breakline()

data frame from dictionary series 
   Name  age  rating
0  vasi   21     4.3
1   rio   20     4.5
2  axel   21     4.1
---------------------------------------------------------------
data frame transposed
           0    1     2
Name    vasi  rio  axel
age       21   20    21
rating   4.3  4.5   4.1
---------------------------------------------------------------
dat frame axes
[RangeIndex(start=0, stop=3, step=1), Index(['Name', 'age', 'rating'], dtype='object')]
---------------------------------------------------------------
data frame axes converted to listn[RangeIndex(start=0, stop=3, step=1), Index(['Name', 'age', 'rating'], dtype='object')]
---------------------------------------------------------------
data frame data type
Name       object
age         int64
rating    float64
dtype: object
---------------------------------------------------------------
data frame is empty =  False
---------------------------------------------------------------
data frame ndim(dimension) =  2
----

In [None]:
# values
# Returns the actual data in the DataFrame as an NDarray.
print('data frame values',df_dict.values,sep='\n')
breakline()

# Head & Tail
# To view a small sample of a DataFrame object, use the head() and tail() methods. head() returns the first n rows (observe the index values). The default number of elements to display is five, but you may pass a custom number.
print('data frame head values',df_dict.head(2),sep='\n')
breakline()

print('data frame tail values',df_dict.tail(2),sep='\n')
breakline()

print('data frame tail values',df_dict.tail(2),sep='\n')
breakline()

data frame values
[['vasi' 21 4.3]
 ['rio' 20 4.5]
 ['axel' 21 4.1]]
---------------------------------------------------------------
data frame head values
   Name  age  rating
0  vasi   21     4.3
1   rio   20     4.5
---------------------------------------------------------------
data frame tail values
   Name  age  rating
1   rio   20     4.5
2  axel   21     4.1
---------------------------------------------------------------
data frame tail values
   Name  age  rating
1   rio   20     4.5
2  axel   21     4.1
---------------------------------------------------------------


## Python Pandas - Descriptive Statistics

>A large number of methods collectively compute descriptive statistics and other related operations on DataFrame. Most of these are aggregations like sum(), mean(), but some of them, like sumsum(), produce an object of the same size. Generally speaking, these methods take an axis argument, just like ndarray.{sum, std, ...}, but the axis can be specified by name or integer

`DataFrame − “index” (axis=0, default), “columns” (axis=1)`

>Let us create a DataFrame and use this object throughout this chapter for all the operations.

In [None]:
df_stat = df_dict
print('Data frame from dict', df_stat,sep='\n')
breakline()

# sum()
# Returns the sum of the values for the requested axis. By default, axis is index (axis=0).
print('Data frame sum at axes',df_stat.sum(0),sep='\n')
breakline()

print('Data frame sum at axes 0',df_stat.sum(1),sep='\n')
breakline()

#df_stat[3]=df_stat[3]+df(pd.Series(['rr',22,3.4]))
print('Data frame sum at axes 1',df_stat,sep='\n')
breakline()

Data frame from dict
   Name  age  rating
0  vasi   21     4.3
1   rio   20     4.5
2  axel   21     4.1
---------------------------------------------------------------
Data frame sum at axes
Name      vasirioaxel
age                62
rating           12.9
dtype: object
---------------------------------------------------------------
Data frame sum at axes 0
0    25.3
1    24.5
2    25.1
dtype: float64
---------------------------------------------------------------
Data frame sum at axes 1
   Name  age  rating
0  vasi   21     4.3
1   rio   20     4.5
2  axel   21     4.1
---------------------------------------------------------------


In [None]:
print(df_stat)

   Name  age  rating
0  vasi   21     4.3
1   rio   20     4.5
2  axel   21     4.1


In [None]:
# mean()
# Returns the average value
print('data frame mean value',df_stat.mean(),sep='\n')
breakline()

# std()
# Returns the Bressel standard deviation of the numerical columns.
print('data frame standard deviation value',df_stat.std(),sep='\n')
breakline()

data frame mean value
age       20.666667
rating     4.300000
dtype: float64
---------------------------------------------------------------
data frame standard deviation value
age       0.57735
rating    0.20000
dtype: float64
---------------------------------------------------------------


## Functions & Description
>Let us now understand the functions under Descriptive Statistics in Python Pandas.

>The following table list down the important functions −

|Sr.No.|	Function |	Description|
|------|--------------|------------|
|1|	count()|	Number of non-null observations|
|2|	sum()|	Sum of values|
|3|	mean()|	Mean of Values|
|4|	median()|	Median of Values|
|5|	mode()|	Mode of values|
|6|	std()|	Standard Deviation of the Values|
|7|	min()|	Minimum Value|
|8|	max()|	Maximum Value|
|9|	abs()|	Absolute Value|
|10|	prod()|	Product of Values|
|11|	cumsum()|	Cumulative Sum|
|12|	cumprod()|	Cumulative Product|

>Note − Since DataFrame is a Heterogeneous data structure. Generic operations don’t work with all functions.

>Functions like sum(), cumsum() work with both numeric and character (or) string data elements without any error. Though n practice, character aggregations are never used generally, these functions do not throw any exception.

>Functions like abs(), cumprod() throw exception when the DataFrame contains character or string data because such operations cannot be performed.

In [None]:
print(df_stat)

print('count fn ',df_stat.count(),sep='\n') # no of elements in each cols
breakline()

print('sum fn ',df_stat.sum(),sep='\n') # sum of elements in each cols
breakline()

print('mode fn ',df_stat.mode(),sep='\n')
breakline()

print('max fn ',df_stat.max(),sep='\n')
breakline()

print('abs fn ',df_stat.mode(),sep='\n')
breakline()

   Name  age  rating
0  vasi   21     4.3
1   rio   20     4.5
2  axel   21     4.1
count fn 
Name      3
age       3
rating    3
dtype: int64
---------------------------------------------------------------
sum fn 
Name      vasirioaxel
age                62
rating           12.9
dtype: object
---------------------------------------------------------------
mode fn 
   Name   age  rating
0  axel  21.0     4.1
1   rio   NaN     4.3
2  vasi   NaN     4.5
---------------------------------------------------------------
max fn 
Name      vasi
age         21
rating     4.5
dtype: object
---------------------------------------------------------------
abs fn 
   Name   age  rating
0  axel  21.0     4.1
1   rio   NaN     4.3
2  vasi   NaN     4.5
---------------------------------------------------------------


In [None]:
# Summarizing Data
# The describe() function computes a summary of statistics pertaining to the DataFrame columns.
print(df_stat)
breakline()

# describe function
print('data frame describe function',df_stat.describe(),sep='\n')
breakline()


'''
    This function gives the mean, std and IQR values. And, function excludes the character columns and given summary about numeric columns. 'include' is the argument which is used to pass necessary information regarding what columns need to be considered for summarizing. Takes the list of values; by default, 'number'.

object − Summarizes String columns
number − Summarizes Numeric columns
all − Summarizes all columns together (Should not pass it as a list value)
Now, use the following statement in the program and check the output −
'''

# describe function with object
print('data frame describe function with object',df_stat.describe(include=['object']),sep='\n')
breakline()

#describe function with all object
print('data frame describe function with all object',df_stat.describe(include='all'),sep='\n')
breakline()

   Name  age  rating
0  vasi   21     4.3
1   rio   20     4.5
2  axel   21     4.1
---------------------------------------------------------------
data frame describe function
             age  rating
count   3.000000     3.0
mean   20.666667     4.3
std     0.577350     0.2
min    20.000000     4.1
25%    20.500000     4.2
50%    21.000000     4.3
75%    21.000000     4.4
max    21.000000     4.5
---------------------------------------------------------------
data frame describe function with object
        Name
count      3
unique     3
top     axel
freq       1
---------------------------------------------------------------
data frame describe function with all object
        Name        age  rating
count      3   3.000000     3.0
unique     3        NaN     NaN
top     axel        NaN     NaN
freq       1        NaN     NaN
mean     NaN  20.666667     4.3
std      NaN   0.577350     0.2
min      NaN  20.000000     4.1
25%      NaN  20.500000     4.2
50%      NaN  21.000000     4.3