<a href="https://colab.research.google.com/github/tushar821999/Pandas_Library/blob/master/DataFrame_Data_Structure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

DataFrame : It is a table. It has rows and columns. Each column in a DataFrame is a series object, rows consist of element inside Series. It can be constructed using built-in Python dicts.

In [0]:
# import pandas library as pd
import pandas as pd

Let's create a simple DataFrame

In [3]:
# Create DataFrame 
demo_df = pd.DataFrame({
    'country':['Russia','India','Japan','China'],
    'population':[17.04,143.5,9.5,203.52],
    'square' : [2724902,17155191,12653456,78956542]
})
# print the basic DataFrame
print(demo_df)

  country  population    square
0  Russia       17.04   2724902
1   India      143.50  17155191
2   Japan        9.50  12653456
3   China      203.52  78956542


In order to make sure that each column is a Series object Let's do this

In [4]:
print(demo_df['country'])
print(demo_df['population'])
print(demo_df['square'])

0    Russia
1     India
2     Japan
3     China
Name: country, dtype: object
0     17.04
1    143.50
2      9.50
3    203.52
Name: population, dtype: float64
0     2724902
1    17155191
2    12653456
3    78956542
Name: square, dtype: int64


DataFrame object has 2 indexes: column index and row index. If you do not provide row index explicitly, pandas will create RangeIndex from 0 to N-1, where N is a number of rows inside DataFrame.

In [5]:
# print the column index
print(demo_df.columns)
# print the row index
print(demo_df.index)

Index(['country', 'population', 'square'], dtype='object')
RangeIndex(start=0, stop=4, step=1)


There are numerous ways to provide row index explicitly, for example you can provide index when creating a DataFrame or do it "on the fly" during runtime.

In [6]:
# Assign index when creating a DataFrame
demo_df2 = pd.DataFrame({
    'country' : ['Kazakhstan','Russia','Belarus','Ukarine'],
    'currency' : ['KZT','RUB','BYN','UAH'],
    'population' : [17.04,143.5,9.5,45.5]
}, index = ['KZ','RU','BY','UA'])
print(demo_df2)

# Assign index on the fly
demo_df2.index = ['KZ','RU','BY','UA']
demo_df2.index.name = 'Country Code'
print(demo_df2)

       country currency  population
KZ  Kazakhstan      KZT       17.04
RU      Russia      RUB      143.50
BY     Belarus      BYN        9.50
UA     Ukarine      UAH       45.50
                 country currency  population
Country Code                                 
KZ            Kazakhstan      KZT       17.04
RU                Russia      RUB      143.50
BY               Belarus      BYN        9.50
UA               Ukarine      UAH       45.50


Series object will also have the same index as DataFrame has

In [7]:
print(demo_df2['country'])
print(demo_df2['currency'])
print(demo_df2['population'])

Country Code
KZ    Kazakhstan
RU        Russia
BY       Belarus
UA       Ukarine
Name: country, dtype: object
Country Code
KZ    KZT
RU    RUB
BY    BYN
UA    UAH
Name: currency, dtype: object
Country Code
KZ     17.04
RU    143.50
BY      9.50
UA     45.50
Name: population, dtype: float64


Row access using index can be performed in serveral ways. Using .loc (providing index label) and .iloc(providing index number)

In [8]:
# using .loc
print(demo_df2.loc['RU'])
print(" ") #used for new line in output
# using .iloc
print(demo_df2.iloc[1])

country       Russia
currency         RUB
population     143.5
Name: RU, dtype: object
 
country       Russia
currency         RUB
population     143.5
Name: RU, dtype: object


Selection of particular rows and columns can be performed tthis way

In [9]:
# select currency column
print(demo_df2.loc[['KZ','RU'],'currency'])

Country Code
KZ    KZT
RU    RUB
Name: currency, dtype: object


.loc takes 2 arguments: index list and column list, slicing operation is supported as well

In [17]:
# print the rows from Russia to Ukarine
print(demo_df2.loc['RU':'UA',:])
# print the rows from Russia to Ukarine and Columns from Currency to Popularion
print(demo_df2.loc['RU':'UA','currency':'population'])

              country currency  population
Country Code                              
RU             Russia      RUB       143.5
BY            Belarus      BYN         9.5
UA            Ukarine      UAH        45.5
             currency  population
Country Code                     
RU                RUB       143.5
BY                BYN         9.5
UA                UAH        45.5


Filtering is performed using so-called Boolean arrays

In [19]:
print(demo_df2[demo_df2.population>10][['country','currency']])

                 country currency
Country Code                     
KZ            Kazakhstan      KZT
RU                Russia      RUB
UA               Ukarine      UAH


Now if you want to reset index of the DataFrame use .reset_index()

In [22]:
demo_df.reset_index()
print(demo_df)

  country  population    square
0  Russia       17.04   2724902
1   India      143.50  17155191
2   Japan        9.50  12653456
3   China      203.52  78956542


Let's add new column in DataFrame

In [41]:
# In this we use demo_df instead of demo_df2
demo_df['density'] = demo_df['population']/demo_df['square']*1000000
print(demo_df)

  country  population    square   density
0  Russia       17.04   2724902  6.253436
1   India      143.50  17155191  8.364815
2   Japan        9.50  12653456  0.750783
3   China      203.52  78956542  2.577620


Let's delete the density column

In [44]:
# delete by using drop
demo_df.drop(['density'],axis='columns')
print(demo_df)

# delete by using del
del demo_df['density']
print(demo_df)

  country  population    square   density
0  Russia       17.04   2724902  6.253436
1   India      143.50  17155191  8.364815
2   Japan        9.50  12653456  0.750783
3   China      203.52  78956542  2.577620
  country  population    square
0  Russia       17.04   2724902
1   India      143.50  17155191
2   Japan        9.50  12653456
3   China      203.52  78956542


Rename the column names

In [46]:
demo_df = demo_df.rename(columns={'population':'pop_in_milions'})
print(demo_df)

  country  pop_in_milions    square
0  Russia           17.04   2724902
1   India          143.50  17155191
2   Japan            9.50  12653456
3   China          203.52  78956542
