<center><u><H1>Pandas: Data Structures and Properties</H1></u></center>

### Pandas Library has the following main data structures:

1.Series

2.DataFrames

<u><H2>SERIES:</H2></u>

In [1]:
import pandas as pd
import numpy as np

In [2]:
pd.Series(np.random.randn(5))

0    0.607927
1    0.369156
2   -0.960044
3   -0.165455
4    0.766895
dtype: float64

### The index of the series can be customized with index values(letters or numbers):

In [3]:
pd.Series(np.random.randn(5), index=[1,2,3,4,5])

1   -1.957152
2   -1.181519
3   -0.596994
4    0.048147
5    0.051511
dtype: float64

### We also can use a Python dict:

In [4]:
d = {'A': 1, 'B': 13, 'C': 'zyx'}
pd.Series(d)

A      1
B     13
C    zyx
dtype: object

<u><H2>DataFrame:</H2><u>

<b>DataFrame is a 2D data structure with columns of different datatypes and rows are named index. It can be formed from the following data structures:</b>
1. Numpy array
2. Lists
3. Dicts
4. Series
5. 2D numpy array

In [5]:
#using dict of series
d = {'column_1': pd.Series([10,12,13]),
    'column_2': pd.Series(['abc',17,'amc'])}
df = pd.DataFrame(d)
df

Unnamed: 0,column_1,column_2
0,10,abc
1,12,17
2,13,amc


In [6]:
#using dict of lists
d = {'column_1': [21,12,3],
    'column_2': ['ab','A','tyz'],
    'column_3': [1,3,6]}
df = pd.DataFrame(d)
df

Unnamed: 0,column_1,column_2,column_3
0,21,ab,1
1,12,A,3
2,3,tyz,6


### Selection and Indexing:

In [7]:
# selecting a column
df['column_3']

0    1
1    3
2    6
Name: column_3, dtype: int64

In [8]:
# selecting more than one column
df[['column_1','column_3']]

Unnamed: 0,column_1,column_3
0,21,1
1,12,3
2,3,6


### <u>loc and iloc:</u>
#### -loc works on labels in the index.
#### -iloc works on the positions in the index (so it only takes integers).

In [9]:
#selecting rows
df.loc[1]

column_1    12
column_2     A
column_3     3
Name: 1, dtype: object

In [10]:
df.iloc[0]

column_1    21
column_2    ab
column_3     1
Name: 0, dtype: object

In [11]:
df.set_index('column_2', inplace=True)
df

Unnamed: 0_level_0,column_1,column_3
column_2,Unnamed: 1_level_1,Unnamed: 2_level_1
ab,21,1
A,12,3
tyz,3,6


In [12]:
df.iloc[0]

column_1    21
column_3     1
Name: ab, dtype: int64

In [13]:
df.loc['ab']

column_1    21
column_3     1
Name: ab, dtype: int64

In [14]:
type(df['column_1'])

pandas.core.series.Series

### Inserting new column

In [15]:
df['new'] = df['column_1'] + df['column_3']
df

Unnamed: 0_level_0,column_1,column_3,new
column_2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ab,21,1,22
A,12,3,15
tyz,3,6,9


### Deleting a column

In [16]:
df.drop('new',axis=1,inplace=True) # use inplace to make changes permanent
df

Unnamed: 0_level_0,column_1,column_3
column_2,Unnamed: 1_level_1,Unnamed: 2_level_1
ab,21,1
A,12,3
tyz,3,6


### Return item and drop it from dataframe

In [17]:
df['new'] = df['column_1'] + df['column_3']
new = df.pop('new')

In [18]:
df

Unnamed: 0_level_0,column_1,column_3
column_2,Unnamed: 1_level_1,Unnamed: 2_level_1
ab,21,1
A,12,3
tyz,3,6


In [19]:
new

column_2
ab     22
A      15
tyz     9
Name: new, dtype: int64

### Selecting a subset of the dataframe with rows and columns

In [20]:
df.loc['A','column_1']

12

In [21]:
#reset index
df.reset_index(inplace=True)
df

Unnamed: 0,column_2,column_1,column_3
0,ab,21,1
1,A,12,3
2,tyz,3,6


In [22]:
df.loc[[0,2],['column_2','column_3']]

Unnamed: 0,column_2,column_3
0,ab,1
2,tyz,6


### Selection by condition:

In [23]:
df[df['column_1']>10][['column_2','column_3']]

Unnamed: 0,column_2,column_3
0,ab,1
1,A,3


In [24]:
df[(df['column_1']>10) & (df['column_3'] > 1)]

Unnamed: 0,column_2,column_1,column_3
1,A,12,3


### Index properties:

In [25]:
#Array of index values
df.index.values

array([0, 1, 2], dtype=int64)

In [26]:
#Using the split function of strings to have a list of items
a = 'C t 5'.split()

In [27]:
#Inserting new column since list values
df['column_4'] = a
df

Unnamed: 0,column_2,column_1,column_3,column_4
0,ab,21,1,C
1,A,12,3,t
2,tyz,3,6,5


In [28]:
#Inserting new row
df.loc[3]=[0.6,43,'xy','v']
df

Unnamed: 0,column_2,column_1,column_3,column_4
0,ab,21,1,C
1,A,12,3,t
2,tyz,3,6,5
3,0.6,43,xy,v


## References:

https://pandas.pydata.org/pandas-docs/stable/dsintro.html

https://pandas.pydata.org/pandas-docs/stable/basics.html