In [1]:
import pandas as pd

# What is pd.read_csv()

- `read_csv` is a function in the Pandas library used to read data from a CSV file and create a DataFrame.
- It can read CSV files from local file systems or URLs.
- It assumes the CSV file has a header row with column names by default, but this can be overridden.
- It has many parameters to customize how the CSV file is read, such as the delimiter, encoding, and column names.
- To use `read_csv`, you need to import Pandas first (import pandas as pd).

In [2]:
df = pd.read_csv("./weather_data.csv")
df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain


-------------------------------

# shape

- The shape function returns the dimensions (number of rows and columns) of a DataFrame as a tuple. It is a convenient way to quickly get information about the size of a DataFrame.

- The syntax for using the shape function is as follows:

In [3]:
df.shape

(6, 4)

- The code `rows, columns = df.shape` assigns the values returned by the shape function to two separate variables named `rows` and `columns`.
- In particular, the `shape` function returns a tuple of two values, where the first value represents the number of rows in the DataFrame and the second value represents the number of columns in the DataFrame.

In [4]:
rows, columns = df.shape

In [5]:
rows

6

In [6]:
columns

4

------

# How to create DataFrame using Python Dictionary?

- Below code will create DataFrame using Dictionary.
- `DataFrame` function will automatically create Dataframe.

In [7]:
# Here, in this python dictionary the key=columns and value=Rows

weather_data = {
    'day': ['1/1/2017', '1/2/2017', '1/3/2017', '1/4/2017', '1/5/2017', '1/6/2017'],
    'temperature': [32, 35, 28, 24, 32, 31],
    'windspeed': [6, 7, 2, 7, 4, 2],
    'event': ['Rain', 'Sunny', 'Snow', 'Snow', 'Rain', 'Sunny']
}

df = pd.DataFrame(weather_data)
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [8]:
df.shape

(6, 4)

-----------------

# head()

- In the Pandas library, the `head()` function is used to view the first few rows of a DataFrame. By default, it shows the first 5 rows of the DataFrame.

In [9]:
df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain


- But, if you want to view a different number of rows, you can pass that number as an argument to the `head()` function.

In [10]:
df.head(2)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny


-----------------

# tail()

- `tail()` function is used to view last five rows of the DataFrame.

In [11]:
df.tail()

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


- But, if you want to view a different number of rows, you can pass that number as an argument to the `tail()` function.

In [12]:
df.tail(3)

Unnamed: 0,day,temperature,windspeed,event
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


-------------

# Indexing and Slicing

In [13]:
df[2:5]

Unnamed: 0,day,temperature,windspeed,event
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain


In [14]:
df[:] # or df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


---------

# How to print columns ?

In [15]:
df.columns

Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')

#### We can also print individual column data

In [16]:
df.day

0    1/1/2017
1    1/2/2017
2    1/3/2017
3    1/4/2017
4    1/5/2017
5    1/6/2017
Name: day, dtype: object

In [17]:
df.temperature

0    32
1    35
2    28
3    24
4    32
5    31
Name: temperature, dtype: int64

In [18]:
type(df.event)

pandas.core.series.Series

In [19]:
df[['event', 'day']]

Unnamed: 0,event,day
0,Rain,1/1/2017
1,Sunny,1/2/2017
2,Snow,1/3/2017
3,Snow,1/4/2017
4,Rain,1/5/2017
5,Sunny,1/6/2017


-----------------------

# Some operations on DataFrame

In [20]:
# To find maximum temperature

df['temperature'].max()

35

In [21]:
# To find average temperature

df['temperature'].mean()

30.333333333333332

In [22]:
# To find minimum temperature

df['temperature'].min()

24

In [23]:
# To find standard deviation temperature

df['temperature'].std()

3.8297084310253524

In [24]:
# If you want to print the statistic of the data set we can use describe function
df.describe()

Unnamed: 0,temperature,windspeed
count,6.0,6.0
mean,30.333333,4.666667
std,3.829708,2.33809
min,24.0,2.0
25%,28.75,2.5
50%,31.5,5.0
75%,32.0,6.75
max,35.0,7.0


# How to conditionally select the data from the Dataframe?

In [25]:
# Display the rows where the temperature was greater then or equal to 32
df[df.temperature>=32]

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
4,1/5/2017,32,4,Rain


In [26]:
# print the row where the temperature is maximum
df[df.temperature==df.temperature.max()]

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny


In [27]:
# print the event column where the temperature is max
df['event'][df['temperature']==df['temperature'].max()]

1    Sunny
Name: event, dtype: object

# set_index()

In [28]:
df.index

RangeIndex(start=0, stop=6, step=1)

In [29]:
# if you want to change the indax to day or any other then we can do someting like this.
# df.set_index('day')

# and if you want to modify the orignal dataframe then
df.set_index('day', inplace=True)
df.head()

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32,6,Rain
1/2/2017,35,7,Sunny
1/3/2017,28,2,Snow
1/4/2017,24,7,Snow
1/5/2017,32,4,Rain


In [30]:
df.loc['1/3/2017']

temperature      28
windspeed         2
event          Snow
Name: 1/3/2017, dtype: object

In [31]:
df.reset_index(inplace=True)

In [32]:
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny
