# Pandas Tutorial 3: Ways to Create a Dataframe

In this tutorial, we will explore various mathods to create a DataFrame, a core component of working with tabular data in Pandas. Building on the basics covered earlier, this is a crucial step in data manipulation and analysis.

**Topics covered:**
- Creating a DataFarme
- Using `read_csv()`
- Using `read_excel()`
- Creating a DataFrame from a Python Dictionary 
- Using a list of tuples or dictionaries with `DataFrame()`
- Other methods to create a DataFrame

In [1]:
import pandas as pd

## Creating a DataFrame from a CSV File

* ### `read_csv()` Method


The `read_csv()` method in Pandas is a common way to load data from a CSV file into a DataFrame. It automatically parses the data into rows and columns, with the first row typically representing the column headers. 

**Key Pointers**:
- Efficiently handles large datasets
- Allows customization through various parameters (e.g., delimiter, handling missing data, selecting specific columns).
- Supports reading CSVs from both file paths and URLs.

In [2]:
# Read from CSV file into a DataFrame 
df = pd.read_csv("weather_data.csv")
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


## Creating a DataFrame from an Excel File

* ### `read_excel()` Method

The `read_excel()` method loads data from an Excel file into a DataFrame. It supports reading data from one or more sheets in a workbook.

**Key Features**:
- Specify the sheet by name or index.
- Offers parameters for column selection, skipping rows, and handling missing data.
- Works with both `.xls` and `xlsx` formats.

This method is commonly used for structured data stored in Excel files.

In [4]:
# Creating a DataFrame from CSV file using `read_csv()` method
df = pd.read_excel("weather_data.xlsx","Sheet1")
df

Unnamed: 0,day,temperature,windspeed,event
0,2017-01-01,32,6,Rain
1,2017-01-02,35,7,Sunny
2,2017-01-03,28,2,Snow


## Creating a DataFrame from a Dictionary

In Pandas, you can create a DataFrame from a Python dictionary, where keys represent column names and values are lists representing the data.

**Key Pointers**:
- Keys become column headers.
- Values (lists) form the column data.
- All lists must have the same length to avoid errors.

This method is straightforward for converting structured dictionary data into a DataFrame.

In [5]:
weather_data = {
    'day': ['1/1/2017', '1/2/2017', '1/3/2017'],
    'temperature': [32, 35, 28],
    'windspeed': [6, 7, 2],
    'event': ['Rain', 'Sunny', 'Snow']
}
df = pd.DataFrame(weather_data)
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow


## Creating a DataFrame from a List of Tuples

In Pandas, you can create a DataFrame from a list of tuples, where each tuple represents a row of data, and column names are specified using the `columns` parameter.

**Key Pointers**:
- Each tuple is treated as a row, with elements corresponding to columns.
- Use the `columns` argument to define column names.
- Useful for data already structured row-wise in tuples, such as from databases.

This method is ideal for converting tuple-based data into a DataFrame format.

In [6]:
weather_data = [
    ('1/1/2017',32,6, 'Rain'),
    ('1/2/2017', 35,7, 'Sunny'),
    ('1/3/2017', 28,2, 'Snow')
]
df = pd.DataFrame(data=weather_data, columns=['day', 'temperature', 'windspeed', 'event'])
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow


## Creating a DataFrame from a List of Dictionaries

You can create a DataFrame from a list of dictionaries in Pandas, where each dictionary represents a row, and the keys become column names.

**Key Pointers**:
- Each dictionary is treated as a row.
- Keys automatically become column headers.
- Missing keys in some dictionaries result in `NaN` values.

This method is useful for converting data from APIs or JSON into a tabular format.

In [7]:
weather_data = [
    {'day': '1/1/2017', 'temperature': 32, 'windspeed': 6, 'event': 'Rain'},
    {'day': '1/2/2017', 'temperature': 35, 'windspeed': 7, 'event': 'Sunny'},
    {'day': '1/3/2017', 'temperature': 28, 'windspeed': 2, 'event': 'Snow'},
]
df = pd.DataFrame(weather_data)
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
