# Introduction to Pandas DataFrames

![Panda](panda.png)

[Pandas](https://pandas.pydata.org/) is a powerful and versatile library for Python, designed primarily for data manipulation and analysis. To quote from Nvidia’s website:

> Pandas is the most popular software library for data manipulation and data analysis for the Python programming language. 
> ([www.nvidia.com](https://www.nvidia.com/en-us/glossary/pandas-python/))

Here is an (incomplete) list of some key functionalities provided by Pandas:


 1. **Data Structures**
    1. Series: One-dimensional labeled array capable of holding data of any type.
    2. DataFrame: Two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).
2. **Data Manipulation**
    1. Data Selection and Indexing: Access data via labels, indices, or boolean masks (```.loc```, ```.iloc```, ```.at```, ```.iat```).
    2. Filtering: Filter data based on conditions or queries.
    3. Sorting: Sort data by labels or values.
    4. Handling Missing Data: Identify, fill, or drop missing values (```isnull```, ```dropna```, ```fillna```).
3. **Data Cleaning**
    1. Dropping Duplicates: Remove duplicate rows or columns.
    2. Replacing Values: Replace specific values in the DataFrame.
    3. String Operations: Perform operations on string data, like splitting, replacing, and pattern matching (```str.split```, ```str.replace```).
4. **Aggregation and Grouping**
    1. Group By: Split data into groups based on criteria, and perform aggregate functions like sum, mean, or custom operations.
    2. Pivot Tables: Create a pivot table to summarize data.
5. **Merging and Joining**
    1. Concatenation: Combine multiple DataFrames along a particular axis.
    2. Merging: Merge DataFrames similar to SQL joins (```merge```, ```join```).
6. **Time Series**
    1. Datetime Conversion: Convert date and time data to a datetime object.
    2. Resampling: Aggregate data over a time period.
    3. Time-based Indexing: Access and manipulate time-series data easily with date indexing.
7. **Statistical and Mathematical Operations**
    1. Descriptive Statistics: Compute summary statistics for DataFrame columns.
    2. Correlation/ Covariance: Calculate the pairwise correlation or covariance between columns.
    3. Cumulative Operations: Perform cumulative operations on data.

At the heart of Pandas lies the DataFrame, a two-dimensional labeled data structure with columns of potentially different types, similar to a table in a relational database or an Excel spreadsheet. Understanding DataFrames is crucial for anyone looking to perform data analysis in Python.

# What is a DataFrame?

A DataFrame is a table-like structure in Pandas that consists of rows and columns, where each column can hold different data types (e.g., integers, floats, strings). You can think of it as a collection of Series objects, where each Series is a single column of data. DataFrames provide a highly efficient way to store and manipulate large datasets in memory.

# Creating a DataFrame

There are several ways to create a DataFrame in Pandas, but some of the most common methods are:


1. From a Dictionary
2. From a List of Lists
3. From a CSV File


Below we take a look at the first two approaches.

# Creating DataFrame from a Dictionary