# What kind of data does pandas handle?

![line+divider%20%282%29.png](attachment:line+divider%20%282%29.png)

I want to start using `pandas`

In [None]:
import pandas as pd

To load the pandas package and start working with it, import the package. The community agreed alias for pandas is `pd`, so loading pandas as `pd` is assumed standard practice for all of the pandas documentation.

## pandas data table representation

![images%20%281%29.png](attachment:images%20%281%29.png)

I want to store passenger data of the Titanic. For a number of passengers, I know the name (characters), age (integers) and sex (male/female) data.

In [None]:
df = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)

In [None]:
df

I’m just interested in working with the data in the column `Age`

In [None]:
df["Age"]

When selecting a single column of a pandas `DataFrame`, the result is a pandas `Series`. To select the column, use the column label in between square brackets `[]`.

You can create a `Series` from scratch as well:

In [None]:
ages = pd.Series([22, 35, 58], name="Age")

In [None]:
ages

# Do something with a DataFrame or Series

![line+divider%20%282%29.png](attachment:line+divider%20%282%29.png)

### I want to know the maximum Age of the passengers

We can do this on the `DataFrame` by selecting the `Age` column and applying `max()`:

In [None]:
df["Age"].max()

Or to the `Series`:

In [None]:
ages.max()

I’m interested in some basic statistics of the numerical data of my data table

In [None]:
df.describe()

## How do I read and write tabular data?

![images.png](attachment:images.png)

I want to analyze the Titanic passenger data, available as a `CSV` file.

In [None]:
titanic = pd.read_csv("data/titanic.csv")

pandas provides the `read_csv()` function to read data stored as a csv file into a pandas `DataFrame`. pandas supports many different file formats or data sources out of the box (csv, excel, sql, json, parquet, …), each of them with the prefix `read_*`.

Make sure to always have a check on the data after reading in the data. When displaying a `DataFrame`, the first and last 5 rows will be shown by default:

In [None]:
titanic

I want to see the first 8 rows of a pandas DataFrame.

In [None]:
titanic.head(8)

A check on how pandas interpreted each of the column data types can be done by requesting the pandas `dtypes` attribute:

In [None]:
titanic.dtypes

My colleague requested the Titanic data as a spreadsheet.

In [None]:
titanic.to_excel("titanic.xlsx", sheet_name="passengers", index=False)

The equivalent read function `read_excel()` will reload the data to a `DataFrame`:

In [None]:
titanic = pd.read_excel("titanic.xlsx", sheet_name="passengers")

In [None]:
titanic.head()

I’m interested in a technical summary of a `DataFrame`

In [None]:
titanic.info()

![line+divider%20%282%29.png](attachment:line+divider%20%282%29.png)

In [None]:
titanic = pd.read_csv("data/titanic.csv")

## How do I select a subset of a `DataFrame`?

![line+divider%20%282%29.png](attachment:line+divider%20%282%29.png)

## How do I select specific columns from a `DataFrame`?

I’m interested in the age of the Titanic passengers.

In [None]:
ages = titanic["Age"]

Each column in a `DataFrame` is a `Series`. As a single column is selected, the returned object is a pandas `Series`. We can verify this by checking the type of the output:

In [None]:
type(titanic["Age"])

And have a look at the `shape` of the output:

In [None]:
titanic["Age"].shape

I’m interested in the age and sex of the Titanic passengers.

In [None]:
age_sex = titanic[["Age", "Sex"]]

In [None]:
age_sex.head()

The returned data type is a pandas DataFrame:

In [None]:
type(titanic[["Age", "Sex"]])

In [None]:
titanic[["Age", "Sex"]].shape