# Creation and Viewing

There are two data types in `pandas` for displaying **sequence** or **tabular** datasets.

- `Series` is used for **sequence** (one-dimensional) data, like `numpy 1d-arrays`
- `DataFrame` is used for **tabular** (two-dimensional) data, like `numpy 2d-arrays`

In [1]:
import pandas as pd
import numpy as np

# Creation

The first thing we need to know is how to create the instances of 2 data types.

## Series

Here are four ways to create a `pd.Series`:

1. from a normal list
2. from a numpy array
3. from dict
4. from one object (int, str, etc)

In [2]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [3]:
pd.Series(np.arange(1, 6), index=list("abcde"))

a    1
b    2
c    3
d    4
e    5
dtype: int32

In [4]:
dic = {"a": 100, "b": 50, "c": 120}
pd.Series(dic)

a    100
b     50
c    120
dtype: int64

In [5]:
pd.Series("hi", index=list("12345"))

1    hi
2    hi
3    hi
4    hi
5    hi
dtype: object

## DataFrame

There are several ways to initialize a `DataFrame`, and with the data, you can optionally pass `index` (row) or `columns` (column) arguments.

Here are many kinds of input that DataFrame accepts:

- Dict of 1d arrays, Series (**create the data column-wise**)
- multiple dicts (**create the data row-wise**)
- 2d arrays
- NamedTuples
- Another DataFrame


In [57]:
pd.DataFrame({
    "col_1": [1, 2, 3, 4, 5],
    "col_2": np.arange(1, 6),
    "col_3": pd.Series(np.arange(1, 7), index=list("abc123")),
}, index=list("abcde"))

Unnamed: 0,col_1,col_2,col_3
a,1,1,1.0
b,2,2,2.0
c,3,3,3.0
d,4,4,
e,5,5,


In [58]:
pd.DataFrame(
    [
        {"a": 1, "b": 2},
        {"b": 10, "c": 5},
        {"a": 55, "b": 489, "c": 32, "d": 590},
    ],
    index=["first", "second", "third"],
    columns=list("ab")
)

Unnamed: 0,a,b
first,1.0,2
second,,10
third,55.0,489


In [59]:
dates = pd.date_range("20200101", periods=2)
pd.DataFrame(
    np.arange(10).reshape(2, 5),
    # [[0,1,2,3,4], [5,6,7,8,9]]
    index=dates,
    columns=list("abcde"))

Unnamed: 0,a,b,c,d,e
2020-01-01,0,1,2,3,4
2020-01-02,5,6,7,8,9


In [60]:
# Multiindexed DataFrame
pd.DataFrame({
     ("a", "b"): {("A", "B"): 1, ("A", "C"): 2},
     ("a", "a"): {("A", "C"): 3, ("A", "B"): 4},
     ("a", "c"): {("A", "B"): 5, ("A", "C"): 6},
     ("b", "a"): {("A", "C"): 7, ("A", "B"): 8},
     ("b", "b"): {("A", "D"): 9, ("A", "B"): 10},
})

Unnamed: 0_level_0,Unnamed: 1_level_0,a,a,a,b,b
Unnamed: 0_level_1,Unnamed: 1_level_1,b,a,c,a,b
A,B,1.0,4.0,5.0,8.0,10.0
A,C,2.0,3.0,6.0,7.0,
A,D,,,,,9.0


# Viewing

You can find the functionalities from the name of the methods below, and they are easy to understand.

- `head(i)`: return the first i rows from the DataFrame
- `tail(i)`: return the last i rows from the DataFrame
- `df.index`: return the list of index from the DataFrame
- `df.columns`: return the list of columns from the DataFrame
- `to_numpy()`: return the numpy array from the DataFrame
- `sort_index()`: return a copy of DataFrame with sorted index
- `sort_values(by=column_name)`: return a copy of DataFrame with sorted column

In [87]:
df = pd.DataFrame({
    "col_1": pd.Series([1,2,3,4], index=list("abcd")),
    "col_2": np.arange(10, 0, -1),
    "col_3": np.random.default_rng(42).integers(0,1000,10)
}, index=list("abcdefghij"))

In [88]:
df.head(2)

Unnamed: 0,col_1,col_2,col_3
a,1.0,10,89
b,2.0,9,773


In [89]:
df.tail(3)

Unnamed: 0,col_1,col_2,col_3
h,,3,697
i,,2,201
j,,1,94


In [101]:
print(df.index)
print(df.columns)

Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], dtype='object')
Index(['col_1', 'col_2', 'col_3'], dtype='object')


In [91]:
df.to_numpy()

array([[  1.,  10.,  89.],
       [  2.,   9., 773.],
       [  3.,   8., 654.],
       [  4.,   7., 438.],
       [ nan,   6., 433.],
       [ nan,   5., 858.],
       [ nan,   4.,  85.],
       [ nan,   3., 697.],
       [ nan,   2., 201.],
       [ nan,   1.,  94.]])

In [95]:
df.sort_index()

Unnamed: 0,col_1,col_2,col_3
a,1.0,10,89
b,2.0,9,773
c,3.0,8,654
d,4.0,7,438
e,,6,433
f,,5,858
g,,4,85
h,,3,697
i,,2,201
j,,1,94


In [96]:
df.sort_values("col_3")

Unnamed: 0,col_1,col_2,col_3
g,,4,85
a,1.0,10,89
j,,1,94
i,,2,201
e,,6,433
d,4.0,7,438
c,3.0,8,654
h,,3,697
b,2.0,9,773
f,,5,858


# Reference

- https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html#object-creation
- https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#intro-to-data-structures