(**You can also open this notebook in Google Colab**)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/xiangshiyin/data-programming-with-python/blob/main/2023-fall/2023-09-19/notebook/code_demo.ipynb)

# Python basics - additional topics

## Class and Objects in Python

In general
- A `class` is a blueprint for declaring and creating objects
- An `object` is a class instance that allows programmers to use variables and methods from inside the class
- A class defines a set of `attributes` (`<--> properties`) and `methods` (`<--> functions`) that the objects of that class will have.

### Create a class in Python

In [None]:
class table:
    def __init__(self, l, w, h):
        self.l = l
        self.w = w
        self.h = h
        self.has_a_flat_top = True
    
    def hold_weight(self, weight):
        print('Holding a weight of {weight} kg')

### Create an object out of a class

### Access the attributes and methods of an object
You can access the `attributes` and `methods` of class `table` use the following pattern
```python
table_1.l
table_1.w
table_1.h
table_1.has_a_flat_top
table_1.hold_weight(weight=10)
```

### Class inheritance
In Python, `class inheritance` is a mechanism by which a new class can be created from an existing class, inheriting its attributes and methods. The new class is called a `subclass` or `derived class`, while the existing class is called the `superclass` or `base class`.

To create a subclass in Python, you can define a new class that inherits from the superclass using the syntax `class Subclass(Superclass)`

```mermaid
flowchart TD
    animal-->dog
    animal-->cat
```

```python
class Animal:
    def __init__(self, name):
        self.name = name

    def speak(self):
        pass

class Dog(Animal):
    pass

class Cat(Animal):
    pass

```

## Files and `I/O`
* Major tool/function: `open(file, mode='r')` (https://docs.python.org/3/library/functions.html#open)
* The default mode is 'r' (open for reading text, synonym of 'rt'). The available modes:

    | Character | Meaning                                                         |
    |-----------|-----------------------------------------------------------------|
    | 'r'       | open for reading (default)                                      |
    | 'w'       | open for writing, truncating the file first                     |
    | 'a'       | open for writing, appending to the end of the file if it exists |
    | 'b'       | binary mode                                                     |

**read**

In [None]:
## Read from a file
var = 'test-read.txt'
fr = open(var,'r') # create one file handle
lines = fr.readlines() # read all into a list
fr.close()

In [None]:
## Another convenient way to automatically handle file handle closure

with open('test-read.txt','r') as fr:
    for line in fr.readlines():
        print(line)

In [None]:
with open('test-read.txt','r') as fr:
    for line in fr:
        print(line)

**write**

In [None]:
## open file in 'w' mode
fw = open('test-write-1.txt','w')
fw.write('this is a test')
fw.close()

In [None]:
with open('test-write-1.txt','r') as fr:
    for line in fr:
        print(line)

In [None]:
## Write new content to a file
with open('test-write.txt','w') as fw:
    for i in range(6,11):
        fw.write(f'this is line {i}\n')

In [None]:
## Append to an existing file
with open('test-write.txt','w') as fw:
    for i in range(4,7):
        fw.write(f'this is line {i}\n')

**append** - the correct way

In [None]:
## Read and write
with open('test-write.txt','a') as fa:
    fa.write('this is a new line\n')
    for i in range(7,10):
        fa.write(f'this is line {i}\n')

## Library import in depth
### A simple Python package
Assume we have a package with the following file distribution
```md
└── sample_package
    └── sample.py
    └── subpackage
        └── subsample.py
```
The content of `sample.py` is like
```python
x = 123
y = 234

def hello():
    print('Hello World')
```

The content of `subsample.py`
```python
xx = 1
yy = 2
```

### Things might be more complicated
![](../pics/library_tree.png)

***You could***
* `import` the whole library, by `import a`
* `import` a module (python script), by `import a.aa`
* `import` a object (variable, function, class, etc.) in a module, by `import a.aa.aaa`, or `from a.aa import aaa`


**However**, you should keep using the `<object>` name in the `import <object>` statement in your program to reference the object you imported. **Sometimes, this could be quite inconvenient** because the `<object>` string could be pretty long due to the complicatedd file structures in the python library

**There are two ways** to solve the problem:
* `from a import aa` (use the `from` statement to reference the complicated folder relationships)
* `import a.aa as aa` (create an alias)

In [None]:
%%sh

tree sample_package

In [None]:
from sample_package.sample import hello
hello()

In [None]:
from sample_package.subpackage.subsample import xx

In [None]:
xx

# Numpy recap

## import `numpy`

In [None]:
import numpy as np

In [None]:
# import numpy

## Create numpy arrays

In [None]:
# create a numpy array out of a list
aList = [1,2,3,4]
aNumpyArray = np.array(aList)

In [None]:
aNumpyArray

In [None]:
type(aNumpyArray)

In [None]:
# check the dimension of the numpy array
aNumpyArray.ndim

In [None]:
# the shape of a numpy array
aNumpyArray.shape

In [None]:
len(aNumpyArray)

In [None]:
# get the absolute size/length of a vector
bList = [3,4]
bNumpyArray = np.array(bList)
np.linalg.norm(bNumpyArray)

In [None]:
# a 2D example
aNumpyArray = np.array([2, 0, 0, 2]).reshape(2,2)

In [None]:
aNumpyArray

In [None]:
aNumpyArray.ndim

In [None]:
aNumpyArray.shape

In [None]:
## get the inverse of the 1D vector
np.linalg.inv(aNumpyArray)

## Operations on numpy arrays

In [None]:
aNumpyArray.T

In [None]:
a = np.array(aList).reshape(2,2)
b = np.eye(2)

In [None]:
a

In [None]:
b

In [None]:
a.dot(b)

## Generate random numbers with `numpy`

In [None]:
np.random.rand(3)

In [None]:
np.random.randn(2,2)

In [None]:
np.random.randint(low=0, high=10, size=100)

## Example

### `axis` in numpy array

### `np.sum()` - [[*official doc*](https://numpy.org/doc/stable/reference/generated/numpy.sum.html#numpy.sum), [*how does `axis` work in numpy*](https://stackoverflow.com/questions/22320534/how-does-the-axis-parameter-from-numpy-work)]

#### For 1D array (or vector)

In [None]:
vector = np.array([1,2,3])
np.sum(vector)

In [None]:
vector.sum()

#### For 2D array (or matrix)
- `axis = 0` is equivalent to $\sum_{i}{A_{ij}}$
- `axis = 1` is equivalent to $\sum_{j}{A_{ij}}$
- `axis = None` (default) is equivalent to $\sum_{i,j}{A_{ij}}$

In [None]:
matrix = np.array([
    [1,2,3],
    [4,5,6]
])

In [None]:
matrix.sum(axis=0)

In [None]:
matrix.sum(axis=1)

In [None]:
matrix.sum()

### The `length` - the $L_2$ norm
#### For 1D vector

In [None]:
vector = np.array([3,4])
np.linalg.norm(vector)

#### For 2D array
- `axis = 0` is equivalent to $\sqrt{\sum_{i}|A_{ij}|^2}$
- `axis = 1` is equivalent to $\sqrt{\sum_{j}|A_{ij}|^2}$
- `axis = None` (default) is equivalent to $\sqrt{\sum_{i,j}|A_{ij}|^2}$

In [None]:
# Calculating the norm of a 2D array (matrix)
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])

# Calculate the L2 (Euclidean) norm along different axes
l2_norm_axis_0 = np.linalg.norm(matrix, axis=0)  # Calculate along columns (axis=0)
l2_norm_axis_1 = np.linalg.norm(matrix, axis=1)  # Calculate along rows (axis=1)
l2_norm_axis_none = np.linalg.norm(matrix) # Calculate on the flattened array

print("L2 Norm along columns (axis=0):", l2_norm_axis_0)
print("L2 Norm along rows (axis=1):", l2_norm_axis_1)
print("L2 Norm along rows (axis=None):", l2_norm_axis_none)


#### For 3D array
- `axis = 0` is equivalent to $\sqrt{\sum_{i}|A_{ijk}|^2}$
- `axis = 1` is equivalent to $\sqrt{\sum_{j}|A_{ijk}|^2}$
- `axis = 2` is equivalent to $\sqrt{\sum_{k}|A_{ijk}|^2}$
- `axis = None` (default) is equivalent to $\sqrt{\sum_{i,j,k}|A_{ijk}|^2}$

In [None]:
import numpy as np

# Create a 3D array (3x4x2)
array_3d = np.array([
    [
        [1, 2],
        [3, 4],
        [5, 6],
        [7, 8]
    ],
    [
        [9, 10],
        [11, 12],
        [13, 14],
        [15, 16]
    ],
    [
        [17, 18],
        [19, 20],
        [21, 22],
        [23, 24]
    ]
])

print(f"The shape of the 3D array: {array_3d.shape}")

# Calculate the L2 norm along different axes
l2_norm_axis_0 = np.linalg.norm(array_3d, axis=0)  # Calculate along the first dimension (axis=0)
l2_norm_axis_1 = np.linalg.norm(array_3d, axis=1)  # Calculate along the second dimension (axis=1)
l2_norm_axis_2 = np.linalg.norm(array_3d, axis=2)  # Calculate along the third dimension (axis=2)

print("L2 Norm along the first dimension (axis=0):")
print(l2_norm_axis_0)

print("\nL2 Norm along the second dimension (axis=1):")
print(l2_norm_axis_1)

print("\nL2 Norm along the third dimension (axis=2):")
print(l2_norm_axis_2)





### Nearest neighbor search

Euclidean distance between 2 points $(x_1,y_1,z_1)$ and $(x_2,y_2,z_2)$ is:
$$\sqrt{(x_2-x_1)^2+(y_2-y1)^2+(z_2-z_1)^2}$$

In [None]:
### Pure iterative Python ###
points = [[9,2,8],[4,7,2],[3,4,4],[5,6,9],[5,0,7],[8,2,7],[0,3,2],[7,3,0],[6,1,1],[2,9,6]]
target = [4,5,3]

shortest_distance = 10 ** 10
nearest_neighbor = []
for point in points:
    x,y,z = point
    x0,y0,z0 = target
    # magic to calculate the distance
    d = ((x-x0)**2 + (y-y0)**2 + (z-z0)**2) ** 0.5
    # figure out if this distance is the minimum distance
    if d <= shortest_distance:
        shortest_distance = d
        nearest_neighbor = [x,y,z]
    
    # if so, print the distance and point, and disclare that this is the closest data point to qPoint

print(f'The shortest distance to the target is {shortest_distance}')
print(f'The nearest neighbor is {nearest_neighbor}')

In [None]:
# # # Equivalent NumPy vectorization # # #
import numpy as np
points = np.array([[9,2,8],[4,7,2],[3,4,4],[5,6,9],[5,0,7],[8,2,7],[0,3,2],[7,3,0],[6,1,1],[2,9,6]])
# points.shape
target = np.array([4,5,3]).reshape(1,3)
distances = np.linalg.norm(points-target,axis=1)
minIdx = np.argmin(distances)  # compute all euclidean distances at once and return the index of the smallest one
print(f'The shortest distance to the target is {distances[minIdx]}')
print(f'The nearest neighbor is {points[minIdx]}')

# Pandas

* `pandas` is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.
* It is included in the installation of the Anaconda distribution
* When working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean and process your data. In pandas, a data table is called a `DataFrame`.

<img align="center" src="../pics/dataframe-structure.png" style="height:300px;">


## Import the core libraries

In [None]:
import pandas as pd

import numpy as np
import matplotlib.pyplot as plt

## Important data structures - `Series` and `DataFrame`

### `Series`
A `Series` is a one-dimensional `array-like` object containing a sequence of values (of similar types to NumPy types) of the same type and an associated array of data labels, called its `index`.

In [None]:
x = pd.Series([1,2,3,4])
x

In [None]:
# the array part
x.array

In [None]:
# the index part
x.index

In [None]:
y = pd.Series([1,3,5,7,9],index=['a','b','c','d','e'])
y

In [None]:
type(y)

In [None]:
y['a']

In [None]:
# mutable
y['c'] = 11

In [None]:
y

**Just like 1D numpy arrays ...**

In [None]:
y.ndim

In [None]:
y.shape

`series` could also be converted to a dictionary

In [None]:
y.to_dict()

### `DataFrame`

#### Create `dataframe` from raw data

In [None]:
import pandas as pd

In [None]:
# create df from a dictionary
x = {
    'A':[1,2,'a',4],
    'B':np.arange(5,9),
    'C':['abc','def','ghi','jkl']
}

df1 = pd.DataFrame(x)

In [None]:
df1

In [None]:
# create df from a list
y = [
    ['a','b','c'],
    ['d','e','f']
]

df2 = pd.DataFrame(y, columns=['col1','col2','col3'])
df2

In [None]:
# create df with fancier settings
z = {
    'A': 1.,
    'B': pd.Timestamp('20130102'),
    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
    'D': np.array([3] * 4, dtype='int32'),
    'E': pd.Categorical(["test", "train", "test", "train"]),
    'F': 'foo'
}
df3 = pd.DataFrame(z) 

In [None]:
df3

#### Create `dataframe` from text file

In [None]:
df = pd.read_csv('../data/imf-gdp-per-capita-2015.csv')
df

In [None]:
df = pd.read_csv('../data/imf-gdp-per-capita-2015.csv',sep=',',header=0)

In [None]:
df.head(3)

**Just like in `numpy arrays`**

In [None]:
df.shape

In [None]:
df.ndim

**Something more ...**

In [None]:
df.dtypes

In [None]:
df.columns

In [None]:
list(df.columns)

In [None]:
df.info()

**A little bit reformatting ...**

In [None]:
df = pd.read_csv('../data/imf-gdp-per-capita-2015.csv',sep=',',header=0, thousands=',')
df.head(3)

In [None]:
df.info()

#### Create `dataframe` from excel spreadsheet

In [None]:
# pd.read_excel() # press shift + tab

In [None]:
## import from excel spreadsheet (need to have package `openpyxl` pre-installed)
df2 = pd.read_excel(io='../data/excel-test-file.xlsx', sheet_name='tab1', header=0)

df2.head(5)

In [None]:
df3 = pd.read_excel(io='../data/excel-test-file.xlsx',sheet_name='tab2',header=0)
df3.head(3)

## View `dataframe`

In [None]:
# create a dataframe from a numpy array, with columns labeled
df = pd.DataFrame(np.random.randn(6,4), columns = ['Ann', "Bob", "Charly", "Don"])
df

**df.head()**

In [None]:
df.head(2)

In [None]:
df.head()

**df.tail()**

In [None]:
df.tail(2)

In [None]:
df.tail()

**`dataframe` attributes**

In [None]:
type(df)

In [None]:
list(df.columns)

In [None]:
list(df.index)

In [None]:
df.ndim

In [None]:
df.shape

In [None]:
len(df)

In [None]:
df.dtypes

In [None]:
df.values # convert df to numpy array

In [None]:
df.values.shape

In [None]:
# you can also do
df.to_numpy()

In [None]:
df2.info()

**df.describe()**

In [None]:
df.describe() # generate descriptive stats on the data

**df.transpose()**

In [None]:
df

In [None]:
# transpose a datafrme

df.transpose()
# type(df.transpose())

In [None]:
df.T # you can also do it this way

**sort `dataframe`**

In [None]:
df

In [None]:
# sort_index(), by labels (index or column)
# df
df.sort_index(axis=0, ascending=False)

In [None]:
df.sort_index(axis=1, ascending=False)

In [None]:
# sort_values(), by values
df

In [None]:
df.sort_values(by='Ann', ascending=True)
# df.sort_values(by=['Ann','Bob'], ascending=True)

## Select `dataframe`

Pandas documentation on select and indexing `dataframe`:
* https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing
* https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced

### The different ways

| Type                  | Notes                                       |
|-----------------------|---------------------------------------------|
| `df[column]`          | Select by column labels                     |
| `df.loc[rows]`        | Select by row labels                        |
| `df.loc[:, cols]`     | Select by column labels                     |
| `df.loc[rows, cols]`  | Select by row and column labels             |
| `df.iloc[rows]`       | Select by row positional indices            |
| `df.iloc[:, cols]`    | Select by column positional indices         |
| `df.iloc[rows, cols]` | Select by row and column positional indices |
| `df.at[row, col]`     | Select an element by row and column labels  |
| `df.iat[row, col]`    | Select an element by row and column indices |

In [None]:
df

### Single column vs. multiple columns

In [None]:
df['Ann']

In [None]:
type(df)

In [None]:
type(df['Ann'])

In [None]:
df.Ann

In [None]:
type(df['Ann'])

Selecting multiple columns yields a dataframe, which references a subset of the original dataframe. Note you are NOT creating a new copy here!

In [None]:
df[['Ann','Bob']]

In [None]:
type(df[['Ann','Bob']])

In [None]:
df[['Ann']]

### Select by labels
* You could use `.loc` method of `dataframe` to select data by labels. Typical format is like
```python
df.loc[row_indexer, column_indexer]
```
* More details can be found here: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing


In [None]:
df

In [None]:
df.index

In [None]:
# by row label
df.loc[0]

In [None]:
# by row and column label
df.loc[[0,1],['Ann','Bob']]

In [None]:
df.loc[0,['Ann','Bob']] # get a Series

In [None]:
df.loc[0:2,['Ann','Bob']] # note here the row for `index=2` is also displayed

In [None]:
# by column label only
df.loc[:,['Ann']] # note that you'll get a dataframe instead of a Series

In [None]:
# what if I just want to get the value of a particular cell?
df.loc[2,'Ann']

In [None]:
# you can also do
df.at[2,'Ann']

### Select by Position

* You could use `.iloc` method of `dataframe` to select data by labels. Typical format is like
```python
df.iloc[row_position_indexer, column_position_indexer]
```
* More details can be found here: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing

In [None]:
df

In [None]:
# select by row position
df.iloc[0]

In [None]:
# select by row position range
df.iloc[0:2] # note that only the only one end of the range is included, different from df.loc

In [None]:
# you can also do
df.iloc[0:2,]

In [None]:
df.iloc[0:2,:]

In [None]:
# select by column position range
df.iloc[:,0:2]

In [None]:
# select by row and column position range
df.iloc[0:2,0:2]

In [None]:
# what if I just want to get the value of a particular cell?
df.iloc[0,0]

In [None]:
# you can also do
df.iat[0,0]

### Select by conditions

In [None]:
df

In [None]:
df[df.Ann>=0]

In [None]:
df.loc[df.Ann>=0,['Ann','Bob']]

In [None]:
df.loc[(df.Ann>=-0.5)&(df.Ann<=1.4),['Ann','Bob']]

## Set/change values - "mutable"

In [None]:
df

In [None]:
# add a new column
df['E'] = 5
df

In [None]:
df['F'] = np.arange(6)
df

In [None]:
# set values by labels
df.loc['2020-08-25','E'] = 3
# df.at['2020-08-25','E'] = 3
df

In [None]:
# set values by position
df.iloc[0,5] = -1
df

In [None]:
# set values by condition
df.loc[df.Ann>0,'E'] = 4
df

### `Reindex`
Create a new object with the values rearranged to align with the new index

#### On `series`

In [None]:
x = pd.Series([4.5, 7.2, -5.3, 3.6], index=["d", "b", "a", "c"])

In [None]:
x

In [None]:
y = x.reindex(["a", "b", "c", "d", "e"])
y

#### On `dataframe`

In [None]:
df = pd.DataFrame(
    np.arange(9).reshape(3,3),
    index=['a', 'c', 'd'],
    columns=['Ohio', 'Texas', 'California']
)

df

In [None]:
df2 = df.reindex(index=['a', 'b', 'c', 'd'])
df2

In [None]:
df3 = df.reindex(columns=['Texas', 'Utah', 'California'])
df3

## Missing values

`pandas` primarily uses the value np.nan to represent missing data. It is by default not included in computations. See the [Missing Data section](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data) from `pandas` official documentation for more details.

Reindexing allows you to change/add/delete the index on a specified axis. This returns a copy of the data.

In [None]:
dates = pd.date_range(start='2020-08-25', end='2020-10-01', freq='7D')
dates

In [None]:
df1 = df.reindex(index=dates[:6],columns=list(df.columns)+['G'])
df1

In [None]:
# fill in values at some locations
df1.loc['2020-08-25':'2020-09-08','G'] = 1
df1

In [None]:
# to get the boolean mask where values are nan
df1.isna()

In [None]:
# you can also do
pd.isna(df1)

In [None]:
# drop any rows that have missing values
df2 = df1.copy()
df2.dropna(how='any')

In [None]:
df2 # df2 is not impacted since the inplace flag is not flipped

In [None]:
# fill missing values
df1.fillna(value=-999)

## Operations on `dataframe`

**Stats**

In [None]:
df.describe()

In [None]:
df

In [None]:
# df.mean()
list(df.mean())

In [None]:
df.mean()

In [None]:
df.mean().values

In [None]:
df.mean(axis=0)

In [None]:
df.mean(axis=1)

**Histogram**

In [None]:
df

In [None]:
df['histcol'] = np.random.randint(0,3,size=3)
df

In [None]:
df.histcol.value_counts()

In [None]:
df.histcol.nunique()

In [None]:
df.histcol.unique()

In [None]:
# df.histcol.hist()
df.histcol.hist(density=True)

**Apply functions/logics to the data**

In [None]:
df

In [None]:
df.apply(np.cumsum) # apply the function on all columns

In [None]:
df.apply(lambda x: -x) # apply the function on all columns

In [None]:
df.California.map(lambda x: x+1) # apply the function on one single column

## `dataframe` and table operations

In [None]:
df = pd.DataFrame(np.random.randn(10, 4), columns=['a','b','c','d'])
df

**Concat**

In [None]:
pieces = [df[:3], df[7:]]
print("pieces:\n", pieces)
print("put back together:\n")
# pd.concat(pieces, axis=1)
pd.concat(pieces, axis=0)

**Joins**

More details at https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
![](joins.jpg)

In [None]:
tb1 = pd.DataFrame({'key': ['foo', 'boo', 'foo'], 'lval': [1, 2, 3]})
tb2 = pd.DataFrame({'key': ['foo', 'coo'], 'rval': [5, 6]})

In [None]:
tb1

In [None]:
tb2

In [None]:
pd.merge(tb1, tb2, on='key', how='inner')

In [None]:
pd.merge(tb1, tb2, on='key', how='left')

In [None]:
pd.merge(tb1, tb2, on='key', how='right')

In [None]:
pd.merge(tb1, tb2, on='key', how='outer')

**Grouping**

By `group by` we are referring to a process involving one or more of the following steps

* Splitting the data into groups based on some criteria
* Applying a function to each group independently
* Combining the results into a data structure
See the Grouping section from the `pandas` official documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html

In [None]:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8),
                   'D' : np.random.randn(8)})

df

In [None]:
df.groupby('A')['C'].mean().reset_index() # simple stats grouped by 1 column

In [None]:
df.groupby(['A','B']).sum().reset_index() # simple stats grouped by multiple columns

In [None]:
df.groupby(['A','B']).mean().reset_index() # simple stats grouped by multiple columns

In [None]:
# df.groupby(['A','B'])['C'].apply(lambda x: np.sum(x**2)).reset_index() # customized aggregation
df.groupby(['A','B'])['C'].apply(lambda x: np.sum(x)).reset_index() # customized aggregation

## Write/Export `dataframe` to files

**CSV file**

In [None]:
df

In [None]:
df.to_csv('../data/to-csv-test.csv',sep=',',header=True,index=None)

**Excel spreadsheet**

In [None]:
df.to_excel('../data/to-excel-test.xlsx',sheet_name='tab1',header=True,index=None)