# Python CheatSheet for Data Science | By Frank Yue Ying

### Last updated on 2022-06-26

Topics: Pandas, Numpy, OS, Jupyter Notebook, Visualization

---

# Directory
| Task | Note | Package | External Link |
| :-- |:-- | :-- | :--- |
| [Iterate Directory and scan files](#iteratedirectory) | |  os |- |
|[pandas Date conversion](#pandasdate) | | pd | -|
| [pandas Groupby](#pandasgroupby) | | pd |- |
| [pandas Print entire dataframe](#pandasprint) | | pd |-|
| [Merge two jupyter notebooks](#mergenotebook) | | nbformat |-|
| Use Python to update Google Sheet through API | Need to create service credential on Google API first  | *multiple* | [GoogleSheet_API.ipynb](./Learning/Other/GoogleSheet_API.ipynb#)| 
| [Load .env file as environment variables into python](#loadenv) | create .env file | os, dotenv | - |
| [Pandas Lambda Function](#lambdafunction) | | pd |- |
| [Setting Seaborn parameters](#seabornpara) | look fansy | seaborn |- |

### Read through directory and scan files
<a id="iteratedirectory"></a>

In [None]:
import os
Dir = r"C:\Users\yingy\Desktop\Discoverability\Raw_Data"
files = os.listdir(Dir)
for f in files:
    df = pd.read_excel(os.path.join(Dir, f))

### Convert dataframe column of String date (2020-06-10) into week (24) and year (2020) numbers
<a id="pandasdate"></a>

In [None]:
import pandas as pd
raw_dt['date'] = pd.to_datetime(raw_dt['date'], errors ='coerce')
raw_dt['week'] = raw_dt['date'].dt.week
raw_dt['year'] = raw_dt['date'].dt.year

### Pandas Groupby
<a id="pandasgroupby"></a>

In [None]:
# Groupby user id & week, show value as sum of amount column
dt1 = dt_2020.groupby(["user_id","week"]).agg({'amount':'sum'})

### Print entire dataframe
<a id="pandasprint"></a>

In [None]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    print(dt1)

### Merge Jupyter Notebook Files
<a id="mergenotebook"></a>

In [None]:
import nbformat
# Reading the notebooks
first_notebook = nbformat.read('Airtable_Modelv1_Attributes_Frank.ipynb', 4)
second_notebook = nbformat.read('Age of checking accounts in days.ipynb', 4)
# Creating a new notebook
final_notebook = nbformat.v4.new_notebook(metadata=first_notebook.metadata)
# Concatenating the notebooks
final_notebook.cells = first_notebook.cells + second_notebook.cells
# Saving the new notebook 
nbformat.write(final_notebook, 'final_notebook.ipynb')

### Load .env file as environment variables into python
<a id="loadenv"></a>

In [None]:
import os
from dotenv import load_dotenv

In [None]:
# load environment file and variables
load_dotenv("airtable.env")
AIRTABLE_TOKEN = os.getenv("Airtable_token")
AIRTABLE_BASE_ID = os.getenv("Airtable_base_id")

### Pandas Lambda Function
<a id="lambdafunction"></a>

In [None]:
def to_sentiment(star_rating):
    if star_rating in {1, 2}: # negative
        return -1 
    if star_rating == 3:      # neutral
        return 0
    if star_rating in {4, 5}: # positive
        return 1

# transform star_rating into the sentiment
df_transformed['sentiment'] = df_transformed['star_rating']
.apply(lambda star_rating: to_sentiment(star_rating=star_rating))

### Setting Seaborn parameters
<a id="seabornpara"></a>

In [None]:
import seaborn as sns
sns.set_style = 'seaborn-whitegrid'
sns.set(rc={"font.style":"normal",
            "axes.facecolor":"white",
            'grid.color': '.8',
            'grid.linestyle': '-',
            "figure.facecolor":"white",
            "figure.titlesize":20,
            "text.color":"black",
            "xtick.color":"black",
            "ytick.color":"black",
            "axes.labelcolor":"black",
            "axes.grid":True,
            'axes.labelsize':10,
            'xtick.labelsize':10,
            'font.size':10,
            'ytick.labelsize':10})