# Cool Python DS Libraries - Part 1 (20 points)

In this notebook, we will do some basic tasks to ensure that your environment works fine and you learn some of the most common tools and techniques that we will use in this and future assignments.

In [2]:
# These two lines ensure that all modules are reloaded every time a Python cell is executed.
# This allows us to modify some other Python file and immediately see the results
# instead of restarting the kernel and running every cell. 
%load_ext autoreload
%autoreload 2

In [3]:
# Import some of the relevant libraries here and use them elsewhere.
# Note that you cannot use arbitrary Python packages.
# During grading, TAs will replace your pyproject.toml with the a different one.
# So if you use some custom libraries, it will not be available during grading time.

import random
import time

from rich.console import Console
from rich.jupyter import print
from rich.live import Live
from rich.panel import Panel
from rich.table import Table



## Task 2a. Markdown Script (2 points)

Markdown is a lightweight markup language like HTML that is widely used in data science. Most of the tools that you will use (such as Jupyter notebook) will accept markdown based formatting. So a good data scientist will need to understand and write markdown.

It is a very simple script. There are many tutorials online. Here is a [popular one](https://www.markdownguide.org/basic-syntax/). Once you learn the basics, write a markdown version that can reproduce the image below. 

There are a number of online markdown editors where you can get immediate feedback and rendering of your markdown text. I would recommend using something without WYSIWYG to get a feel of writing markdown. A popular example is [this site](https://markdownlivepreview.com/). In the future, you can use the powerful tools (such as within VS Code) that allows you to write markdown visually. 

Write the markdown code for this task in the cell below the image. This will be manually evaluated by the TAs.



![Markdown](resources/t2a_markdown_rendering.png) 

# This is the top level header.
---
## This is the second level header.
---
This text contains **bold text.** It also has *italicized text*. It even has both ***bold and italic.***

It even has a blockquote.
> This is an example of blockquote.
>
> This blockquote has multiple paragraphs.
This also has ordered lists.
1. One
2. Two
3. Three
4. Four

It also has unordered lists.

- First
- Second
- Third
- Fourth

It also simple `code` blocks using backticks.

    import sys
    print(sys.version)

It also some links [www.google.com](www.google.com).

## Task 2b: Generating colorful output in Python and Notebooks using rich library

Markdown provides one way of generating the output. Charts and visualizations provide another. Sometimes, it is necessary to generate eye-catching output in Python. There are many ways to do this. In this assignment, we will be using [rich](https://github.com/Textualize/rich) library. 

To see the capabilities, run the following command in the terminal.

> rye run python -m rich



### Task 2b1: Basic formatting, colors and Emojis (2 points)

Generate the following figure using rich library. Note that my editor uses dark mode. No need to recreate the dark background. Just the formatting text is sufficient.

The [Emoji Cheat Sheet](https://www.webfx.com/tools/emoji-cheat-sheet/) is a good resource for getting the Emoji shortcodes. It has a nifty feature that allows you to copy short codes when you click.

![Markdown](resources/t2b1_rich_formatting.png) 

In [4]:
"Give your code for t2b1 here."
from rich import print
print("Rich can do formatting: [bold]bold[/bold], [italic]italic[/italic]")
print("Rich can show different colors: [red]RED[/red], [green]GREEN[/green], [blue]BLUE[/blue] and many more")
print("You can even combine them to show a text in [bold red]bold red[/bold red]")
print("You can print even Emojis! 😀 😉 🤫 😇")

### Task 2b2: Tables with Rich (2 points)

Generate the following table using rich library. Note that my editor uses dark mode. No need to recreate the dark background. Just the formatting the table is sufficient. 

For the table, note that the title row has a different color (bold and magenta). The columns have colors cyan, magenta and green respectively. Note that you can set colors for columns directly instead of specifying it for each cell.

![Tables](resources/t2b2_rich_tables.png) 

In [5]:
"Give your code for t2b2 here."
from rich.table import Table

table = Table(title="Table with Rich")
table.header_style = "Magenta"
table.add_column("Name", style="Cyan"); table.add_column("Age", style="Magenta"); table.add_column("City", style="Green")

table.add_row("Alice", "25", "New York")
table.add_row("Bob", "30", "San Francisco")
table.add_row("Charlie", "35", "London")

table

### Task 2b3: Panels with Rich (2 points)

Generate the following Panel using rich library. Note that my editor uses dark mode. No need to recreate the dark background. Just the formatting is sufficient. 

Couple of subtleties to note. The panel only does not scale to the entire screen (hint: check the expand parameter). Plus it also has some title, subtitles, emojis and coloring. 


![Panels](resources/t2b3_rich_panel.png) 

In [6]:
"Give your code for t2b3 here."
from rich.panel import Panel

text = "Hello, [red]World![/red]. I can do cool stuff in Python 😀"
panel = Panel(text)
panel.box.mid_right = "|"
panel.expand = False
panel.title = "Data Mining"
panel.subtitle = "Rocks"

print(panel)

### Task 2b4: Live Display with Rich (2 points)

The next sub task is based on Live display component of rich. This is an extremely powerful tool. We will do a simple task of doing a live display of stock prices. 

A sample video of the expected output can be found [here](https://www.dropbox.com/scl/fi/sl6vf2tigg6orku4pliby/t2live_display.mov?rlkey=gvip9rwip2wr74hhgta0gcl17&dl=0).

Intuitively, generate a random price for each stock and update them live. For simplicity, you can compute both the stock price and change as uniform random variables (see random.uniform function). In other words, there is no need to store the historical stock price and compute accurate delta percentage difference.

Repeat this process for 10 seconds where the table is updated 20 times with a time delay of 0.5 seconds each. 

Hint: You should use the update function so that the entire live display is refreshed. 

In [17]:
from copy import deepcopy
prev_stock = None
def t2b4_generate_table() -> Table:
    global prev_stock
    """Generate a table with random stock data."""
    table = Table(title="Stock Prices")
    table.add_column("Stock",style="Cyan"), table.add_column("Price", style="Magenta"), table.add_column("Change", style="Green")

    stocks = {
        "AAPL": 100+random.random()*20,
        "GOOGL": 200+random.random()*10,
        "MSFT": 400+random.random()*30,
        "AMZN": 300+random.random()*40,
    }
    if prev_stock == None:
        change = {
            "AAPL": 0,
            "GOOGL": 0,
            "MSFT": 0,
            "AMZN": 0,
        }
        prev_stock = deepcopy(stocks)
    else:
        change = {}
        for com in stocks:
            ch = 100*(stocks[com]-prev_stock[com])/prev_stock[com]
            change[com] = ch
        prev_stock = deepcopy(stocks)
        
    for com in stocks:
        table.add_row(com, f"${stocks[com]}", f"{change[com]}%")
    return table


with Live(t2b4_generate_table(), refresh_per_second=2) as live:
    for _ in range(20):
        time.sleep(0.5)
        live.update(t2b4_generate_table())



Output()

## Configuration Formats for Data Science

The data science ecosystem is vast. There are thousands of packages written in many languages. So, it is imperative that there exists common formats for these diverse packages to communicate with each other. Intuitively, there are two things to communicate: configs and data. 

We will discuss common data file formats in the next assignment. In this assignment we will focus on ways to communicate configs. There are three common file formats: 

1. JSON: JavaScript Object Notation. A basic tutorial is [here](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON)
2. TOML: Tom’s Obvious Minimal Language : A basic tutorial is [here](https://realpython.com/python311-tomllib/)
3. YAML: YAML Ain't Markup Language : A basic tutorial is [here](https://spacelift.io/blog/yaml)

We will discuss JSON in more detail in next assignment. In this assignment, we will focus on TOML and YAML. 

## Task 2c. Fun with TOML (5 points)

TOML is a powerful language and becoming widely used especially in the Python ecosystem. Most of the important libraries are using TOML to share their configurations. For example, pyproject.toml that we used for customizing the setup is a TOML file. An indication of its importance is the fact that Python provides mechanisms to work with TOML as part of its standard library. So, the importance of TOML is only going to increase from hereon. So it is essential that a budding data scientist is fluent with TOML. 

You can use the link above to learn about TOML. It is a reasonably good intro to TOML. Usually, you will only need to parse a TOML file. So tomllib does not provide a way to dump a TOML string. 

I have found that a good way to learn a markup language is to write a translator. So we will build a translator that takes Python dictionary as input and outputs TOML. 

TOML supports lot of data types. For this assignment, let us limit ourselves where the scalars are integers and strings. We will support lists and dictionaries as complex entities where the values will be either integers, strings or lists and dictionaries (ie nested data structures). 

Please update the function `t2c_python_dict_to_toml_string` in `t2_tasks.py` with your code. We provide two simple dictionaries to test your code -- `t2c_simple_dict` and `t2c_complex_dict`. 

You can test your code by running rye with the pytest marker `t2c`

> rye test -- -m t2c

It goes without saying that you cannot use any existing TOML library and call its dumps function. During grading, the TAs will have a custom pyproject.toml that will not include such libraries :)

Similarly, the TAs grading code will contain additional private test cases that is not much more complex than the two given here.

## Task 2d. Fun with YAML (5 points)

YAML is another popular format. Most of the deep learning packages allow customization via YAML. It is (slightly) less powerful than TOML but still widely used in ML as it came before it. So it is imperative for data scientists to know YAML. 

You can use the link above to learn about YAML.   As before, we will build a translator that takes Python dictionary as input and outputs YAML. 

YAML supports lot of data types. For this assignment, let us limit ourselves where the scalars are integers and strings. We will support lists and dictionaries as complex entities where the values will be either integers, strings or lists and dictionaries (ie nested data structures). 

Please update the function `t2d_python_dict_to_yaml_string` in `t2_tasks.py` with your code. We provide two simple dictionaries to test your code -- `t2_simple_dict` and `t2_complex_dict`. 

You can test your code by running rye with the pytest marker `t2d`

> rye test -- -m t2d

It goes without saying that you cannot use any existing YAML library and call its dumps function. During grading, the TAs will have a custom pyproject.toml that will not include such libraries :)

Similarly, the TAs grading code will contain additional private test cases that is not much more complex than the two given here.