In [None]:
import seaborn as sns
import pandas as pd

# line plot

First, we load the GameStop stock price data directly from the CSV file.  

In [None]:
gme = pd.read_csv('gamestop.csv')
gme

Notice that the `Date` column is currently read as plain text (string), **not as a true datetime type**

In [None]:
gme.dtypes

**When we plot it, Seaborn will just treat the dates as generic labels.  
This often makes the x-axis messy and unreadable for time series data.**

In [None]:
sns.lineplot(data=gme, x='Date', y='Open')

The plot works, but the x-axis labels are cluttered because Python does not yet recognize the `Date` column as actual time values.

Now we reload the data, this time telling pandas to:  
- **parse the `Date` column as datetime objects**, and  
- **use `Date` as the DataFrame index**.
  
This way, Python knows we are working with time series data:

In [None]:
gme = pd.read_csv('gamestop.csv', 
                  index_col="Date", 
                  parse_dates=True)
gme

In [None]:
gme.index
# gme.index.dtype # You can also try this

This shows the index type (should be DatetimeIndex if parsed correctly).
Make the plot again:

In [None]:
sns.lineplot(data=gme, x='Date', y='Open')

Now the x-axis shows years in order (2004, 2008, … 2024).  
This makes the time series trend much clearer and is the standard way to work with stock data.

# scatter plot 

In [None]:
dugong = pd.read_csv('dugong.csv')
dugong

In [None]:
sns.scatterplot(data=dugong, x='Age', y='Length')

# plot with groupby

**Spirit of plot with groupby**: first summarize the data by groups, then visualize the aggregated results for comparison.

**Our goal: count how many movies appear in the dataset for each year, and then visualize the results**

First summarize the data by groups (here by `Year`), then plot the aggregated results.

In [None]:
movies = pd.read_csv('top_movies_2017.csv')
movies

In [None]:
# Count how many movies per year
counts = movies.groupby('Year').Title.count()
#counts = movies.groupby('Year').Title.nunique()    # Alternative: .nunique() to count unique titles

counts

Now `counts` shows the number of movies for each year.  
Next, we plot these counts to visualize trends over time.

In [None]:
sns.lineplot(data=counts)

&uarr; Looks like before 1960, Hollywood was sleepy(1-2 movies)... but after 2000, blockbusters just exploded!

**Comparing raw gross vs adjusted gross**

When we first look at the **raw gross revenue** of movies over time,  
it seems like movies are making more and more money every year.  
But is that really true, or is inflation playing a role?

In [None]:
avg_gross = movies.groupby('Year').Gross.mean()
sns.scatterplot(data=avg_gross)

##### The raw numbers show a clear upward trend, but this might be misleading,  
because dollars in 1939 are not worth the same as dollars in 2025.

To account for inflation, we use the **adjusted gross revenue**.  
This allows us to compare movie earnings across decades on the same scale.

In [None]:
avg_gross = movies.groupby('Year')['Gross (Adjusted)'].mean()
sns.scatterplot(data=avg_gross)

After adjusting for inflation, the upward trend disappears.  
Now we see that old classics like *Gone with the Wind* remain among the highest earners,  
and recent movies are not necessarily "bigger" once we correct for the value of money.  

**Key lesson:** Always check whether you need to adjust for inflation when comparing money across time.

# Plotting more than two variables

In [None]:
penguins = pd.read_csv('palmer_penguins.csv')
penguins.head()

In [None]:
sns.scatterplot(data=penguins, x='Culmen Length (mm)', y='Culmen Depth (mm)', hue='Flipper Length (mm)')

In [None]:
sns.scatterplot(data=penguins, x='Culmen Length (mm)', y='Culmen Depth (mm)', hue='Species')

In [None]:
sns.scatterplot(data=penguins, x='Culmen Length (mm)', y='Culmen Depth (mm)', hue='Species', size='Body Mass (g)')

In [None]:
sns.scatterplot(data=penguins, x='Culmen Length (mm)', y='Culmen Depth (mm)', hue='Flipper Length (mm)', style='Sex')

In [None]:
sns.relplot(
	data=penguins, x='Culmen Length (mm)', y='Culmen Depth (mm)',
	col="Sex", row='Species', hue='Flipper Length (mm)',
	kind="scatter"
)

### Summary: Adding more variables to scatter plots
- **Numerical**: use color (hue) or marker size
- **Categorical**: use color (hue), marker style, or facet (rows/cols, best for 2–3 categories)