# Assignment 3

# Visualization Technique

## A narrative description of each visualization type in dashboard / A discussion of how these visualizations complement each other and when each should be used
**The Visualization Techniques I used in dashboard are:**

**Grouped Bar Chart**:A grouped bar chart is used to enable clear year-by-year comparison between categories.
In Annual global corporate investment in AI (by Type) Grouped Bar Chart, the x-axis represents the year, while the y-axis shows investment amounts in billions of dollars. 

**Line Chart**:A line chart shows how a variable changes over time. It is useful for highlighting long-term trends and temporal patterns.In Annual private investment in AI Line(by Region) Chart, the x-axis displays years, and the y-axis shows the investment amount in billions. 

**Scatter Chart**:A scatter chart illustrates relationships between two continuous variables. With encodings like color or size, it can reveal multidimensional patterns. In datapoints used to train AI(by Domains) Scatter Chart, the x-axis shows the publication date of each system, while the y-axis represents the number of training datapoints—plotted on a logarithmic scale due to the wide range of values (from hundreds to trillions). Color is used to distinguish between domains.

**Pie Chart**:Pie charts are effective for communicating part-to-whole relationships.Market Share(by Stage) Pie Charts display the global market share distribution across countries or regions at different stages of the AI hardware supply chain—namely Design, Fabrication, and Assembly.

**Interactivity I used in dashboard are:**

The year range slider in the grouped bar and line charts allows exploration of investment patterns over time.

The domain selector in the scatter plot lets users focus on specific AI sectors, reducing visual clutter and highlighting targeted insights.

The stage selector in the pie chart section supports quick comparisons between supply chain phases (Design, Fabrication, Assembly).

**Grouped bar chart, line chart, scatter plot, and pie charts—each offering unique strengths for exploring different dimensions of the data. Together, they provide a comprehensive and multi-angle view of AI investment and development trends.Interactivity plays a key role in enhancing the user experience.**

# Visualization Library

**The dashboard framework and libraries you're using, and why they're suitable for this visualization./ A discussion of the general approach and limitations of this framework.**

I used **Plotly** to create the core visualizations (bar chart, line chart, pie chart, scatter plot) because of its rich built-in interactivity and visual appeal. However, Plotly alone cannot manage dashboard layout or UI widgets, such as sliders, dropdowns, or tabs.It is open resource and has very detrailed instruction on website https://plotly.com/python/basic-charts/. It integrates with Jupyter and I don't need to intall it.

To address the limitation of Plotly, I used **Panel** to integrate interactive controls and organize the layout. Panel allowed me to bind widgets to Plotly visualizations using @pn.depends, enabling real-time updates and a more user-friendly experience. I follow the instruction on website https://panel.holoviz.org/. It doesn't need to install in Jupyter too.

I try to use other libraries, such as hvplot. It need to install before using.
![hvplot need to install first](hvplot_error.png)
I do not find pie chart in hvplot, which I want to use in displaying part-to-all relations.

Although Plotly and Panel are both powerful tools, they were not originally designed as a fully integrated pair. I need to manually bind Panel widgets to Plotly visualizations using @pn.depends.It is a bit complicated and lacks flexibility.


# Demonstration

My **Data source**: https://ourworldindata.org/artificial-intelligence

AI technology has become much more powerful over the past few decades. It has found applications in many different domains. And a lot of development projects have achieved different types of investments, which have increased dramatically in recent years. 

There're great differences in the focus degree of different regions. I'm interested in the these differences between differnet regions, such as inverst amount and market share.

Given how rapidly AI developed in the past ,such as datapoints used in AI development has increased greatly, we might expect AI technology to become much more powerful in the coming decades.Now that the resources dedicated to its development have increased so substantially.


To explore these trend and relationship.I do following steps:

**I download the dataset to csv files.**

**I read and clean the raw data.**

**I create the dashboard.**

In [45]:
# import warnings

import warnings
warnings.filterwarnings("ignore")


In [46]:
# import all librabries 

import pandas as pd
import plotly.express as px
import panel as pn

pn.extension("plotly")

## Annual global corporate investment in artificial intelligence(by Type)

I have downloaded the data underestimates total global AI investment, including different types of investment resources, such as Merger/acquisition, Minority stake, Private investment, Public offering.

A merger is a corporate strategy involving two companies joining together to form a new company. An acquisition is a corporate strategy involving one company buying another company.

Private investment is defined as investment in AI companies of more than $1.5 million (in current US dollars).

A public offering is the sale of equity shares or other financial instruments to the public in order to raise capital.

A minority stake is an ownership interest of less than 50% of the total shares of a company.

In [48]:
# read and clean data

df_bar = pd.read_csv("ai_investment_by_type.csv")

df_bar = df_bar.drop(columns=["Code"])
df_bar = df_bar.rename(columns={
    'Entity': 'Investment Type',
    'Global corporate investment in AI': 'Investment Amount'
    })
df_bar['Investment Amount (Billion $)'] = df_bar['Investment Amount'] / 1e9

df_bar.head()

Unnamed: 0,Investment Type,Year,Investment Amount,Investment Amount (Billion $)
0,Merger/acquisition,2013,6886002147,6.886002
1,Merger/acquisition,2014,7657430075,7.65743
2,Merger/acquisition,2015,10117774371,10.117774
3,Merger/acquisition,2016,14733563855,14.733564
4,Merger/acquisition,2017,27282717703,27.282718


In [80]:
# add year_slider

min_year = df_bar['Year'].min()
max_year = df_bar['Year'].max()

year_slider = pn.widgets.IntRangeSlider(
    name="Year Range",
    start=min_year,
    end=max_year,
    value=(min_year, max_year),
    step=1
)


In [100]:
@pn.depends(year_slider.param.value)

def plot_bar(year_range):
    
    filtered_df_bar = df_bar[(df_bar['Year'] >= year_range[0]) & (df_bar['Year'] <= year_range[1])]
    
    fig = px.bar(
        filtered_df_bar,
        x="Year",
        y="Investment Amount (Billion $)",
        color="Investment Type",
        barmode="group", 
        title="AI Investment by Type (Grouped Bar Chart)"
    )
    
    fig.update_layout(
        xaxis_title="Year",
        yaxis_title="Investment Amount (Billion $)",
        template="plotly_white"
    )
    
    return fig

In [105]:
# Panel Dashboard 

dashboard = pn.Column(
    "# AI Investment by Type ",
    year_slider,
    plot_bar
)

dashboard

## Annual private investment in artificial intelligence(by Region)

I download the data underestimates AI investment in different regions, including China,Europe and United State.

In [106]:
# read and clean data

df_line = pd.read_csv("ai_investment_by_region.csv").drop(columns=["Code"])
df_line = df_line.rename(columns={
    "Entity": "Region",
    "Global total private investment in AI": "Investment Amount"
})
df_line["Investment Amount (Billion $)"] = df_line["Investment Amount"] / 1e9

df_line.head()

Unnamed: 0,Region,Year,Investment Amount,Investment Amount (Billion $)
0,China,2013,717196188,0.717196
1,China,2014,771392286,0.771392
2,China,2015,2385249620,2.38525
3,China,2016,5102962786,5.102963
4,China,2017,7314146469,7.314146


In [107]:
@pn.depends(year_slider.param.value)

def plot_line(year_range):
    
    df_line_filtered = df_line[(df_line["Year"] >= year_range[0]) & (df_line["Year"] <= year_range[1])]
    
    fig = px.line(
        df_line_filtered,
        x="Year",
        y="Investment Amount (Billion $)",
        color="Region",
        markers=True,
        title="Private AI Investment by Region Over Time"
    )
    
    fig.update_layout(
        xaxis_title="Year",
        yaxis_title="Investment (Billion $)",
        template="plotly_white"
    )
    
    return fig

In [108]:
# Panel dashboard

dashboard = pn.Column(
    "# AI Investment by Region",
    year_slider,
    plot_line
)

dashboard

## Datapoints used to train artificial intelligence systems(by Domains)

Training data size refers to the volume of data employed to train an artificial intelligence (AI) model effectively. It's a representation of the number of examples that the model learns from during its training process. It is a fundamental measure of the scope of the data used in the model's learning phase.I download the data of Datapoints used to train artificial intelligence systems in different domains,such as Language, vision, biology.

In [109]:
# read and clean data
df_scatter = pd.read_csv("ai_training.csv")

df_scatter = df_scatter.drop(columns=["Entity", "Code"])
df_scatter = df_scatter.rename(columns={
    "Day": "Date",
    "Training dataset size": "Datapoints"
})

df_scatter = df_scatter[df_scatter["Datapoints"] > 0]

df_scatter["Date"] = pd.to_datetime(df_scatter["Date"])
df_scatter["Datapoints"] = pd.to_numeric(df_scatter["Datapoints"])

df_scatter.head()


Unnamed: 0,Date,Datapoints,Domain
0,2018-08-30,2000000.0,Language
1,2017-12-05,929000.0,Language
3,2018-03-22,103000000.0,Language
5,2010-03-01,60000.0,Vision
7,1960-06-30,100.0,Vision


In [110]:
domains = sorted(df_scatter["Domain"].dropna().unique().tolist())

In [111]:
# domain_selector
domain_selector = pn.widgets.MultiChoice(
    name="Select Domains",
    options=domains,
    value=domains
)

In [112]:
@pn.depends(domain_selector.param.value)

def plot_scatter(selected_domains):
    df_scatter_filtered = df_scatter[df_scatter["Domain"].isin(selected_domains)].copy()

    fig = px.scatter(
        df_scatter_filtered,
        x="Date",
        y="Datapoints",
        color="Domain",
        title="Datapoints used to train AI systems ",
        log_y=True,
        opacity=0.7
    )

    fig.update_layout(
        xaxis_title="Publication Date",
        yaxis_title="Training Datapoints",
        template="plotly_white"
    )

    fig.update_yaxes(tickformat=".0s")  # y： 1K / 1M / 1B / 1T

    return fig

In [113]:
dashboard = pn.Column(
    "# AI Training ",
    domain_selector,
    plot_scatter
)

dashboard

## Market share for logic chip production, by manufacturing stage, 2021

I download the market share data in different region, which sorted by manufacturing stages(Design/Fabrication /Assembly, testing and packaging).

In [114]:
df_pie = pd.read_csv("market_share.csv")

df_long = df_pie.melt(
    id_vars=["Entity"],
    value_vars=["Design", "Fabrication", "Assembly, testing and packaging"],
    var_name="Stage",
    value_name="Share"
)

df_long = df_long.dropna(subset=["Share"])

df_long.head()

Unnamed: 0,Entity,Stage,Share
0,China,Design,9.0
2,Japan,Design,6.0
4,Others,Design,9.0
6,South Korea,Design,6.0
7,Taiwan,Design,9.0


In [115]:
stage_selector = pn.widgets.Select(
    name="Select Stage",
    options=sorted(df_long["Stage"].unique().tolist()),
    value="Design"
)

In [116]:
@pn.depends(stage_selector.param.value)
def plot_pie(stage):
    df_long_filtered = df_long[df_long["Stage"] == stage]
    
    fig = px.pie(
        df_long_filtered,
        names="Entity",
        values="Share",
        title=f"{stage} Market Share by Entity"
    )
    fig.update_traces(textinfo="percent+label")
    fig.update_layout(template="plotly_white")
    
    return fig

In [117]:
dashboard = pn.Column(
    "# Market Share by Stage",
    stage_selector,
    plot_pie
)

dashboard

## Multiple visualizations are visible on a single screen

In [119]:
dashboard = pn.Column(
    "# Key Insights on Artificial Intelligence",
    year_slider,
    stage_selector,
    pn.Row(plot_bar, plot_line),      
    pn.Row(plot_scatter, plot_pie)   
)

dashboard