# **Summary**

**Dataset Description**

I am using the Coffee Quality Database from the Coffee Quality Institute GitHub Repository. This dataset contains detailed cupping reviews of Arabica coffees from around the world, including:


*   Country of Origin
*   Altitude
*   Processing method
*   Cupping scores (Aroma, Acidity, Body, Aftertaste, etc.)


It has over 1,300 rows of Arabica samples and will help answer questions around coffee quality by region and production methods.

**Main Goal / Research Question**

What countries produce the highest quality coffee, and how do altitude and processing method influence cupping scores like aroma, acidity, and total rating?

This interactive Dash app is designed for coffee buyers, enthusiasts, or researchers who want to explore coffee data visually and interactively.

**Limitations**


*   Some values (e.g., altitude, processing method) are inconsistent or missing

*   Altitude ranges need to be cleaned and normalized numerically
*   The current visualizations are static and not yet filterable in real time




**Plan Moving Forward**

*   Clean the dataset further to normalize numeric and categorical fields
*   Build Dash filters for:
    *   Country
    *   Altitude Range
    *   Processing Method
    *   Aroma, acidity, body, etc.
*   Develop interactive visualizations that respond to these filters
*   Improve design with perceptual effectiveness (color, encoding, labeling)







**Functioning Plotly Visualizations**

In [None]:
import pandas as pd
import plotly.express as px
from IPython.display import display


url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSsmAo0sTU5LeSQz6QaCOAMEvqSb9m4iA6lf1b_i1Iwi5QtD-OnkMkDh_NT2KLLstHQ_uKHgS_VHIVH/pub?output=csv'
df = pd.read_csv(url)


df['Altitude'] = df['Altitude'].astype(str).str.extract(r'(\d+)').astype(float)
df_filtered = df[['Country.of.Origin', 'Altitude']].dropna()

df_filtered = df_filtered[df_filtered['Altitude'] < 10000]

median_altitude = df_filtered.groupby('Country.of.Origin')['Altitude'].median().reset_index()

display(median_altitude)


#Bar chart for median altitude per country
fig2 = px.bar(median_altitude, x="Country.of.Origin", y="Altitude",
              title="Median Altitude of Coffee Production by Country",
              labels={"Altitude": "Median Altitude (m)", "Country.of.Origin": "Country"},
              template="plotly_white",
              color_discrete_sequence=["#636EFA"])

fig2.show()

Unnamed: 0,Country.of.Origin,Altitude
0,Brazil,977.5
1,Burundi,1790.0
2,China,1540.0
3,Colombia,1600.0
4,Costa Rica,1300.0
5,Cote d?Ivoire,200.0
6,Ecuador,420.0
7,El Salvador,1330.0
8,Ethiopia,1750.0
9,Guatemala,4000.0


In [None]:
df_box = df[['Processing.Method', 'Total.Cup.Points']].dropna()

fig = px.box(
    df_box,
    x='Processing.Method', y='Total.Cup.Points',
    color='Processing.Method',
    title='Score Distribution by Processing Method')
fig.show()

In [None]:
#Box Plot
fig2 = px.box(df,
              x='Country.of.Origin',
              y='Total.Cup.Points',
              title="Distribution of Coffee Quality Scores by Country",
              color='Country.of.Origin')

fig2.update_traces(marker=dict(symbol='cross'))
fig2.show()