<a href="https://colab.research.google.com/github/wcj365/python-dataviz/blob/master/plotly/computer_languages_ranking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Comparing Programming Language Popularity - Using Plotly
We extract data from stackflow.com for Year 2018 and 2019.
The data represent the order of aggregate number of tags (programming languages, libraries, tools). 
We compare two year to see the change of ranking over time.

In [0]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

In [43]:
# Column specification for the fixed-width format (fwf)
ColSpecs = [(0,32),(32,41)] 

URL_2018 = "https://raw.githubusercontent.com/wcj365/compare-computer-languages/master/data/2018-results.txt"
URL_2019 = "https://raw.githubusercontent.com/wcj365/compare-computer-languages/master/data/2019-results.txt"

df18 = pd.read_fwf(URL_2018,colspecs=ColSpecs,Header=1, skiprows=(1,1))
df19 = pd.read_fwf(URL_2019,colspecs=ColSpecs,Header=1, skiprows=(1,1))

df = pd.merge(df18, df19, on="Tag")
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5090 entries, 0 to 5089
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Tag       5086 non-null   object
 1   Rank2018  5090 non-null   int64 
 2   Rank2019  5090 non-null   int64 
dtypes: int64(2), object(1)
memory usage: 159.1+ KB


In [44]:
 # This row is an outlier which screws up our plot
df[df["Tag"] == "excel-vba"]  

Unnamed: 0,Tag,Rank2018,Rank2019
50,excel-vba,51,1650


In [45]:
# We drop it.
df = df[df["Tag"] != "excel-vba"]  
df[df["Tag"] == "excel-vba"]

Unnamed: 0,Tag,Rank2018,Rank2019


In [46]:
# Sort the dataset by 2018 rank
df.sort_values(by=['Rank2018'], ascending=True, inplace=True)
df.head(10)

Unnamed: 0,Tag,Rank2018,Rank2019
0,javascript,1,2
1,python,2,1
2,java,3,3
3,android,4,5
4,c#,5,4
5,php,6,6
6,html,7,7
7,angular,8,11
8,jquery,9,16
9,css,10,13


In [47]:
px.scatter(data_frame=df[:50], x='Rank2018', y='Rank2019',hover_data=['Tag','Rank2018','Rank2019'])

### We try to differentiate the three categories with different colors
- No change in ranking (Rank 2019 = Rank 2018)
- Rank improves (Rank 2019 < Rank 2018)
- Rank wrosens (Rank 2019 > Rank 2018)

In [48]:
df["Change"] = df["Rank2018"] - df["Rank2019"] 
df.head()

Unnamed: 0,Tag,Rank2018,Rank2019,Change
0,javascript,1,2,-1
1,python,2,1,1
2,java,3,3,0
3,android,4,5,-1
4,c#,5,4,1


In [49]:
def get_change(x):
    if x == 0:
        return "Same"
    elif x < 0:
        return "Worse"
    else:
        return "Better"

df["Change"] = df["Change"].apply(get_change)

df.head()

Unnamed: 0,Tag,Rank2018,Rank2019,Change
0,javascript,1,2,Worse
1,python,2,1,Better
2,java,3,3,Same
3,android,4,5,Worse
4,c#,5,4,Better


In [65]:
MAX_RANK = 100

fig = px.scatter(data_frame=df[:MAX_RANK], 
           x='Rank2018', 
           y='Rank2019',
           color="Change", 
           text="Tag",
           hover_data=['Tag','Rank2018','Rank2019'])

fig.update_traces(marker=dict(size=40))

fig.add_shape(
            type="line",
            x0=0,
            y0=0,
            x1=MAX_RANK,
            y1=MAX_RANK,
            line=dict(
                color="green",
                width=4,
                dash="dot",
            )
)
fig.show()

## Observations
- Good news for Python programmers. Python, Python 3.7, and Pandas rank improves meaning they become more popular. 
- Python is ranked #1 in 2019 jumpped from #2 in 2018
