## Comparing Programming Language Popularity - Using Altair Visualization Library
We extract data from stackflow.com for Year 2018 and 2019.
The data represent the order of aggregate number of tags (programming languages, libraries, tools). 
We compare two year to see the change of ranking over time.

In [62]:
import pandas as pd
import altair as alt
alt.renderers.enable('default')

RendererRegistry.enable('default')

In [41]:
# Column specification for the fixed-width format (fwf)
ColSpecs = [(0,32),(32,41)] 

df18 = pd.read_fwf("./data/2018-results.txt",colspecs=ColSpecs,Header=1, skiprows=(1,1))
df19 = pd.read_fwf("./data/2019-results.txt",colspecs=ColSpecs,Header=1, skiprows=(1,1))

df = pd.merge(df18, df19, on="Tag")
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5090 entries, 0 to 5089
Data columns (total 3 columns):
Tag         5086 non-null object
Rank2018    5090 non-null int64
Rank2019    5090 non-null int64
dtypes: int64(2), object(1)
memory usage: 159.1+ KB


In [42]:
 # This row is an outlier which screws up our plot
df[df["Tag"] == "excel-vba"]  

Unnamed: 0,Tag,Rank2018,Rank2019
50,excel-vba,51,1650


In [43]:
# We drop it.
df = df[df["Tag"] != "excel-vba"]  
df[df["Tag"] == "excel-vba"]

Unnamed: 0,Tag,Rank2018,Rank2019


In [59]:
# Sort the dataset by 2018 rank
df.sort_values(by=['Rank2018'], ascending=True, inplace=True)
df.head(10)

Unnamed: 0,Tag,Rank2018,Rank2019
0,javascript,1,2
1,python,2,1
2,java,3,3
3,android,4,5
4,c#,5,4
5,php,6,6
6,html,7,7
7,angular,8,11
8,jquery,9,16
9,css,10,13


### WE try to differentiate the three categories with different colors
- No change in ranking (Rank 2019 = Rank 2018)
- Rank improves (Rank 2019 < Rank 2018)
- Rank wrosens (Rank 2019 > Rank 2018)

In [63]:
source = df.head(25)    # compare top 25 of Year 2018

# No change in rank, use light gray, the circles are on the 45 degree diagnoal line

source1 = source[df['Rank2018'] == df['Rank2019']]
point1 = alt.Chart(source).mark_circle(size=2500, color='lightgray').encode(
    x = 'Rank2018',
    y = 'Rank2019',
    tooltip=["Tag", "Rank2018", "Rank2019"]
    
)

# Rank worsens, use red color, these circles are above the 45 degree diagnoal line

source2 = source[df['Rank2018'] < df['Rank2019']]
point2 = alt.Chart(source2).mark_circle(size=2500, color='red').encode(
    x = 'Rank2018',
    y = 'Rank2019',
    tooltip=["Tag", "Rank2018", "Rank2019"]     # make it interactive with tooltip for mouse over
)

# Rank improves, use green color, these circles are below tlinehe 45 degree diagonal line

source3 = source[df['Rank2018'] > df['Rank2019']]
point3 = alt.Chart(source3).mark_circle(size=2500, color='green').encode(
    x = 'Rank2018',
    y = 'Rank2019',
    tooltip=["Tag", "Rank2018", "Rank2019"]
)

# Display the name of the tag in the center of the circle

text=alt.Chart(source).mark_text(align='center', size=16, baseline='middle').encode(
    x='Rank2018',
    y='Rank2019',
    text='Tag'
).properties(
    width=700,
    height=700
)


alt.layer(point1, point2, point3, text)


  """
  from ipykernel import kernelapp as app


## Observations
- Good news for Python programmers. Python, Python 3.7, and Pandas rank improves meaning they become more popular. 
- Python is ranked #1 in 2019 jumpped from #2 in 2018
