# Author: Vlad Stejeroiu (984963)
# CSC337: Coursework 1

## Task 1: Design 3
### Chart to check specific places and understand how they compare between themselves in terms of time since they existed.
### The idea here is to make a box plot with more or less two types of outcome for each place in question. 
### It directly relates to the distribution of structure types around the world.

<b>Aim:</b> To gain information about specific types of places and buildings in relation to their age.

<b>Visual Design Type:</b> Box plot.

<b>Name of Tool:</b> Altair.

<b>Visual Mappings:</b>
+ The lines with the color on them are the result of them either being a chosen special place or not.
+ The x-axis is represented by the types of places we chose.
+ The y-axis is represented by the age of those places.
+ The colors are easy to see, the chart easy to understand, but the shape of the lines and everything else could be smoother.

<b>Data Preparation:</b>
+ Chose just the essential columns needed from the dataset.
+ Removed the null or irrelevant values from these chosen columns.
+ Chose the types of places we will investigate.

<b>Improvements:</b> 
+ Add more information or icons, but for now it looked too crowded.
+ Colours can be changed to look more appealing.
+ Animations can be added to make it interactive and better for viewing, in general.
+ No hyperlinks to the data source in tooltips as it seemed irrelevant to the task.
+ Make the data cleaner and easier to use, e.g. by removing duplicates created by commas that added after a word makes a new variable.

<b>Image:</b> Can be found in its designated folder, in PNG format. 

In [1]:
import pandas as pd
import altair as alt
import datetime as dt
from vega_datasets import data

Debug options

In [2]:
# font to use for chart labels
__CHART_FONT__ = 'Circular'

# DEBUG: disable maximum row prevention (cripples chart performance)
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

Load the GPPD data set and sequentially derive the required information

In [3]:
# load GPPD data set into pandas array
source = pd.read_csv('../data/pleiades-locations.csv').head(3000)

# keep only necessary columns
d_boxplot = source[['minDate','maxDate', 'featureType']]

# remove null data from combined column
d_boxplot.dropna(
    subset=['featureType'],
    inplace=True
)

# remove  of type unkown because they don't generate any value
d_boxplot = d_boxplot[d_boxplot['featureType'] != 'unknown']

# calculate age of the places
d_boxplot['age'] = dt.datetime.now().year - d_boxplot['maxDate']  + d_boxplot['minDate']

# chose the places which are considered as special 
d_boxplot['SpecialPlace'] = d_boxplot['featureType'].isin(['monument', 'sanctuary', 'tomb', 
                                                           'amphitheatre', 'church', 'pyramid', 
                                                           'mosque', 'cemetery',
                                                           'monument,', 'sanctuary,', 'tomb,', 
                                                           'amphitheatre,', 'church,', 'pyramid,', 
                                                           'mosque,', 'cemetery,'])

# replace 
d_boxplot.replace(
    {True: 'Yes', False: 'No'},
    inplace=True
)

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  d_boxplot.dropna(


In [7]:
alt.Chart(
    data=d_boxplot,
    height=250,
    width=3000,
    padding=20,
    title=alt.TitleParams(
        text='Oldest locations compared to their type',
        fontSize=22,
        font=__CHART_FONT__
    )
).mark_boxplot(
    size=50,
    extent=0.5,
    outliers=False,
    median=alt.MarkConfig(
        stroke='black'
    )
).encode(
    x=alt.X(
        'featureType:N',
        axis=alt.Axis(
            title="Type of location",
            titleFontSize=18,
            titleFont=__CHART_FONT__,
            labelFontSize=14,
            labelFont=__CHART_FONT__,
            labelAngle=45
        ),
    ),
    y=alt.Y(
        'age:Q',
        axis=alt.Axis(
            title='Age (years)',
            titleFontSize=18,
            titleFont=__CHART_FONT__,
            labelFontSize=14,
            labelFont=__CHART_FONT__,
            tickCount=20
        )
    ),
    color=alt.Color(
        # colour boxes red/blue 
        'SpecialPlace:N',
        scale=alt.Scale(
            range=['#FF5722', '#00BCD4']
        ),
        legend=alt.Legend(
            titleFontSize=18,
            titleFont=__CHART_FONT__,
            labelFontSize=16,
            labelFont=__CHART_FONT__
        )
    )
)