# Interactive Data Visualization
##### (C) 2023-2025 Timothy James Becker: [revision 1.0](),  [GPLv3 license](https://www.gnu.org/licenses/gpl-3.0.html) 

## <u>Time Visualization</u>

Time can be an incredibly complex data type to work with, which is why we provide a dedicated treatment so that we can learn about the potential issues and how to correct them to be able to ethically visualize our results. In this section we will look at both preprocessing steps we can take for time-series analysis using the built in python3 [datetime](https://docs.python.org/3/library/datetime.html#) types which offer robust analysis (up to an including aggregation). For the visual component we will make use of the robust [d3 treatment of time](https://d3js.org/d3-time-format) which will use a well-established basis for working with time. Interestingly enough time is entirely dependence on spatial location in our world and as a result everything discussed in the [10_Spatial_Visualization.ipynb] is important to keep in mind. For example, one time point in a location (latitude, longitude) will have a different time on some coordinates but not others which is called the time zone. We will discuss this and how to handle this spatial component in the dedicated time analysis section.

<img src="https://upload.wikimedia.org/wikipedia/commons/8/88/World_Time_Zones_Map.png" alt="time zones" width="1000px">


#### <u>Coordinated Universal Time</u>

Coordinated Universal Time or [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) is the basis for dealing with time (synchronizing time reference points).  UTC is the elapsed time for a singular time reference point which is then offset (+ or -) based on the rotational coordinate (roughly). It does not include provisions for local time adjustments such as [Daylight Savings Time](https://en.wikipedia.org/wiki/Daylight_saving_time).  Each day has nearly 86400 (60 seconds * 60minutes * 24 hours) with the exception of leap seconds that are used to make adjustments to Earth rotation time or Universal Time [UT1](https://en.wikipedia.org/wiki/Universal_Time). Using the chart shown above you can see that CT lies within band UTC -4 and Japan is in UTC +9.

#### <u>Parsing Dates</u>

Time needs to be in something like UTC in order for us to analyze or visualize it. We will start by looking at the process of reading in common date notations as strings and then converting them into a UTC aware numeric time type. Some commonly used date formats in the USA are:

Month/Day/Year

1/24/23

or

01/24/23

or

01/24/2023

See how the different forms of a standard Month/Day/Year need to be better specified? Basically, we want to make sure we have a certain number of digits for Day, Month, Year which will make processing our dates easier and faster:

MM/DD/YY (provides some understanding of how many digits for each, here there are 2)

Lets use the python3 datetime library to work on some historical stock data AAPL_Historical_Data.csv which can be found in this OER repository data section.


In [5]:
#loading in time-based stock data into python3 data analysis system using datetime
with open('data/AAPL_Historical_Data.csv','r') as f:
    raw = [row.replace('\n','').split(',') for row in f.readlines()]
header,raw = raw[0],raw[1:]
header,raw[0:10]

(['Date', 'Close/Last', 'Volume', 'Open', 'High', 'Low'],
 [['11/07/2025', '$268.47', '48227370', '$269.795', '$272.29', '$266.77'],
  ['11/06/2025', '$269.77', '51204050', '$267.89', '$273.40', '$267.89'],
  ['11/05/2025', '$270.14', '43683070', '$268.61', '$271.70', '$266.93'],
  ['11/04/2025', '$270.04', '49274850', '$268.325', '$271.486', '$267.615'],
  ['11/03/2025', '$269.05', '50194580', '$270.42', '$270.85', '$266.25'],
  ['10/31/2025', '$270.37', '86167120', '$276.99', '$277.32', '$269.16'],
  ['10/30/2025', '$271.40', '69886530', '$271.99', '$274.14', '$268.48'],
  ['10/29/2025', '$269.70', '51086740', '$269.275', '$271.41', '$267.11'],
  ['10/28/2025', '$269.00', '41534760', '$268.985', '$269.89', '$268.15'],
  ['10/27/2025', '$268.81', '44888150', '$264.88', '$269.12', '$264.6501']])

We will work with the first two columns for now which have the date and the closing price. We will make use of the datetime python library which has a nice string to time parser [strptime](https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime). Notice that our data above follows a MM/DD/YYYY date pattern which will translate into our [parser syntax](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes).

How can we determine how each part of the date is structured in the input data? We can either look at it in a CSV viewer (carefully) and make some mistakes, or we can use a more data-orientated and error free approach (professional analysis).

In [9]:
# look at all the values for day, month and year using sets/fictionaries in python

months = sorted(set([row[0].split('/')[0] for row in raw]))
days   = sorted(set([row[0].split('/')[1] for row in raw]))
years  = sorted(set([row[0].split('/')[2] for row in raw]))
print(months)
print(days)
print(years)

['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31']
['2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024', '2025']


In [52]:
#continue using python for datetime analysis
from datetime import datetime as dt
from datetime import timezone as tz
from datetime import timedelta as td
import numpy as np

date = raw[0][0]
d = dt.strptime(date,'%m/%d/%Y') #%m is zero-padded month, %d is zerp-padded day, %Y is four-digit year
d

datetime.datetime(2025, 11, 7, 0, 0)

Notice here that when a time is not given, it will default to the very start of the date provided: 12:00am (or 00:00 in military time). Similar to parsing dates we can also print them using [strftime](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [14]:
d.strftime('%d-%m-%y') #now we output in day month year formate (popular in Europe)

'07-11-25'

Adding time components is similar where we have hours, minutes, seconds:

In [18]:
date = '12-25-26 06:20:02' #this is in month-day-year hours:minutes:seconds format
d = dt.strptime(date,'%m-%d-%y %H:%M:%S')
d

datetime.datetime(2026, 12, 25, 6, 20, 2)

Using this knowledge we can complete some analysis that would be hard to do otherwise: find the average closing price of the stocks per month (there are a variable number of days per month so you can't just take 30 days...)

In [27]:
#read in the data as datetime using strptime() and read closing price as float
data = [[dt.strptime(row[0],'%m/%d/%Y'),float(row[1].replace('$',''))]for row in raw]
data[0:10]

[[datetime.datetime(2025, 11, 7, 0, 0), 268.47],
 [datetime.datetime(2025, 11, 6, 0, 0), 269.77],
 [datetime.datetime(2025, 11, 5, 0, 0), 270.14],
 [datetime.datetime(2025, 11, 4, 0, 0), 270.04],
 [datetime.datetime(2025, 11, 3, 0, 0), 269.05],
 [datetime.datetime(2025, 10, 31, 0, 0), 270.37],
 [datetime.datetime(2025, 10, 30, 0, 0), 271.4],
 [datetime.datetime(2025, 10, 29, 0, 0), 269.7],
 [datetime.datetime(2025, 10, 28, 0, 0), 269.0],
 [datetime.datetime(2025, 10, 27, 0, 0), 268.81]]

We should understand that the datetime type in python is sortable and works just like integers because the underlying structure is seconds since a reference point (UTC)

In [28]:
#python can easily sort timeseries
data = sorted(data, key=lambda x: x[0]) #this will sort based on earliest time to latest (past to present)
data [0:10]

[[datetime.datetime(2015, 11, 10, 0, 0), 29.1925],
 [datetime.datetime(2015, 11, 11, 0, 0), 29.0275],
 [datetime.datetime(2015, 11, 12, 0, 0), 28.93],
 [datetime.datetime(2015, 11, 13, 0, 0), 28.085],
 [datetime.datetime(2015, 11, 16, 0, 0), 28.5437],
 [datetime.datetime(2015, 11, 17, 0, 0), 28.4225],
 [datetime.datetime(2015, 11, 18, 0, 0), 29.3225],
 [datetime.datetime(2015, 11, 19, 0, 0), 29.695],
 [datetime.datetime(2015, 11, 20, 0, 0), 29.825],
 [datetime.datetime(2015, 11, 23, 0, 0), 29.4375]]

Now we can gather all the data for each month (across all years):

In [29]:
#python object orientated datetime object have dot notation for instance variables
d = data[0][0]
d.day,d.month,d.year

(10, 11, 2015)

In [31]:
#python find average month across years which has the highest close

months = {} #dictionary to hold a list of datetime objects
for d,p in data:
    if d.month in months: months[d.month] += [p]
    else:                 months[d.month]  = [p]
        
for m in months: months[m] = round(np.mean(months[m]),2)
months #October is the highest average month of closing stock for Apple. Any ideas why?

{11: 110.51,
 12: 109.77,
 1: 107.43,
 2: 109.51,
 3: 105.04,
 4: 105.87,
 5: 106.99,
 6: 108.56,
 7: 118.78,
 8: 122.01,
 9: 124.35,
 10: 125.96}

#### <u>Time Zones</u>

We can also add timezone information to our datetime objects so that we can compute more advanced temporal problems. Lets parse the data again but we will set the timezone as well.

In [63]:
#timezone python object
tzone  = tz(td(hours=-4)) #for Eastern offset from UTC in the chart
tzone

datetime.timezone(datetime.timedelta(days=-1, seconds=72000))

In [70]:
#using the timezone aware object to make your data analysis timezone aware
data = [[dt.strptime(row[0],'%m/%d/%Y').replace(tzinfo=tzone),float(row[1].replace('$',''))]for row in raw]
data[0:10]

[[datetime.datetime(2025, 11, 7, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
  268.47],
 [datetime.datetime(2025, 11, 6, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
  269.77],
 [datetime.datetime(2025, 11, 5, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
  270.14],
 [datetime.datetime(2025, 11, 4, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
  270.04],
 [datetime.datetime(2025, 11, 3, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
  269.05],
 [datetime.datetime(2025, 10, 31, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
  270.37],
 [datetime.datetime(2025, 10, 30, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
  271.4],
 [datetime.datetime(2025, 10, 29, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
  269.7],
 [datetime.datetime(2025, 10, 2

#### <u>Time Delta</u>

A core operation in any temporal analysis is dealing with the distance between times or more simply: the time delta (time difference or change in time which delta usually represents)


In [77]:
#python has comprehensive timezone aware distance built in!

data[0][0]-data[-1][0] #data span of this data is around ~10 years

datetime.timedelta(days=3650)

#### <u>Using D3 time formatting to manipulate time</u>

We will also look at using [d3 time formatting](https://d3js.org/d3-time-format) for parsing CSV files as well. In general, when the data is not nice and has to be fixed, you should use something like python while once the data has been fixed it can be fed into a d3 web app. Here we will switch over to the HTML/JS d3 stack to show how d3 can directly manipulate date/time data.


In [79]:
%%html
<div id="dd1"></div>

<script type="module"> 
    import * as d3 from "https://cdn.skypack.dev/d3@7"; 
    const date  = "11/07/2025";
    const parse = d3.utcParse("%m/%d/%Y");
    alert(parse(date));
</script>

Try the example above and play around with extracting different date formats (there are exercises that you can do in this cell here at the end of this section)

Let’s build a new d3 based line graph that will consume the AAPL stocks one year at a time, knowing that we have actually 10 years of data. If we now manipulate our data appropriately, we can provide numerous visual insights into year stock visualization.  First, we will start with a fresh [d3 web app](https://github.com/timothyjamesbecker/Interactive_Data_Visualization/tree/main/d3_template_webapp) which you can download or make a copy of to start. You then should download the [AAPL_Historical_Data.csv]() from the data folder and put it inside your web app folder so that the d3 library can see it using a relative path. We will write a basic CSV data loader and use our d3 time parser to convert all data into [d3 time](https://d3js.org/d3-time) objects. Once we are there, we can then process them by year to gather multiple years of data, or write a more complex framework that would allow each year to be selected for visualization.

<img src="figures/webstorm_aapl_time.png" alt="webstorm aapl time" width="800px">


If we didn't have the ability to manipulate time data, we would simply visualize the data over the course of all tens years using a line plot:

<img src="figures/aapl_10_years.png" alt="aapl ten years" width="800px">

But what if we want to look and compare each year-to-year growth? That is a more difficult task and one that will require working with our data based on its time. Mainly how do we organize the time series when we want to compare one year to another?  One way we will employ is to pull out the year for each data point and use it as a key into a separate data structure (dictionary in python) that will basically partition the data by year.  If we simply chop the data by year we won't see the overall yearly growth and so we can also normalize and even standardize it to get a better visual result of year-to-year performance.


In [None]:
function get_yearly(data){
    let new_data = {};
    for(let i=0; i<data.length; i++){
        let k = data[i].date.getUTCFullYear();
        let d = new Date(2025,Number(d3.timeFormat('%m')(data[i].date))-1,d3.timeFormat('%d')(data[i].date));
        if(k in new_data){ new_data[k].push({'date':d,'close':data[i].close,'volume':data[i].volume});}
        else { new_data[k] = [{'date':d,'close':data[i].close,'volume':data[i].volume}]; }
    }
    for(let k in new_data){
        let m = d3.mean(new_data[k],(d)=>{ return d.close; })                        //mean
        let s = Math.pow(d3.sum(new_data[k],(d)=>{ return (d.close-m)**2; }),0.5) //stdev
        for(let i=0; i<new_data[k].length; i++){
            new_data[k][i].close = (new_data[k][i].close-m)/s;
        }
    }
    return new_data;
}

d3.dsv(",", "AAPL_Historical_Data.csv", (d) => {
    return {
        date:    d3.utcParse("%m/%d/%Y")(d.Date),
        close:  +d['Close/Last'].replace("$",""),
        volume: +d.Volume
    };
}).then((data,err)=>{
    //draw out d3 code here 
});

In [None]:
//this is the draw code that goes in the cell placement above

let new_data = get_yearly(data);
let close = [0,0];
for(let k in new_data){
    let mx = [d3.min(new_data[k],(d)=>{return d.close;}),d3.max(new_data[k],(d)=>{return d.close;})];
    if(mx[0]<=close[0]){close[0]=mx[0];}
    if(mx[1]>=close[1]){close[1]=mx[1];}
}
console.log(close)
let height = 800, width = 1200;
var svg = d3.select("#div1")
    .append("svg").attr("width", width+40).attr("height", height+40)
    .append("g").attr("transform","translate("+80+","+0+")");

var x = d3.scaleTime()
    .domain([new Date(2025,0,1), new Date(2025,11,31)]).range([0, width])
svg.append("g")
    .attr("transform", "translate(0,"+height+")").call(d3.axisBottom(x).ticks(d3.timeMonth.every(1)) // One tick for each month
    .tickFormat(d3.timeFormat("%B")));
var y = d3.scaleLinear()
    .domain(close).range([height, 0]);
svg.append("g").call(d3.axisLeft(y));

var colors = {2016:'#9e0142',2017:'#d53e4f',2018:'#f46d43',2019:'#fdae61',2020:'#fee08b',
              2021:'#e6f598',2022:'#abdda4',2023:'#66c2a5',2024:'#3288bd',2025:'#5e4fa2'};
for(let k in new_data) {
    svg.append("path")
        .datum(new_data[k])
        .attr("fill", "none").attr("stroke", colors[k]).attr("stroke-width", 2)
        .attr("d", d3.line().curve(d3.curveBundle.beta(0.0)) //lower beta towards 0 to see its smoothing effects
            .x(function (d) {return x(d.date)})
            .y(function (d) {return y(d.close)})
        );
}

First, we define a function to gather all the months of data for each of the ten years. We use an object with the true year as the key and then extract the month and day values by using the Number(d3.timeFormat('%m'))-1 which sets it to the weird month index that is required by the JavaScript Date object. Once we have built a dictionary of all the monthly datasets, we can normalize them to the mean which will center all the years at zero. After that they get divided by the standard deviation which will also scale each year to its local making each year vastly more comparable to each other year. Finally, we apply some curvature to the resulting lines so that we can control the ability to look at each year’s trend.

<img src="figures/aapl_d3_std.png" alt="aapl d3 std" width="800px">

When we change the beta value from 1.0 toward 0 we will see how the stock performed over time. The results will show only two of the ten years had a negative trending stock: 2019 and 2024.

<img src="figures/aapl_d3_std_beta.png" alt="aapl d3 std beta" width="800px">


# Exercises

#### [1] Get some practice with the python3 datetime parser as shown in the beginning of this section. Try to parse the following examples: 

"March 27th, 2008"

"1/23/15 23:21:01"

"02/12/97"

"1:22pm, Tuesday, November 11th, 2025"

#### [2] Starting with the AAPL datset above, try experimenting to visualize another column variable (volume, opening cost, etc)
#### [3] Using a different stock data from the github repo (such as NVDA_Historical_Data.csv) complete the same visualization and look for any bad performance (negative line) years.
#### [4] Redo the color choices shown here so that it uses a d3-color scale. If you manage that you can then easily add a [d3 legend](https://observablehq.com/@d3/color-legend) to the plot to show which years are low-performance
