# California Fires - Project

## Client Interaction
The client for this project was Mr. Monjur ul Hasan, who had adequate knowledge in regards to the domain and data visualization. 
The client's request was in the management domain, but due to confidentiality purposes, the visuals shown are for California Forest Fires.

While both the domains are very different, the reason the visuals were able to switch was due to the consistency of structure (Time Series Event data set).

The client wanted a solution to a visual problem, "Given a dataset consisting of four columns, namely, Name, Start Date, End Date, and Intensity; I want a visual that is easy to look at and yet conveys the information it has to. The graph should be able to show the events with duration and intensity togather. The existing visuals are too big to look at, so I need something simple."  

By existing visuals, the client was referring to the well known Gantt Chart. A lot of interaction took place between the client and me; the client was most satisfied with the Timeline Balloons Animated Variations. When the above mentioned visual was shown, the least amount of explanation was required, and the client was quickly able to decode the information. 

## Visualization Problem

The purpose of the notebook is to analyze the dataset acquired from the client viz.., California Forest Fires and satisfy client requirements, ie.., show all the four above mentioned variables in a visualization that is easy to decode at a glance.

The notebook starts by showing a Gantt Chart, which conveys most of the information in a traditional sense, then proceeds to a single axis  Timeline balloons colored and animated variations and also shows a minimal experimental version of the Timeline Balloons.

## Introduction 

#### Sources

`Calfire.csv` - https://gist.github.com/lazarogamio/d64e0d04b1ce1f2a3bd08db7526fa632

`Chart` - http://bl.ocks.org/dk8996/5538271

`Chart` - https://github.com/denisemauldin/d3-timeline

The Calfire.csv was scraped by Lazaro from the california forest fires website. The chart sources above, provided an abstract template to build upon. Heavier modification was required on the core library functionality for the timeline chart in comparision to the gantt chart and hence the modified versions of the code are compatible with the project while the original versions aren't.

## Domain Explanation

The California Forest Fires are incidents recorded by The California Department of Forestry and Fire Protection. Hence each event corresponds to an event in time series domain and can be qualified as a time-series event dataset.

The dataset being used consists of events starting from 2000 to 2017. The records are more accurate with the progression of each year, meaning that the year 2000 dataset has a lot of corrupt and missing values for some columns compared to the years after. One of the reasons for this might be technological advancement or better rules enforcement.

Specific units allocated for the area monitoring did the recording of the events, these units were responsible for maintaining a record consisting of the name of the fire, start and end dates of the fire, the acres of land burned, and the cause for the fire.



## Data Definition

The raw California fires dataset consists of a lot of columns that aren't used. Column wise data definition of the raw data:

* Id - Unique id given to the fire
* Unit - The unit that was responsible for the fire monitoring
* Name - Unique Name given to the fire
* Start - The start date of the fire in the format DD-MM-YYY
* End - The end date of the fire in the format DD-MM-YYY
* Agency - The name of the agency that took care of the report and cause monitoring.
* Acres - acres of land burned
* Cause - The cause of the fire.



## Data Cleaning and Processing

The data was split into multiple csv's for better visualization. The csv were split with respect to the year the fire started.

Also, since the acres column had the number of acres of land burned, the column was normalized on a scale of 0 to 1 with each step of .1 with max decimal place of 1. 

The start and end dates were also converted into unix time stamps for easier axis integeration. While these arent used by the gantt chart, they are made useful in the timeline charts.

All of the csv's are sorted in ascending order by the start day of the fire.




## Muzner's What
**What** : Dataset type is `Tables`

**Dataset Columns**: 

Name -> Categorical -> Contains name of fire

Start Date -> Ordered (Quantitative -> Sequential) -> Contains start date of the fire in format (DD-MM-YYYY).

Starting Time -> Ordered (Quantitative -> Sequential) -> Contains start date of the fire in unix time format (milliseconds).

End Date -> Ordered (Quantitative -> Sequential) -> Contains end date of the fire in format (DD-MM-YYYY).

Ending Time -> Ordered (Quantitative -> Sequential) -> Contains end date of the fire in unix time format (milliseconds).

Intensity -> Ordered(Ordinal) ->  Contains numerical values on a range of 0 to 1.

``` Muzners What analysis remains the same throught the graphs and hence is only mentioned in this section ```

## Gantt Chart - Model 1
## Muzner's Theory

What is the dependency of intensity on the duration of the event?


**Why**: {`Locate`, `Attributes -> Many -> Dependency` }

In regards to action, our location is unknown but the target is known hence action is `Locate` , In accordance with our data, the target has many attributes and hence we are trying to find dependency ie.., start/end dependencies between names with intensity.

**Task**: emphasize temporal overlaps, start/end dependencies between items

**How**: ` Arrange -> Separate`, `Map - Ordered -> Color -> Saturation, Ordered -> Size -> length`

**Idiom Structure Analysis**:

**Mark**: line

  * length: duration

**Channels**:

  * Vertical Position (Name)
  * Horizontal Position (Start and End)
  * Color Hue (Intensity)
  * Size (length) (Start and End)


**Idioms**: `Gantt Chart` - 1 Categorical Attribute (Name), 2 Quant attributes (Start, End)  

### Graph 1 Explaination:

**Methodology**:

According to muzner, a ordinal attribute if used in color must be a saturation but in the graph hue was used. The reason was hue did a better job at showin the fire transition from 0 to 1 than the saturation.

Due to that it was easier for us to find if a dependency actually did exist, which in this case did not exist between intensity and duration of the event.

But even though a thorough use of space is made, the gantt chart would increase by as much as the Names on the vertical axes increases. Which makes us question, if at any given year the names were to be doubled would we be able to decode the same information quickly?

**Visualization**: 

The Gantt chart does a pretty good job of explaining the dependency of the intensity on the duration of the event.  Even scrolling through the years, it can be seen that most high-intensity events occur in June and August but there doesnt seem to be a relationship between the intensity and duration of the event ie... intensity tied to the length of the event.



All in all the visual was pretty successful in the current scenario as it was able to help us arrive at an answer.

### Graph 1 Links:

Local links are locations to the files locally on the assignemnts folder and hence cannot be viewed.

Demo links on the other hand are hosted on mun account and can be viewed freely.

The chart defaults to `Timeline Balloons Colored - Model 1`


**Local**: <a href="./calfire-box.html"> Link </a> 

**Demo**: <a href="http://www.cs.mun.ca/~ufshaik/Project/calfire-box.html"> Link </a>

Please select `Gantt chart - Model 1` from the first select tool.


## Timeline Balloons Colored - Model 1

What is the intensity dependency on the duration of the event?

**Why**: {`Locate`, `Attributes -> Many -> Dependency` }

In regards to action, our location is unknown but the target is known hence action is `Locate` , In accordance with our data, the target has many attributes and hence we are trying to find dependency ie.., start/end dependencies between names with intensity.

**Task**: emphasize temporal overlaps, start/end dependencies between items

**How**: ` Arrange -> Separate`, `Map - Ordered -> Color -> Saturation, Ordered -> Size -> length / Area`

**Idiom Structure Analysis**:

**Mark**: line + point

 * line length: duration
 * point: Start of event 

**Channels**:

  * Tilted vertical dashed line connecting marks
  * Line Horizontal Position (Start and End)
  * Line Color - Saturation (Name)
  * Point Horizontal Position (Start)
  * Point Color - Saturation (Intensity) 
  * Point Area Size (Intensity) 


**Idioms**: `Timeline chart colored` - 1 Categorical Attribute (Name), 3 Quant attributes (Start, End, Intensity)  


**Methodology**:

According to muzner, a ordinal attribute if used in color must be a saturation but in the graph hue was used. The reason was hue did a better job at showin the fire transition from 0 to 1 than the saturation. Also Saturation was used instead of Hue for categorical data which is name, so as to detect the possible overlaps in duration. 

The x axis line represents the duration of the event, while the balloons represent the start day of the event. As the intensity increases, so does the hue color and the size of the circle increase.

A user can scroll indefinitely until he hits the last date along the x axis, if there are multiple events on the same day then they are stacked on top of each other distinguishable only by intensity. A user can also drag the balloons on the y axis scale for comparision purposes.

The timeline chart expands on the x axis scale for as much data provided. But still remains concise in conveying information. The line attaching the marks provides a better visual treat in knowing the start date without having to interact with the balloon

**Visualization**:

Same question is asked in the timeline balloons chart to see if a user is able to arrive at the same conclusion.

Comparing through the years, the timeline hides one of the main thing a gantt chart shows inherently which is the duration on the x axis. With timeline chart, the duration on the x axis gets overlapped by other durations and hence cannot be perceived accurately for a given event.

But although this distinction clearly exists, the red color pop outs on the balloons make it easier to actually locate the targets and hover gives more information on the start and end date. This way the user has concealed information unless an interaction takes place.



The visual pretty successful in the current scenario and was able to generate the same answer as the gantt chart with a lot less clutter.

### Graph 2 Links:

Local links are locations to the files locally on the assignemnts folder and hence cannot be viewed.

Demo links on the other hand are hosted on mun account and can be viewed freely.

**Local**: <a href="./calfire-box.html"> Link </a> 

**Demo**: <a href="http://www.cs.mun.ca/~ufshaik/Project/calfire-box.html"> Link </a>

Please select or search for `Timeline Ballons Unstacked - Model 1` from the first select tool.


## Timeline Balloons Animated - Model 2

At what point did the intensity actually increase?

**Why**: {`Locate`, `Attributes -> One -> Extremes` }

In regards to action, our location is unknown but the target is known hence action is `Locate` , In accordance with our data, the target has a single attribute and hence we are trying to find extreme locations in the distribution.

**Task**: emphasize temporal overlaps, start/end dependencies between items

**How**: ` Arrange -> Separate`, `Map - Ordered -> Color -> Saturation, Ordered -> Size -> length / Area`

**Idiom Structure Analysis**:

**Mark**: line + point

 * line length: duration
 * point: Start of event 

**Channels**:

  * Tilted vertical dashed line connecting marks
  * Line Horizontal Position (Start and End)
  * Line Color - Saturation (Name)
  * Point Horizontal Position (Start)
  * Point Color - Hue (Intensity) (Animation) (but should be Saturation)
  * Point Area Size (Intensity) 


**Idioms**: `Timeline chart Animated` - 1 Categorical Attribute (Name), 3 Quant attributes (Start, End, Intensity)  

**Methodology**:

New channels are added to the chart, in comparision to the previous one.
The point color previously representing just the intensity in hue, is now animated. A blink animation is implemented for it. As the intensity increases, the size increases and blink gets faster and the vice versa in the opposite case.

According to muzner, animations if in many states should be considered in small multiples instead of on a larger visual scale, which is the basis of the animation and the scroll behaviour. Blinks quite naturally represent the intensity and hence is easier for humans to perceive than color.


**Visualization**:

The animated behaviour helped us in evaluating if a user is able to easily detect an intensity spike point. This model was liked by the client very much and needed little explanation, as the visuals were easy to decode.

It is easier to scroll through and look for unnatural blinking patterns in the graph.

Hence the graph was successful at locating the intensity spikes.

### Graph 3 Links:

Local links are locations to the files locally on the assignemnts folder and hence cannot be viewed.

Demo links on the other hand are hosted on mun account and can be viewed freely.

**Local**: <a href="./calfire-box.html"> Link </a> 

**Demo**: <a href="http://www.cs.mun.ca/~ufshaik/Project/calfire-box.html"> Link </a>

Please select or search for `Timeline Ballons Unstacked - Model 2` from the first select tool.

## Balloons Animated - Model 3

Can I compare trends through the years using increased intensity points?

**Why**: {`Compare`, `Trends`} 

In regards to action, the user should be able to compare trends. which can be done by toggling the years. But the underlying problem is how successful is this visualization at helping the user to compare trends faster.

**Task**: emphasize temporal overlaps with respect to intensity

**How**: ` Arrange -> Separate`, `Map - Ordered -> Color -> Saturation, Ordered -> Size -> Area`

**Idiom Structure Analysis**:

**Mark**: point

 * point: Start of event 

**Channels**:

  * Point Horizontal Position (Start)
  * Point Color - Hue (Intensity) (but should be Saturation)
  * Point Area Size (Intensity) 


**Idioms**: `Balloons Animated` - 1 Categorical Attribute (Name), 3 Quant attributes (Start, End, Intensity)  

**Methodology**:

The duration lines along x axis and the tilt line connecting the marks were removed, in favor of event start timings with animation.

Similar methodology points apply as the previous chart for existing channels.


**Visualization**:

Since in the previous visualization it was easier for the user to detect intensity spikes due to points, the graph was implemented only for points.

If we were to look at trends going through years on when the intensity spikes actually happened, it is a lot easier. But, if the user was to choose all years, the graph gets pretty clumsy and it becomes pretty hard to detect the visuals since it also violates muzner's visual principle.

The graph is successful at comparing the trends individually, but fails to do so if all the years are clustered on the same graph.

### Graph 4 Links:

Local links are locations to the files locally on the assignemnts folder and hence cannot be viewed.

Demo links on the other hand are hosted on mun account and can be viewed freely.

**Local**: <a href="./calfire-box.html"> Link </a> 

**Demo**: <a href="http://www.cs.mun.ca/~ufshaik/Project/calfire-box.html"> Link </a>

Please select or search for `Timeline Ballons Unstacked - Model 3` from the first select tool.

## Conclusion

The timeline balloon graphs proved to be satisfactory to the client but the opinion differs between every other person.

It doesn't follow one of the main concepts on muzner, which is utilizing the space. We only make use of x axis and not the y axis. 

Even the animation variation, although very helpful for smaller datasets would prove to be useless if plugged with a larger dataset due to a lot of animation states present in the same multiple.

Perhaps there is a better visual waiting if we were to plot start and end date on the x and y axis and intensity to color the point or perhaps construct duration on x axis and intensity on y axis to create a gantt like visual.

But violating two of the principles of muzner where hue was applied to Intensity instead of saturation, and Saturation was applied to x axis duration instead of hue provided to be useful. Also constructing multiple marks on the x axis yielded in fruitful results.

In regards to suggestion on Muzner's methodology, sometimes its hard decide the why, after you have constructed the idioms. Also, its very confusing the way she uses ordinal and quantitative attributes. Which often ends up in a confusion unless referred to the idioms Key, Attribute pair for clarification.

