# Interactive Data Visualization
##### (C) 2023-2025 Timothy James Becker: [revision 1.0](),  [GPLv3 license](https://www.gnu.org/licenses/gpl-3.0.html) 


## <u>Data Transforms</u>

#### <u>Data Type/Domain/Range</u>

There are many types of data in a programming language like Java or Python, but for the purposes of analysis and visualization, we group them based on how they behave. More specifcally let us look more closely at our data types one by one and also explore the domain and range of that type which will a nessesary part of mapping data into the pixel space. 


When we look at any column data it will have a type like integer counts shown below for the $X_2$ column.  This will naturally have a domain that corresponds. For example if the column represents counts of something the domain will be [0,inf).  The range on the other hand is simply the values that we want to map to the pixel or color space on our screen.  

<img src="figures/visualization_domain_range.png" alt="visualization_domain_range" width="700px">

If we don't have outside knowledge of the natural domain of each column, we can use the observations and take the minimal and maximal values as good end points which here will be 1 to 10 and map to the width portion of the pixel space which in our example starts a 0 and goes to 600 (for svg).

Our mapping therefore is a function from domain to range or: $[1,10]=>[0,600]$

When values are numbers such as integer or floating point value or counts (positive values) we can parse them using the + in the d3 data loader shown in [04_Loading_Data.ipynb]. When we have items like names or strings we will call them categorical. Examples are the name field in the cars.csv example file.  On the other hand, when we have strings but we have some understanding of a natural ordering in that domain such as: "hot", "warm", "cold" we call them ordinal. Since they have an ordering. 

#### <u>Scaling and Transforms</u>
Since our mapping involves endpoints, it may seem that linear functions are the only ones avaible but this is not the case since a linear function can easily be manipulated (transformed). First we will start by formulating a linear transformation from our domain of $[1,10]$ to the range $[0,600]$ keeping in mind that 1 would need to become 0 and 10 would have to become 600.

If we take our domain and map it to 0 to 1 first we can then offer its range scale and offset. To get any observed domain to become 0 to 1 we can simply use the maximal and minimal values for the column $X_i$

$\forall x \in X_i : f(x) = \frac{\left(x-min\{X_i\}\right)}{\left(max\{X_i\}-min\{X_i\}\right)}$

Lets test to see if this works for value 1 which should become 0:

$f(1)=\frac{\left(X_j-min\{X_i\}\right)}{\left(max\{X_i\}-min\{X_i\}\right)} = \frac{\left(1-1\right)}{\left(10-1\right)}=\frac{0}{9}=0$

And finally we will test the other endpoint which is 10:

$f(10)=\frac{\left(10-1\right)}{\left(10-1\right)}=\frac{9}{9}=1$

Now this 0 to 1 function can use the destination scale where $p_{min}$ is 0 and $p_{max}$ is 600 in our example:

$f(t) = f(x)*\left(p_{max}-p_{min}\right)+p_{min}$

And we will check this again for our value 1 should now become 0 and 10 should become 600:

$f(1) = \left(\frac{1-1}{10-1}\right)600+0 = (0)600=0$

$f(10) = \left(\frac{10-1}{10-1}\right)600+0 = (1)600=600$

And if we check a middle value like 5.5 for a linear function, we would also expect somewhere near 300:

$f(5.5) = \left(\frac{5.5-1}{10-1}\right)600+0 = (0.5)600=300$


We can write a javascript function to see if this works:

In [1]:
%%javascript
function scale(xs,w){
    let _min = xs[0];
    let _max = xs[0];
    let _scale = 1;
    for(let x of xs){
        if(x<=_min){ _min = x; }
        if(x>=_max){ _max = x; }
    }
    if(_max-_min>0){ _scale = _max-_min; }
    for(let i=0; i<xs.length; i++){
        xs[i] = ((xs[i]-_min)/_scale)*w;
    }
    return xs;
}

let width = 600;
let x_i   = [1,2,1,3,4,2,3,2,1,2,2,6,5,8,9,10,1,2,3,2,1,2,1,2,1,2,1];
alert(scale(x_i,width));

<IPython.core.display.Javascript object>

#### <u>D3 Builtin Scales</u>
Thanks fully d3 includes this same [linear scaling function](https://d3js.org/d3-scale/linear) along with many [more](https://d3js.org/d3-scale) which we can expriment with here in this notebook before trying to visualize it:

In [3]:
%%html
<div id="dd1"></div>
<script type="module"> 
    import * as d3 from "https://cdn.skypack.dev/d3@7"; 
    let mpg_scale = d3.scaleLinear([1,10],[0,600]);
    let div = d3.select('#dd1').text(mpg_scale(5.5));
</script>

#### <u>D3 Scale Example</u>

Now we will start from the basic webfolder template located [here](https://github.com/timothyjamesbecker/Interactive_Data_Visualization/tree/main/d3_template_webapp) and then modify the main.js file as shown below:

#### <u>D3 Drawing With the Scale</u>

Now we will add some additional code so that we can make a scatter plot of mpg and hp using linear scales

#### <u>Data Type Transform</u>

But what about the other types such as categorical data or ordinal. We it can be very simple in that we used enumeration or we can creat our own map literal and use that.

In [4]:
%%javascript

let M = {"hot":4,"warm":3,"cool":2,"cold":1,"freezing":0};
let xs = ["hot","hot","warm","cold","freezing"];
let dt = [];
for(let x of xs){ dt.push(M[x]); }
alert(dt);

<IPython.core.display.Javascript object>

#### Exercises
#### [1] Try another type of scale from the [d3-scale](https://d3js.org/d3-scale) page such as log.  
#### [2] Look back at the cars data set and pick a new variable and then modify the main.js file to make a transform to pixel space.

#### [3] Try to draw different color circles.

#### [4] How can you modify the scale to keep the circles from being cut off? Hint: use the equations above to make some room (margins).