Our data set is defined as below:

1. RIDGE along x axis of pollen
2. NUB along Y axis of pollen
3. CRACK along Z axis of pollen
4. Pollen Grain Weight
5. Pollen Grain Density

I hypothesize that the weight and density have a positive correlation, especially as the other three traits are minimized (as density is mass / volume, so if volume is minimized and mass is maximized, then our density should be greater).

Below, the code generates a line graph with the WEIGHT on the x axis and DENSITY on the y axis. I put WEIGHT on the x axis as the independent variable and DENSITY as the dependent variable as DENSITY should depend on weight according to my hypothesis. As such, generally a positive correlation should exist in 2D for these two characteristics.

In [1]:
from string import Template
from IPython.core.display import HTML
import json
import pandas as pd
import numpy as np

html_template = Template('''
<head>
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
    <script src="https://d3js.org/d3.v3.min.js"></script>
    <style type="text/css"> $css_text </style>
</head>
<body>
<div id="fillerBody" class="svgContainer"></div>

<script type="text/javascript"> $js_text </script>
</body>
''')

css_text = '''
.svgContainer {
    width: 1080px;
    height: 720px;
}
'''

js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;

// set the ranges
var weights = [];
var densities = [];
data.forEach(e => {
    weights.push(e.WEIGHT);
    densities.push(e.DENSITY);
})

var x = d3.scale.linear().range([0, width]).domain([Math.min(...weights), Math.max(...weights)]);
var y = d3.scale.linear().range([height, 0]).domain([Math.min(...densities), Math.max(...densities)]);

//define line for WEIGHT vs DENSITY
var line = d3.svg.line()
    .x(function(d) { return x(d); })
    .y(function(d) { return y(d); });

var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");
// Add dots
svg.append('g')
    .selectAll("dot")
    .data(data)
    .enter()
    .append("circle")
    .attr("cx", function (d) { return x(d.WEIGHT); } )
    .attr("cy", function (d) { return y(d.DENSITY); } )
    .attr("r", 5)
    .style("fill", "#9B870C")
    

// Add the X Axis
svg.append("g")
    .attr("transform", "translate(0," + height + ")")
    .call(d3.svg.axis().scale(x).orient("bottom"));
svg.append("text")             
    .attr("transform",
    "translate(" + (width/2) + " ," + 
    (height + margin.top + 20) + ")")
    .style("text-anchor", "middle")
    .text("Pollen Grain Weight");

// Add the Y Axis
svg.append("g")
      .call(d3.svg.axis().scale(y).orient("left"));
// text label for the y axis
svg.append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 0 - margin.left)
    .attr("x",0 - (height / 2))
    .attr("dy", "1em")
    .style("text-anchor", "middle")
    .text("Pollen Grain Density");  
'''

js_text_template = Template('''
var data = $data;
''' + js_text)

data = pd.read_csv('/groups/5754_f2019/HW5_data/pollen.csv')
js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))

We can somewhat see that the cluster seems a bit tilted upwards (positive relationship). However, it's very hard to visualize as the scatter plot is extremely bundled up. There's a few approaches we can take to reducing this bundling: 

#1. We can alter the alpha scale so that overlapping dots are more opaque and so denser clusters are more opaque.
#2. We can alter the scale of the dots so that dots aren't as clustered or overlapping.
#3. We can change the visual representation of the dots in some way, such as adding a stroke or changing the shape of data points.

I'll be looking at approach 2, as it is the simplest to implement.

In [2]:
js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
// set the ranges
var weights = [];
var densities = [];
data.forEach(e => {
    weights.push(e.WEIGHT);
    densities.push(e.DENSITY);
})

var x = d3.scale.linear().range([0, width]).domain([Math.min(...weights), Math.max(...weights)]);
var y = d3.scale.linear().range([height, 0]).domain([Math.min(...densities), Math.max(...densities)]);

//define line for WEIGHT vs DENSITY
var line = d3.svg.line()
    .x(function(d) { return x(d); })
    .y(function(d) { return y(d); });

var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");
// Add dots
svg.append('g')
    .selectAll("dot")
    .data(data)
    .enter()
    .append("circle")
    .attr("cx", function (d) { return x(d.WEIGHT); } )
    .attr("cy", function (d) { return y(d.DENSITY); } )
    .attr("r", 2)
    .style("fill", "#9B870C")
    

// Add the X Axis
svg.append("g")
    .attr("transform", "translate(0," + height + ")")
    .call(d3.svg.axis().scale(x).orient("bottom"));
svg.append("text")             
    .attr("transform",
    "translate(" + (width/2) + " ," + 
    (height + margin.top + 20) + ")")
    .style("text-anchor", "middle")
    .text("Pollen Grain Weight");

// Add the Y Axis
svg.append("g")
      .call(d3.svg.axis().scale(y).orient("left"));
// text label for the y axis
svg.append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 0 - margin.left)
    .attr("x",0 - (height / 2))
    .attr("dy", "1em")
    .style("text-anchor", "middle")
    .text("Pollen Grain Density");  
'''

js_text_template = Template('''
var data = $data;
''' + js_text)

js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))

Well, look at that. The density of the scatter plot is significantly decreased, and we can somewhat confirm the trend; though, it's not a very tight fit at all so the relationship is quite hard to determine, really. This could be a result of the other three variables at hand for each of these data points, as if they change, then the density value is also changed. I suspect that they will have a negative relationship, as if the dimensions of a pollen grain are decreased (volume decrease), then the density should increase. I'll confirm this by graphing the other three using the same technique here.

In [3]:
js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
// set the ranges
var ridges = [];
var densities = [];
data.forEach(e => {
    ridges.push(e.RIDGE);
    densities.push(e.DENSITY);
})

var x = d3.scale.linear().range([0, width]).domain([Math.min(...ridges), Math.max(...ridges)]);
var y = d3.scale.linear().range([height, 0]).domain([Math.min(...densities), Math.max(...densities)]);

//define line for RIDGE vs DENSITY
var line = d3.svg.line()
    .x(function(d) { return x(d); })
    .y(function(d) { return y(d); });

var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");
// Add dots
svg.append('g')
    .selectAll("dot")
    .data(data)
    .enter()
    .append("circle")
    .attr("cx", function (d) { return x(d.RIDGE); } )
    .attr("cy", function (d) { return y(d.DENSITY); } )
    .attr("r", 2)
    .style("fill", "#9B870C")
    

// Add the X Axis
svg.append("g")
    .attr("transform", "translate(0," + height + ")")
    .call(d3.svg.axis().scale(x).orient("bottom"));
svg.append("text")             
    .attr("transform",
    "translate(" + (width/2) + " ," + 
    (height + margin.top + 20) + ")")
    .style("text-anchor", "middle")
    .text("Pollen Grain Ridge (X direction)");

// Add the Y Axis
svg.append("g")
      .call(d3.svg.axis().scale(y).orient("left"));
// text label for the y axis
svg.append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 0 - margin.left)
    .attr("x",0 - (height / 2))
    .attr("dy", "1em")
    .style("text-anchor", "middle")
    .text("Pollen Grain Density");  
'''

js_text_template = Template('''
var data = $data;
''' + js_text)

js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))

For the ridges, a negative relationship is pretty clear. We can try out the next two as well.

In [4]:
js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
// set the ranges
var nubs = [];
var densities = [];
data.forEach(e => {
    nubs.push(e.NUB);
    densities.push(e.DENSITY);
})

var x = d3.scale.linear().range([0, width]).domain([Math.min(...nubs), Math.max(...nubs)]);
var y = d3.scale.linear().range([height, 0]).domain([Math.min(...densities), Math.max(...densities)]);

//define line for NUB vs DENSITY
var line = d3.svg.line()
    .x(function(d) { return x(d); })
    .y(function(d) { return y(d); });

var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");
// Add dots
svg.append('g')
    .selectAll("dot")
    .data(data)
    .enter()
    .append("circle")
    .attr("cx", function (d) { return x(d.NUB); } )
    .attr("cy", function (d) { return y(d.DENSITY); } )
    .attr("r", 2)
    .style("fill", "#9B870C")
    

// Add the X Axis
svg.append("g")
    .attr("transform", "translate(0," + height + ")")
    .call(d3.svg.axis().scale(x).orient("bottom"));
svg.append("text")             
    .attr("transform",
    "translate(" + (width/2) + " ," + 
    (height + margin.top + 20) + ")")
    .style("text-anchor", "middle")
    .text("Pollen Grain Nub (Y direction)");

// Add the Y Axis
svg.append("g")
      .call(d3.svg.axis().scale(y).orient("left"));
// text label for the y axis
svg.append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 0 - margin.left)
    .attr("x",0 - (height / 2))
    .attr("dy", "1em")
    .style("text-anchor", "middle")
    .text("Pollen Grain Density");  
'''

js_text_template = Template('''
var data = $data;
''' + js_text)

js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))

It doesn't appear that the nub follows the same convention as the ridge. These variables are likely not what I assumed, with simple euclidean geometry and volumetric measurements helping to determine density, then. We can finally check out the crack measurement.

In [5]:
js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
// set the ranges
var cracks = [];
var densities = [];
data.forEach(e => {
    cracks.push(e.CRACK);
    densities.push(e.DENSITY);
})

var x = d3.scale.linear().range([0, width]).domain([Math.min(...cracks), Math.max(...cracks)]);
var y = d3.scale.linear().range([height, 0]).domain([Math.min(...densities), Math.max(...densities)]);

//define line for CRACK vs DENSITY
var line = d3.svg.line()
    .x(function(d) { return x(d); })
    .y(function(d) { return y(d); });

var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");
// Add dots
svg.append('g')
    .selectAll("dot")
    .data(data)
    .enter()
    .append("circle")
    .attr("cx", function (d) { return x(d.CRACK); } )
    .attr("cy", function (d) { return y(d.DENSITY); } )
    .attr("r", 2)
    .style("fill", "#9B870C")
    

// Add the X Axis
svg.append("g")
    .attr("transform", "translate(0," + height + ")")
    .call(d3.svg.axis().scale(x).orient("bottom"));
svg.append("text")             
    .attr("transform",
    "translate(" + (width/2) + " ," + 
    (height + margin.top + 20) + ")")
    .style("text-anchor", "middle")
    .text("Pollen Grain Crack (Z direction)");

// Add the Y Axis
svg.append("g")
      .call(d3.svg.axis().scale(y).orient("left"));
// text label for the y axis
svg.append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 0 - margin.left)
    .attr("x",0 - (height / 2))
    .attr("dy", "1em")
    .style("text-anchor", "middle")
    .text("Pollen Grain Density");  
'''

js_text_template = Template('''
var data = $data;
''' + js_text)

js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))

Thus, we see that the "crack" of the pollen grains also has a negative relationship with the density. None of these relationships are particularly tight, but it's extremely hard to observe such a massive volume of data in a single 2D scatter plot. I'll look into a few other visualizations that may be important.

The next step in analysis through visualization would be to further pursue this path of comparing relationships between different attributes of the data. A strong method for analyzing multivariate data is to use a parallel coordinates visualization, so that we can see the trends between multiple axes and see if they cross or have a parallel relationship between some attributes. I'll create a parallel coordinates visualization below.

In [6]:
js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;

var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");

var dimensions = d3.keys(data[0]).filter(d => { return d != "OBSERVATION_NUMBER"; });

var y = {};
for (i in dimensions) {
    name = dimensions[i];
    y[name] = d3.scale.linear()
        .domain( d3.extent(data, function(d) { return +d[name]; }))
        .range([height, 0]);
}

var x = d3.scale.ordinal()
    .domain(dimensions)
    .rangePoints([0, width]);
    
//Takes a piece of data as input, and returns x and y coordinates corresponding
function path(d) {
    return d3.svg.line()(dimensions.map(function(p) {
        return [x(p), y[p](d[p])];
    }))
}
// Draw the lines
svg
    .selectAll("myPath")
    .data(data)
    .enter().append("path")
    .attr("d",  path)
    .style("fill", "none")
    .style("stroke", "#69b3a2")
    .style("opacity", 0.5);
    
// Draw the axis:
  svg.selectAll("myAxes")
    .data(dimensions).enter()
    .append("g")
    // I translate this element to its right position on the x axis
    .attr("transform", function(d) { return "translate(" + x(d) + ")"; })
    // And I build the axis with the call function
    .each(function(d) { d3.select(this).call(d3.svg.axis().scale(y[d]).orient("left")); })
    // Add axis title
    .append("text")
      .style("text-anchor", "middle")
      .attr("y", -9)
      .text(function(d) { return d; })
      .style("fill", "black")
'''

js_text_template = Template('''
var data = $data;
''' + js_text)

js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))

Unfortunately, while parallel coordinates can often be helpful with multivariate data, it suffers from the same issue as our scatter plot in 2D; we have simply too much data in one frame. This could be remedied by allowing users to somehow filter the data set, either through interaction, or simply narrowing down our dataset. For our case in hand, implementing interaction may be a challenge, especially using d3 version 3 which has a rather annoying API to work with as opposed to v4 and 5, yet is the only one that seems to work with jupyter notebook most often.

Now, we'll look into whether the attributes follow normal curves just so that we can tell that this data is something worth investigating trends within. This should be a much easier task, by simply using histograms, which should work well with a large volume of data as we will simply be looking at the frequencies rather than looking at spatial trends in specific values of attributes.

In [7]:
js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
    
var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");

var nubs = [];
data.forEach(e => {
    nubs.push(e.NUB);
})

//Create thresholds array for xVals
//We want 20 bins
var nubMin = Math.min(...nubs);
var nubMax = Math.max(...nubs);
var nubRange = nubMax - nubMin;

var binSize = nubRange / 16;
var thresholds = [];
for (var i = nubMin; i < nubMax; i += binSize) {
    thresholds.push(i);
}
var thresholdMin = Math.min(...thresholds);
var thresholdMax = Math.max(...thresholds);
var thresholdRange = thresholdMax - thresholdMin;

// set the parameters for the histogram
var hist = d3.layout.histogram()
            .range([nubMin, nubMax])
            .value(function(d) { return d.NUB; })
            .bins(thresholds);
            
var bins = hist(data);


var lengths = [];
var xVals = [];
var yVals = [];
bins.forEach(d => {
    lengths.push(d.length);
    xVals.push(d.x);
    yVals.push(d.y);
})

var maxYVal = Math.max(...yVals);

var minXVal = Math.min(...xVals);
var maxXVal = Math.max(...xVals);
var xRange = maxXVal - minXVal;



var x = d3.scale.linear().range([0, width]).domain([0, thresholdRange]);
var xAxis = d3.scale.linear().range([0, width]).domain([minXVal, maxXVal]);
var yAxis = d3.scale.linear().range([height, 0]).domain([0, maxYVal]);
var y = d3.scale.linear().range([0, height]).domain([0, Math.max(...lengths)]);


svg.selectAll("rect")
    .data(bins)
    .enter()
    .append("rect")
    .attr("transform", function(d) { return "translate(" + x(d.x - minXVal)  + "," + (height - y(d.length)) + ")"; })
    .attr("width", function(d) { return x(d.dx) ; })
    .attr("height", function(d) { return y(d.length); })
    .style("fill", "#9B870C")

// Add the X Axis
svg.append("g")
    .attr("transform", "translate(0," + height + ")")
    .call(d3.svg.axis().scale(xAxis).orient("bottom"));
    
svg.append("text")             
    .attr("transform",
        "translate(" + (width/2) + " ," + 
        (height + margin.top + 20) + ")")
    .style("text-anchor", "middle")
    .text("Pollen Grain Nub Measurement");

// Add the Y Axis
svg.append("g")
      .call(d3.svg.axis().scale(yAxis).orient("left"));
// text label for the y axis
svg.append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 0 - margin.left)
    .attr("x",0 - (height / 2))
    .attr("dy", "1em")
    .style("text-anchor", "middle")
    .text("Frequency");  

'''

js_text_template = Template('''
var data = $data;
''' + js_text)

js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))

In [8]:
js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
    
var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");

var cracks = [];
data.forEach(e => {
    cracks.push(e.CRACK);
})

cracks.sort(function (a,b) {
    return a - b;
});

//Create thresholds array for xVals
//We want 20 bins
var crackMin = Math.min(...cracks);
var crackMax = Math.max(...cracks);
var crackRange = crackMax - crackMin;

var binSize = crackRange / 16;
var thresholds = [];
for (var i = crackMin; i < crackMax; i += binSize) {
    thresholds.push(i);
}
var thresholdMin = Math.min(...thresholds);
var thresholdMax = Math.max(...thresholds);
var thresholdRange = thresholdMax - thresholdMin;

// set the parameters for the histogram
var hist = d3.layout.histogram()
            .range([crackMin, crackMax])
            .value(function(d) { return d.CRACK; })
            .bins(thresholds);
            
var bins = hist(data);

var lengths = [];
var xVals = [];
var yVals = [];
bins.forEach(d => {
    lengths.push(d.length);
    xVals.push(d.x);
    yVals.push(d.y);
})

var maxYVal = Math.max(...yVals);

var minXVal = Math.min(...xVals);
var maxXVal = Math.max(...xVals);
var xRange = maxXVal - minXVal;



var x = d3.scale.linear().range([0, width]).domain([0, thresholdRange]);
var xAxis = d3.scale.linear().range([0, width]).domain([minXVal, maxXVal]);
var yAxis = d3.scale.linear().range([height, 0]).domain([0, maxYVal]);
var y = d3.scale.linear().range([0, height]).domain([0, Math.max(...lengths)]);


svg.selectAll("rect")
    .data(bins)
    .enter()
    .append("rect")
    .attr("transform", function(d) { return "translate(" + x(d.x - minXVal)  + "," + (height - y(d.length)) + ")"; })
    .attr("width", function(d) { return x(d.dx) ; })
    .attr("height", function(d) { return y(d.length); })
    .style("fill", "#9B870C")

// Add the X Axis
svg.append("g")
    .attr("transform", "translate(0," + height + ")")
    .call(d3.svg.axis().scale(xAxis).orient("bottom"));
    
svg.append("text")             
    .attr("transform",
        "translate(" + (width/2) + " ," + 
        (height + margin.top + 20) + ")")
    .style("text-anchor", "middle")
    .text("Pollen Grain Crack Measurement");

// Add the Y Axis
svg.append("g")
      .call(d3.svg.axis().scale(yAxis).orient("left"));
// text label for the y axis
svg.append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 0 - margin.left)
    .attr("x",0 - (height / 2))
    .attr("dy", "1em")
    .style("text-anchor", "middle")
    .text("Frequency");  

'''

js_text_template = Template('''
var data = $data;
''' + js_text)

js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))

In [9]:
js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
    
var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");

var weights = [];
data.forEach(e => {
    weights.push(e.WEIGHT);
})

//Create thresholds array for xVals
//We want 20 bins
var weightMin = Math.min(...weights);
var weightMax = Math.max(...weights);
var weightRange = weightMax - weightMin;

var binSize = weightRange / 16;
var thresholds = [];
for (var i = weightMin; i < weightMax; i += binSize) {
    thresholds.push(i);
}
var thresholdMin = Math.min(...thresholds);
var thresholdMax = Math.max(...thresholds);
var thresholdRange = thresholdMax - thresholdMin;

// set the parameters for the histogram
var hist = d3.layout.histogram()
            .range([weightMin, weightMax])
            .value(function(d) { return d.WEIGHT; })
            .bins(thresholds);
            
var bins = hist(data);


var lengths = [];
var xVals = [];
var yVals = [];
bins.forEach(d => {
    lengths.push(d.length);
    xVals.push(d.x);
    yVals.push(d.y);
})

var maxYVal = Math.max(...yVals);

var minXVal = Math.min(...xVals);
var maxXVal = Math.max(...xVals);
var xRange = maxXVal - minXVal;



var x = d3.scale.linear().range([0, width]).domain([0, thresholdRange]);
var xAxis = d3.scale.linear().range([0, width]).domain([minXVal, maxXVal]);
var yAxis = d3.scale.linear().range([height, 0]).domain([0, maxYVal]);
var y = d3.scale.linear().range([0, height]).domain([0, Math.max(...lengths)]);


svg.selectAll("rect")
    .data(bins)
    .enter()
    .append("rect")
    .attr("transform", function(d) { return "translate(" + x(d.x - minXVal)  + "," + (height - y(d.length)) + ")"; })
    .attr("width", function(d) { return x(d.dx) ; })
    .attr("height", function(d) { return y(d.length); })
    .style("fill", "#9B870C")

// Add the X Axis
svg.append("g")
    .attr("transform", "translate(0," + height + ")")
    .call(d3.svg.axis().scale(xAxis).orient("bottom"));
    
svg.append("text")             
    .attr("transform",
        "translate(" + (width/2) + " ," + 
        (height + margin.top + 20) + ")")
    .style("text-anchor", "middle")
    .text("Pollen Grain Weight Measurement");

// Add the Y Axis
svg.append("g")
      .call(d3.svg.axis().scale(yAxis).orient("left"));
// text label for the y axis
svg.append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 0 - margin.left)
    .attr("x",0 - (height / 2))
    .attr("dy", "1em")
    .style("text-anchor", "middle")
    .text("Frequency");  

'''

js_text_template = Template('''
var data = $data;
''' + js_text)

js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))

In [10]:
js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
    
var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");

var densitys = [];
data.forEach(e => {
    densitys.push(e.DENSITY);
})

//Create thresholds array for xVals
//We want 20 bins
var densityMin = Math.min(...densitys);
var densityMax = Math.max(...densitys);
var densityRange = densityMax - densityMin;

var binSize = densityRange / 16;
var thresholds = [];
for (var i = densityMin; i < densityMax; i += binSize) {
    thresholds.push(i);
}
var thresholdMin = Math.min(...thresholds);
var thresholdMax = Math.max(...thresholds);
var thresholdRange = thresholdMax - thresholdMin;

// set the parameters for the histogram
var hist = d3.layout.histogram()
            .range([densityMin, densityMax])
            .value(function(d) { return d.DENSITY; })
            .bins(thresholds);
            
var bins = hist(data);


var lengths = [];
var xVals = [];
var yVals = [];
bins.forEach(d => {
    lengths.push(d.length);
    xVals.push(d.x);
    yVals.push(d.y);
})

var maxYVal = Math.max(...yVals);

var minXVal = Math.min(...xVals);
var maxXVal = Math.max(...xVals);
var xRange = maxXVal - minXVal;



var x = d3.scale.linear().range([0, width]).domain([0, thresholdRange]);
var xAxis = d3.scale.linear().range([0, width]).domain([minXVal, maxXVal]);
var yAxis = d3.scale.linear().range([height, 0]).domain([0, maxYVal]);
var y = d3.scale.linear().range([0, height]).domain([0, Math.max(...lengths)]);


svg.selectAll("rect")
    .data(bins)
    .enter()
    .append("rect")
    .attr("transform", function(d) { return "translate(" + x(d.x - minXVal)  + "," + (height - y(d.length)) + ")"; })
    .attr("width", function(d) { return x(d.dx) ; })
    .attr("height", function(d) { return y(d.length); })
    .style("fill", "#9B870C")

// Add the X Axis
svg.append("g")
    .attr("transform", "translate(0," + height + ")")
    .call(d3.svg.axis().scale(xAxis).orient("bottom"));
    
svg.append("text")             
    .attr("transform",
        "translate(" + (width/2) + " ," + 
        (height + margin.top + 20) + ")")
    .style("text-anchor", "middle")
    .text("Pollen Grain Density Measurement");

// Add the Y Axis
svg.append("g")
      .call(d3.svg.axis().scale(yAxis).orient("left"));
// text label for the y axis
svg.append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 0 - margin.left)
    .attr("x",0 - (height / 2))
    .attr("dy", "1em")
    .style("text-anchor", "middle")
    .text("Frequency");  

'''

js_text_template = Template('''
var data = $data;
''' + js_text)

js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))

In [11]:
js_text = '''
// set the dimensions and margins of the graph
var margin = {top: 20, right: 40, bottom: 100, left: 50},
    width = 800 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
    
var svg = d3.select("#fillerBody").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");

var ridges = [];
data.forEach(e => {
    ridges.push(e.RIDGE);
})

//Create thresholds array for xVals
//We want 20 bins
var ridgeMin = Math.min(...ridges);
var ridgeMax = Math.max(...ridges);
var ridgeRange = ridgeMax - ridgeMin;
var binSize = ridgeRange / 16;
var thresholds = [];
for (var i = ridgeMin; i < ridgeMax; i += binSize) {
    thresholds.push(i);
}
var thresholdMin = Math.min(...thresholds);
var thresholdMax = Math.max(...thresholds);
var thresholdRange = thresholdMax - thresholdMin;

// set the parameters for the histogram
var hist = d3.layout.histogram()
            .range([ridgeMin, ridgeMax])
            .value(function(d) { return d.RIDGE; })
            .bins(thresholds);
            
var bins = hist(data);

var lengths = [];
var xVals = [];
var yVals = [];
bins.forEach(d => {
    lengths.push(d.length);
    xVals.push(d.x);
    yVals.push(d.y);
})

var maxYVal = Math.max(...yVals);

var minXVal = Math.min(...xVals);
var maxXVal = Math.max(...xVals);
var xRange = maxXVal - minXVal;



var x = d3.scale.linear().range([0, width]).domain([0, thresholdRange]);
var xAxis = d3.scale.linear().range([0, width]).domain([minXVal, maxXVal]);
var yAxis = d3.scale.linear().range([height, 0]).domain([0, maxYVal]);
var y = d3.scale.linear().range([0, height]).domain([0, Math.max(...lengths)]);


svg.selectAll("rect")
    .data(bins)
    .enter()
    .append("rect")
    .attr("transform", function(d) { return "translate(" + x(d.x - minXVal)  + "," + (height - y(d.length)) + ")"; })
    .attr("width", function(d) { return x(d.dx) ; })
    .attr("height", function(d) { return y(d.length); })
    .style("fill", "#9B870C")

// Add the X Axis
svg.append("g")
    .attr("transform", "translate(0," + height + ")")
    .call(d3.svg.axis().scale(xAxis).orient("bottom"));
    
svg.append("text")             
    .attr("transform",
        "translate(" + (width/2) + " ," + 
        (height + margin.top + 20) + ")")
    .style("text-anchor", "middle")
    .text("Pollen Grain Ridge Measurement");

// Add the Y Axis
svg.append("g")
      .call(d3.svg.axis().scale(yAxis).orient("left"));
// text label for the y axis
svg.append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 0 - margin.left)
    .attr("x",0 - (height / 2))
    .attr("dy", "1em")
    .style("text-anchor", "middle")
    .text("Frequency");  

'''

js_text_template = Template('''
var data = $data;
''' + js_text)

js_text = js_text_template.substitute({'data': json.dumps(data.to_dict(orient='records'))})
HTML(html_template.substitute({'css_text': css_text, 'js_text': js_text}))


All of these histograms follow a mostly normal curve. Thus, we can further analyze the data for relations among these characteristics, to see if any of them have any strong negative or positive correlations. Our final approach to come up next is to use a 3D scatter plot. This is a strong method for analyzing multivariate data, as we can correlate three attributes together and see how they would affect one another. I feel like further investigation could be made into some kind of 3D parallel coordinate diagram, where planar values of multiple coordinates could be represented through lines and points, but that would probably best to investigate at another time.