Session Objectives

  • Creating Scatterplots
  • Understanding Domain and Range
  • Learn to take and customize others code! Copy an example from the web and customize it.

Prerequisite: Introduction to D3 - Make a Chart.

To download the materials for exercises, click here. This is d3_part2_loadingData.


In the previous session, we made a simple chart to show you how you can use D3 to bind data to page elements. This should have given you a nice look into how D3 actually works to create elements and bind data to them. Most of the time though, you will be coming from a different paradigm, one where you have a large complicated dataset, one where you might not know the number of elements in that dataset, and one where you might be grabbing the dataset from an external source. After all, D3 is all about the data right? A quick look at what we made last time, view source to check out our code!

Bar Chart with Interactive Hover (Click to view this example on its own.)

The fundamentals we learned in the last session will help us make sense of what exactly is going on today, so lets continue to expand our knowledge of D3, create a scatterplot, load some data, and most importantly, begin to borrow, read, build from, and modify other peoples code! With attribution where due of course!!


Creating Scatterplots

Scatterplots can be an effective way to show data and change, and compare datasets. In creating a scatterplot, we are in essence doing the same thing as the bar chart, but instead of locating rectangles in our SVG workspace, we are loading circles, locating them at an x,y location, and giving them properties such as color, size, and label by adjusting the properties.

Let's take a look at a new dataset:

City # of Rats # of Coffee Shops
Brookline 40 50
Boston 90 120
Cambridge 30 90
Chelsea 10 10
Somerville 60 40

And from it, create a scatterplot!

Dynamic Scatterplot from CSV with Scaling (Click to view this example on its own.)

Now, presumably rats and coffeeshops have no relationship, but let's plot these to check out how it would work. The following example loads our dataset from a CSV, and then plots SVG circles at the coordinates provided. Fundamentally, it will follow the same rules as the bar chart.

View the source code for the scatter plot above. You will see its pretty simple, and very similar to the bar chart, however, instead of drawing rectangle elements at the location of each data value we draw circles. The code looks as follows.

//Width and height
			var w = 180;
			var h = 180;
			
			var dataset = [
							[40, 50], [90, 120], [30, 90], [60, 40], [10, 10]
						  ];
	
			//Create SVG element
			var svg = d3.select("body")
						.append("svg")
						.attr("width", w)
						.attr("height", h)
						.attr("style", "outline: thin solid black;");

			svg.selectAll("circle")
			   .data(dataset)
			   .enter()
			   .append("circle")
			   .attr("cx", function(d) {
			   		return d[0];
			   })
			   .attr("cy", function(d) {
			   		return d[1];
			   })
			   .attr("r", 5);

This code builds our scatterplot in the following manner. Note how to access the x and y, it accesses values of the array.

Scatterplot Layout

This code can be modified in similar fashion as our bar chart to read in a CSV.

var ratData = [];

d3.csv("/_assets/data/coffee_rodents.csv", function(d) {
	return {
		city : d.city,
		rats : +d.rats,
		coffee : +d.coffee
	};
}, function(error, rows) {
	ratData = rows;
	console.log(ratData);
	createVisualization();
});

function createVisualization(){
	//Width and height
	var w = 180;
	var h = 180;

	//Create SVG element
	var svg = d3.select("body")
				.append("svg")
				.attr("width", w)
				.attr("height", h)
				.attr("style", "outline: thin solid black;");

	svg.selectAll("circle")
	   .data( ratData)
	   .enter()
	   .append("circle")
	   .attr("cx", function(d) {
	   		return d.rats;
	   })
	   .attr("cy", function(d) {
	   		return d.coffee;
	   })
	   .attr("r", 5);
}

Our CSV loaded scatterplot...

Scatterplot with CSV Data Loading (Click to view this example on its own.)

Scales, Ranges, and Domains

So, what happens if we add a value to our scatterplot that falls outside of our 180px x 180px layout? This is where D3 scales come in. Because coffee or rats are not always equal on a one-to-one relationship with pixels, you can use D3 scales to set the ratio of data values to pixels. Scott Murray, of Aligned Left, has a fantastic tutorial on this on his page. We will work through this for our example.

In D3, scales are functions that map from an input domain to an output range1. In English, we are using the d3.scale.linear() method to set the height of our chart based on the values in our dataset. Meaning, if we set a range from 0 to 100, the base value of our chart will be 0, and the maximum 100 pixels above that. You will hear the terms range and domain. Range is the input value, provide here when we set the scale, and domain is the output. The point, values in any dataset are unlikely to correspond exactly to pixel measurments, scales provide a way to map those data values to new values that work with your visualization.

Let's say we want to add the following data value.

City # of Rats # of Coffee Shops
Medford 190 240

This clearly falls out of our layout on both the X and Y axes, so we have to set up our scale.

To set up the scale for our chart, lets use the following. The method returns an array with the new value for each data value. Because our data goes from 10 to 240, lets use 0 to 250 as our domain, and then map the range to the layout (180px x 180px). Set it to a variable named xScale. Do the same for the y axis using yScale.

var xScale = d3.scale.linear()
    .domain([0, 250])
    .range([0, 180]);

var yScale = d3.scale.linear()
    .domain([0, 250])
    .range([0, 180]);

Next, adjust your cx and cy methods to use the values from the scale dictionary.

>svg.selectAll("circle")
	.data( ratData)
	.enter()
	.append("circle")
	.attr("cx", function(d) {
			return xScale(d.rats);
	})
	.attr("cy", function(d) {
			return yScale(d.coffee);
	})
	.attr("r", 5);

Check out our working example. Our data value for Medford is now scaled to our visualization. View source to see the code.

Scatterplot with Scale Applied (Click to view this example on its own.)

There is one last step we want to take. Right now, we have our maximum values hardcoded into our scale. Likely, we want this to change if we adjust our CSV, and we don't want to hardcode the values into the scale. Use the maximum function (d3.max) to accomplish this by using the maximum value of the array as the top value of the domain.

>var xScale = d3.scale.linear()
	.domain([0, d3.max(ratData, function(d) {
		return d.rats;
	})])
	.range([0, w]);

var yScale = d3.scale.linear()
	.domain([0, d3.max(ratData, function(d) {
		return d.coffee;
	})])
	.range([0, h]);

Now, to test this out, add another row to our CSV dataset. Do something big, like the following.

City # of Rats # of Coffee Shops
Watertown 350 500

Save and refresh your scatterplot. Our new data is shown, and scaled to fit in our 180px x 180px frame. Note: The maximum item gets cut off. We'll address this shortly.

Basic Axis on Scatterplot (Click to view this example on its own.)

You can also set up scales for the data values themselves. For example, if we were wanting to scale the radius values, you could. Read more on scales in Scott Murray's excellent Scales Tutorial.

Adding Axes

You might want to add axes to your visualization. Adding axes to your chart can be done using the D3 built-in d3.svg.axis method. To use the built in axes, you can create a svg.axis object as a variable, then call it into your visualization. Within your createVisualization() function use the following to create an X axis.

var xAxis = d3.svg.axis()
	.scale(xScale)
	.orient("bottom")
	.ticks(5);

Then, within your SVG, append the axes to the SVG element in your visualization.

svg.append("g")
    .attr("class", "axis")  //Assign "axis" class
    .call(xAxis);

Refreshing your map, you will have to adjust things slightly, because our axes fall on the very edge of our 180px x 180px box. In this sample, I'm using our original dataset, and you can see the D3 defaults set up the axis at the top of the layout.

Dynamic Scatterplot with Scale (Click to view this example on its own.)

Part of the axis gets cut off by the edge of our layout (180x180). One solution for this to add a little bit of a padding around our SVG so that labels that fall outside can still be seen. We don't need much. After the width and height variables, include padding. Let's use 30px.

var w = 180;
var h = 180;
var padding = 30;

Then, in the range, set the range to include those padding variables to give us a small amount of buffer around our SVG. Note here, we are going to flip the range on the Y-axis. We want high values on the top of the scatterplot!

var xScale = d3.scale.linear()
	.domain([0, d3.max(ratData, function(d) {
		return d.rats;
	})])
	.range([padding, w - padding]);

var yScale = d3.scale.linear()
	.domain([0, d3.max(ratData, function(d) {
		return d.coffee;
	})])
	.range([h - padding, padding]);

An updated look at our chart, its starting to come together.

Dynamic Scatterplot with Scale (Click to view this example on its own.)

Styling the Axes

Because we have given the axes a class in our code (class="axis"), we use CSS to provide styles to them. To do this, we can set the CSS for the axes using the following style tags get get us started. Place this in your head between the style tags.

.axis {
	font: 10px sans-serif;
}

.axis path,
.axis line {
	fill: none;
	stroke: #000;
	shape-rendering: crispEdges;
}

When working with the axes, note that it will automatically update the numbers for us. Add the high numbers for Medford and Watertown back into your CSV.

Dynamic Scatterplot with Scale (Click to view this example on its own.)

Add a Simple Hover

To find the properties of each point in the scatterplot, lets just add a simple hover. In your code, above the createVisualization() function, create a tooltip object using the following.

var tooltip = d3.select("body")
	.append("div")
	.style("position", "absolute")
	.style("font-family", "sans-serif")
	.style("font-size", "10px")
	.style("z-index", "10")
	.style("visibility", "hidden");

Then, implement it on your SVG drawing method. Easy!

svg.selectAll("circle")
	.data( ratData )
	.enter()
	.append("circle")
	.attr("cx", function(d) {
			return xScale(d.rats);
	})
	.attr("cy", function(d) {
			return yScale(d.coffee);
	})
	.attr("r", 5)
	.on("mouseover", function(d){
		return tooltip.style("visibility", "visible").text(d.city + ": " + d.rats + ", " + d.coffee);
	})
	.on("mousemove", function(d){
		return tooltip.style("top", (event.pageY-10)+"px").style("left",(event.pageX+10)+"px").text(d.city + ": " + d.rats + ", " + d.coffee);
	})
	.on("mouseout", function(d){
		return tooltip.style("visibility", "hidden");
	});

Your chart is starting to look fancy, think of other ways you could improve it!

Dynamic Scatterplot with Scale (Click to view this example on its own.)

You have loaded data, made both a scatterplot AND a bar chart, and in the process, become very familiar with many D3 fundamentals! In the next part, we will change gears, steal some existing code and making it our own.


Borrowing Code

You don't have to reinvent the wheel... lets do some customization! A ton of work has been done with D3 already, and many individuals open-source the code that they use to create D3 visualizations. Once you have a foundation, the easiest way to get started, and then dig in deeper, is to grab other peoples code and modify it to suit your needs. You can even change it to suit your data!

The following are great places to go for examples and all associated code.

Reading Code and Customization

With the bar chart we created, you've been given some fundamentals and equiped to read D3 code and begin to understand what it is doing. Put this into action. Let's start by stealing and customizing something relatively simple. To customize, we can follow some simple guidelines:

  • Locate a D3 example that is exciting and will work with your data.
    • Find an example that makes sense for your dataset. If you are showing categorical data, such as groups of book titles and authors, choose a layout example, if you are showing numerical and ordinal data, such as census data, choose an example that will show this.
    • It is easiest to work with an example that is already using data in a format similar to yours, for example, if your data is in CSV, try to find an example that uses tabular data.
  • Copy and paste the code from your example to new documents on your development server.
  • Massage the example to hold your data.
    • Locate lines where data is read into the document.
    • Locate lines where data is classified into groups.
    • Locate lines where data is bound to an SVG page element (ie height of a bar or width of a line).
  • Get example working with your new dataset.
  • Change the layout and appearance of the visualization. Items to consider:
    • Change the visualization size and dimensions to fit the desired document and layout.
    • Adjust the colors to match the scheme of your webpage and account for changes in the data.
    • Match the fonts and font family to the rest of your document.

1. View Data and Locate Example

Our dataset, for this task, looks like the following. It is a example census dataset for Boston in 2010 and 2000.


Age Demographics of Boston, 1990-2010
Age 1990 2000 2010
Under 18 128,185 139,460 126,275
18-34 260,256 246,353 275,425
35-64 194,969 227,831 244,597
65 and Above 80,496 76,163 75,726

I found a nice example that will work for this, it is a pie chart and all of the code on bl.ocks that will work, and is relatively simple to get us started. It uses the d3 Pie Layout method. For more reading on the Pie Layout method, check the d3 docs.

2. Copy and Paste Code into Blank Documents

Navigate to the bl.ocks page that has the pie chart. There are two items you need to copy here. First, copy and paste the index.html file from the bl.ocks page in a blank document. Second, copy and paste the dataset that is being bound to the D3 chart from the bl.ocks page, in this case, it is a TSV (Tab-Separated Variable) document.

In your webfolder, create a new folder, call it 'pie-chart', and save the index.html code as index.html, and the dataset as data.tsv.

Navigate to your website. If all was copied correctly, you should have a working D3 page and it should look like the example.

3. Customization

Next, we are going to customize this to work with the data we want it to.

Consider the Design

First, let's consider the design of our data and our visualization. Right now, we have radio buttons for apples and oranges from the example. Our census data has three census years, 1990, 2000, and 2010. Let's modify these radio buttons, changing the text and values from apples and oranges to our census years. In the code, locate the form tag. Modify it to look like below, adding a button for 2010.

<form>
	<label><input type="radio" name="dataset" value="Cen_1990" checked> 1990</label>
	<label><input type="radio" name="dataset" value="Cen_2000"> 2000</label>
	<label><input type="radio" name="dataset" value="Cen_2010"> 2010</label>
</form>

We now have our basic design set up, we can look at the data.

Take a Look at the Data

In the materials for the week, open 'boston-data.csv'. This is our dataset, and you'll see the data above in comma-separated format. As long as the data looks good and workable, meaning there are no extraneous headers or information that might confuse us when working with D3, we should be good.

Age Demographics of Boston, 1990-2010

a. Adjust the Data Read Function

In your code, locate the following line.

d3.tsv("data.tsv", type, function(error, data) {

this is the line that loads our dataset into the document. We have to do two things here. One, we are using a CSV, not TSV, so change the D3 method to d3.csv. Two, change this the Boston dataset by placing boston-data.csv onto your web server into a location easy to find. Change this to the path to boston-data.csv. Make sure this is on your web server and readable.

The data load line will look like the following, using the path to your data file.

d3.csv("path/to/boston-data.csv", type, function(error, data) {

b. Locate Lines of Code referring to Attributes

The easiest way to get something working is to minimize changes at first, and find and replace references to attributes. For example, the field names in the example are Apples and Oranges, but our field names are Cen_1990, Cen_2000, and Cen_2010. When we load the dataset, it becomes a JavaScript object readable by D3. To find the right numbers, we need to find all references to the Apples and Oranges fields and change them to our fields.

Start with the pie layout. Locate the following, and change d.apples to be d.Cen_1990, the name of the first column in our data.

var pie = d3.layout.pie()
    .value(function(d) { return d.apples; }) // Change this to d.Cen_1990
    .sort(null);

Next, there is actually a transition in this example. We are going to explain these in the next session, but we still need to modify the code in this example. Locate the timeout function, and change oranges to Cen_2000, the second column in our dataset. This is a simple transition that after a short amount of time will automatically change the pie chart to the second value.

var timeout = setTimeout(function() {
	d3.select("input[value=\"oranges\"]").property("checked", true).each(change); // Change "oranges" to "Cen_2000"
}, 2000);

We have one more block of code to adjust to fix our example to work with our data. As we saw above, we have a function that forces strings read in by D3 to be numbers. We need to adjust this to read our fields, not the example fields. Modify the type function as you see below, adding another value for Census 2010.

// Function to force data type as number not string
function type(d) {
  d.apples = +d.apples; // Change both of these to d.Cen_1990
  d.oranges = +d.oranges; // Change both of these values to d.Cen_2000
  // Add another value here that sets d.Cen2010
  return d;
}

Save and refresh your document. You should see the following. Our data is bound to our pie chart.

Same Pie Chart: Now with Our Data! (Click to view this example on its own.)

With our data bound to the visualization, we can now explore other customizations, or see how we can expand on this visualization to make it more rich, such as adding more data or other components.

b. Adjust Size and Layout

One of the easiest ways to learn more advanced techniques is to change existing code and see what it does. We have our data bound to our visualization, so now we can try to adjust some of the other code in the example to see what it does. For example, what if we decide we don't want a donut chart, but rather a traditional pie chart. We can adjust the radius of the inner circle. Locate the arc variable and the d3.pie function, and change the inner circle radius to half of the width of the visualization (in this case, 250px). Save and refresh.

No More Donut: Pie Chart (Click to view this example on its own.)

b. Change Styling - Fonts and Colors

Next, you might want to change fonts and colors to match the design of your webpage. There are a handful of ways to do this, but the easiest is to use a selection of color ramps that D3 has built right in. In our example, find it in the following line. Change the scale to category20c.

var color = d3.scale.category20(); // Change this to category20c 

Save and refresh.

D3 also bundles some very nice ordinal color schemes from Colorbrewer by Cindy Brewer. To incorporate these, see the D3 docs on Colorbrewer.

We don't have much for fonts to change in this, as we do not have labels on the visualization. However, this can be accomplished quite nicely simply by adjusting the CSS.

c. Other Improvements

We have successfully customized our example to bind our data, then adjusted some items, such as color and pie radius. Feel free to make other changes and see what it does to the code.

One important item might be to add a legend. Right now, we have a nice visualization, but there is no way to tell what the colors mean. A great resource for building legends using only HTML and CSS can be found from Mapbox in their Tilemill documentation on Legends. Challenge: Can you get the legend into the visualization and change the layout with the CSS and HTML?


Additional Considerations

We made pretty easy work of customizing an example found online to show our data, but is this the best solution? It might be hard to see comparisons. Perhaps we want to use small multiples...

Or perhaps we want to use a simple scatter plot.

Or perhaps we want to use a stream graph.

It is totally up to you, happy coding.


Steal Code! Make it your own! (Provide attribution of course.)

This was a fairly simple example, but shows how you can load data and how you can steal others code to create a visualization. Refer to the list of references below and the links through the tutorial for further reading. Next week, we are going to dig even deeper, look at a different type of visualization, and then add addition user interaction, transitions, and animation.



References and Bibliography

1 - D3 Quantitative Scales - https://github.com/mbostock/d3/wiki/Quantitative-Scales

2 - Scales (Aligned Left) - Scott Murray - http://alignedleft.com/tutorials/d3/scales

3 - D3 Quantitative Scales - - https://github.com/mbostock/d3/wiki/CSV

4 - Reading in Data - LearnJSData - http://learnjsdata.com/read_data.html

5 - Axes - Interactive Data Visualization for the Web (Chapter 8) - Scott Murray - http://chimera.labs.oreilly.com/books/1230000000345/ch08.html

6 - Ordinal Scales - Mike Bostock - https://github.com/mbostock/d3/wiki/Ordinal-Scales

7 - Pie Layout - Mike Bostock - https://github.com/mbostock/d3/wiki/Pie-Layout

8 - Categorical Colors - Mike Bostock - https://github.com/mbostock/d3/wiki/Ordinal-Scales#categorical-colors


Go to main DUSPVIZ tutorials page