If there's any one concept I've taken away from the 12 hours I've spent learning from mapping and data visualization designer Michal Migurski, it's that maps themselves are data visualized. The format is so taken for granted as a way to render information that I think most of us tend to view maps as landscapes in themselves, rather than representations. I know I tend to mistake the map for the territory.
During our third day at GAFFTA's data visualization and mapping class, Migurski talked abouit cartography, specifically all the different ways geography gets sliced and diced. The Spherical Mercator has become ubiquitous on the Web, so much so that it's easy to start believing that the world just looks this way. But viewing the crazy variety of projections creative cartographers have come up with the slice a sphere, you're reminded that the ways our world is visualized are shot through with choices and biases about how the planet looks. As one classmate observed, in China the United States is not in the middle of the map.
Migurski also talked about scale. He said the scale you choose for a map depends on the information you want to present. The point may sound obvious, but again because of the Web (i.e. Google maps), I take for granted that a map zooms, that I can change the scale to show different levels of detail. This is helpful when I'm working out driving directions.
But what if I wanted to see maps of watersheds? Or disputed territories? Or moutain ranges beneath the ocean? The large-scale public domain data sets at Natural Earth provide the necessary wide-angle view. If you want to navigate Burning Man, on the other hand, you'll need these hyper-detailed user-generated maps of Black Rock City made using OpenStreetMaps.
Migurski introduced us to a great mapping tool called TileMill that nicely streamlines the process of building your own map/data mashups. TileMill uses a CSS-style syntax called Carto for managing the presentation of data on your map. One of the data sets we layered over part of Oakland's Lake Merritt was a collection of aerial photographs taken by the folks at GonzoEarth, who use ultra-sophisticated techniques like strapping cameras to helium balloons with rubber bands to get satellite-style imagery without leaving the stratosphere.
(In the course of this discussion, one classmate mentioned he had duct-taped a
GoPro HD camera to a $65 radio control airplane he bought on Amazon to create a DIY surveillance drone. The footage is great.)
On day four, the class turned its attention from maps to other data visualization genres. Migurski ran through some of the marquee projects he and his cohort at Stamen Design have put together over the years. A few years ago, the MTV Video Music Awards served as a kind of watershed for Twitter, which sent some of its most creative people to help bring viewers "inside" the venue via their tweets. Turns out Stamen was the shop that designed the visualizations.
Migurski showed us how D3 uses jQuery to bind data to entire sections of HTML documents rather than individual elements, which made building a basic bar graph, well, basic. The next few examples got a little dense for me as we learned how to apply the same techniques to SVG elements to create pie charts. I haven't worked with SVG before, but the gist I took away was that for pie charts, D3 counts data increments around the circle like a clock and then sets the angles needed to draw the slices of the pie. We also learned how to use D3 to put blobs of varying sizes on maps. None of the results were particularly sexy, but I can see how once you master the basics, it's just a matter of few tweaks between a bar chart and a streamgraph.
While the stylistic trend in data visualizations tends toward the gee whiz, my favorite slide of the night was the in-house visualization Stamen uses to track its projects. It's a fairly standard graph, except for two small additions: an angry bird and a green pig. Instead of "hitting their numbers," the team at Stamen has a much more concrete goal to aim for: If the bird misses the pig, they start losing money. Just a goofy little thing, to be sure. But it's remarkable how such a modest perceptual tweak can change the way you look at things.
The theme of day two at GAFFTA's data visualization and mapping class was where data comes from.* To get under way, Michal Migurski went meta about data, explaining that data doesn't simply hover in the ether, a value-neutral substance waiting to be collected in our nets. Rather, data and data sets represent choices made by the people collecting the data that reflect their point of view, embodied in what and how they choose to measure.
He quoted Australian information designer Mitchell Whitelaw, who blogs at (The Teeming Void):
Accepting data shaped by someone else’s choices is a tacit acceptance of their view of the world, their notion of what is interesting or important or valid. … If we are going to work intelligently with data we must remember that it can always be constructed some other way.
This point seems like an especially important point for journalists just starting to delve into the world of data, which at first glance can seem like an ideal vision of impartiality, a realm of pure factuality that transcends "he said-she said."
Next, Migurski dove into the history of large-scale data collection. Charles Darwin cousin and quintessential Victorian gentleman scientist Sir Francis Galton ** helped launch the field of modern meteorology by orchestrating _ presumably through telegrams and letters sent weeks in advance _ the simultaneous collection of weather data across Europe. From that effort Galton produced possibly the first specimen of the modern weathermap. (He also apparently originated the practice of publishing weathermaps in newspapers.)
Migurski said when you think about the logistics involved in coordinating a continent-wide data collection effort in the days when the telegraph and the steam train were the most advanced technologies you had to work with, you start to appreciate how much we take for granted the wealth of data available today. After all, what's more mundane than a weather map? Migurski pointed to the Tagging of Pacific Predators program, in which Northern California marine biologists tagged great white sharks and built a Web interface that let anyone track their location in near-real time.
"Collecting data used to be a major major network problem," Migurski said. "Now you can just strap transmitters to animals."
Other troves accessible to data sleuths include the firehose of public social media data unleashed mainly by Twitter. We talked briefly about how to filter Twitter's Streaming API to cull tweets from specific locations, timeframes and keyword searches. Click here for a real-time JSON-structured stream from the Twitter firehose, but be warned: Migurski said let this run for too long and your machine will crash. Migurski showed us how he used a slice of geotagged Twitter data to plot tweets on a map of San Francisco that referenced various burrito-related keywords.
Speaking of JSON, Migurski introduced the concept to we noobs who had heard the term but couldn't quite work out what it meant or why it mattered. We learned that geoJSON offers the most powerful way to structure your data for mapping purposes, since many mapping applications have the standard baked in. (Twitter's streaming API and JSON have moved to the top of the "must tinker with" list.)
Compared to the first class, day two definitely journeyed deeper into the geek jungle of script libraries and structured data. But I'm hanging on and eager to take a crack at a first project. The more I learn, the more apparent the benefits of mastering these tools becomes.