GAFFTA visualizing and mapping data class, day 2

By Marcus Wohlsen | Feb. 14, 2012, 5:18 p.m.

The theme of day two at GAFFTA's data visualization and mapping class was where data comes from.* To get under way, Michal Migurski went meta about data, explaining that data doesn't simply hover in the ether, a value-neutral substance waiting to be collected in our nets. Rather, data and data sets represent choices made by the people collecting the data that reflect their point of view, embodied in what and how they choose to measure.

He quoted Australian information designer Mitchell Whitelaw, who blogs at (The Teeming Void):

Accepting data shaped by someone else’s choices is a tacit acceptance of their view of the world, their notion of what is interesting or important or valid. … If we are going to work intelligently with data we must remember that it can always be constructed some other way.

This point seems like an especially important point for journalists just starting to delve into the world of data, which at first glance can seem like an ideal vision of impartiality, a realm of pure factuality that transcends "he said-she said."

Next, Migurski dove into the history of large-scale data collection. Charles Darwin cousin and quintessential Victorian gentleman scientist Sir Francis Galton ** helped launch the field of modern meteorology by orchestrating _ presumably through telegrams and letters sent weeks in advance _ the simultaneous collection of weather data across Europe. From that effort Galton produced possibly the first specimen of the modern weathermap. (He also apparently originated the practice of publishing weathermaps in newspapers.)

Sir Francis Galton's map of weather data collected across Europe the morning of Dec. 30 1861.

Sir Francis Galton's map of weather data collected across Europe the morning of Dec. 30 1861. (via The Morning News)

Migurski said when you think about the logistics involved in coordinating a continent-wide data collection effort in the days when the telegraph and the steam train were the most advanced technologies you had to work with, you start to appreciate how much we take for granted the wealth of data available today. After all, what's more mundane than a weather map? Migurski pointed to the Tagging of Pacific Predators program, in which Northern California marine biologists tagged great white sharks and built a Web interface that let anyone track their location in near-real time.

"Collecting data used to be a major major network problem," Migurski said. "Now you can just strap transmitters to animals."

Other troves accessible to data sleuths include the firehose of public social media data unleashed mainly by Twitter. We talked briefly about how to filter Twitter's Streaming API to cull tweets from specific locations, timeframes and keyword searches. Click here for a real-time JSON-structured stream from the Twitter firehose, but be warned: Migurski said let this run for too long and your machine will crash. Migurski showed us how he used a slice of geotagged Twitter data to plot tweets on a map of San Francisco that referenced various burrito-related keywords.

Speaking of JSON, Migurski introduced the concept to we noobs who had heard the term but couldn't quite work out what it meant or why it mattered. We learned that geoJSON offers the most powerful way to structure your data for mapping purposes, since many mapping applications have the standard baked in. (Twitter's streaming API and JSON have moved to the top of the "must tinker with" list.)

Migurski introduced us to two other powerful visualization tools, both courtesy of former Stamen intern and current Square engineer Mike Bostock. The first, Polymaps, offers an open-source vector-based collection of map tiles that render and move with exquisite smoothness. The other, only mentioned in passing, is D3, which I'm possibly even more excited to discover. D3 is a lightweight Javascript framework that offers myriad ways to visualizing data. Bostock links to some great examples of D3 in the wild. Just yesterday, I dug into the guts of this great New York Times interactive detailing President Obama's proposed 2013 budget and found D3 lurking beneath.

Compared to the first class, day two definitely journeyed deeper into the geek jungle of script libraries and structured data. But I'm hanging on and eager to take a crack at a first project. The more I learn, the more apparent the benefits of mastering these tools becomes.

* I'm making an executive decision that for this blog, I generally will treat the word "data" as a singular noun, like "group" or "family."

** I hadn't heard of Galton before this class and have since learned he held some appalling, racist views as a leading Victorian-era proponent of eugenics. I still think the map is cool but wanted to be clear I'm not offering the guy some kind of wholesale endorsement.

blog comments powered by Disqus