Tool of the Day: Google Refine

When it comes to working with and presenting data, Google reigns supreme. We’ve covered Google’s Chart Wizard, Google’s Public Data Explorer, and even ways to run a news website using Google Docs (with WordPress). Another of Google’s powerful data tools, Google Refine, lets users work with “messy” data sets and transform them into something amazing. Check out Part 1 of the Google Refine screencast.

Unlike Google’s general web-based data services, Google Refine is a standalone desktop application. Formerly known as Freebase Gridworks, the Google Refine tool has been used by the Chicago, and most famously by ProPublica for their “Dollars for Docs” investigation series from October 2010. Once you download and install the Google Refine tool, you interact with it through your web browser. You can create a new project from scratch, or you can import data sets from files stored on your computer. When your data is imported, that’s where the real power of the tool comes through.

You can use facets and filters to create subsets of data, as well as format strings of data which match your search patterns. For example, if you see the term “as soon as possible” and “ASAP” in the same data set, you can reformat both data strings to match each other. For more complicated queries, you can use the Google Refine Expression Language (GREL) to create regular expressions and isolate substrings of data to separate columns.

Once you’re done with formatting your data, Google Refine lets you export your work in a number of different formats, including as an Excel spreadsheet, an HTML table, or as JSON data, which you can change to match a wiki-style format. Google Refine also lets you hook into open web services, such as Google’s Language Detection Service or the open map service Nominatim.

Google Refine is a free download and is available for Windows, Mac, and Linux.