Speaking truth with numbers


Data visualization has made journalism more objective

By Srinivasan Ramani

“A picture is worth a thousand words” is a well-repeated cliché. But it is not any more in journalism, where the increasing use of data visualization to tell stories has revolutionized the field, making journalism more objective, more interpretative and bringing authenticity to storytelling.

The workflow in data journalism has three separate processes. The first is data sourcing and preparation. This can be done in various ways — either through direct sourcing from public documents, or from surveys or indirectly through methods such as “web scraping” and creation of data sets from digital resources. Web scraping requires a lot of refining and cleaning up of data from various sources like PDF documents, HTML pages and text files. There are several free tools available for this job. At an advanced level, a working knowledge of the python programming language and various libraries which aid in HTML scraping is useful.

Also read this: Essential tips and tools for beginning data journalists

Analyzing data for journalistic purposes does not require one to be a trained statistician but one needs to be at least familiar with simple statistical concepts.

Some document caches from which data is to be created are so large that it is difficult to parse or prepare useful tables out of them without the help of a much larger team than what newspapers typically have. Simon Rogers (who was formerly The Guardian’s data editor), in his book on data journalism, Facts are Sacred writes how The Guardian used techniques such as crowdsourcing to obtain big data used to come up with stories, like the MP expenses scandal in the United Kingdom. Mr. Rogers rightly points out in his book that data journalism is “80% perspiration, 10% great idea and 10% output” — a statement that rings true and puts emphasis on the first process of data preparation.

Also read this: Not Numbers, but Numbers Which Matter

The next step in data journalism is analyzing the data and looking for patterns, rules, exceptions, in order to tell a coherent story. For non-coders — most data journalists come under this category — this typically involves a lot of work with spreadsheets, pivoting tables, simple statistical analyses and so on. Analyzing data for journalistic purposes does not require one to be a trained statistician but one needs to be at least familiar with simple statistical concepts (for example, correlation does not amount to causation). If one requires a crash course in basic econometrics, D.N. Gujarati’s book (of the same name) is a good place to start.

The third part in data journalism is data visualization, the most exciting feature that has galvanized the digital medium in particular. The journalist needs an intuitive feel of how to present a data graphic that explains the story in an effective manner. Various software tools — like fusion tables, chart wrappers and the D3 (dynamic document design) javascript library — are freely available but to get a familiarity with graphic design, statistician and political scientist Edward Tufte’s books The Visual Display of Quantitative Information and Envisioning Information are very useful guides.


This post was originally published on The Hindu and is reproduced here with permission.


Main Image: FoxBusiness