The First Amendment is under attack. Fight back with us. Visit to find out how.

Member Login | Join SPJ | Benefits | Rates

> Latest News, Blogs and Events (tap to expand)

Advertise with SPJ

News and More
Click to Expand Instantly

Journalist's Toolbox


Stay in Touch
Twitter Storify Facebook Google Plus
RSS Pinterest Pinterest Flickr

Current Issue
Browse Archive
About Quill
Advertising Info
Back Issue Request
Reprint Permission Form
Pulliam/Kilgore Internship Info

Search Quill

SPJ Blogs
SPJ Leads
The EIJ News
Press Notes
SPJ News
Open Doors
Geneva Conventions
Annual FOI Reports

Home > Publications > Quill > Data Journalism for a Database Age

Current Issue | Browse Archive | About Quill | Advertising Info
Back Issues | Reprint Permission Form

Search Quill

Tuesday, October 22, 2013
Data Journalism for a Database Age

At a time when access to public information and data is easier to obtain than ever, schools are finding ways to teach a skill that all journalists -- not just investigative reporters -- should know.

By Kara Hackett

For decades, journalism schools followed one step behind the industry. But the evolution of technology in the ’90s that disrupted traditional newsrooms gave some schools the chance to get ahead.

“We saw the opportunity to not just keep up with the news industry, but to have a leadership role in that,” said Christopher Callahan, dean of the Walter Cronkite School of Journalism at Arizona State University.

Arizona State was one of the first schools to adopt what journalism-funding foundations like Knight call a “teaching hospital” method of instruction. It’s an initiative to bring the best professional reporters into the classroom and educate the next generation with hands-on experience innovating for the digital age.

One of the first professional reporters to join Arizona State’s staff in 1996 was Steve Doig, a veteran editor and investigative reporter at the Miami Herald.

Doig was the first-ever full-time professor hired for computer-assisted reporting, also known as data journalism. It’s a reporting process that uses spreadsheet programs to generate statistics from public records and data sets.

Although data journalism has been taught at schools for decades, it’s becoming a more critical component of curriculums to help students process the influx of information available in digital databases and file effective public records requests.

“There is more data available today than ever before, more ways to analyze data than ever before and more ways to tell data stories than ever before,” Callahan said. “All of these areas are growing.”


Doig has been on the front lines of journalism’s digital evolution since he was composing his own simple programs in the ’60s as a student at Dartmouth College and later toying with an Atari 800 in his spare time at the Miami Herald.

It wasn’t long before he convinced the Herald to buy an IBM PC equipped with the earliest spreadsheet program called VisiCalc to help him manage the legislative roll call and state house budgets.

“Over time at the Herald, I went from being a reporter using a computer for something other than a writing instrument to it finally becoming my specialty,” Doig said.

Doig’s work earned him a Pulitzer Prize in 1993 for his article “What Went Wrong,” investigating the construction of Dade County, Fla., houses destroyed in Hurricane Andrew.

After the roof of Doig’s fairly new house blew off in the storm, he began questioning contractors’ building methods, so he collected data about the damage to houses, the strength of winds those houses experienced and when the houses were constructed. When he overlaid these data sets on a map, he noticed that the newer a house was, the more vulnerable it was to high winds.

His findings prompted him to question contractors who admitted that a housing boom in the area caused builders to overlook certain safety precautions and overworked building inspectors to the extent of neglect.

“The key thing data journalism does is allow us to tackle issues that otherwise would be impossible to talk about in any authoritative way,” Doig said. “It’s going beyond the anecdotes and adding evidence to the story.”

Now that more information is available in digital databases, he said, every journalism student should learn to generate statistics they can weave into their stories for maximum impact and added context.


One advantage of a good database is that it helps reporters pinpoint outliers in data sets and stay one step ahead of the latest news, Doig said.

Another advantage to well-planned data journalism is that it allows reporters to write fast-turn-around, in-depth articles as news breaks.

When Mark Wilson was a student reporter at Texas State University in the spring of 2011, the perfect opportunity to breathe life into his research and statistical analysis came three hours before the deadline of his campus newspaper’s final production that year.

After nearly three months of collecting and categorizing five years of police crime records from the Santa Marcos police station, Wilson heard on police scanners that the cops had finally canned a pair of thieves breaking into apartments all over the city.

In a matter of hours, he pulled together an article that highlighted the breaking news element and put his statistics into larger context for the student body. It taught him the importance of pairing the right anecdotes with his statistics to make them more powerful.

“You can’t just throw out a pile of statistics, and say, ‘Here you go,’” Wilson said. “It’s our job as reporters to make it matter. We have to find a hook and make people want to read it.”

Wilson learned data journalism as part of an independent study course with Texas State journalism professor Kym Fox (who is also a co-adviser to the Texas State SPJ chapter).

Fox said Texas State doesn’t offer any courses primarily focused on freedom of information laws, so she helps students file public records requests and use the information they receive in an independent study with the Texas Light of Day Project.

Three Dallas Morning News reporters with support from the Freedom of Information Foundation of Texas launched the Light of Day Project in 2004 to encourage college students in investigative work.

The project coordinates massive open records requests throughout the state by having students submit Texas Public Information Act requests to their schools and local government agencies based on the same statewide topic.

Then when students get the information they requested, they clean and process their own data sets before compiling them into a larger database where they can be studied and published for use in investigative stories at the state, local and university levels.

“Once students get into it, they feel the power of it and they really like it,” Fox said.

The topic was crime for the 2011 spring semester when Wilson participated in the Light of Day Project at Texas State.

He was assigned five years of police crime records from the San Marcos police station. It took the station two weeks to round up the information he requested. Then it took him about two months to decode it and comb through it.

Now as a reporter at a San Antonio Express-News publication called Conexión, Wilson files public records requests and analyzes data frequently. He said it’s worth the effort to find statistics because they reveal trends that public relations staff and government agencies might not willingly share or even know exist.

“If you can pull a document, you’ll probably get more than anyone will ever tell you,” Wilson said.

Another student in his independent study uncovered surprising information about crime on Texas State’s campus that taught her the importance of records requests and the risk of relying on statistics posted online.


When Jordan Gass-Pooré was part of the Light of Day Project with Wilson in spring 2011, she learned that information on institution websites is not always correct, and correct information is not easy to obtain.

She was responsible for working with the campus public relations staff and the university’s police chief to gather five years of crime statistics at dorms and apartments on Texas State’s campus.

But she kept getting emails and phone calls from university representatives asking for more time to collect the information. It took her two and a half months to get what she needed.

“I remember waiting around so long,” Wilson said.

Since acquiring the information took longer than she expected, her campus newspaper, The University Star, had to postpone her article about burglary at campus dorms and apartments until June 2011.

But while she was collecting the crime and theft records, Gass-Pooré noticed a major discrepancy between data on the school’s website and data she obtained through her records request that she was never able to get campus authorities to affirm.

Online reports showed that Texas State’s newest, most expensive dorm on campus had the lowest theft rate. But when Gass-Pooré got the crime records, she saw that the new dorm actually had the highest theft rate of all the dorms.

“It made me wonder, if there were discrepancies in these records, what else could there be discrepancies in?” Gass- Pooré asked.

Critics of data journalism argue that reporters and agencies can skew statistics to show what they want. But Doig tells them statistics are just like any other source at a reporter’s disposal. They can be accurate or inaccurate, used or abused. That’s why reporters need to verify their statistics by double-checking the methods of whoever gathered the information and processed the numbers.

He said all statistics are limited to some extent, but most government agencies have fine print and metadata that explains their limitations.

“You have to be aware of those limitations and acknowledge them when you’re using the data in your story,” Doig said. “If you become aware that just because numbers come out of a computer doesn’t mean they’re accurate, then you become more cautious of the data you have.”

Although the digital age made precision reporting possible, Light of Day Project Director Dan Malone fears it may have also induced an overreliance on the records that schools and government agencies provide on their websites.

“I hear people say journalists are too skeptical, too quick to challenge authority, and I think it’s just the opposite,” Malone said. “I think sometimes we’re not skeptical enough.”

When he was a student at the University of Texas, he scoured public offices at the courthouse and city hall, chatting with clerks about what records they kept.

He asked to see the forms people filled out when they filed for information or filed a complaint.

“I would see what information the government collected and that would help me know what to look for,” Malone said.

Even though he acknowledges that very little information is on paper these days, he tells students to study government websites and see what information the government collects, how it is collected and how it is stored. He also tells them to look for holes and unanswered questions. Then sharpen their records requests to fill the gaps.

“The stuff on government websites is what government wants us to know, and that may or may not be what we need to know,” Malone said. “If journalists don’t ask those questions about what we need to know, then who will?”

That’s the thought behind Open Missouri, a website started by professor David Herzog to help reporters learn what information state agencies have that they aren’t readily posting online and give them the tools to write records requests for specific databases.

“Journalists feel frustrated by (asking for public records) because they say, ‘How do I know what to ask for if I don’t know what you have?’” Herzog said. “One of the big things I wanted to do with Open Missouri was to create a place where people could go and learn about what databases were held offline by government agencies.”

After working as an investigative reporter for five years at the Providence Journal in Rhode Island, Herzog became an associate professor of print and digital news at the Missouri School of Journalism and an academic adviser to the National Institute for Computer-Assisted Reporting. He created Open Missouri as part of his 2010-11 Reynolds Journalism Institute fellowship.

When the website launched in March 2011, Herzog posted 135 formerly offline databases on Open Missouri. Today there are 250 databases, he said.

His students help collect information about the agencies, and those in his data journalism class use the website to make sunshine requests.


vvOne of Herzog’s graduate students, Jon McClure, developed an interest in data journalism during an assistantship at the National Institute for Computer- Assisted Reporting, where he learned to clean and process data sets with the support of a data journalism community.

Coming into the assistantship, Mc- Clure had basic training in Microsoft Excel, but he picked up higher-level technical skills by working on data projects.

He said reporters who want to advance their knowledge of data journalism should start by finding a project they can use for data reporting and then seek advice from experts in the field.

“I think there is a timidity to learning some technical skills, and that said, I think there is a significant investment you have to make to learn some of those skills,” McClure said. “We say nobody really learns the skills unless you have a project to apply them to.”

One of McClure’s most recent projects was a two-part analysis of statistics about race and policing in Columbia, Mo. He published the study in spring 2013 as a public safety beat reporter for the Columbia Missourian.

When his editor asked him to make a simple map of the city’s traffic stops, he began investigating why the stops were more highly concentrated in specific areas around the city.

Following a 1999 national poll that found that more than half of the population believed the police racially profiled citizens, Missouri’s attorney general began publishing an annual report in 2000 measuring how fairly races were represented in traffic stops around the state.

The annual report consistently indicated that blacks in cities such as Columbia were unfairly targeted by police with an influx of traffic stops near their homes.

McClure said the city as a whole wasn’t sure what to do with the attorney general’s reports, so he cleaned the data and prepared statistics using some of the software he had learned about in Herzog’s class.

But answering his question about the geography of the traffic stops required more data-specific geospatial techniques, so he gleaned ideas from other reporters solving similar problems. He studied peer-reviewed research papers and bounced ideas off data journalists, such as Ben Poston, assistant data editor at the Los Angeles Times.

McClure’s research eventually led him to software called GeoDa created by Luc Anselin, director of the GeoDa Center for Geospatial Analysis and Computation at Arizona State University.

Using Anselin’s software and techniques, McClure found that, contrary to the attorney general’s report, a geographic or neighborhood-based bias is a more significant factor than a racebased bias against drivers in Columbia. He hopes his findings shed new light on the city’s decade-long discussion.

“Once you know where the problem lies, you can work to better fix it,” Mc- Clure said.


Although generating complicated data visualizations like McClure’s takes hours of research and analysis, Doig said reporters can do basic data journalism if they can add, subtract and multiply.

He teaches students at Arizona State to use familiar programs, such as Microsoft Excel, to mine, filter and summarize data.

Ultimately, he said, two of the greatest misconceptions about data journalism are that it’s impossibly difficult and it’s only for so-called investigative reporters. “Data is being gathered across all beats,” Doig said. “If you are blind to that data, you’re going to miss a lot of stories.”

Kara Hackett was SPJ's 2013 Pulliam/Kilgore Freedom of Information intern. She now works at the Fort Wayne (Ind.) Journal Gazette. Interact on Twitter: @karahackett. Send letters to

Stay in Touch
Twitter Storify Facebook Google Plus RSS Pinterest Pinterest
Flickr LinkedIn Tout

Current Issue
Browse Archive
About Quill
Advertising Info
Back Issue Request
Reprint Permission Form
Pulliam/Kilgore Internship Info

Search Quill

SPJ Blogs
SPJ Leads
The EIJ News
Press Notes
SPJ News
Open Doors
Geneva Conventions
Annual FOI Reports

Copyright © 1996-2017 Society of Professional Journalists. All Rights Reserved.

Legal | Policies

Society of Professional Journalists
Eugene S. Pulliam National Journalism Center
3909 N. Meridian St., Suite 200
Indianapolis, IN 46208
317/927-8000 | Fax: 317/920-4789

Contact SPJ Headquarters
Employment Opportunities
Advertise with SPJ