Preserving Pierre Trudeau's Memory
Spring 2001

Contents

Cover Stories


Departments

Media Magazine

Publisher
Nick Russell


Editor
David McKie

Books Editor
Gillian Steward

Legal Advisor
Peter Jacobsen
(Paterson McDougall)

Magazine Designer
Ric Kadubiec


Editorial Board
Chris Cobb
Wendy McLellan
Sean Moore
Catherine Ford
J.T. Grossmith
Linda Goyette
John Gushue
Carolyn Ryan

Advertising Sales
John Dickins
(613) 526-8061
Fax: (613) 521-3904
E-mail: caj@igs.net

Administrative Director
John Dickins
(613) 526-8061
Fax: (613) 521-3904
E-mail: caj@igs.net

Subscribe to Media!


Please forward any comments or suggestions for
Media Magazine's page to Media Magazine.


  






Computer-assisted reporting
By Fred Vallance-Jones 

Get to know your data
It can be tedious work, so be prepared  
 
 
One of the great moments in computer-assisted reporting happens when you tear open that brown envelope and inside you find the disks or CD containing the government database you fought for months to receive. You finally have information in your hands; the mother lode that will pay you handsomely in stories. Unfortunately, the elation often peters out quickly when you realize you haven't a clue what it all means. It's time to get to know your data.

The first step of course is to load up the file (or files) into your favourite database program. If you are lucky, the data will come in a form that can be easily imported into your software. Government agencies can often be persuaded to give you data in a file format such as Microsoft Excel or comma-delimited text, both of which are easy to import into common desktop database applications.

 

If you are lucky, the data will come in a form that can be easily imported into your software.

 

It's still possible that agencies may give you data in a format that needs to be converted by a commercial data service house. The need to update systems for the year 2000 resulted in a massive switch from antiquated mainframes to PC-based computers. Many old-fashioned formats were put out to pasture.

In any case, once you have the data in your computer, you can start looking at figures using your database software, be that Microsoft Access or FoxPro, Corel Paradox or some other desktop system.

You probably were given some information by the agency about the form the data are in and codes used in the database. If not, you should seek that out now.

One thing you need to know quickly is how the data are structured. Is the information in a single table, also called a flat-file database? Or are the data in multiple tables related by one or more key fields, also called a relational database?

Flat files are inherently simpler to use. Relational databases are more complicated and frequently used where there is recurring information that is stored in separate tables. The database is called “relational” because you are comparing a table with information in the main table that contains all the data. Relational tables provide plain-English translations for codes in the main table.

Once you are clear on the structure, you can take a close look at the data to see what is in each of the fields or columns in the table that contain the information. It is important to understand what the designer of the database was tracking and how the task was accomplished. At this stage you may again find you have questions that can only be answered by contacting the keepers of the data. A call like that can help avoid serious mistakes later.

Once you understand the structure of the database and what all the fields are supposed to mean, you can begin "interviewing" your data. That is, you want to develop a broad sense of what the database can tell you.

What can be added up and averaged; what can be sorted from biggest to smallest; what occurs the most and the least, and so on. These are the basic questions you must ask. If you are planning to link the database to another unrelated one, you need to figure out how you will do that as well.

Another step you must take at this early stage is to clean up any dirty data. It is vitally important that you go through fields to ensure that, for instance, all of the occurrences of a city's name are spelled correctly.

You also need to do some basic addition to see if sums and counts in your dataline add compare with totals publicly available in sources such as agency annual reports. If not, you have to go back to your agency and ask some questions.

You should also be looking for numbers that seem out of whack with the rest of the file’s contents. An example of that would be numbers in the millions situated in a column where most numbers are in the thousands. At the very least, you must ensure these aren't mistakes.

Only when you have finished this stage of interviewing your data can you plan the detailed queries that you will write to extract the information that will form the basis for your stories.

This column is by necessity lacking in technical explanations. So a good place to go for answers is Brant Houston's excellent "Computer Assisted Reporting-A Practical Guide." You can order it on the Investigative Reporters and Editors web site at www.ire.org.


Fred Vallance-Jones is a municipal affairs reporter at The Hamilton Spectator and chair of the CAJ's computer-assisted reporting network. You can contact him by e-mail at fvjones@idirect.com