Computer-assisted
reporting
By Fred Vallance-Jones
Get
to know your data
It
can be tedious work, so be prepared
One
of the great moments in computer-assisted reporting happens when
you tear
open that brown envelope and inside you find the disks or CD containing
the
government database you fought for months to receive. You finally
have information
in your hands; the mother lode that will pay you handsomely in
stories. Unfortunately, the elation often peters out quickly when
you realize you haven't a clue what it all means. It's time to
get to know your data.
The
first step of course is to load up the file (or files) into your
favourite database program. If you are lucky, the data will come
in a form that can be easily imported into your software. Government
agencies can often be persuaded to give you data in a file format
such as Microsoft Excel or comma-delimited text, both of which
are easy to import into common desktop database applications.
|
If
you are lucky, the data will come in a form that can be
easily imported into your software.
|
It's
still possible that agencies may give you data in a format that
needs to
be converted by a commercial data service house. The need to update
systems for the year 2000 resulted in a massive switch from antiquated
mainframes to PC-based computers. Many old-fashioned formats were
put out to pasture.
In
any case, once you have the data in your computer, you can start
looking at figures using your database software, be that Microsoft
Access or FoxPro, Corel Paradox or some other desktop system.
You
probably were given some information by the agency about the form
the data
are in and codes used in the database. If not, you should seek
that out now.
One
thing you need to know quickly is how the data are structured.
Is the information in a single table, also called a flat-file
database? Or are the data in multiple tables related by one or
more key fields, also called a relational
database?
Flat
files are inherently simpler to use. Relational databases are
more complicated and frequently used where there is recurring
information that is stored in separate tables. The database is
called “relational” because you are comparing a table with information
in the main table that contains all the data. Relational tables
provide plain-English translations for codes in the main table.
Once
you are clear on the structure, you can take a close look at the
data to
see what is in each of the fields or columns in the table that
contain the information. It is important to understand what the
designer of the database was tracking and how the task was accomplished.
At this stage you may again find you have questions that can only
be answered by contacting the keepers of the data. A call like
that can help avoid serious mistakes later.
Once
you understand the structure of the database and what all the
fields are
supposed to mean, you can begin "interviewing" your
data. That is, you want to develop a broad sense of what the database
can tell you.
What
can be added up and averaged; what can be sorted from biggest
to smallest;
what occurs the most and the least, and so on. These are the basic
questions
you must ask. If you are planning to link the database to another
unrelated one, you need to figure out how you will do that as
well.
Another
step you must take at this early stage is to clean up any dirty
data.
It is vitally important that you go through fields to ensure that,
for
instance, all of the occurrences of a city's name are spelled
correctly.
You
also need to do some basic addition to see if sums and counts
in your dataline
add compare with totals publicly available in sources such as
agency annual reports. If not, you have to go back to your agency
and ask some questions.
You
should also be looking for numbers that seem out of whack with
the rest of the file’s contents. An example of that would be numbers
in the millions
situated in a column where most numbers are in the thousands.
At the very least, you must ensure these aren't mistakes.
Only
when you have finished this stage of interviewing your data can
you plan the detailed queries that you will write to extract the
information that will form the basis for your stories.
This
column is by necessity lacking in technical explanations. So a
good place to go for answers is Brant Houston's excellent "Computer
Assisted Reporting-A
Practical Guide." You can order it on the Investigative Reporters
and Editors web site at www.ire.org.
Fred
Vallance-Jones is a municipal affairs reporter at The Hamilton
Spectator
and chair of the CAJ's computer-assisted reporting network.
You can
contact him by e-mail at fvjones@idirect.com