JournalismNet
By Julian Sher
Searching By Format
By default,Google
and all the other search engines search web pages for the words
you want.And most people assume that’s all you can search on the
web.
In fact, the web is also home to many other kinds of documents
— slide shows, financial tables and other documents.And there
is a little-known feature of Google Advanced that allows you to
find these treasures. (You’ll find Google Advanced by clicking
on the “Advanced Search” link on the main Google page or by going
to www.google.com/advanced_search.html )
The third line in the
Google Advanced box is called File Format.By clicking on the small
arrow on the right hand side, you can tell Google to narrow your
search not just to web pages, but other formats such as .ppt,
.pdf, .doc and .xls files. You should never do only a format search.
Obviously, good information about many subjects can be found on
normal web pages and you should always do a general search for
your topic as well. But by selecting certain specialized format
searches,Google is helping you hunt through the invisible web
— the millions of documents and slide shows and appendages that
do not appear at first glance on web pages.
Let’s take a look
at each of these formats.
SLIDE SHOWS
PPT files are PowerPoint
slide shows.Many companies, governments and organizations put
slide shows on the web for their members or the public.You can
find these shows and display the images,graphics and information
on your own computer. For example, if you wanted to see if the
American military had produced a slide show on the anthrax vaccine:
- go to Google Advanced and put in anthrax as your keyword - under
Format, select .ppt - under Domain, type in .mil (This restricts
the search to the military sites of the US Armed Forces. For more
on Searching by Domain, see the JournalismNet column in Media
magazine Fall 1999, vol.6- www.journalismnet.com/tips/domain.htm)
If you click on any of the results, you can watch the slide show
— complete with graphics — and save it on your computer— providing,of
course,you have PowerPoint installed. If you don’t have PowerPoint
installed on your machine,you can at least see the text and images
of the slide show in a web page format.You can do this by clicking
on the words “VIEW AS HTML” right next to the results. You won’t
get the effect of the show but you will get the content. SPREADSHEETS
Companies, academics and individuals store financial and other
data in tables or spreadsheets created by Microsoft Excel.These
are in .XLS format.
If you are looking for statistics, growth
rates, comparisons — anything that is likely to be best written
or presented in a table — then you can assume someone has written
it and stored it on the web in an Excel format. For example, let’s
say you want to examine gasoline and cigarette taxes by province
in Canada: - go to Google Advanced and put in as your keywords
cigarette taxes world - under Format, select .xls
When you click
on the result,your Excel program launches and gives you the table.(Again,
if you don’t have Excel installed on your machine, you can at
least see the text in a web page format by clicking on the words
“VIEW AS HTML”right next to the results.You won’t get the complete
effect of the table but it is usually quite comprehensible.) Tables
do not have to be strictly financial or economic. People use Excel
to track any kind of historical or statistical trend.
Here is
another example.Research on the global arms trade should be done
using the usual general search techniques. But go to Google Advanced
and perform an .XLS format search like this: - put in as your
keywords global arms trade - under Format, select .xls
REPORTS
The third format we’ll look at are .pdf Acrobat reader documents.
When companies or governments want to put out official documents
and preserve the look of those documents, they produce documents
in the .PDF format. PDF stands for portable document format and
it allows you the reader to see the document in exactly the same
format — with all the illustrations,boxes, forms and page breaks
— as the person who created it.
> For example,many government tax
departments produce PDF files on the web to make sure everyone
fills out the same form. Many application forms, manuals and official
reports exist on the web in .PDF format. (You must have Adobe
Acrobat Reader to retrieve .pdf documents.Adobe’s reader is free
and can be downloaded at www.adobe.com.)
Let’s say you were investigating
the controversy over the use of depleted uranium in Kosovo.Go
to Google Advanced and make the following request: - put in as
your keywords “depleted uranium kosovo.” -under Format, select
.pdf You’ll get several results, including a 74-page report from
a UN mission.
WORD DOCUMENTS
Many people also use Microsoft Word
to write reports, essays, studies and pretty much everything else
they type on their computers except for email.The documents are
stored on your computer as .DOC files. Some web sites will then
simply import their members’writings or reports as .doc files.This
is not as useful a search as the other formats,but it can still
turn up some gems.
Universities in particular post a lot of .doc
files on their web sites since so many professors and students
do their work in Word and it is easier to share information that
way. Armed with this knowledge,you can do some sophisticated searching.Let’s
say you need research on the Russian election of 1999 that brought
Vladimir Putin to power. You assume Harvard University has done
some work on this. Go to Google Advanced: - put in as your keywords
russia election 1999 - under Format, select .doc - under Domain,
type in .harvard.edu
(This restricts the search to the Harvard
web site. You will get several research papers on the topic.If
you click on the links,you will open up a Microsoft Word document.
To sum up,we have seen how to search for official reports and
documents (.pdf and .doc files),tables and spreadsheets (.xls
files) and slide shows (.ppt files).
But be careful — you should
always combine these restricted searches with a general search
for all kinds of web pages to make sure you don’t miss some valuable
sites.
Julian
Sher, the creator and webmaster of Journalism Net (www.journalismnet.com),
does Internet training in newsrooms around the world. He can be
reached by email at jsher@journalism.com.
This article and many other columns from Media magazine are available
online with hot links on the JournalismNet Tips page at
www.journalismnet.com/tips