Out from the Shadows
Winter 2002

Contents

Features

Departments

Media Magazine

Publisher
Nick Russell


Editor
David McKie

Books Editor
Gillian Steward

Legal Advisor
Peter Jacobsen
(Paterson McDougall)

Magazine Designer
Ric Kadubiec


Editorial Board
Chris Cobb
Wendy McLellan
Sean Moore
Catherine Ford
J.T. Grossmith
Linda Goyette
John Gushue
Carolyn Ryan

Advertising Sales
John Dickins
(613) 526-8061
Fax: (613) 521-3904
E-mail: caj@igs.net

Administrative Director
John Dickins
(613) 526-8061
Fax: (613) 521-3904
E-mail: caj@igs.net

Subscribe to Media!


Please forward any comments or suggestions for
Media Magazine's page to Media Magazine.


  






JournalismNet
By Julian Sher

Using Google to narrow your searches

A visit to its advanced search page will provide the help you need


When you do any kind of search in Google or any other search engine, the computer searches for the words you ask for ANYWHERE ON THE PAGE.

Google is smart enough to return web pages where the word you asked for is prominent — in the title or first lines. But this is not a guarantee: your word could turn up at the very bottom of a web page.

This means that many of your results are irrelevant — they talk about another topic entirely and might only mention your keyword at the very end of a long web page.

For example, go to Google and put in a simple search for AIDS in Africa. You'll turn up over 750,000 web pages. Some have the words AIDS and AFRICA high up, but many other pages talk about other diseases such as malaria and complain that AIDS is getting too much attention; still other pages talk about AIDS in Asia and only mention Africa by comparison.

But Google returns all these pages because the words AIDS and AFRICA appear in these pages, even if the concept or focus of these web pages has nothing to do with your subject.

The solution

Google offers a solution on its advanced search page. There is a drop-down menu on the fourth line from the top called OCCURENCES. You can click on the arrow on the right hand side and change the location of where you want your search word to appear.

By default, it is set to ANYWHERE ON THE PAGE. But you can change that to IN THE TITLE or IN THE URL or to other options by clicking on the small arrow

The most useful location is TITLE. This allows you to search for any keywords or exact phrases that appear in the title of the page.

One thing of which you can be sure — if the words you are looking for appear in the title of the page, the entire web page is devoted to your subject.

For example, make a request in Google Advanced for the words AIDS and Africa in the title of a web page. Your results have dropped from 750,000 pages to less than 6000.

And the big advantage here is that you can be certain that all the web pages in your list of results focus on the concept of AIDS in Africa, and do not just mention those two words in passing.

Limits of a title search

Still, there are two limits to a title search of which you should be aware.

First, you might sometimes get pages where the words do not appear in the headline or the apparent page title; conversely, you might sometimes miss a page where the words do appear but Google does not retrieve them.

Why?

By title, Google is actually searching for the title the webmasters give their pages. These titles are written in hidden HTML code, which you never see, but the computer does. Usually, the title the webmaster gives corresponds to the title or headline you see on the web page, but not always. That means sometimes in your results you will get pages where the words you requested don't appear to be in the headline.

For example, the UN has a good news page on Kosovo at www.un.org/peace/kosovo/pages/kosovo1.shtml
But the hidden source coding shows the webmaster forgot to put in a title, so the page is UNTITLED as far as Google is concerned. Google would not retrieve this web site if you asked for pages with the words UN and Kosovo. (The geeks among you can peek at the source code by going to View/Source on the Explorer toolbar and View/Page Source in Netscape.)

The second limit is that you might miss pages that are entirely devoted to your topic but don't have the keywords in the title. For example, an excellent page on women, poverty and AIDS has as its title: "Threat More Devastating than Disease."

It would not come up in a title search about women, poverty and AIDS because those words don't appear in the title

This means that doing a title search is a good way to begin your search but you should never stop there. Always do a general search to look for pages that mention your keywords anywhere on the page to make sure you have found all the relevant pages.

Searching by URLs

One way to get around the limits of a title search is to change the OCCURENCES selection in Google Advanced to the URL of a web page. The URL is the web address.

It is the web editor for every site that decides what the URL will be for the sub-pages. And usually, for simplicity, the editor names the pages with simple keywords that allow him or her to identify what the page is about. It's similar to what you do on your computer when you save a document. If you label a draft for an article you are writing about nuclear safety nuclear.doc, it's easier for you to find and retrieve it later.

For example, the Washington Post has sub-sections that work like this:

www.washingtonpost.com
www.washingtonpost.com/world
www.washingtonpost.com/world/mideast
www.washingtonpost.com/world/mideast/editorials

A URL search can be a quick way to find entire sections of a web site that address your topic. If you wanted to know what the Guardian newspaper in London had said about Kosovo, go to Google Advanced and request this search: guardian kosovo

Similarly, a hunt for news about AIDS in China from CNN can be accomplished in Google Advanced with an URL search for: cnn aids china

Or if you want to know what the groups Human Rights Watch (at www.hrw.org) has to say about Mexico, try this URL search in Google Advanced: hrw mexico

Searches by URL and title are not perfect. But they can help you weed out irrelevant results when you do a general search for the words anywhere on the page.


Julian Sher, the creator and webmaster of Journalism Net (www.journalismnet.com), does Internet training in newsrooms around the world. He can be reached by email at jsher@journalism.com. This article and many other columns from Media magazine are available online with hot links on the JournalismNet Tips page at www.journalismnet.com/tips