JournalismNet
By Julian Sher
Using Google
to narrow your searches
A visit
to its advanced search page will provide the help you need
When you do any kind of search in Google or any other search engine,
the computer searches for the words you ask for ANYWHERE ON THE
PAGE.
Google is
smart enough to return web pages where the word you asked for
is prominent in the title or first lines. But this is not a
guarantee: your word could turn up at the very bottom of a web
page.
This means
that many of your results are irrelevant they talk
about another topic entirely and might only mention your keyword
at the very end of a long web page.
For example,
go to Google and put in a simple search for AIDS in Africa. You'll
turn up over 750,000 web pages. Some have the words AIDS and AFRICA
high up, but many other pages talk about other diseases such as
malaria and complain that AIDS is getting too much attention;
still other pages talk about AIDS in Asia and only mention Africa
by comparison.
But Google
returns all these pages because the words AIDS and AFRICA appear
in these pages, even if the concept or focus of these web pages
has nothing to do with your subject.
The solution
Google offers
a
solution on its advanced search page. There is a drop-down
menu on the fourth line from the top called OCCURENCES. You can
click on the arrow on the right hand side and change the location
of where you want your search word to appear.
By default,
it is set to ANYWHERE ON THE PAGE. But you can change that to
IN THE TITLE or IN THE URL or to other options by clicking on
the small arrow
The most useful
location is TITLE. This allows you to search for any keywords
or exact phrases that appear in the title of the page.
One thing
of which you can be sure if the words you are looking
for appear in the title of the page, the entire web page is devoted
to your subject.
For example,
make a request in Google Advanced for the words AIDS and Africa
in the title of a web page. Your results have dropped from 750,000
pages to less than 6000.
And the big
advantage here is that you can be certain that all the web pages
in your list of results focus on the concept of AIDS in Africa,
and do not just mention those two words in passing.
Limits
of a title search
Still, there
are two limits to a title search of which you should be aware.
First, you
might sometimes get pages where the words do not appear in the
headline or the apparent page title; conversely, you might sometimes
miss a page where the words do appear but Google does not retrieve
them.
Why?
By title,
Google is actually searching for the title the webmasters give
their pages. These titles are written in hidden HTML code, which
you never see, but the computer does. Usually, the title the webmaster
gives corresponds to the title or headline you see on the web
page, but not always. That means sometimes in your results you
will get pages where the words you requested don't appear to be
in the headline.
For example,
the UN has a good news page on Kosovo at www.un.org/peace/kosovo/pages/kosovo1.shtml
But the hidden source coding shows the webmaster forgot to put
in a title, so the page is UNTITLED as far as Google is concerned.
Google would not retrieve this web site if you asked for pages
with the words UN and Kosovo. (The geeks among you can peek at
the source code by going to View/Source on the Explorer toolbar
and View/Page Source in Netscape.)
The second
limit is that you might miss pages that are entirely devoted to
your topic but don't have the keywords in the title. For example,
an excellent page on women, poverty and AIDS has as its title:
"Threat More Devastating than Disease."
It would not
come up in a title search about women, poverty and AIDS because
those words don't appear in the title
This means
that doing a title search is a good way to begin your search but
you should never stop there. Always do a general search to look
for pages that mention your keywords anywhere on the page to make
sure you have found all the relevant pages.
Searching
by URLs
One way to
get around the limits of a title search is to change the OCCURENCES
selection in Google Advanced to the URL of a web page. The URL
is the web address.
It is the
web editor for every site that decides what the URL will be for
the sub-pages. And usually, for simplicity, the editor names the
pages with simple keywords that allow him or her to identify what
the page is about. It's similar to what you do on your computer
when you save a document. If you label a draft for an article
you are writing about nuclear safety nuclear.doc, it's easier
for you to find and retrieve it later.
For example,
the Washington Post has sub-sections that work like this:
www.washingtonpost.com
www.washingtonpost.com/world
www.washingtonpost.com/world/mideast
www.washingtonpost.com/world/mideast/editorials
A URL search
can be a quick way to find entire sections of a web site that
address your topic. If you wanted to know what the Guardian
newspaper in London had said about Kosovo, go to Google Advanced
and request this search: guardian kosovo
Similarly,
a hunt for news about AIDS in China from CNN can be accomplished
in Google Advanced with an URL search for: cnn aids china
Or if you
want to know what the groups Human
Rights Watch (at
www.hrw.org) has to say about Mexico, try this URL search
in Google Advanced: hrw mexico
Searches by
URL and title are not perfect. But they can help you weed out
irrelevant results when you do a general search for the words
anywhere on the page.
Julian
Sher, the creator and webmaster of Journalism Net (www.journalismnet.com),
does Internet training in newsrooms around the world. He can be
reached by email at jsher@journalism.com.
This article and many other columns from Media magazine are available
online with hot links on the JournalismNet Tips page at
www.journalismnet.com/tips