Thursday, March 09, 2006

How well do search engines index the OA repositories?

Frank McCown, Xiaoming Liu, Michael L. Nelson, Mohammad
Zubair (2006) Search Engine Coverage of the OAI-PMH
Corpus, IEEE Internet Computing, March/April 2006.

Abstract: The major search engines are competing to index as much
of the Web as possible. Having indexed much of the surface Web,
search engines are now using a variety of approaches to index the
deep Web. At the same time, institutional repositories and digital
libraries are adopting the Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH) to expose their holdings, some of
which are indexed by search engines and some of which are not. To
determine how much of the current OAI-PMH corpus search engines
index, we harvested nearly 10M records from 776 OAI-PMH repositories.
From these records we extracted 3.3M unique resource identifiers
and then conducted searches on samples from this collection. Of this
OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN
(7%). Twenty-one percent of the resources were not indexed by any
of the three search engines.


