How well do search engines index the OA repositories?
Frank McCown, Xiaoming Liu, Michael L. Nelson, Mohammad
Zubair (2006) Search Engine Coverage of the OAI-PMH
Corpus, IEEE Internet Computing, March/April 2006.
http://library.lanl.gov/cgi-bin/getfile?LA-UR-05-9158.pdf
Abstract: The major search engines are competing to index as much
of the Web as possible. Having indexed much of the surface Web,
search engines are now using a variety of approaches to index the
deep Web. At the same time, institutional repositories and digital
libraries are adopting the Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH) to expose their holdings, some of
which are indexed by search engines and some of which are not. To
determine how much of the current OAI-PMH corpus search engines
index, we harvested nearly 10M records from 776 OAI-PMH repositories.
From these records we extracted 3.3M unique resource identifiers
and then conducted searches on samples from this collection. Of this
OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN
(7%). Twenty-one percent of the resources were not indexed by any
of the three search engines.
Frank McCown, Xiaoming Liu, Michael L. Nelson, Mohammad
Zubair (2006) Search Engine Coverage of the OAI-PMH
Corpus, IEEE Internet Computing, March/April 2006.
http://library.lanl.gov/cgi-bin/getfile?LA-UR-05-9158.pdf
Abstract: The major search engines are competing to index as much
of the Web as possible. Having indexed much of the surface Web,
search engines are now using a variety of approaches to index the
deep Web. At the same time, institutional repositories and digital
libraries are adopting the Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH) to expose their holdings, some of
which are indexed by search engines and some of which are not. To
determine how much of the current OAI-PMH corpus search engines
index, we harvested nearly 10M records from 776 OAI-PMH repositories.
From these records we extracted 3.3M unique resource identifiers
and then conducted searches on samples from this collection. Of this
OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN
(7%). Twenty-one percent of the resources were not indexed by any
of the three search engines.
0 Comments:
Post a Comment
<< Home