Thursday, March 09, 2006

How well do search engines index the OA repositories?

Frank McCown, Xiaoming Liu, Michael L. Nelson, Mohammad
Zubair (2006) Search Engine Coverage of the OAI-PMH
Corpus, IEEE Internet Computing, March/April 2006.
http://library.lanl.gov/cgi-bin/getfile?LA-UR-05-9158.pdf

Abstract: The major search engines are competing to index as much
of the Web as possible. Having indexed much of the surface Web,
search engines are now using a variety of approaches to index the
deep Web. At the same time, institutional repositories and digital
libraries are adopting the Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH) to expose their holdings, some of
which are indexed by search engines and some of which are not. To
determine how much of the current OAI-PMH corpus search engines
index, we harvested nearly 10M records from 776 OAI-PMH repositories.
From these records we extracted 3.3M unique resource identifiers
and then conducted searches on samples from this collection. Of this
OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN
(7%). Twenty-one percent of the resources were not indexed by any
of the three search engines.

Wednesday, March 08, 2006

unapi and live Clipboard

This is a copy of email to gcs-pcs list, following Alf's copy/paste example, and live clipboard, it live demo, and screencast.

I am not sure how much I got live Clipboard, but looking at their demos really helps a lot, see screencam, so perhaps helpful to others too.

The last three examples are more advanced than microcontent copy, they are about copying RSS feeds URL, and in the pasted site these RSS feeds are dynamically loaded. Put it another way, the data is "live" in pasted site.

Well, I think this is associated with our discussion of unAPI here. It seems like we can use Clipboard to copy URI+unAPI baseURL, and, the data is also "live" in pasted site in two senses: (a) the pasted site can decide which formats to use (b) the pasted site can decide to use other unAPI baseURL.

I guess the question is about how to plug URI+unAPI into live clipboard, like the hcalendar or RSS feed. Not sure if there is going to be an API or have to read the source code, lots to learn here. But my initial impression is that they are compensative technologies, similar to the relationsahip between RSS and Clipboard.

Tuesday, March 07, 2006

Canary Database unapi compliant

canary

It also includes a self-export page:

canaryexport