Saturday, February 25, 2006

Make a case for URI microformat

The URI microformat, as suggested in unapi specification by Dan Chud, will be very interesting to library application. Actually it takes advantage of OpenURL and existing link resolver solution.

URI microformat defines a convention of plugging URI metadata in HTML page.

<span class=”unapi-uri” title=”info:pmid/12345678”>PMID 12345678</span>

If URI microformat is adopted and widely used, say in major online bookstore, or in faculty/researcher's homepage, now a microformat-aware application (be a greasemonkey script, or a web service) can grab the identifier and point to your local OpenURL resolver, you immediately get the copy from local library.

In this sense, the URI microformat is very similar to COINS, but it's much simpler and cleaner, anyone can understand and use it, and its aplication can be beyond traditional research library. e.g. in a public library, you can use amazon as catalog and immediately check if it's available in local collection.

Maybe the rosy picture is too opmistic, but I think this thing is something really valuable.

Friday, February 24, 2006

Thinking about identifier and unAPI

This topic is brought up again in gcs-pcs list, the question is whether to use unapi&id=xxx or unapi&uri=xxx. So many smart people have spent quite some time on the topic of identifier, I won't pretend I know what I am talking about ;-) but just some personal thought.

First I think whether an identifier is persistent: horizontal (time) or vertical (across different applications) is really an economic issue. There are really three categories: precious, normal, or free one.

Sometimes the "thing" is so precious, so people takes extra care, such as DOI, ISBN, handle, PURL, or info URI, another example is the w3c's tech reports always reside in same place. All these needs central control and special care ($$$) are taken.

In second category, we do care but it doesn't worth the extra effort, good examples are such as tag URI, Permanent Link of blog, or various unregistered URIs, people is using them everyday and it works.

The third category is most common URLs, we put it there just because it's resolvable at certain time. There is no guarantee that it will be ever be available tomorrow, and we all live well with this.

My point is that all these identifiers have good reason to exist, and which one to choose is essentially an economic model. And we cannot predict which one will fly and market will tell us.

Now come back to unAPI, I think URI is essentially important because it's cornerstone of the Web, the whole RDF thing is based on URI, we just cannot easily discard it. Second, perhaps weak argument is that using URI will make people think twice before putting something there, therefore help persistence and re-use.

The beauty of URI perhaps can be demonstrated by following example: in blog world people uses "Permanent Link", we can easily plug unapi to blogspace by doing:



This is really cool because we suddenly have access to rich metadata for all web information. Perhaps people will argue that "" is not permanent -- again, this is an economic issue, and it perhaps is more persistent than handle ;-)

The last thing is about what copy/paste means in unAPI. One camp said we are copying unapi/?uri, andother camp said that we are copying uri.

Although initially perhaps unapi/?uri is feasible, I think the final goal is to be able to copy/paste uri. I guess this is perhaps Dan's original vision: there are really two parts in unapi, a microformat to specify URI; and a mechanism of accessing them. I seriously think the first part is very important and independent.

Now about second part, it's really building a way of specifying possible services to an URI, and responses format of these services. This excites me a lot. We all know REST model, however REST model only specifies request, it doesn't say anything about
response. However, if unAPI is really successful, it actually adds another aspect to REST. So if I am not mistaken, I saw a huge potential of integrating library with the web.
unapi_link script to add unapi links

Inspired in #code4lib, I am getting interested in Greasemonkey, I started by reading "Dive into Greasemonkey" and studying Dan Chud's COINS-PMH code. Here is a little script of adding unapi links to unapi-compatible page, it adds links right next to unapi span.

To try it out:

  1. Install greasemonkey, restart firefox, come back here, etc
  2. Install unapi_link.user.js right-click (ctrl-click on mac)
  3. Visit a unapi compatible page, such as quaedam
Notice unapi apis links will appear, but I agree it's difficult to find the tiny link ;-), to make the point, I also put a screen caputure here.

Tuesday, February 21, 2006

Cornell web library We heard the new before, but recently William Arms have two publications about Cornell's work of web library -- transfer, storage, and access of whole Internet Archive Collection (tens of billions pages with 600+TB data, still counting ;-).

The project is ambitious and they only start doing real testing in January, so results are initial. Nevertheless this is very related to aDORe work for its immense scalability problem. It is very interesting to see their design choices and arguments, such as transfer rate of the data, pre-ingest, one big SQL server, and one big machine to do everything.

Sunday, February 19, 2006

switch esc to alt key for emacs in macOS.

I was trying to make emacs work as I am comfortable in linux. One issue is to use "alt" key intead of "esc" key for "meta" character.

This is done by "defaults write swap_alt_meta -boolean true" in shell script
replace ^M (\n\r) with \n in emacs

While I am copying firefox text to an emacs buffer, I sometimes get the annoying "^M". This seems solvable by : "M-x replace-string ^q^m RET ^q^j",