Tuesday, November 07, 2006

convert between 13 digits and 10 digits xisbn

self note.
(a) from 10 digit to 13 digits

if 10 digits takes the format ABCDEFGHIJ, 13 digits is going to be: 978ABCDEFGHI(?), and last digit (checksum) can be calcualted by:

(10-((9*1)+(7*3)+(8*1)+(A*3)+(B*1)+(C*3)+(D*1)...+(I*3)) modulo 10) modulo 10

(b) from 13 digits to 10 digits
if the first three digits is not "978", it cannot be converted to 10 digits.

if the first three digits is "978", takes remaining 9 digits: ABCDEFGHI, and calcuate checksum for 10 digits ISBN:
(11-((A*10)+(B*9)+ .... +(I*2) modulo 11)) modulo 11

10 will be replaced by X

reference:
[1] http://wiki.tcl.tk/12638
[2] http://www.isbn.org/standards/home/isbn/transition.asp

Sunday, October 22, 2006

googlemap api and craigslist

It's the right season to find an apartment again ;-) this time I am mainly using craigslist to look for apartment. Fortunately, most rentals have a google map link, but after a while, it becomes really difficult to keep track an interest list.

For a while I want to try the goolgemap API and this is the right time to do so, it turns out the API is really handy. I need to register a googlemap API account, after I just need follow some javascript code examples in googlemap homepage, in less than 2 hours I can conveniently compare interesting apartments through the nice google map interface.

I still need manually populate craigslist information into my small application -- it would be great if craigslist have a built-in function to create a "my bookshelf" kind of service.


googlemap

Sunday, October 15, 2006

oaiarc new release

oaiarc-1.0 is just released, this release includes following changes:

--internationalization support (with English, Spansih, French built-in)
--strict XHTML compliant
--UTF-8 support
--remove oai 1.x support
--various bug fixes

Juan Corrales from Universidad Rey Juan Carlos de Madrid contributes
most improvements in this release.

Friday, October 13, 2006

pathways paper

The last two years pathways research result are published:

An Interoperable Fabric for Scholarly Value Chains - D-Lib, October 2006 (link will be active early next week)

Pathways: Augmenting interoperability across scholarly repositories - International Journal of Digital Libraries

dump a web page and its links by wget

self-note:

Every few weeks I need to read wget manpage to find parameters to cache a single webpage and all its links, so perhaps better write it down here to save 5 minutes:

wget -r -H -l1 -k -P $targetdir --exclude-domains ${comma-seperated domain name} --user=xxx --password=xxx $url

Wednesday, September 06, 2006

Poincaré conjecture and Shing-Tung Yau

it was sad to read recent New Yorker story of Poincaré conjecture and Shing-Tung Yau. Yau won Fields award and has greatly contributed to mathmatics study, his opinion (in general) is very well respected and cited in China.

I believe this article must have exaggerated some information, as evidenced by clarifaction from some mathematicians recently, still, it is really difficult to know when to stop, even as brilliant as Yau. So this is really a sad and disappointed story.

Sunday, June 25, 2006

notes in Fedora User Conference 2006

I took some notes in the Fedora user conference 2006. Disclaimer: I am not a Fedora user; and the conference doesn't have proceedings and presentation is not online -- so read with caution.

The conference is nice but somehow difficult to summarize because many presentations are project-oriented. So I will try to summarize them in several perspective: (a) core functionalities (b) interesting project (c) some observations and thinkings.

(a) core functionalities

RDF triple store: A very interesting part of Fedora is its usage of RDF and triple store, as I have always been interesting in the scalability of RDF triple store. Dean B. Krafft of Cornell gave some interesting numbers about NSDL 2.0 usage of Fedora. NSDL 2.0 handles some ~2M digital objects, with 70 RDF tuples for each object, overall Kowari triple store has 163 Million triples.

Content model: The ongoing development of Fedora Core includes "content model", such as structure/ontology for thesis, article, etc, and dynamic dissemination associating with "content model". The question is how to reach agreement on common ontology.

Dissemination/behaviour: This seems like a pretty active area, it's about given a digital object, how can you associate services with this object. DLF Aquifer project has an interesting concept of "AssetActions", which defines an XML schema of associating behaviors with object, initial experiment based on Fedora.


(b) projects

This is perhaps the most interesting part of the conference, there are several ambitious, national project using Fedora as core service, including NSDL, German's eSciDoc, Aussie's ARROW, and DART, and some works by Harris Group. These projects have a theme of building workflow system for scholarly communication in a large scale. Although information can be overwhelming in some cases, it's really a good headstart to take a further look at these initiatives.

(c) thoughts

  • RDF store or RDBMS?
an old question, but still relevant and interesting

  • What's essential functionalities of a repository?
Is it OKI, JSR170, Fedora core API ?

  • A nice graph of describing repository:
Repository->services->applications->interfaces
---------------------------------------
when moving left to right, the changes are more likely, repository is expected to be stable.

  • New scholarly communication system
I think we definitely need some models and researches in this area, such as this and this

Liveclipboard copy beyond microformats -- unapi 1.0 released

Dan Chudnov recently released unAPI version 1. Here I particularly want to compare unAPI with microformats in liveclipboard implementation.

The MS liveclipboard demo is based on microformats, essentially, the demo uses some smart javascripts to copy XHTML page between different web pages. In principle, the technology can be used to copy any XML fragment, or even binary files if you consider base64 encoding.

Microformats is well suited for this purpose because its XHTML fragments are well-formatted XML. However, there are many applications/domains are not covered by microformats: (a) not all contents can be described by XHTML, such as tremendous XML standard/contents in the web, or many non-HTML content in the web. (b) even a content can be described by XHTML, it may never reach the radar of microformats.org, or never have enough incentives to make them standard.

So here I think unapi can fill a gap, because it allows copy of any content, far beyond the scope of microformats.org, from initial implementation of unapi we saw mods, dc, json, pubmed, rdf, text, and we can expect more diversified formats in the future. All these formats may eventually become payload of liveclipboard, and the application is up to your imagination.

On the other side, the approach of unAPI is more complex than microformats. In microformats you can simply markup an XHTML page and it's all done. In unAPI one have to markup an XHTML page and implement a simple API. And the little tradeoff can be worthwhile if we want to take full advantage of liveclipboard.

Wednesday, June 14, 2006

Stuff I like in JCDL 2006

some interesting readings in JCDL:

"Also By The Same Author: AKTiveAuthor, A Citation Graph Approach To Name Disambiguation. " Duncan M. McRae-Spencer, Nigel R. Shadbolt

This paper describes how to use author's self-citation (for good or bad, anyhow another topic) to implement name disambiguation. For example, if a paper has an author "X. Liu", and again I cite a paper with author "X.Liu", it's quite likely that the two "X.Liu" are same person. By taking advantage of this social context, the paper got starling precision/recall in name disambiguation.


"Bibliometric Impact Measures Leveraging Topic Analysis", Gideon Mann, David Mimno and Andrew McCallum.

Andrew McCallum wrote the popular bow/rainbow text classification package. This is the first time I saw his paper in JCDL. The paper is based on a new cluster method TNG, which can label clusters by phase, instead of individual words, e.g. "text classification" can be used to label a cluster, instead of "text" and "classification". This is extremely powerful in labeling cluster. After that the paper proposes several impact measures of topic, particularly life cycle of how a subject/topic emerges, develops, and influences other topics. Pretty interesting reading and solid work.


"Building a Research Library for the History of the Web", William Y. Arms, Selcuk Aya, Pavel Dmitriev, Blazej J. Kot, Ruth Mitchell, Lucia Walle

I blogged this work before, the project tries to mirror and mine whole Internet Archive. Anything dealing with that level of scalability is worth checking out.


"Metadata aggregation and automated digital libraries: A Retrospective on the NSDL experience", Carl Lagoze, Tim Cornwell, Naomi Dushay, Dean Ecktrom, Dean Krafft

Some remarkable lessons of OAI-PMH in a very distributed system. It also wins best paper award. Loosely distributed system is always a hard question, especially you are targeting transaction-level quality, no matter distributed search or harvesting.

"EcoPod: A Mobile Tool for Community Based Biodiversity Collection Building" YuanYuan Yu, Jeannie A. Stamberger, Aswath Manoharan, Andreas Paepcke

A PDA-based application for biology species observations, it is not complex or abstract, and it focuses on a simple task and solve it well. Maybe that's how the research should be done in many DL projects: do one thing and do it well.


"An architecture for the aggregation and analysis of scholarly usage data." Johan Bollen and Herbert Van de Sompel

It describes using OAI-PMH to harvest usage data, which is embedded in OpenURL ContextObject. There are also interesting result of mining these usage data. I think the choice of OAI-PMH/OpenURL are very much appropriate here.

Sunday, May 21, 2006

unapi scripts, and a wiki page

I updated several thing regrading unapi:

-- fix a bug in unapi userscript

-- update a proxy demonstration to unapi revision 3.

-- I started a wiki page in http://unapi.stikipad.com/, although a personal opinion, I think it is a good idea to document commonly used format in unapi, largely motivated by a similar microformats list. Feel free to update the wiki page if you have comments/ideas.

Friday, May 19, 2006

unapi greasymonkey scripts updates to revision 3, the URLs are same so you may follow previous instructions at: unapilink,
citeseer and pubmed, and amazon.

Thursday, May 18, 2006

DIDLTools package online


The DIDL Tools is just released http://african.lanl.gov/aDORe/projects/DIDLTools/, as posted in release page:

"The aDORe DIDLTools is a Java toolkit for the construction, validation, serialization and de-serialization of the MPEG-21 DID data model. DID, the MPEG-21 Digital Item Declaration, provides an abstract model for the representation of digital items, whereas DIDL, the MPEG-21 Digital Item Declaration Language specifies how to serialize the model in XML. The API provided by the aDORe DIDLTools allows for the construction of customized DIDL XML documents, as well as provides flexible and extensible serialization methods."

There was some good thinking of allowing any metadata in the model, have a look and ask questions in http://lists.lib.ugent.be/mailman/listinfo/didwriter-dev

Tuesday, May 16, 2006

slides of AISTI Mini-Conference

The AISTI Mini-Conference has its slides online, I was in the conference and there were several pretty good presentations.

Douglas Fils made a presentation about Chronos project, titled Services Architecture and Semantic Practices Using the CHRONOS Cyber-Infrastructure Effort, the project combines some cool web2.0 technology to build Chronos environment.

Alex Szalay presented Science in an Exponential World, which overlaps library world with data curating. I pretty much like his remarks that data must be close to analysis tool and we need journal for data. His work is also assocated with microsoft 2020 science initiative.

Wednesday, May 10, 2006

arxiv and trackback

I recently listened Paul Ginsparg's interview by CNI, it touches history and ongoing development of arXiv.

This is the first time that I heard arXiv has trackback feature, see http://arxiv.org/help/trackback, as always, arXiv does everything differently (no kidding, I am a regular arxiv user), and seems there was a hot debate about its policy a while ago in physics blogsphere.

I then quickly checked hubmed and WPopac, two blog-style library service, it's no suprise that both support trackback.

However, arXiv is unusual, because its interface/functions is stable for quite some years, it shows what's quality and good design. Now it has a trackback, the fact shows quite nicely how new techologies can be integrated with established library service.

Saturday, May 06, 2006

WebCite

Just read WebCite from Lorcan Dempsey's blog.

WebCite is an archiving system for webreferences (cited webpages and websites). It works in a way of tinyurl, by submitting a URL you get a new URL back, e.g. I submit "http://www.yahoo.com" on May 6th, 2006, webcite gives me back a new URL http://www.webcitation.org/5Fgu0Xf5x, and yahoo page on May 6th, 2006 is archived in http://www.webcitation.org/5Fgu0Xf5x.

So this is actually a combination of archive.org and tinyurl.com, and may look extremely simple. However, as I worked on persistent URL and web archiving issues previously, I can testify simple is not easy.

Persistent URLs issues have been studied by Steve Lawrence (citeseer, now google) in (Persistence of Web References in Scientific Research. IEEE Computer, 34(2):26--31, 2001), Thomas A. Phelps and Robert Wilensky in (Robust Hyperlinks and Locations ), Frank McCown and Michael L. Nelson's "The Availability and Persistence of Web References in D-Lib Magazine".

In particular Thomas Phelps has proposed that most web pages can be uniquely identified by 5 keywords, so you should referene the web page by the five keywords, and issue the 5 keywords to web search engine to find the web page (in case it's relocated).

Web archiving is best addressed archive.org for now, in a rather typical web crawling way, and still an active research area. And now we have a on-demand web archiving system such as webcitation.org.

I don't know how webcitation.org scales, or how it protects the data, and it certainly doesn't solve all web archiving issues (we may only realize a web page should be preserved after the right time is long gone), but still, it's really simple, neat, and targetted two problems (persistent URL and web archiving) so nicely.

Monday, May 01, 2006

learn tricks in java jar file

I used java jar files for some years but didn't realize several interesting mechanisms inside of jar file. Now I have a chance of reading jar specification and found several very handy features:

-- specify class-path in MANIFEST.MF

-- create index files (INDEX.LIST) for all packages.

-- META-INF/services directory to associate an implementation with its definition (e.g. class to interface)

These all come handy for easy distribution of a package

Wednesday, March 22, 2006

Programmable Web and protocol

Some common web API design patterns are discussed by Raymond and Leigh. Following my previous blog, I am thinking there are different requirements in designing service and protocol. When writing a service, one has control of all resources and has the luxury of making everything clean and right, including all kinds of error processing, so no ambiguity here. While working on a protocol, perhaps it's ok to get minimal correct because one don't have control of all resources.

So perhaps this may help explain the difference between programmable web and Atom publishing protocol, and why Atom dropped detailed status code between 0.2
and 0.4
Producer and Consumer, the economy of unAPI

While I am riding the bus to Los Alamos, I ponder a problem which annoys me for a while in unAPI design and was debating still now: what's simple, and not too simple in unAPI?

I think I got something and eager to write down, again this is an economy issue. There are two parties in most web applications: producers and consumers; when producers are potentially quite outnumber consumer, you want to make producer as simple as possible, and leave most complexity to consumer, because consumer benefits more ($$$) from this cycle and therefore is willing to afford the complexity.

Maybe this is too abstract, but it may make sense if we think about several examples. (a) In the web, there are so many nasty HTML pages, but we are all happily living with it, because the consumer (browser, search engine, etc) takes extra efforts of making sense of these pages. (b) In the RSS/Atom world, there are several standards and many more nasty pages but news aggregators happily support them all.

So maybe I have the answer now. In unAPI we want to make the producer as simple as possible, we don't want to burden them for heavy error processing or schema validation -- so we can claim they are wrong here and there, no! it must be extremely easier and nasty errors are tolerated (such as HTML or RSS), content is king here; it's up to the consumer to make order from chaos.

Sunday, March 12, 2006

unAPI proxy: power of URI

There has been a lot of debates the nature of "identifier" in unAPI. To demonstrate the power of URI, I put together an unAPI proxy http://dp9.cs.odu.edu/unapi/js/uproxy.html, which modified Alf's copy/paste example to demonstrate that URI indeed helps. In this demo, the javascript extracts responses from both Opa and Hubmed: multiple views of same resource. It's possible because both servers support info:pmid URI.

proxy

Thursday, March 09, 2006

How well do search engines index the OA repositories?

Frank McCown, Xiaoming Liu, Michael L. Nelson, Mohammad
Zubair (2006) Search Engine Coverage of the OAI-PMH
Corpus, IEEE Internet Computing, March/April 2006.
http://library.lanl.gov/cgi-bin/getfile?LA-UR-05-9158.pdf

Abstract: The major search engines are competing to index as much
of the Web as possible. Having indexed much of the surface Web,
search engines are now using a variety of approaches to index the
deep Web. At the same time, institutional repositories and digital
libraries are adopting the Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH) to expose their holdings, some of
which are indexed by search engines and some of which are not. To
determine how much of the current OAI-PMH corpus search engines
index, we harvested nearly 10M records from 776 OAI-PMH repositories.
From these records we extracted 3.3M unique resource identifiers
and then conducted searches on samples from this collection. Of this
OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN
(7%). Twenty-one percent of the resources were not indexed by any
of the three search engines.

Wednesday, March 08, 2006

unapi and live Clipboard

This is a copy of email to gcs-pcs list, following Alf's copy/paste example, and live clipboard, it live demo, and screencast.

I am not sure how much I got live Clipboard, but looking at their demos really helps a lot, see screencam, so perhaps helpful to others too.

The last three examples are more advanced than microcontent copy, they are about copying RSS feeds URL, and in the pasted site these RSS feeds are dynamically loaded. Put it another way, the data is "live" in pasted site.

Well, I think this is associated with our discussion of unAPI here. It seems like we can use Clipboard to copy URI+unAPI baseURL, and, the data is also "live" in pasted site in two senses: (a) the pasted site can decide which formats to use (b) the pasted site can decide to use other unAPI baseURL.

I guess the question is about how to plug URI+unAPI into live clipboard, like the hcalendar or RSS feed. Not sure if there is going to be an API or have to read the source code, lots to learn here. But my initial impression is that they are compensative technologies, similar to the relationsahip between RSS and Clipboard.

Tuesday, March 07, 2006

Canary Database unapi compliant

canary

It also includes a self-export page:

canaryexport

Saturday, March 04, 2006

Technosophia unapi support

Technosophia by Michael J. Giarlo is also unapi compliant, see his post, and a screenshot below:
technosophia

Friday, March 03, 2006

keep list of unapi server (and lanudry list ;-)

Any serious validation should use Ed Summers's excellent validator , although I may still update information in this page (2006-03-08)

while playing with unapi_link script, I notice some unapi site are not 100% unapi compliant, and actually reveal some interesting questions about unapi itself. Before edsu has a test tool running, I need keep a record and to repeat Alan Kent's philosphy "Nothing like airing dirty laundry to get people to clean up their act! :-)". I will try to keep the list up-to-date. A common pitfall list is also created below.

Notice I am doing manual testing, so I apologize for any fault at my side.


1) quaedam
pass

2)http://rsinger.library.gatech.edu/unapi/sru.php
a) a trailing "?" in unapi link. (notice the spec said no trailing ?)
b) request to a dissemination doesn't return right value

3) http://staff.washington.edu/leftwing
pass

4) wikid
client-side xslt render is required. not sure how to handle this, because we cannot expect all client has an xslt rendering ability, I would like to say unapi server must explicitly support HTML page.

4) opa
4.1) doesn't return json in formats request.
4.2) no site formats list, I think unapi should clarify relationship between site and individual formats list. The wording of OAI-PMH is more precise: "If this argument (identifier) is omitted, then the response includes all metadata formats supported by this repository." In this case I think an empty list is correct. In particular, I think unapi can be cleaner by following how to deal with ListMetadataFormats in OAI-PMH.

5) unapi_link
need handle relative path in unapi link.

6)canarydatabase
pass
Notice is also has a page explicitly exposing unAPI links.
canarydatabase export

7) evergreen
it doesn't work with unapi_link, still looking for the reason.

Common pitfalls

1) use href instead of xhref in UNAPI link
2) The trailing "?" is explicitly forbidden in unAPI spec.

Please leave a comment or send me an email if any question

Thursday, March 02, 2006

UTF-8 character ”RIGHT DOUBLE QUOTATION MARK, or ” or ” or UTF-8 e2809d.

This takes me a while staring at the screen and figure out the problem.

In one web page I am testing, the RIGHT DOUBLE QUOTATION MARK (unicode 0x201d); character is used to quote attributes, instead of plain quote (unicode 0x22). the problem is that these character looks terribly similar, and usually in debug tool you don't realize the differences.

Anyhow it takes me hour to figure out the problem, so perhaps better record the steps here.


1) use "od -ax" find utf-8 value of this character e2809d.


2) get the unicode 0x201d by following formula in [http://en.wikipedia.org/wiki/UTF-8] to get its unicode value.


3) search the unicode either in "Character Map" of redhat, or search it in the web [http://www.fileformat.info/search/search.htm]

Tuesday, February 28, 2006

pubmed and citeseer also unapi-enabled by opa

It's hard to catch Dan's speed of adding more unapi-enabled sites [http://onebiglibrary.net/project/opa/opa-0.2-release-with-json-wrapper]. I am talking about all these OAI-PMH repositories -- perhaps I can turn nearly obsolete DP9 to unapi-compliant, which will be easier than writing several hundreds greasymonkey script.

Anyhow, I have added citeseer and pubmed, these are very similar to previous amazon script. Citeseer has some trick with their identifiers (e.g. http://citeseer.ist.psu.edu/brin98anatomy.html and http://citeseer.ist.psu.edu/285516 refers same article), but it seems to be working now.

See a screenshot for pubmed below:


and citeseer



To try it out for citeseer:

1. Install greasemonkey, restart firefox, come back here, etc
2. Install citeseer_opa.user.js
3. Install unapi_link.user.js
4. Visit a citeseer page, such as
http://citeseer.ist.psu.edu/285516

Notice the sequence of operation is important.



To try it out for pubmed:

1. Install greasemonkey, restart firefox, come back here, etc
2. Install pubmed_opa.user.js
3. Install unapi_link.user.js
4. Visit a pubmed page, such as
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12345678&query_hl=1&itool=pubmed_docsum

Notice the sequence of operation is important.

Sunday, February 26, 2006

unapi-enabled amazon.com with OPA proxy.

I put together an unapi-enabled amazon page, it uses Dan's OPA-0.1 server as proxy [http://onebiglibrary.net/story/opa-release-0.1].

Again I am following Dan's previous Amazonmixer script [http://curtis.med.yale.edu/dchud/log/project/rogue/amazon-coinspmh-enabled,

See a screenshot below:


To try it out:

1. Install greasemonkey, restart firefox, come back here, etc
2. Install amazon_opa.user.js
3. Install unapi_link.user.js
4. Visit an amazon book page, such as http://www.amazon.com/gp/product/0131103628/104-2542103-6763135

Notice the sequence of operation is important.

so I did nothing new but to demonstrate the system can work ;-) There is a small problem with list of metadata formats supported at a site, since OPA 0.1 returns an empty list for the site. In this version I use listformats response from first available URI. I think we still need a little clarification between listformats for a site and an identifier.

Saturday, February 25, 2006

Make a case for URI microformat

The URI microformat, as suggested in unapi specification by Dan Chud, will be very interesting to library application. Actually it takes advantage of OpenURL and existing link resolver solution.

URI microformat defines a convention of plugging URI metadata in HTML page.

<span class=”unapi-uri” title=”info:pmid/12345678”>PMID 12345678</span>


If URI microformat is adopted and widely used, say in major online bookstore, or in faculty/researcher's homepage, now a microformat-aware application (be a greasemonkey script, or a web service) can grab the identifier and point to your local OpenURL resolver, you immediately get the copy from local library.

In this sense, the URI microformat is very similar to COINS, but it's much simpler and cleaner, anyone can understand and use it, and its aplication can be beyond traditional research library. e.g. in a public library, you can use amazon as catalog and immediately check if it's available in local collection.

Maybe the rosy picture is too opmistic, but I think this thing is something really valuable.

Friday, February 24, 2006

Thinking about identifier and unAPI

This topic is brought up again in gcs-pcs list, the question is whether to use unapi&id=xxx or unapi&uri=xxx. So many smart people have spent quite some time on the topic of identifier, I won't pretend I know what I am talking about ;-) but just some personal thought.

First I think whether an identifier is persistent: horizontal (time) or vertical (across different applications) is really an economic issue. There are really three categories: precious, normal, or free one.

Sometimes the "thing" is so precious, so people takes extra care, such as DOI, ISBN, handle, PURL, or info URI, another example is the w3c's tech reports always reside in same place. All these needs central control and special care ($$$) are taken.

In second category, we do care but it doesn't worth the extra effort, good examples are such as tag URI, Permanent Link of blog, or various unregistered URIs, people is using them everyday and it works.

The third category is most common URLs, we put it there just because it's resolvable at certain time. There is no guarantee that it will be ever be available tomorrow, and we all live well with this.

My point is that all these identifiers have good reason to exist, and which one to choose is essentially an economic model. And we cannot predict which one will fly and market will tell us.

Now come back to unAPI, I think URI is essentially important because it's cornerstone of the Web, the whole RDF thing is based on URI, we just cannot easily discard it. Second, perhaps weak argument is that using URI will make people think twice before putting something there, therefore help persistence and re-use.

The beauty of URI perhaps can be demonstrated by following example: in blog world people uses "Permanent Link", we can easily plug unapi to blogspace by doing:

unapi?uri=http://www.inkdroid.org/journal/2006/02/24/hit-sh/&format=dc

unapi?uri=http://www.inkdroid.org/journal/2006/02/24/hit-sh/&format=html

This is really cool because we suddenly have access to rich metadata for all web information. Perhaps people will argue that "http://www.inkdroid.org/journal/2006/02/24/hit-sh/" is not permanent -- again, this is an economic issue, and it perhaps is more persistent than handle ;-)

The last thing is about what copy/paste means in unAPI. One camp said we are copying unapi/?uri, andother camp said that we are copying uri.

Although initially perhaps unapi/?uri is feasible, I think the final goal is to be able to copy/paste uri. I guess this is perhaps Dan's original vision: there are really two parts in unapi, a microformat to specify URI; and a mechanism of accessing them. I seriously think the first part is very important and independent.


Now about second part, it's really building a way of specifying possible services to an URI, and responses format of these services. This excites me a lot. We all know REST model, however REST model only specifies request, it doesn't say anything about
response. However, if unAPI is really successful, it actually adds another aspect to REST. So if I am not mistaken, I saw a huge potential of integrating library with the web.
unapi_link script to add unapi links

Inspired in #code4lib, I am getting interested in Greasemonkey, I started by reading "Dive into Greasemonkey" and studying Dan Chud's COINS-PMH code. Here is a little script of adding unapi links to unapi-compatible page, it adds links right next to unapi span.

To try it out:

  1. Install greasemonkey, restart firefox, come back here, etc
  2. Install unapi_link.user.js right-click (ctrl-click on mac)
  3. Visit a unapi compatible page, such as quaedam
Notice unapi apis links will appear, but I agree it's difficult to find the tiny link ;-), to make the point, I also put a screen caputure here.
quaedam

Tuesday, February 21, 2006

Cornell web library We heard the new before, but recently William Arms have two publications about Cornell's work of web library -- transfer, storage, and access of whole Internet Archive Collection (tens of billions pages with 600+TB data, still counting ;-).

The project is ambitious and they only start doing real testing in January, so results are initial. Nevertheless this is very related to aDORe work for its immense scalability problem. It is very interesting to see their design choices and arguments, such as transfer rate of the data, pre-ingest, one big SQL server, and one big machine to do everything.

http://www.dlib.org/dlib/february06/arms/02arms.html

http://www.infosci.cornell.edu/SIN/WebLib/papers/Arms2006a.doc

Sunday, February 19, 2006

switch esc to alt key for emacs in macOS.

I was trying to make emacs work as I am comfortable in linux. One issue is to use "alt" key intead of "esc" key for "meta" character.

This is done by "defaults write com.apple.x11 swap_alt_meta -boolean true" in shell script
replace ^M (\n\r) with \n in emacs

While I am copying firefox text to an emacs buffer, I sometimes get the annoying "^M". This seems solvable by : "M-x replace-string ^q^m RET ^q^j",