Wednesday, June 14, 2006

Stuff I like in JCDL 2006

some interesting readings in JCDL:

"Also By The Same Author: AKTiveAuthor, A Citation Graph Approach To Name Disambiguation. " Duncan M. McRae-Spencer, Nigel R. Shadbolt

This paper describes how to use author's self-citation (for good or bad, anyhow another topic) to implement name disambiguation. For example, if a paper has an author "X. Liu", and again I cite a paper with author "X.Liu", it's quite likely that the two "X.Liu" are same person. By taking advantage of this social context, the paper got starling precision/recall in name disambiguation.

"Bibliometric Impact Measures Leveraging Topic Analysis", Gideon Mann, David Mimno and Andrew McCallum.

Andrew McCallum wrote the popular bow/rainbow text classification package. This is the first time I saw his paper in JCDL. The paper is based on a new cluster method TNG, which can label clusters by phase, instead of individual words, e.g. "text classification" can be used to label a cluster, instead of "text" and "classification". This is extremely powerful in labeling cluster. After that the paper proposes several impact measures of topic, particularly life cycle of how a subject/topic emerges, develops, and influences other topics. Pretty interesting reading and solid work.

"Building a Research Library for the History of the Web", William Y. Arms, Selcuk Aya, Pavel Dmitriev, Blazej J. Kot, Ruth Mitchell, Lucia Walle

I blogged this work before, the project tries to mirror and mine whole Internet Archive. Anything dealing with that level of scalability is worth checking out.

"Metadata aggregation and automated digital libraries: A Retrospective on the NSDL experience", Carl Lagoze, Tim Cornwell, Naomi Dushay, Dean Ecktrom, Dean Krafft

Some remarkable lessons of OAI-PMH in a very distributed system. It also wins best paper award. Loosely distributed system is always a hard question, especially you are targeting transaction-level quality, no matter distributed search or harvesting.

"EcoPod: A Mobile Tool for Community Based Biodiversity Collection Building" YuanYuan Yu, Jeannie A. Stamberger, Aswath Manoharan, Andreas Paepcke

A PDA-based application for biology species observations, it is not complex or abstract, and it focuses on a simple task and solve it well. Maybe that's how the research should be done in many DL projects: do one thing and do it well.

"An architecture for the aggregation and analysis of scholarly usage data." Johan Bollen and Herbert Van de Sompel

It describes using OAI-PMH to harvest usage data, which is embedded in OpenURL ContextObject. There are also interesting result of mining these usage data. I think the choice of OAI-PMH/OpenURL are very much appropriate here.