Sunday, June 25, 2006

notes in Fedora User Conference 2006

I took some notes in the Fedora user conference 2006. Disclaimer: I am not a Fedora user; and the conference doesn't have proceedings and presentation is not online -- so read with caution.

The conference is nice but somehow difficult to summarize because many presentations are project-oriented. So I will try to summarize them in several perspective: (a) core functionalities (b) interesting project (c) some observations and thinkings.

(a) core functionalities

RDF triple store: A very interesting part of Fedora is its usage of RDF and triple store, as I have always been interesting in the scalability of RDF triple store. Dean B. Krafft of Cornell gave some interesting numbers about NSDL 2.0 usage of Fedora. NSDL 2.0 handles some ~2M digital objects, with 70 RDF tuples for each object, overall Kowari triple store has 163 Million triples.

Content model: The ongoing development of Fedora Core includes "content model", such as structure/ontology for thesis, article, etc, and dynamic dissemination associating with "content model". The question is how to reach agreement on common ontology.

Dissemination/behaviour: This seems like a pretty active area, it's about given a digital object, how can you associate services with this object. DLF Aquifer project has an interesting concept of "AssetActions", which defines an XML schema of associating behaviors with object, initial experiment based on Fedora.

(b) projects

This is perhaps the most interesting part of the conference, there are several ambitious, national project using Fedora as core service, including NSDL, German's eSciDoc, Aussie's ARROW, and DART, and some works by Harris Group. These projects have a theme of building workflow system for scholarly communication in a large scale. Although information can be overwhelming in some cases, it's really a good headstart to take a further look at these initiatives.

(c) thoughts

  • RDF store or RDBMS?
an old question, but still relevant and interesting

  • What's essential functionalities of a repository?
Is it OKI, JSR170, Fedora core API ?

  • A nice graph of describing repository:
when moving left to right, the changes are more likely, repository is expected to be stable.

  • New scholarly communication system
I think we definitely need some models and researches in this area, such as this and this

Liveclipboard copy beyond microformats -- unapi 1.0 released

Dan Chudnov recently released unAPI version 1. Here I particularly want to compare unAPI with microformats in liveclipboard implementation.

The MS liveclipboard demo is based on microformats, essentially, the demo uses some smart javascripts to copy XHTML page between different web pages. In principle, the technology can be used to copy any XML fragment, or even binary files if you consider base64 encoding.

Microformats is well suited for this purpose because its XHTML fragments are well-formatted XML. However, there are many applications/domains are not covered by microformats: (a) not all contents can be described by XHTML, such as tremendous XML standard/contents in the web, or many non-HTML content in the web. (b) even a content can be described by XHTML, it may never reach the radar of, or never have enough incentives to make them standard.

So here I think unapi can fill a gap, because it allows copy of any content, far beyond the scope of, from initial implementation of unapi we saw mods, dc, json, pubmed, rdf, text, and we can expect more diversified formats in the future. All these formats may eventually become payload of liveclipboard, and the application is up to your imagination.

On the other side, the approach of unAPI is more complex than microformats. In microformats you can simply markup an XHTML page and it's all done. In unAPI one have to markup an XHTML page and implement a simple API. And the little tradeoff can be worthwhile if we want to take full advantage of liveclipboard.