Tuesday, February 21, 2006

Cornell web library We heard the new before, but recently William Arms have two publications about Cornell's work of web library -- transfer, storage, and access of whole Internet Archive Collection (tens of billions pages with 600+TB data, still counting ;-).

The project is ambitious and they only start doing real testing in January, so results are initial. Nevertheless this is very related to aDORe work for its immense scalability problem. It is very interesting to see their design choices and arguments, such as transfer rate of the data, pre-ingest, one big SQL server, and one big machine to do everything.




Post a Comment

Links to this post:

Create a Link

<< Home