Archive for the ‘Batch Ingest’ Category

Off the shelf and into the [digital] fray?

Tuesday, September 22nd, 2009

(On reflecting on the title of this entry, and whether “fray” was appropriate, perhaps I should also qualify the term as a “friendly” skirmish, contest, or quarrel…)

Ed, George, and I spent the morning in a meeting discussing Shelf2Life, a project encompassing the digitization of public domain texts, distribution of e-versions of the texts to various online vendors, and the ability to print these works on demand at the point-of-sale, at BCR. Several of our members are participating in the initiative, while others are generally interested.

We’ve begun to explore whether there is a consortial interest, as well as how to offer long-term digital archiving of the e-texts, if our members desire. Local access to and distribution of the e-texts is tied up in contracts and business models, but at the very least, we should be able to provide tiers of secure preservation services for the e-texts, and auto-loading workflows not unlike our developing OA ProQuest ETDs auto-load service. There are a lot of stakeholders involved in the Shelf2Life project, not to mention a heck of a lot of ISBNs – for the e-text, for the hard cover, for the soft cover, for the online edition, etc. There are also multiple sources of metadata, multiple formats, and many questions still to be answered regarding server and system security from the prospective of a profit-oriented vendor. The meeting – and attendees – were upbeat and positive, while we hammered out what we all thought we needed to know more about…

There is some homework, certainly future meetings, and some continued testing…but, all in all, it appears to be an interesting project. Now, I just have to lobby for the members’ royalties from the sale of these works to be dedicated to the on-going storage and preservation costs of these texts…Another “fray” to be sure!

MARCOut to MARCIn(gest)

Thursday, September 20th, 2007

We’ve been working with DPL to batch ingest more than 120,000 archival digital images (tiffs) from Western History and Geaneology (WHG) into the ADR, along with the associated MARC files extracted from DPL’s CARL ILS…

Keith has written a batch ingest utility that can transform raw MARC to MARC XML and then generate “sidecars” of cross-walked metadata (MODS, DC, etc.) for ingest along with the tiff into Fedora, and then indexing in Fez. The utility also reports orphaned records and files along with any malformed data. Up next is to broaden to other schemas and formats…

Challenges to date:

  • How Fez handles displaying and editing multiple repeating fields that do not include attributes (i.e. <title> vs <title type=”alternative”>
  • Estimating real-time processing speeds (i.e. How long, per image, does it take to get from CD at DPL to published object in ADR)
  • How to handle the lack of mapping of the local call number field (099) to MODS in the LC crosswalk

We’ve got over 1500 objects ingested, indexed, and access controlled at the moment in our production environment…a little over 1%…but it’s a start!