MARC XML to METS

From ADR TechWiki

Jump to: navigation, search

Contents

Running MARCXML->METS Tool

The MetsBuilder is a Java class tool used by ADR to create METS metadata records for the digital repository. ADR compliant metadata files are ingested into and run through the MetsBuilder. During this process MARCXML metadata records are crosswalked to MODS and DC, resulting in a set of files with the correct METS metadata. Each METS record includes the original MARCXML datastream, as well as corresponding MODS and DC records.

This code is still in use, but has not been tested extensively. It runs fine, but may very well require code tweaks and of course updates to the source based on any changes to the METS profile.


What it does?

Given a METS template file and a MARCXML file, this tool will create corresponding METS files based on the METS wrapper given.

How does it do it?

The Java code is very simple. It takes a MARCXML file as input along with a METS template and creates the necessary output XML files that would be ready to be ingested into the ADR using the Batch Tool.

A MARCXML file has the form

<?xml version="1.0" encoding="UTF-8"?>
   <collection xmlns="http://www.loc.gov/MARC21/slim">
      <record>
       ...
      </record>
      <record>
       ...
      </record>
      <record>
       ...
      </record>
      ...
	</collection>

You can see a full working example in the file "Transition Communities\DPL\DPL METS Builder\WHGRECS1.norm.xml". The file must contain ONE collection node and ONE or MORE record nodes. If you export the file from a standard MARCXML builder, you should have no problems with getting output to look like this.

The basic algorithm runs as follows :

   for every record R
      read MARCXML
	  create MARCXML->DC 
	  create MARCXML->MODS 
	  create and save the METS wrapper given DC, MODS and PCO identifier

The MARC->MODS crosswalk has been tweaked to DUs needs and this exemplar lives in "Transition Communities\DPL\DPL METS Builder\MARC21slim2MODS-ADR.xsl". Similarly, "Transition Communities\DPL\DPL METS Builder\MARC21slim2OAIDC-ADR.xsl".

The code is rather easy to read, so you can find more specific information by looking at that.


Does this work?

Yes. We ran it in November. HOWEVER, there are a few things to note. The PCO is encoded as ZZRXXXXXXXX in the MARCXML but the FILE is XXXXXXXX.tif. Something will need to be adjusted to make this work more smoothly, as the sample objects we ingested in November were copied and renamed to the file name that was being expected. This can be tweaked by dropping the ZZR in the fileSec.

Something from

L112:   this.insertFileSec(metsDocument, objID + ".tif", "image/tiff");

to

	    this.insertFileSec(metsDocument, objID.replaceFirst("ZZR7", "") 
		                   + ".tif", "image/tiff");

This is currently not implemented because it introduces and inconsistency in the METS where the filename is not exactly the same as the OBJID, and before such a change is made, the decision should be agreed upon and verified with Chet for compatibility with the Batch Tool (which it should be).


How do I run it?

I have been running it through my Eclipse shell. You can certainly run it wherever you want, but Eclipse worked for me because it was where I was actually developing the code.

I moved the files I needed into the Eclipse visible workspace directory and went to town. If you pull the files from "Transition Communities\DPL\DPL METS Builder\" everything should be there.

You can run it directly from the shell using the play button. No arguments are required.


What else should I know?

You can watch the progress of the application by watching the files show up as they are processed. It runs pretty fast, so pay attention! :)

Where's the code?

svn://bes.coalliance.org:3690/adrdev/experimental/java/MetsBuilder/Eclipse_Prototype

--Keith Maull

--Revised 09/07/30 05:42

Personal tools