Skip to Main Content

Yale Archival Management Systems Committee: Excel to EAD

What's this all about?

Why Excel (XML)?

  • We can retain formatting this way (and also retain inline EAD tags, like persname, corpname, and the like, using different font colors).
  • We can make a hierarchy as deep or as shallow as required
  • We can convert any EAD container list into a table for easy editing, sorting, etc.

Why EAD?

  • That way, this process isn't specific to ArchivesSpace, AtoM, etc. 
  • We can do other transformations outside of ArchivesSpace, like auto-number folders, sort aggregates by title (ignoring initial articles), etc.  That said, we're looking toward ways to be able to do these same sorts of transformations directly in ArchivesSpace with the API!

Excel to EAD and EAD to ArchivesSpace

  1. Open up a copy of the Yale EAD template Excel file.  If you don't have a copy, you can download one here: https://github.com/fordmadox/Excel-to-DSC/raw/master/excel-to-dsc_Yale-Template.xlsx
  2. Access the "Hide Guide" here - which explains the format of the template.
  3. Unhide any columns that you might want to use that are pre-hidden (aside from those columns that the "Hide Guide" specifically states should not be used at this stage, such as the instance type column, barcode column, etc.).
  4. Rename your file with collection-level information that's meaningful to you (e.g. "beinecke.astronomy.xslx")
  5. Fill out the Excel spreadsheet with your archival description (refer to the "Hide Guide" and/or colleagues for advice).
  6. When you're done, save the XSLX file as a XML Spreadsheet 2003 (*.xml) file. NOTE: if on a Mac or using a more recent version of Excel, this might also be named Excel 2004 XML Spreadsheet (*.xml)
  7. Open this new XML file with oXygen XML Editor. If you don't have a copy of the aspace-plus-excel-at-yale-2021-03-30.xpr oXygen project, you can download a copy here: https://drive.google.com/file/d/19Ncbx58_atwDcRV-kaH2l1GXrcOis8VL/view?usp=sharing  (as of 2021/03/30, please use this new oXygen project file)
  8. With the aspace-to-yfad-plus-excel.xpr project file open in oXygen, convert your XML file to EAD by using transformation scenario 3 Excel-to-EAD (select the checkbox and then click the red arrow in the Transformation Scenarios section to run the transformation).
  9. If you'd like to preview the EAD as a PDF file, you can convert it to PDF with the 3 Excel/EAD-to-PDF-preview transformation scenario after you have run the 3 Excel-to-EAD transformation.
  10. If you're going to upload this file to ArchivesSpace, then you'll need to convert it with the 4 prep-EAD-for-ASpace transformation.
  11. If you're going to merge this file after importing it into ArchivesSpace, you should edit the collection title. With the XML file open in oXygen, do a search for "Collection Title" and then change that text string to something like "MERGE_$callNumber", where $callNumber is the call number for the collection that you're working with.
  12. Before uploading any EAD file to ArchivesSpace, you must validate that file against a local set of tests that should alert you to any issues that might cause ArchivesSpace to reject the file.  To do this, go to Document -> Validate -> Configure Validation Scenario(s) -> select the "ArchivesSpace-PreIngest-validation" option, and then hit the "Apply Associated" button. You can also select the down arrow that's next to the "red check" Validate icon --> Configure Validation Sceaniro(s) -> etc.  Please do not skip this step ㋡
  13. To ingest into ArchivesSpace, go to Create --> Background Jobs.  There, select "Import Data" for "Job Type" and choose "EAD" from the Import Type dropdown list, and then add your file to the queue. At this point, your filename should have "-prepped4aspace.xml," "-foldered_per_collection," or "-foldered_per_box" at the end of it.  To get things started, just click the Queue Job button after you've pointed ArchivesSpace to the location of your EAD file.
  14. To merge the record with an existing finding aid, locate the existing finding aid in ArchivesSpace, and then use the "Merge" function. In the typeahead box, you can locate your newly-imported finding aid by typing in "MERGE-$callNumber" as explained step 11.
  15. Once you've reviewed everything, make sure that the collection, components, and notes are published as required.

ArchivesSpace to EAD to Excel (and back again)

Staff may wish to occasionally download EAD from ArchivesSpace, manipulate data in Excel, and re-import the file into ArchivesSpace. 

These instructions provide steps to do so.

  1. Open the desired resource record in ArchivesSpace
  2. Click on the “Export” drop-down button
  3. Select and Click “Download EAD” (keep the EAD3 Schema option checked)
  4. Open the downloaded EAD file in oXygen XML Editor. If you don't have a copy of the aspace-plus-excel-at-yale-2020-11-15.xpr oXygen project, see step 7 of "Excel to EAD and EAD to ArchivesSpace" to download a copy
  5. Run the 2 EAD3-to-Excel transformation
  6. In your Downloads folder, open the most recent oXygen file in Excel. This file should read something like: “[call number]_UTC_ead_bpg-excel.xml”
    1. Note: You can also open this file by dragging and dropping into an open blank Excel spreadsheet
  7. Manipulate the data, as needed, in Excel
  8. Once you are finished manipulating the data in the Excel spreadsheet save the file. Ensure that you're saving it as an XML Spreadsheet 2003 (*.xml)
  9. From there, follow steps 7-15 in the EAD and Excel instructions.  For Beinecke staff, please check with Alicia Detelich before re-importing any files following this method. There are a few other potential changes that will need to take place in order not to reintroduce messy data back into ArchivesSpace.

Caveats:

  1. If you intend to re-import your data into ArchivesSpace, be aware that the system may create duplicate top containers. It's best to always test your import in ArchivesSpace TEST and confirm that you're able to reconcile the data and the top conainers.
  2. Staff should keep in mind that there is no column in the spreadsheet that corresponds to ArchivesSpace's "general note" field. As of February 2019, any "general notes" in ArchivesSpace will be retained in the EAD to Excel transformation, but those notes will be moved to the scope and contents column instead.
  3. The EAD to Excel transformation does *not* support every type of EAD encoding possible.  EAD elements/attributes that will be dropped during the transformation process include:  table, ref, extref, ptr, daogrp, bibliography, fileplan, index, note, @role (in origination elements), @script, @calendar, @certainty, etc. All of these elements and attributes are retained at the collection level, but right now they are not mapped anywhere within the container list section that is output into the Excel spreadsheet. The text from each element should still survive, but not the extra markup.

Auto-numbering folders in oXygen

​For a video overview of this process, see https://vimeo.com/205969952 

  1. In Column V, "Container 1 Value," you must indicate the box number in each row (the transformation process will *not* assume that missing cells should be filled in)
  2. In Column X, "Container 2 Value", of the Excel template, you'll need to indicate the number of folders at each level of description by including the pound sign, #, directly in front of a number.  
    • #1 = 1 folder --> folder 1
    • #5 = 5 folders --> folder 2-6
    • #0 = i.e. the folder number should repeat the value of its preceding sibling --> folder 6, in this case, since the preceding value ended on 6. Note that you should never start a new box (or a new collection) with a #0 folder range, since that would mean that a single folder was located in two boxes (or that a collection should start with folder 0, which isn't common practice, just yet).
    • etc.
  3. Once you have all of your box numbers (e.g. 1) and folder ranges (e.g. #1) added to your Excel spreadsheet, you will proceed to transform the Excel file to EAD as described in steps 1-9 in the "EAD and Excel" section.
  4. Now that your file is in EAD, you can auto-number the folders by running either the "5 COLLECTION-method_convert-hashes-to-foldernumbers" transformation (i.e. the Beinecke method), which will result in distinct folder numbers throughout an entire collection (numbered in box number order, e.g. 1, 2, 3, 3a, 4, 5, etc.), regardless of the order of the boxes listed in the spreadsheet (e.g. 1, 5, 2, 4, 3, 3a), OR you can select the "5 BOX-method_convert-hashes-to-foldernumbers" transformation, which will result in folder numbers that start over at "folder 1" with each new box number.
  5. If you're auto-numbering folders using the "5 COLLECTION-method_convert-hashes-to-foldernumbers" method, and if you need the next folder number to start at a number other than the value "1" (e.g. you're adding a new series or accession to a previously-described collection), then you'll need to edit the transformation scenario before you run it again. To do this, you should select the transformation scenario, and then:
    1. Right-click on "5 COLLECTION-method_convert-hashes-to-foldernumbers"
    2. Select "Edit"
    3. Click the "Parameters (0)" button near the bottom of the dialog window.
    4. Select the "folder-number-start" parameter, which will by default have a value of 1.
    5. Click "Edit"
    6. Change the value from 1 to the next folder number (e.g. 99).
    7. Click "Ok", and then "Ok" again.
    8. Note that the Parameters button will now display as "Parameters (1)", indicating that the transformation that you'll run in just a moment will pass along this value.
    9. Click "Ok" once more to rid the screen of dialog boxes
    10. Now, run your transformation scenario, which should already be selected, as you normally would (e.g. by clicking the right, red arrow).
    11. Note that the first folder number in your resulting EAD file will now start with whatever parameter you selected (e.g. 99).
    12. Before closing oXygen, you should edit your transformation scenario once again, following the steps above; but after you click on the "Parameters (1)" button this time, you should click the "folder-number-start" parameter; and this time click the "Unset" button (thereby returning the default value to 1); and then finally click "Ok" twice more to rid the screen of the dialog boxes.