Scaling Up a Collaborative Consortial Institutional Repository

Bulking Up, Part 2: Bulk Upload in Hyku Commons


This post is the second of a two-part look at bulk upload in Hyku.  The first examined the background of bulk operations and why they are difficult to do well. This post focuses specifically on the application of a bulk import solution in Hyku Commons.  The need for bulk upload in this project is similar to those identified by the large Hyku user community.  We too need an “easy in and easy out” data solution for our repository users.  (Photo above by Pexels on Pixabay)

In PALNI’s 2018 white paper, we identified several valued repository attributes, and have since adopted them as the shared vision Hyku for Consortia project.  One of these values speaks directly to the need for bulk upload solutions: “The collaborative institutional repository should be a system which is interoperable and allows free-flow of data. Easy import and export of metadata and objects are possible.” The use cases for bulk ingest are numerous.  Migrations from another platform, repurposing data from external sources such as finding aids, and user preference rank high on the list for why one would rather import works and their metadata in bulk rather than piecemeal.

To further illustrate the need for bulk importing, you will find a list of workflow examples in the Hyku for Consortia project documentation.  These examples provide hypothetical consortial profiles and repository scenarios based on real-life stories contributed by our Product Management Team.  From these scenarios:

  • Scenario 1, “Midwest Library Consortium”: Tenant-only Editor is in the archives department and has a digitized archives collection to add to the repository. He creates a new collection and uses one of the pre-populated admin set choices. He then bulk uploads the content and saves it but does not publish it.
  • Scenario 3, “Wealthy Alumni College”: Tenant-only Editor begins uploading student works in bulk into the repository with draft metadata, licenses, and embargoes. 
  • Scenario 5, “Sunnydale Community College”: Student staff member is made Tenant Editor. She uploads minutes in batches with a spreadsheet of basic metadata. The collection is not yet published.

These scenarios helped us to envision all the ways that Hyku might be used for various IR users and content, and to define our collaborative workflows and user roles.  Also, without us realizing it at the time, they very much highlighted how essential bulk import is to this work. In three out of five of these examples, we envisioned works being uploaded in bulk by Tenant Editors, who might be an archivist/librarian, grad school staff member, or even a trusted student.  These users have metadata in an existing external format, and rekeying hundreds or thousands of metadata values would be a waste of their time.

Shifting away from the hypothetical to the actual, now that we are using Hyku Commons for real-world pilot repositories, the need for bulk import functionality is even more apparent.  For example, one of our partner institutions moved content from Digital Commons to CONTENTdm as a stop-gap when they lost access to the platform due to cost. Now they want to move that content into Hyku. 

Using the Bridge2Hyku project’s CDM Bridge tool, export was a breeze.  We were able to extract all the files and metadata from CONTENTdm in a way that Hyku would understand.  But how to get the described works into Hyku?  The native Hyku batch import did not provide a solution, since it applied identical metadata to each item.  The records we wanted to bulk upload have complete, individual descriptions. We soon learned that this kind of desired bulk import was a much more complicated task, and reached out to Notch8 to find a solution.  

With Notch8’s help, we investigated HyBridge (the import counterpart to CDM Bridge’s export), Cdm_Migrator, and Bulkrax as potential bulk import solutions for Hyku Commons.  We selected Bulkrax for our project because it seemed to work best for our multi-tenant environment and was easiest to configure within our setup.

According to the Samvera Labs webpage, “Bulkrax is a batteries included importer for Samvera applications. It currently includes support for OAI-PMH (DC and Qualified DC), XML, Bagit, and CSV out of the box. It is also designed to be extensible, allowing you to easily add new importers into your application or to include them with other gems. Bulkrax provides a full admin interface including creating, editing, scheduling and reviewing imports.”

Check out this poster from Samvera Connect 2019 for more information about Bulkrax.

Bulkrax poster by Keving Kochanski, used with permission

After working with Notch8 to install and update Bulkrax into Hyku Commons, we viewed developer-supplied walkthrough videos (like this one) and wiki documentation to get a better understanding of how to use the CSV importer.  It is now possible to bulk upload to Hyku Commons with Bulkrax by importing a zipped folder containing a folder of files and a properly formulated CSV file.  The CSV contains rows for each object’s descriptive metadata.  Additionally, the first four fields are administrative fields, which govern how the importer imports the files. 

Administrative Fields

  • item – Lists the name and extension of the item being imported, such as file.jpg. 
  • source_identifier – Establishes a persistent identifier for the object being imported. 
  • model – Identifies the worktype the work will be created as. 
  • collection – Determines what collection(s) the work will be added to. 

One of our challenges is the lack of step by step documentation for these processes.  It’s a complex process and a tad finicky, so a very detailed guide would be helpful.  Another is the need for separate parsers, and the intervention of a developer to create them, for custom worktypes.  For our bulk upload to work for our OER worktype, for example, separate work had to be done to add the parser and to allow the relationships between items mentioned in the last post.  Lastly, there were a few oddities along the way that we reported and were added to the Bulkrax project board so that they can receive feedback from the community.

In considering bulk capabilities for our project, the next step is to look towards bulk export with Bulkrax. This functionality currently exists in limited capacity, but it is in further development for wider usability.  In keeping with the “easy in, easy out” theme, there are many use cases in which we’d desire the ability to export metadata as well as files from the Hyku Commons tenants. Stay tuned for additional developments on this process!

Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *