Scaling Up a Collaborative Consortial Institutional Repository

Author: Amanda Hurford


This post is the second of a two-part look at bulk upload in Hyku.  The first examined the background of bulk operations and why they are difficult to do well. This post focuses specifically on the application of a bulk import solution in Hyku Commons.  The need for bulk upload in this project is similar to those identified by the large Hyku user community.  We too need an “easy in and easy out” data solution for our repository users.  (Photo above by Pexels on Pixabay)

In PALNI’s 2018 white paper, we identified several valued repository attributes, and have since adopted them as the shared vision Hyku for Consortia project.  One of these values speaks directly to the need for bulk upload solutions: “The collaborative institutional repository should be a system which is interoperable and allows free-flow of data. Easy import and export of metadata and objects are possible.” The use cases for bulk ingest are numerous.  Migrations from another platform, repurposing data from external sources such as finding aids, and user preference rank high on the list for why one would rather import works and their metadata in bulk rather than piecemeal.

To further illustrate the need for bulk importing, you will find a list of workflow examples in the Hyku for Consortia project documentation.  These examples provide hypothetical consortial profiles and repository scenarios based on real-life stories contributed by our Product Management Team.  From these scenarios:

  • Scenario 1, “Midwest Library Consortium”: Tenant-only Editor is in the archives department and has a digitized archives collection to add to the repository. He creates a new collection and uses one of the pre-populated admin set choices. He then bulk uploads the content and saves it but does not publish it.
  • Scenario 3, “Wealthy Alumni College”: Tenant-only Editor begins uploading student works in bulk into the repository with draft metadata, licenses, and embargoes. 
  • Scenario 5, “Sunnydale Community College”: Student staff member is made Tenant Editor. She uploads minutes in batches with a spreadsheet of basic metadata. The collection is not yet published.

These scenarios helped us to envision all the ways that Hyku might be used for various IR users and content, and to define our collaborative workflows and user roles.  Also, without us realizing it at the time, they very much highlighted how essential bulk import is to this work. In three out of five of these examples, we envisioned works being uploaded in bulk by Tenant Editors, who might be an archivist/librarian, grad school staff member, or even a trusted student.  These users have metadata in an existing external format, and rekeying hundreds or thousands of metadata values would be a waste of their time.

Shifting away from the hypothetical to the actual, now that we are using Hyku Commons for real-world pilot repositories, the need for bulk import functionality is even more apparent.  For example, one of our partner institutions moved content from Digital Commons to CONTENTdm as a stop-gap when they lost access to the platform due to cost. Now they want to move that content into Hyku. 

Using the Bridge2Hyku project’s CDM Bridge tool, export was a breeze.  We were able to extract all the files and metadata from CONTENTdm in a way that Hyku would understand.  But how to get the described works into Hyku?  The native Hyku batch import did not provide a solution, since it applied identical metadata to each item.  The records we wanted to bulk upload have complete, individual descriptions. We soon learned that this kind of desired bulk import was a much more complicated task, and reached out to Notch8 to find a solution.  

With Notch8’s help, we investigated HyBridge (the import counterpart to CDM Bridge’s export), Cdm_Migrator, and Bulkrax as potential bulk import solutions for Hyku Commons.  We selected Bulkrax for our project because it seemed to work best for our multi-tenant environment and was easiest to configure within our setup.

According to the Samvera Labs webpage, “Bulkrax is a batteries included importer for Samvera applications. It currently includes support for OAI-PMH (DC and Qualified DC), XML, Bagit, and CSV out of the box. It is also designed to be extensible, allowing you to easily add new importers into your application or to include them with other gems. Bulkrax provides a full admin interface including creating, editing, scheduling and reviewing imports.”

Check out this poster from Samvera Connect 2019 for more information about Bulkrax.

Bulkrax poster by Keving Kochanski, used with permission

After working with Notch8 to install and update Bulkrax into Hyku Commons, we viewed developer-supplied walkthrough videos (like this one) and wiki documentation to get a better understanding of how to use the CSV importer.  It is now possible to bulk upload to Hyku Commons with Bulkrax by importing a zipped folder containing a folder of files and a properly formulated CSV file.  The CSV contains rows for each object’s descriptive metadata.  Additionally, the first four fields are administrative fields, which govern how the importer imports the files. 

Administrative Fields

  • item – Lists the name and extension of the item being imported, such as file.jpg. 
  • source_identifier – Establishes a persistent identifier for the object being imported. 
  • model – Identifies the worktype the work will be created as. 
  • collection – Determines what collection(s) the work will be added to. 

One of our challenges is the lack of step by step documentation for these processes.  It’s a complex process and a tad finicky, so a very detailed guide would be helpful.  Another is the need for separate parsers, and the intervention of a developer to create them, for custom worktypes.  For our bulk upload to work for our OER worktype, for example, separate work had to be done to add the parser and to allow the relationships between items mentioned in the last post.  Lastly, there were a few oddities along the way that we reported and were added to the Bulkrax project board so that they can receive feedback from the community.

In considering bulk capabilities for our project, the next step is to look towards bulk export with Bulkrax. This functionality currently exists in limited capacity, but it is in further development for wider usability.  In keeping with the “easy in, easy out” theme, there are many use cases in which we’d desire the ability to export metadata as well as files from the Hyku Commons tenants. Stay tuned for additional developments on this process!


Within our IMLS grant project, we have been working hard with our product management team and developer Notch8 to define and develop consortially-focused improvements to Hyku. For example, last month we shared some logistics for building collaborative workflows, working towards a master dashboard to control multi-tenant user permissions.

At the same time as these development activities are taking place, this project has also focused on the practical aspect of making the existing version of Hyku usable for our consortial partners to pilot as a working institutional repository.  Our work has thus branched into two separate areas: Development and Production.

The Production arm of our work focuses on readying the existing Hyku Commons product for real-world pilot use starting this summer.  As a result of user testing from both the PALNI and PALCI sides, we’ve been submitting tickets for small bug fixes and minor improvements which are now happening parallel to the development of features as outlined in the IMLS grant.  Notch8 has devoted a lot of resources to our project in both arenas, and we’ve established a great working relationship and clear communication of needs from both sides.

To date we’ve created two clear and separate working instances of Hyku for Development and Production.  First, the Development instance acts in a number of purposes:

  1. A sandbox for PALCI and PALNI institutions to preview and test a Hyku tenant
  2. A staging area for Notch8 to preliminarily roll out updates, bug fixes, and new features

Second, the Production instance is where work is deployed once tested in the Development environment and also where pilot repositories will be built. It will be publicly available as a working repository soon. 

We’ll share the Hyku Commons product (ie, our Production instance) when it has pilot content that is ready to be viewed.  For now, checkout the demo video below for a brief look at our Development environment. PALCI and PALNI libraries can request a test repository in the Development instance using this form.

Five minute Hyku Commons demo


At the end of 2019, we posted an introduction to our Hyku project, Scaling Up a Collaborative Consortial Institutional Repository (made possible with support from IMLS).  Now we are sharing some of the high-level goals and phases for the project, as well as a status update.  Stay tuned for progress on these activities!

Goals:

1. Contribute an affordable open source IR tool to consortial communities

2. Develop a model for collaboration and shared infrastructure that is easily adoptable

3. Further grow the Hyku community

Activities:

Phase 1: Specification – In progress

  • Needs assessment for use cases, workflows, and functionality
  • Specification gathering for ETD and OER worktypes and workflows 
  • Collaboration with external advisors feedback
  • Exploration of consortium scale DOI services 
  • Distillation of specifications for development planning

Phase 2: Development – to start in April 2020

  • Building collaborative workflows 
  • Theming and branding development
  • Multi-tenant viewable works and searching
  • Enhance data exports for improved discovery
  • ETD and OER worktype implementation and versioning
  • Integration with external Hyku development efforts 

Phase 3: Pilot and Communication

  • Early development testing
  • Pilot phase
  • Project reporting, documentation, and training 
  • Build out sustainability/governance/business models 
  • Outreach & communications 
  • Contribute code and development efforts back to Hyku/Samvera community

Update:

Working with our Product Management Team’s use case scenarios, we are currently defining improvements we want to make to Hyku to support consortial workflows. So far, we’ve identified the need for more levels of user permissions than standard Hyku offers, and tools to assign users to roles across more than one tenant. These will allow consortia to collaborate together on things like collection development or metadata creation, if they so desire. 

Our next big area for exploration are the needs for easy look and feel customization and feel, and non-repository features of each tenant (things like widgets for displaying featured items, or social media feeds, etc.).  Soon we’ll move on to development of these first two areas, while we continue to flesh out our needs for an ETD worktype and DOI services, among other features. Check back in on this space for more on these in the future.