Scaling Up a Collaborative Consortial Institutional Repository

Author: Amanda Hurford


(Feature photo by Olya Kobruseva from Pexels)

We thought we’d kick off the new year with a project update. 

At the end of 2020, as we looked to the end of “Phase 2” of our work (improving features for multi-tenant administration), we took some time to review our project goals with Notch8.  In this review, we laid out the deliverables promised in our IMLS grant and determined a deliberately scoped path to complete these goals by Spring 2021.  We both took stock of our progress to date and identified areas to further define our goals for the sake of efficient progress.  This was done with hopeful anticipation of an additional round of grant funding for Phase 3, in which we plan to remove identified barriers to adopting Hyku, both in and outside the consortial community.  

As a refresher, here’s an overview of our grant goals and deliverables:

Collaborative Workflow Support

Development is currently underway for collaborative workflow support.  Updated scoping for this work now includes:

  • An admin can create a new group in a tenant
  • An admin can then assign roles to that group 
  • Users who are added to a group will receive the permissions from that group. 
  • On the user management tab user’s groups as well as any individual permissions granted to the user will be displayed
  • Tenant level roles on the User Matrix will be created 

In a future project phase, we hope to add the Multi-tenant Manager and Multi-tenant Editor roles.  We also want to create a groups and permissions area on the consortia admin page that will create one workflow for adding groups permissions across multiple tenants.

Worktypes

Development is also underway for worktype development.  The OER worktype is completed (specs here), with specifications for the ETD worktype are complete and the “shell” worktype (a copy of the generic workype) already created.  Work now underway and now nearly completed includes:

  • Metadata customization for the ETD worktype
  • Fields will be configured according to the ETD worktype specifications
  • Once the fields are configured Bulkrax mappings will be set up for importing and exporting the ETD worktype.

We’d love to further explore easy creation of worktypes in the future, as well as greater flexibility for controlled vocabularies.

Themed Templates

We’ve done a lot of work gathering specifications and mocking up wireframes representing the themes (IR, cultural heritage, and neutral) we’d like to implement as part of Hyku.  Scoped work in this area for the remainder of this project phase are as follows:

  • A Theme tab will be added under the Appearance page. 
  • On this tab, a user will be able to select a home page theme, a search results page theme, and a work display page theme. This will allow for greater flexibility for repository managers and extend the core offering to a wider range of use cases. 
  • The theme pages will respond to the colors, logos, and feature flippers set in the app.
  • The following Pages will be built as themes (referencing preliminary mockups):
    • 3 Home Page options (Cultural Repository, Institutional Repository, Neutral )
    • Search Pages with Gallery, Masonry, and Slideshow
    • Images Based and Text Based show Pages.

This is an exciting new development path, and we can’t wait to see how it turns out!  In the future we may make some changes to how the template elements function, and possibly additional options to make the theming as flexible and customizable as possible.

For the remaining deliverables (DOI minting, cross tenant searching, and multi-tenant shared works), we’ll continue to gather requirements from our user communities and explore work being completed in complementary projects. As always, we look to integrate our work with the larger Hyku Roadmap, contributing our improvements back to the Hyku base code and avoiding duplicative development efforts whenever possible.  

We’ll continue to post updates on our project here, and please feel free to contact us with any questions.


This post is the second of a two-part look at bulk upload in Hyku.  The first examined the background of bulk operations and why they are difficult to do well. This post focuses specifically on the application of a bulk import solution in Hyku Commons.  The need for bulk upload in this project is similar to those identified by the large Hyku user community.  We too need an “easy in and easy out” data solution for our repository users.  (Photo above by Pexels on Pixabay)

In PALNI’s 2018 white paper, we identified several valued repository attributes, and have since adopted them as the shared vision Hyku for Consortia project.  One of these values speaks directly to the need for bulk upload solutions: “The collaborative institutional repository should be a system which is interoperable and allows free-flow of data. Easy import and export of metadata and objects are possible.” The use cases for bulk ingest are numerous.  Migrations from another platform, repurposing data from external sources such as finding aids, and user preference rank high on the list for why one would rather import works and their metadata in bulk rather than piecemeal.

To further illustrate the need for bulk importing, you will find a list of workflow examples in the Hyku for Consortia project documentation.  These examples provide hypothetical consortial profiles and repository scenarios based on real-life stories contributed by our Product Management Team.  From these scenarios:

  • Scenario 1, “Midwest Library Consortium”: Tenant-only Editor is in the archives department and has a digitized archives collection to add to the repository. He creates a new collection and uses one of the pre-populated admin set choices. He then bulk uploads the content and saves it but does not publish it.
  • Scenario 3, “Wealthy Alumni College”: Tenant-only Editor begins uploading student works in bulk into the repository with draft metadata, licenses, and embargoes. 
  • Scenario 5, “Sunnydale Community College”: Student staff member is made Tenant Editor. She uploads minutes in batches with a spreadsheet of basic metadata. The collection is not yet published.

These scenarios helped us to envision all the ways that Hyku might be used for various IR users and content, and to define our collaborative workflows and user roles.  Also, without us realizing it at the time, they very much highlighted how essential bulk import is to this work. In three out of five of these examples, we envisioned works being uploaded in bulk by Tenant Editors, who might be an archivist/librarian, grad school staff member, or even a trusted student.  These users have metadata in an existing external format, and rekeying hundreds or thousands of metadata values would be a waste of their time.

Shifting away from the hypothetical to the actual, now that we are using Hyku Commons for real-world pilot repositories, the need for bulk import functionality is even more apparent.  For example, one of our partner institutions moved content from Digital Commons to CONTENTdm as a stop-gap when they lost access to the platform due to cost. Now they want to move that content into Hyku. 

Using the Bridge2Hyku project’s CDM Bridge tool, export was a breeze.  We were able to extract all the files and metadata from CONTENTdm in a way that Hyku would understand.  But how to get the described works into Hyku?  The native Hyku batch import did not provide a solution, since it applied identical metadata to each item.  The records we wanted to bulk upload have complete, individual descriptions. We soon learned that this kind of desired bulk import was a much more complicated task, and reached out to Notch8 to find a solution.  

With Notch8’s help, we investigated HyBridge (the import counterpart to CDM Bridge’s export), Cdm_Migrator, and Bulkrax as potential bulk import solutions for Hyku Commons.  We selected Bulkrax for our project because it seemed to work best for our multi-tenant environment and was easiest to configure within our setup.

According to the Samvera Labs webpage, “Bulkrax is a batteries included importer for Samvera applications. It currently includes support for OAI-PMH (DC and Qualified DC), XML, Bagit, and CSV out of the box. It is also designed to be extensible, allowing you to easily add new importers into your application or to include them with other gems. Bulkrax provides a full admin interface including creating, editing, scheduling and reviewing imports.”

Check out this poster from Samvera Connect 2019 for more information about Bulkrax.

Bulkrax poster by Keving Kochanski, used with permission

After working with Notch8 to install and update Bulkrax into Hyku Commons, we viewed developer-supplied walkthrough videos (like this one) and wiki documentation to get a better understanding of how to use the CSV importer.  It is now possible to bulk upload to Hyku Commons with Bulkrax by importing a zipped folder containing a folder of files and a properly formulated CSV file.  The CSV contains rows for each object’s descriptive metadata.  Additionally, the first four fields are administrative fields, which govern how the importer imports the files. 

Administrative Fields

  • item – Lists the name and extension of the item being imported, such as file.jpg. 
  • source_identifier – Establishes a persistent identifier for the object being imported. 
  • model – Identifies the worktype the work will be created as. 
  • collection – Determines what collection(s) the work will be added to. 

One of our challenges is the lack of step by step documentation for these processes.  It’s a complex process and a tad finicky, so a very detailed guide would be helpful.  Another is the need for separate parsers, and the intervention of a developer to create them, for custom worktypes.  For our bulk upload to work for our OER worktype, for example, separate work had to be done to add the parser and to allow the relationships between items mentioned in the last post.  Lastly, there were a few oddities along the way that we reported and were added to the Bulkrax project board so that they can receive feedback from the community.

In considering bulk capabilities for our project, the next step is to look towards bulk export with Bulkrax. This functionality currently exists in limited capacity, but it is in further development for wider usability.  In keeping with the “easy in, easy out” theme, there are many use cases in which we’d desire the ability to export metadata as well as files from the Hyku Commons tenants. Stay tuned for additional developments on this process!


Within our IMLS grant project, we have been working hard with our product management team and developer Notch8 to define and develop consortially-focused improvements to Hyku. For example, last month we shared some logistics for building collaborative workflows, working towards a master dashboard to control multi-tenant user permissions.

At the same time as these development activities are taking place, this project has also focused on the practical aspect of making the existing version of Hyku usable for our consortial partners to pilot as a working institutional repository.  Our work has thus branched into two separate areas: Development and Production.

The Production arm of our work focuses on readying the existing Hyku Commons product for real-world pilot use starting this summer.  As a result of user testing from both the PALNI and PALCI sides, we’ve been submitting tickets for small bug fixes and minor improvements which are now happening parallel to the development of features as outlined in the IMLS grant.  Notch8 has devoted a lot of resources to our project in both arenas, and we’ve established a great working relationship and clear communication of needs from both sides.

To date we’ve created two clear and separate working instances of Hyku for Development and Production.  First, the Development instance acts in a number of purposes:

  1. A sandbox for PALCI and PALNI institutions to preview and test a Hyku tenant
  2. A staging area for Notch8 to preliminarily roll out updates, bug fixes, and new features

Second, the Production instance is where work is deployed once tested in the Development environment and also where pilot repositories will be built. It will be publicly available as a working repository soon. 

We’ll share the Hyku Commons product (ie, our Production instance) when it has pilot content that is ready to be viewed.  For now, checkout the demo video below for a brief look at our Development environment. PALCI and PALNI libraries can request a test repository in the Development instance using this form.

Five minute Hyku Commons demo


At the end of 2019, we posted an introduction to our Hyku project, Scaling Up a Collaborative Consortial Institutional Repository (made possible with support from IMLS).  Now we are sharing some of the high-level goals and phases for the project, as well as a status update.  Stay tuned for progress on these activities!

Goals:

1. Contribute an affordable open source IR tool to consortial communities

2. Develop a model for collaboration and shared infrastructure that is easily adoptable

3. Further grow the Hyku community

Activities:

Phase 1: Specification – In progress

  • Needs assessment for use cases, workflows, and functionality
  • Specification gathering for ETD and OER worktypes and workflows 
  • Collaboration with external advisors feedback
  • Exploration of consortium scale DOI services 
  • Distillation of specifications for development planning

Phase 2: Development – to start in April 2020

  • Building collaborative workflows 
  • Theming and branding development
  • Multi-tenant viewable works and searching
  • Enhance data exports for improved discovery
  • ETD and OER worktype implementation and versioning
  • Integration with external Hyku development efforts 

Phase 3: Pilot and Communication

  • Early development testing
  • Pilot phase
  • Project reporting, documentation, and training 
  • Build out sustainability/governance/business models 
  • Outreach & communications 
  • Contribute code and development efforts back to Hyku/Samvera community

Update:

Working with our Product Management Team’s use case scenarios, we are currently defining improvements we want to make to Hyku to support consortial workflows. So far, we’ve identified the need for more levels of user permissions than standard Hyku offers, and tools to assign users to roles across more than one tenant. These will allow consortia to collaborate together on things like collection development or metadata creation, if they so desire. 

Our next big area for exploration are the needs for easy look and feel customization and feel, and non-repository features of each tenant (things like widgets for displaying featured items, or social media feeds, etc.).  Soon we’ll move on to development of these first two areas, while we continue to flesh out our needs for an ETD worktype and DOI services, among other features. Check back in on this space for more on these in the future.