This post is the first of a two-part look at bulk upload and data remediation in Hyku. Part one is going to take a look at the background of bulk operations and why they are difficult to do well. Part two will talk about our specific work to try and address some of these needs in the Hyku for Consortia project. (Photo above by Ryoji Iwata on Unsplash)
Bulk operations in Hyku have a long history. In the initial user survey, conducted way back in 2015, one of the main findings was that Hyku needed to support the “easy in and out” of metadata. Metadata migration/remediation/transformation has always been a major activity in libraries. Think back to what an enormous task retrospective conversion of card catalogs to MARC was. Any library system containing metadata has to be able to manage that data at a large scale.
The design team for Hyku knew that bulk operations would be a key element to allowing potential users to commit to migrating out of their current tools. Hyku entered a market with a number of existing repositories. This new solution might have been able to solve many of the community’s frustrations with those tools, but only if there was an easy way to migrate to it. The initial requirements and personas therefore both reflected the needs to tools to upload and transform metadata from one system to another. Mockups reflected the need to both migrate data as well as remediate it.
This work was then reflected in Github issues during the project development (see: https://github.com/samvera/hyku/issues?q=is%3Aissue+is%3Aopen+bulk), but other more basic needs for repository development (you need a repository to migrate data to, after all) took a higher priority. So a new grant project called Bridge2Hyku picked up where development left off and explored the issue of migration in more depth (https://bridge2hyku.github.io/). Our colleagues at the Bridge to Hyku project did great work analyzing not only how to upload data and objects to Hyku, but also how to get it out of some of the major repository systems currently in use.
All of this work then…but why is metadata migration and bulk creation/upload so difficult?
The nature of structured data is what makes it so powerful: you can index and search it, you can compare like to like, you can organize and sort. In short, it makes order out of chaos. And as humans, that’s what we naturally do: recognize patterns. But, also like humans, we might all see the world slightly differently. So different metadata schema and repository systems can have their own way of seeing the world. Some are quite simple and allow for the same basic type of description of everything. Others are quite granular, allowing for more nuanced description of subtle details that can be important and powerful. So any system to migrate or convert from one system to another typically relies on a lot of human intelligence to see the patterns and make the connections.
But human capacity is only so much. How do you analyze thousands of records? Analytical tools like Open Refine can be helpful. So can guidelines for general rules on the major categories of migration as shown in crosswalks from other projects. But, as these examples perhaps show, these tools are not simple and not necessarily easy to pick up and learn. So any migration process is either going to require a lot of manual intellectual effort, or the creation of new tools to help with this business of organizing and translating.
The quirks of particular systems can also provide barriers. You may come up with a great crosswalk that works for one system, but doesn’t capture the nuance of another. Within Hyku for example, all works are sorted into worktypes. These types define the metadata schema used, the relationships between objects that can be created, and in some cases, the way that the object itself is presented and handled within the repository.
Data from other systems that don’t use this type of organization then require an extra step to define the worktype data should be migrated to. The system that data is coming from can also prove a barrier. Some systems are opaque making it hard to know exactly how data is stored. Others make it difficult to export data out. Many systems can provide an XML feed of records through a tool using the OAI-PMH protocol, but these are then just records, not objects themselves. Others might use a newer protocol like ResourceSync for export, but may be incompatible with systems still relying on OAI-PMH.
Finally, issues can come from the very nature of materials themselves. A particular challenge we’ve had with migration relates to the inter-relationships between objects. As I’ve talked about before, and will likely write about on this blog in the future, one of the key needs we found to assist in the uptake of Open Educational Resources (OER) is the availability of related teaching tools or ancillary materials. A freely available textbook is great, but if there are also related quizzes, videos, or lecture slides, an educator has all they need to make the switch. In order to make these materials visible in an OER repository, we need to have the ability to define lots of different types of relationship like “translation of”, “part of”, or “replaced by” (for new editions).
Creating these relationships may be easy when materials are being uploaded as they are created on an ad-hoc basis. But migrating them to a new environment presents a new challenge: how do you create a relationship between materials that may be next in the queue to be created? There isn’t a simple solution. For us, it’s meant creating some new code to handle the creation of relationships as a second step in the data migration process. The point of this example isn’t necessarily the solution we found to this problem, but the acknowledgment that many other types of materials may present their own unique needs. While uniformity and standardization is good, it’s the balance between standardization and diversity that makes a repository useful.
So bulk operations in repositories is a hard nut to crack. There are similarities in any migration or conversion, but there are also a lot of specific challenges to every situation. In our next post, we ‘ll talk about the development of bulk upload functionality for Hyku Commons and how we addressed challenges in our own work.
PALNI and PALCI are working together on Hyku because we believe in its potential for improving repository workflows and open access for libraries. But we are also working on it because we understand and value the benefits of collaborative work on open source software. On Friday, June 26th, I am doing a presentation for the West Virginia / Western Pennsylvania chapter of ACRL called “Collaborating for Innovation: Developing Consortial Open Source Software at PALCI,” part of which will involve a discussion of Hyku for Consortia. I’d like to take some time this month to delve into the philosophy that I’ll be discussing there about collaboration and consortia.
We’ve written about collaboration before, and so have others. Generally speaking, consortia help to increase the scale of the work that libraries can do. When consortia work together, they can increase that power even further. Collaboration on open source software projects is a particularly good example of the benefits of this kind of cooperation.
First, a little background on our two consortia. PALNI is a consortium of 24 private, academic libraries and PALCI is a consortium of 70 members of varying types of academic libraries from small to large, public to private. We both include aspects of what Lorcan Dempsey calls (in the second link above) the “classic library consortia activities … some combination of licensing, resource sharing and training, or sometimes manag[ing] a shared library system…” But we are also both trying to increase our value to our networks in new and innovative ways to help them meet new challenges.
One of those challenges we hear about is infrastructure for handling scholarly communication or other types of digital object management. Even with the diversity in our membership, we hear about these issues frequently:
Cost: Many repository solutions are simply too expensive, either in actual dollars or staff needed to successfully use them.
Adaptability: Many of our members have diverse needs when it comes to repositories. They have both scholarly communication missions as well as managing their own digital library content, and many solutions fit only one or the other successfully.
Limited choice: Consolidations and mergers in the vendor market seem to be limiting choice. The fear of getting locked-in to an infrastructure that will become untenable is real.
So here is a clear case where open source software may be helpful. Firstly, because it has a relatively low barrier to entry: the communities supporting many tools are open. While it may take some expertise to really engage, we have that expertise within our networks. There are also potential open source software solutions to these problems that could have a really high impact in meeting member needs. Finally, within PALNI and PALCI we already have some infrastructure and experience in repository management through PALNI’s research and other types of repository services and PALCI’s participation in the HykuDirect Pilot.
Our Hyku for Consortia project then, is an opportunity to help meet these members’ needs in some specific ways. First, we can mitigate the risk they might take on in trying something new. By spreading out that risk among our consortia — each institution supporting the project through a little bit of staff time, or a portion of their membership fee — we decrease that risk for all. Secondly, we also extend the opportunity to our members to help shape a product to specifically suit their needs. We are doing this by including multiple members from both consortia in our Product Management teams, getting feedback from members testing the software, and having open discussions with the membership on our progress.
The sneakiest benefit of all though is that investment in this particular software (or other open source software projects) can have ripple effects in the rest of the environment. The developments we are working on with Hyku will be highly beneficial to other vendors or providers that may offer the service as well. By integrating newer, better standards, we raise the bar for other competing services. So even if our member libraries don’t participate in this specific project, they can experience the benefits in the long-term.
It’s no surprise that recent global events related to the COVID-19 pandemic have affected libraries across the globe. As we focus on keeping distance to slow the spread, one bright spot is that we have been remotely collaborating on our cross-consortia repository from the beginning, so it’s offered a welcome sense of continuity in troubled times to continue the project.
Our last posts outline the goals and planned activities of our projects, and in the interim we’ve made excellent progress on defining the requirements and designing the planned outcomes for the first two of our major development activities:
Building collaborative workflows
Theming and branding development
With this blog post we’d like to focus on introducing what “building collaborative workflows” means to us. Consortial collaboration is more than just sharing costs. We want to create a tool that will allow us to jointly manage a multi-tenant repository infrastructure. Creating the flexibility for both IR workflows and more “traditional” library-owned content within the same instance of Hyku means enhancing the ability to manage user and tenant settings (enabling different workflows) through the consortial dashboard.
Our process for uncovering a way to address the rather broad task we’d given ourselves leaned into our collaborative process to uncover the places where workflows overlapped and diverged among our consortium members. We asked our Product Management Team to articulate the types of collections they hoped to build with Hyku. They described the types and sources of materials, as well as the people involved, thus identifying where workflows overlapped and diverged among our consortium members.
From these we next began to brainstorm through narrative scenarios of various workflows. These helped to highlight specific shared workflow tasks as well as gaps in the current Hyku product. We also examined the existing user roles and permissions available within and across tenants and articulated the need for some additional levels of permission through narrative documents, matrices, and visualizations of these shared workflows.
By working through this process we realized that a robust dashboard for user/role assignment, and the expansion of a few more roles, would enable us to manage these flexible workflow options. The current multi-tenant administrative dashboard for Hyku only allows for the creation of new tenants and the creation of users. We would need something far more powerful to assign users to our envisioned permission levels in multiple tenants.
With this basic idea and our specific needs for user levels articulated, we turned to work with our development partners at Notch8. Talking through each of our requirements documents, we have come up with a rough development plan. Some of our expectations will likely be adjusted based on the feasibility and difficulty of implementation, but our goal of “building collaborative workflows” will remain the same.
First is to decouple the “role” functionality in Hyku from the “group” functionality. Currently, permissions are assigned at both levels which can work at cross-purposes.
Next we will develop the dashboard needed to control these permissions. This part will require us to put our creative thinking caps back on to more fully define what this looks like.
Finally, we will work on implementing roles at the tenant level through the new dashboard.
We hope you’ve enjoyed this little peak behind the curtain at this behind-the-scenes “collaborative workflow” of our own: a cross-consortial development process between partners in three different states and two different time zones, using shared online tools, working asynchronously but together. We look forward to sharing our results in the future.
This project was made possible in part by the Institute of Museum and Library