I had the pleasure of attending Open Repositories 2009, which took place on May 18-21, 2009, in Atlanta, GA, at Georgia Tech University. The conference web site states “Open Repositories attempts to create an opportunity to explore the challenges faced by user communities and others in today’s world.”
The conference was more or less split into two halves, with the first two days being like your typical conference with two large rooms with presentations taking place simultaneously. The last two days were held in a different location with smaller rooms, where users of different software packages could focus on issues pertaining to their chosen platform – Dspace, Fedora, and EPrints. As this was a four day conference, I’ll hit what I consider the highlights.
The first morning started off interesting, with Cornell’s Simeon Warner discussing the need for Author IDs, which would enable authors to be linked to a unique identifier, instead of relying on their name as is the standard practice. This would allow functionalities such as “allow me to see all papers by THIS Mr. Lee”, and allow for services to interact with only Mr. Lee’s papers. Additionally, this could allow one to be able to see papers by Mr. Lee in all repositories. This allows for increased statistical measurements as well, as you can get accurate measures of usage and co-usage/citations. The creation of a standard Author ID is sticky, however, as issues such as privacy, legality, effort, accuracy, longevity, openness, and control are raised. Additional challenges are issues such as dealing with a paper with 10 or 2500 authors. Warner mentions the always-accurate statement that author adoption of an open AuthorID would be driven by the services it allows.
Folllowing Warner, Matthew Zumwalt of MediaShelf gave a lively presentation where he promoted and championed the Agile development of lightweight applications that live on top of more involved and technically robust systems. He demonstrated a variety of small Ruby on Rails applications that existed on top of a Fedora backend. He made the statement that data needs to be separated from the software so that developers can quickly build “fun” applications that do what you need. I’m of the opinion that these applications were clearly not fully realized, but the concept that the archival software does not necessarily have to be the presentation and interaction software is definitely worthwhile to explore and champion.
The next session featured Pablo Fernicola of Microsoft, Adrian Stevenson from the University of Bath, and Julie Allinson from the University of York giving some updates on SWORD, the Simple Web Service Offering Repository Deposit. Some of the tools mentioned or demonstrated were web-based, desktop-based, MS Office plugins, and facebook (and other) widgets. SWORD currently makes use of the Atom Publishing Protocol, which has pros and cons. It believe it was mentioned that SWORD v2 would likely move to a different protocol, but don’t quote me on that.
My notes from the afternoon sessions are a bit vague, in that I have some interesting points noted but not where they came from – so I’ll just list them here. One point mentioned was that “data curation is hard”, in that there is a tension between solid standards and technical innovation. There are heavy requirements of ensuring provenance and dealing with custom data types or software. Micro-services, such as the ones Zumwalt mentions, were mentioned, to exist as low-barrier, low-commitment, tools – leveraging native OS file management tools to create flexible systems. “We are GOOD at file systems” was a statement, so it makes sense to create an “object system” using these ideas.
The keynote was from John Wilbanks, the Vice President of Science at Creative Commons. He gave the keynote at the SPARC Digital Repositories Meeting 2008, which I attended, and this was a slight variation on that presentation. Wilbanks stresses a new way of thinking about copyright and digital information – digital paper is still the same old analog way of thinking about knowledge existing through pen and paper. Researchers need to be able to see the return on investment – that repositories are worth it.
Tuesday began with a discussion by Wayne Johnston of the University of Guelph Library, where he focused on the branding and promotion of a repository. He talked about how social marketing – using commercial marketing principles to effect behavioral change – could be used to increase the adoption of a repository. The naming of a product or service is often the first impression, and “Institutional Repository” is pretty bad – bureaucratic and passive. The name of an IR should be meaningful if possible, which can be difficult. Promotion is also important – events, presentations, invitations to faculty, the creation of promotional kits, all can help. Graphic design is also an important factor – the site has to look professional and be understandable to users.
Elizabeth Yaken of the University of Michigan then discussed MIRACLE - Making IR’s a Collaborative Learning Environment. UM undertook a census involving interviews with users, user studies, case studies, and such, to take internal measures. They conducted interviews with library and campus leaders, IT staff, other campus IRs, users, and contributers. Their mission was to change the message from the “IR” to one of “authors rights” – the KEY to buy in, according to her. The IR needs to help realize the library’s vision in the 21st century. Content, services, sustainability of the IR may not be sufficient for actual IMPACT, however. The IR needs to be framed in terms of long-term library goals.
The Library of Congress outlined a tool they created to manage the transfer of the incredibly large amount of data they received from various projects, known as BagIt. Transfer is important to manage well because it’s a large part of their daily operations, and digitization can create one or hundreds of files, existing in multiple locations simultaneously – good tools reduce the number of tasks performed and items tracked by people. They also demonstrated an inventory program that records package and file events, which helps with risk assessment and storage audits. The knowledge of file holdings and the file life cycle events reduces risks. They stressed that these modular tools and services can be both extended upon but exist independently, giving them flexibility and nimbleness.
Over lunch, I attended a “Birds of a Feather” session where Tim Donohue and Sarah Shreeves demonstrated their beta of BibApp, a very nice looking program that strives to match the researchers at an institution with their work, and mines the data to determine collaborations and other links. It also allows for depositing the work directly into a repository and for tracking the Open Access allowances the publisher of the work provides. It looks to be an exciting and useful piece of software and I am looking forward to the 1.0 release.
John Kunze, Stephen Abrams, and Patricia Cruse discussed the California Digital Library’s project to re-invision its curation infrastructure as a set of micro-services rather than a single monolithic one. They believe it is safer and more cost-effective to plan to deal with the transient nature of systems and plan on managing instead of resisting change. Each function of the repository is small and self-contained, which allows them to be more easily developed and maintained. This lower level of initial investment allows them to be easily replaced as needed. They stressed clearly articulating needs and outcomes before implementing software – create STRATEGIES, then software.
Steve DiDomenico and Claire Stewart from Northwestern University discussed their Mounting Books Project, which is a large book scanning project taking place there. They outlined the issues faced with hardware, student workers, brittle pages, large fold outs, and so on, and the errors caused by massive volumes of data.
Wednesday was the beginning of the breakout sessions, so the presentations became more DSpace-focused. The morning began with an overview of DuraSpace, the fairly new combination of DSpace and Fedora. Sandy Payette of Fedora Commons and Michele Kimpton of DSpace talked about the future directions for the two software packages and of the organization. DuraSpace is the umbrella organization for DSpace and Fedora, with no initial change of governance of the ways communities update the software. Over time there is the hope that synergies will be found and the communities will move towards a single entity.
Dspace 2 is planned for 2010, which is a major restructuring. In the mean time, DSpace 1.6 will contain bug fixes and add in feature requests, and be a stepping stone to 2.0. Dspace 2.0 will have no more strict communities, collections, and items, but instead be built around entities, relationships, and properties with the old community model built on top for the user interface. A detailed model was shown about how entities related to each other and have properties applied to both. Additionally, a superior metadata schema is employed that allows for any entity to have metadata, which can allow for increased support for things such as journals with volumes, issues, and individual articles.
I attended the DSpace “manager track” in the afternoon, which moved away from the technical side of things and focused on use cases. Sue Kunda of Oregon State University demonstrated their ETD workflow, which wasn’t drastically different from our own here at Mason. Sean Thomas of the Massachusetts Institute of Technology demonstrated CiteLine, a web site that looks to provide a similar functionality to BibApp. It uses open source software to create a centralized service for citation data, and making that data more interactive.
William E. Moen from the University of North Texas demonstrated their learning object repository, and discussed the problems and thought processes involved. They had to wrap their heads around how to break apart a course into granular elements and how to store these in a way that made sense to users. They utilized a Manakin-based interface to highly customize the presentation of the learning objects to people interacting with the system.
I didn’t note which individual made the presentation, but four people from both Texas A&M and the University of Texas worked on a project called Vireo, which was a very robust overhaul to DSpace for managing ETD submissions and the documents themselves. It incorporated many features that would be useful for DSpace proper, such as faceted browsing and a straightforward depositing mechanism, but unfortunately this was tailored specifically for ETD documents and the specific needs of those data types. It was a very nice solution to the problem and hopefully will be made available to the public soon.
The day ended with a poster session minute madness, where all the poster presenters had one minute to describe what their project was. Then we had the opportunity to interact with the presenters and learn about their projects.
Thursday was mostly a wrap-up day. I attended a presentation by Tim Donohue where he outlined various methods of customizing the DSpace 1.5 interface, and two members of the DSpace Foundation discussed updates from the Outreach Committee and their next steps. After that were workshops. I chose to attend one concerning SWORD, where I saw a demonstration of a nice SWORD depositing application that used the Adobe Air framework, which is kind of a java-esque multi-platform runtime environment. The application allowed for applying metadata to documents and then uploading them to the repository. Much of the discussion pertained to developers, however.
To summarize: it was pretty clear that much of the discussion focused on individuals developing tools locally that would be more nimble and replaceable than the software that stored the data for the long term. The separation of the data management and the data interaction/display makes sense, as it is likely easier to build a tool that can handle the needs of interacting with dynamic information types on TOP of the software that manages the complexities of digital preservation, instead of relying on one software to do both. Additionally, this shows again how infrequently the out-of-the-box software truly meets user needs – the institutions realized they had to take control of the display of their information. Some of the more advanced institutions were building entire systems for the management of and interaction with their data, but looking at it in a strategies first, software second way.
It’s pretty clear to me that institutional repositories are still seeking a true foundation to build from, and there still is a lack of understanding of just what the role of the repository is within the institution. Software foundations are joining together, new versions are rolling out, people are creating highly customized interfaces to meet very specific internal needs – it’s very dynamic. I’m excited to see some projects that attempt to get some of the biggest problems I face as a repo manager - dealing with authors and their work, the copyright issues with the work, and getting that work into the repository.
The future of repositories is uncertain, but certainly going SOMEWHERE. Where that is remains to be seen. I’m concerned that smaller institutions, or those that do not place a high priority on digital preservation/data curation, will be heavily relying on either open source or commercial software to meet their initial needs, but not placing any dollars into hiring developers or enough staff to really do something. As Elizabeth Yaken said, the repository has to be a part of the library’s total vision, and I’m fearful that this isn’t the case in many places.
Tags: No Comments
0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.