Digital Repositories and Preservation

Thoughts on digital repositories, digital preservation, and scholarly communication.

Digital Repositories and Preservation header image 1

Open Access and Digital Preservation

May 14th, 2008 · 1 Comment · DSpace, Digital Preservation, Open Access, Rant

I previously mentioned that I was going to address my concerns with the way DSpace (and other repository software) is designed to allow open access to the information contained within the archive. As I prefer to keep things brief, this may not dive deeply enough into the issues at hand to be truly satisfactory, but I’ll give it a shot.

I’ll start by stating that I have no problem with the open access to information - I feel that the current copyright laws are somewhat ludicrous, especially concerning durations (see: Disney), and stray too far from satisfying an initial financial compensation to someone’s intellectual output and seem designed more for ensuring that a certain mouse remains a property of an animation company. Throw in the DMCA and nearly any librarian or computer geek is going to take issue with contemporary copyright law. Having said that, I still think that some form of copyright law is necessary without a complete shift of the way creators are compensated for their intellectual output. That requires changes to a number of systems and the way we as a culture think about the way we pay for things. As I don’t think this will take place anytime soon, people still need to be able to protect their creations in some way, and an enforceable copyright law is a pretty good way to do that. The fact that copyright law exists, and will continue to exist, is something I think we have to accept without getting into the political realm.

Digital preservation, in what I would consider the term to mean, really has nothing to do with copyright law or open access. It has to do with the creation of accurate digital surrogates (for analog materials), with as much information captured as is necessary to preserve the data of the original object. A locally archived digital facsimile of, say, a monograph that a library has purchased is completely within the realm of Fair Use as far as I’m concerned, and I think it would be hard to argue against this. The goal of digital preservation is to preserve information through the use of technology, and to ensure that digital bitstreams remain useful, accessible, and unaltered over a long period of time. There are few lawsuits to be faced when digitizing material that one has a license for.

The legal issues pop up when one somehow attempts to allow access to these digital surrogates. Universities get sued by publishers due to having poorly set-up E-Reserve systems, or improperly posting copyrighted information on their local Course Management Systems. Creating a publicly accessible digital collection is an exercise in locating copyright holders, requesting permissions, and crossing of fingers. However, this has nothing to do with digital preservation, and has everything to do with the access model employed by the owners of the digital content.

The most common use of DSpace is to store and provide access to PDFs of articles written by university professors and published in academic journals. Many other uses abound, but I’d wager that nearly 3/4 of the content in all the IRs is journal articles in some form. Here is my usual workflow when working with a faculty member:

  1. Get list of publications they wish to put in the IR
  2. Use an awfully incomplete SHERPA/RoMEO database (no offense meant) to hunt down copyright policies of the various journals on the list
  3. Use Google to hunt down copyright information of the journals not in S/R (usually the majority), and attempt to contact the journals directly
  4. Get data files from faculty member

At this point, I now process the vast majority of the publisher PDFs to strip the formatting out, as the copyright policy for nearly all of them insist that the final published PDF cannot be posted on an IR. As this would be giving away what they charge money for, I understand this. However, it adds HOURS of work for any list of any length, as I have to manually process each PDF through a program and re-save out a new PDF. This would not be an issue if the archive were closed. I could simply store the PDF (and metadata of course) of any journal we had a license for at GMU, and ensure these PDFs remained in a version that a software reader could open. Additionally, these archived PDFs would be more accurate (due to issues in the transformation process), more desirable on an archival level (due to being the version that actually was published to the research world), and more attractive (the transformation process removes certain graphic elements, has limited fonts for the reformatted PDF, and so on).

To summarize: a journal that wishes to have some financial gain frequently chooses to not allow an open access repository to store and provide access to their material. This adds hours of tedious work and generates inferior archival content. This is not archiving, this is not digital preservation. It’s the digital repurposing of intellectual content to make it legally accessible. These are two different goals with two very different solutions. DSpace (or any open access repository software) is not a solution to digital preservation, it’s a solution to digital information storage and delivery. This is due to its open access philosophical model, which is not compatible with current copyright laws. You could argue that you CAN close off DSpace to the outside world… but then I’d certainly argue back that there are a NUMBER of better solutions to plain old digital asset management.

→ 1 CommentTags:

Harvard Law School mandates OA

May 8th, 2008 · 2 Comments · Institutional Politics, Open Access

The people saying that the Harvard Faculty of Arts and Sciences mandating OA wasn’t that important are going to have a hard time saying the same thing about this announcement. Another unanimous faculty vote for OA, and the first for a law school - and a prestigious one at that. An often-heard argument against OA mandates is that faculty are universally opposed to such things, but the Harvard example pretty clearly shows this isn’t always the case. It’s going to depend on the political climate of the institution, of course, along with how daring the administration feels in the first place to even spend the time on investigating mandates at all.

I recently spoke to a dean of research here, as I wanted to pick his brain about what he knew about MARS and the general research climate at the university. It wasn’t a very positive conversation. While I appreciate his candor, it’s fairly demeaning to have someone close to the top of the administrative ladder confirm many of your personal feelings about what you do: that researchers would have a hard time seeing why an IR is valuable, that the administration here would never mandate anything for fear of faculty backlash, and that my job is essentially futile. Whether or not he was correct remains to be seen, but one thing I got out of that conversation is that I face a tremendously uphill battle - which I already knew.

I don’t know how many more Harvard mandates there need to be before other universities take notice. Maybe it’s NOT good that Harvard is the first (or perhaps first big name) American institution to put these in place - I imagine that many administrative boards would find it quite easy to say “Well of course HARVARD can mandate open access, they are HARVARD. We’re just a <adjective describing size> <public /private> university!” Maybe if say, THE Ohio State University or Virginia Tech started mandating OA more boards would take notice…. but I certainly can’t say for sure.

Thanks to Open Access News for the heads up and useful commentary.

→ 2 CommentsTags:

Hiding behind the faults

May 7th, 2008 · 1 Comment · DSpace, Rant

In yet another “hey can you help me with something” AIM conversation with a colleague today, a few of DSpace’s myriad of faults were discussed. Like how one cannot easily return to the page of the Collection of which they just edited the title of, or really hit the back button in the browser EVER unless you are just casually browsing the archive, and how the email address of the administrator of the system is not set a single time in the dspace.cfg, but instead can be modified in a number of places, leading to mis-sent technical issue emails from time to time.

In conversations with a colleague within the library, or with one of the few faculty members who actually have a real interest in using DSpace, I frequently say something along the lines of “that would really be great, but the software makes that really really difficult, or impossible…. DSpace kind of feels like a software that couldn’t have possibly been designed for what it supposedly does.” This makes me feel like a complete idiot. Carpenters don’t blame their tools or something along those lines, right? Unfortunately, I’m not such a great carpenter that I can build my own house. The metaphor here being DSpace as the house… you get it, right? So I more-or-less have to use DSpace and am more-or-less decent at tracking down a hack or a mod to make it a little bit better at what it does, but the foundation remains the same, the walls remain the same, all I’m doing is adding a window sash or a new carpet. I think I’ve exhausted this house thing for now.

But my point stands - I feel like I’m hiding behind the lack of a decent toolset to accomplish my job and to meet the needs of the users of the system. This is frustrating. I don’t necessarily blame any single developer, or even the group of developers, who created DSpace. It’s certainly superior to anything I could code. It’s just nearly unforgivably bad at doing anything one would have said “I certainly will need to do THAT some day…”, and I have no idea how that happened.

A non-library friend of mine asked me why I believe the intermingling of open access and the repository is a questionable one. My next post will likely address this.

→ 1 CommentTags:

Something I’ve been meaning to do

May 5th, 2008 · No Comments · Metablog

I’ve been tossing around the idea of creating a professional blog for a few months now, and here it is. I apologize for the use of the overwhelmingly popular Cutline theme, but it’s the best I know of without digging more than I’d like to at the moment, and it’s clear and easy to read. Who cares, anyway, anyone who is serious about reading blogs just uses an RSS feed reader and wouldn’t see if the page were full of flowers and bunnies.

That being said, my objectives for this site are as follows:

- Fill some sort of niche role in the digital repository/digital preservation community. Dorothea Salo, a widely read and outspoken individual (who perhaps not coincidentally is previously owner of my current position), is frequently lamenting the fact that so few people are speaking about the issues regarding institutional repositories, scholarly communication, and open access publishing. While I don’t claim to have any special insight or to know what people want to know, I do know that 1) I have a lot of opinions regarding institutional repositories and digital preservation and 2) I frequently experience something that makes me want to document it for someone else to read, because I think it might be valuable to someone else in a position similar to mine. So while I can’t state what my aformentioned niche role will be just yet, I imagine one will become defined over time.

- To discuss repositories as they relate to digital preservation, and not necessarily scholarly communication. It’s important to realize that I have next to no philosophical allegiance to open-access publishing or changing the way scholars communicate with each other. My main focus during my graduate education was digital preservation, and one of the largest reasons I took this position was the opportunity to be working in that field in some manner. Little did I know how intimately IRs and open-access were intertwined! As I see this alignment between IRs and open-access as one of the largest threats to the long-term success to their role as a digital preservation platform, I more-or-less simply deal with open-access as part of the system as opposed to promoting or evangelizing it. I’m not saying it’s silly to support, I just don’t have the interest or passion required to make it a soapbox issue for myself.

- To be positive about what I do, how I do it, and what the future will be. I mean absolutely no disrespect to the well-spoken and passionate Ms. Salo, but when I see her use the term “repository rat” it makes me scowl. I’m not saying I don’t understand what her meaning is, but I don’t want to think of myself that way. More importantly, I don’t want others thinking of me that way. Being self-depreciating is a quick way to make others think you aren’t worth being positive about, either. While my work is frequently frustrating in a variety of ways, I don’t think it benefits the larger community to be negative about it. I’d like to strive for a balance between straight-up complaining and putting a happy face on everything. That may mean talking about small advances made while still being realistic about the larger scope of things, which I’m fine with. However, know that I’m not that down on my role, yet.

- To be brief. This may be one of the longest posts you see me make for some time. One thing a lot of bloggers sure seem to enjoy is typing, apparently thinking that everyone has a few hours a day to devote to reading RSS feeds. Well, I don’t. I don’t enjoy reading or writing huge rambling posts, so I won’t subject anyone who reads this to one if I can help it.

So that’s that. I’ve had some things on my mind that I’ll probably write up in the next week or so.

I’m going on record here officially stating that anything I say here is not sanctioned or approved by George Mason University or anyone associated with it, and is my own personal opinion.

→ No CommentsTags: