Digital Repositories and Preservation

Thoughts on digital repositories, digital preservation, and scholarly communication.

Digital Repositories and Preservation header image 4

Farewell to Caveat Lector

June 24th, 2009 by Shane
Respond

I just wanted to make a small post congratulating Dorothea on both her long run as one of the most widely read and influential writers on the subject of repositories and her decision to hang up the keyboard on her blog Caveat Lector. I’m certain she will continue to be just as important to the community (if one can be defined) through future presentations and publications. I know she will continue to be someone I frequently go to for advice and insight (and a sometimes spirited discussion). So farewell, CavLec, and good luck to you, Dorothea.

Tags: 1 Comment

2200 Words on Open Repositories 2009

June 2nd, 2009 by Shane
Respond

I had the pleasure of attending Open Repositories 2009, which took place on May 18-21, 2009, in Atlanta, GA, at Georgia Tech University. The conference web site states “Open Repositories attempts to create an opportunity to explore the challenges faced by user communities and others in today’s world.”

The conference was more or less split into two halves, with the first two days being like your typical conference with two large rooms with presentations taking place simultaneously. The last two days were held in a different location with smaller rooms, where users of different software packages could focus on issues pertaining to their chosen platform – Dspace, Fedora, and EPrints. As this was a four day conference, I’ll hit what I consider the highlights.

The first morning started off interesting, with Cornell’s Simeon Warner discussing the need for Author IDs, which would enable authors to be linked to a unique identifier, instead of relying on their name as is the standard practice. This would allow functionalities such as “allow me to see all papers by THIS Mr. Lee”, and allow for services to interact with only Mr. Lee’s papers. Additionally, this could allow one to be able to see papers by Mr. Lee in all repositories. This allows for increased statistical measurements as well, as you can get accurate measures of usage and co-usage/citations. The creation of a standard Author ID is sticky, however, as issues such as privacy, legality, effort, accuracy, longevity, openness, and control are raised. Additional challenges are issues such as dealing with a paper with 10 or 2500 authors. Warner mentions the always-accurate statement that author adoption of an open AuthorID would be driven by the services it allows.

Folllowing Warner, Matthew Zumwalt of MediaShelf gave a lively presentation where he promoted and championed the Agile development of lightweight applications that live on top of more involved and technically robust systems. He demonstrated a variety of small Ruby on Rails applications that existed on top of a Fedora backend. He made the statement that data needs to be separated from the software so that developers can quickly build “fun” applications that do what you need. I’m of the opinion that these applications were clearly not fully realized, but the concept that the archival software does not necessarily have to be the presentation and interaction software is definitely worthwhile to explore and champion.

The next session featured Pablo Fernicola of Microsoft, Adrian Stevenson from the University of Bath, and Julie Allinson from the University of York giving some updates on SWORD, the Simple Web Service Offering Repository Deposit. Some of the tools mentioned or demonstrated were web-based, desktop-based, MS Office plugins, and facebook (and other) widgets. SWORD currently makes use of the Atom Publishing Protocol, which has pros and cons. It believe it was mentioned that SWORD v2 would likely move to a different protocol, but don’t quote me on that.

My notes from the afternoon sessions are a bit vague, in that I have some interesting points noted but not where they came from – so I’ll just list them here. One point mentioned was that “data curation is hard”, in that there is a tension between solid standards and technical innovation. There are heavy requirements of ensuring provenance and dealing with custom data types or software. Micro-services, such as the ones Zumwalt mentions, were mentioned, to exist as low-barrier, low-commitment, tools – leveraging native OS file management tools to create flexible systems. “We are GOOD at file systems” was a statement, so it makes sense to create an “object system” using these ideas.

The keynote was from John Wilbanks, the Vice President of Science at Creative Commons. He gave the keynote at the SPARC Digital Repositories Meeting 2008, which I attended, and this was a slight variation on that presentation. Wilbanks stresses a new way of thinking about copyright and digital information – digital paper is still the same old analog way of thinking about knowledge existing through pen and paper. Researchers need to be able to see the return on investment – that repositories are worth it.

Tuesday began with a discussion by Wayne Johnston of the University of Guelph Library, where he focused on the branding and promotion of a repository. He talked about how social marketing – using commercial marketing principles to effect behavioral change – could be used to increase the adoption of a repository. The naming of a product or service is often the first impression, and “Institutional Repository” is pretty bad – bureaucratic and passive. The name of an IR should be meaningful if possible, which can be difficult. Promotion is also important – events, presentations, invitations to faculty, the creation of promotional kits, all can help. Graphic design is also an important factor – the site has to look professional and be understandable to users.

Elizabeth Yaken of the University of Michigan then discussed MIRACLE - Making IR’s a Collaborative Learning Environment. UM undertook a census involving interviews with users, user studies, case studies, and such, to take internal measures. They conducted interviews with library and campus leaders, IT staff, other campus IRs, users, and contributers. Their mission was to change the message from the “IR” to one of “authors rights” – the KEY to buy in, according to her. The IR needs to help realize the library’s vision in the 21st century. Content, services, sustainability of the IR may not be sufficient for actual IMPACT, however. The IR needs to be framed in terms of long-term library goals.

The Library of Congress outlined a tool they created to manage the transfer of the incredibly large amount of data they received from various projects, known as BagIt. Transfer is important to manage well because it’s a large part of their daily operations, and digitization can create one or hundreds of files, existing in multiple locations simultaneously – good tools reduce the number of tasks performed and items tracked by people. They also demonstrated an inventory program that records package and file events, which helps with risk assessment and storage audits. The knowledge of file holdings and the file life cycle events reduces risks. They stressed that these modular tools and services can be both extended upon but exist independently, giving them flexibility and nimbleness.

Over lunch, I attended a “Birds of a Feather” session where Tim Donohue and Sarah Shreeves demonstrated their beta of BibApp, a very nice looking program that strives to match the researchers at an institution with their work, and mines the data to determine collaborations and other links. It also allows for depositing the work directly into a repository and for tracking the Open Access allowances the publisher of the work provides. It looks to be an exciting and useful piece of software and I am looking forward to the 1.0 release.

John Kunze, Stephen Abrams, and Patricia Cruse discussed the California Digital Library’s project to re-invision its curation infrastructure as a set of micro-services rather than a single monolithic one. They believe it is safer and more cost-effective to plan to deal with the transient nature of systems and plan on managing instead of resisting change. Each function of the repository is small and self-contained, which allows them to be more easily developed and maintained. This lower level of initial investment allows them to be easily replaced as needed. They stressed clearly articulating needs and outcomes before implementing software – create STRATEGIES, then software.

Steve DiDomenico and Claire Stewart from Northwestern University discussed their Mounting Books Project, which is a large book scanning project taking place there. They outlined the issues faced with hardware, student workers, brittle pages, large fold outs, and so on, and the errors caused by massive volumes of data.

Wednesday was the beginning of the breakout sessions, so the presentations became more DSpace-focused. The morning began with an overview of DuraSpace, the fairly new combination of DSpace and Fedora. Sandy Payette of Fedora Commons and Michele Kimpton of DSpace talked about the future directions for the two software packages and of the organization. DuraSpace is the umbrella organization for DSpace and Fedora, with no initial change of governance of the ways communities update the software. Over time there is the hope that synergies will be found and the communities will move towards a single entity.

Dspace 2 is planned for 2010, which is a major restructuring. In the mean time, DSpace 1.6 will contain bug fixes and add in feature requests, and be a stepping stone to 2.0. Dspace 2.0 will have no more strict communities, collections, and items, but instead be built around entities, relationships, and properties with the old community model built on top for the user interface. A detailed model was shown about how entities related to each other and have properties applied to both. Additionally, a superior metadata schema is employed that allows for any entity to have metadata, which can allow for increased support for things such as journals with volumes, issues, and individual articles.

I attended the DSpace “manager track” in the afternoon, which moved away from the technical side of things and focused on use cases. Sue Kunda of Oregon State University demonstrated their ETD workflow, which wasn’t drastically different from our own here at Mason. Sean Thomas of the Massachusetts Institute of Technology demonstrated CiteLine, a web site that looks to provide a similar functionality to BibApp. It uses open source software to create a centralized service for citation data, and making that data more interactive.

William E. Moen from the University of North Texas demonstrated their learning object repository, and discussed the problems and thought processes involved. They had to wrap their heads around how to break apart a course into granular elements and how to store these in a way that made sense to users. They utilized a Manakin-based interface to highly customize the presentation of the learning objects to people interacting with the system.

I didn’t note which individual made the presentation, but four people from both Texas A&M and the University of Texas worked on a project called Vireo, which was a very robust overhaul to DSpace for managing ETD submissions and the documents themselves. It incorporated many features that would be useful for DSpace proper, such as faceted browsing and a straightforward depositing mechanism, but unfortunately this was tailored specifically for ETD documents and the specific needs of those data types. It was a very nice solution to the problem and hopefully will be made available to the public soon.

The day ended with a poster session minute madness, where all the poster presenters had one minute to describe what their project was. Then we had the opportunity to interact with the presenters and learn about their projects.

Thursday was mostly a wrap-up day. I attended a presentation by Tim Donohue where he outlined various methods of customizing the DSpace 1.5 interface, and two members of the DSpace Foundation discussed updates from the Outreach Committee and their next steps. After that were workshops. I chose to attend one concerning SWORD, where I saw a demonstration of a nice SWORD depositing application that used the Adobe Air framework, which is kind of a java-esque multi-platform runtime environment. The application allowed for applying metadata to documents and then uploading them to the repository. Much of the discussion pertained to developers, however.

To summarize: it was pretty clear that much of the discussion focused on individuals developing tools locally that would be more nimble and replaceable than the software that stored the data for the long term. The separation of the data management and the data interaction/display makes sense, as it is likely easier to build a tool that can handle the needs of interacting with dynamic information types on TOP of the software that manages the complexities of digital preservation, instead of relying on one software to do both. Additionally, this shows again how infrequently the out-of-the-box software truly meets user needs – the institutions realized they had to take control of the display of their information. Some of the more advanced institutions were building entire systems for the management of and interaction with their data, but looking at it in a strategies first, software second way.

It’s pretty clear to me that institutional repositories are still seeking a true foundation to build from, and there still is a lack of understanding of just what the role of the repository is within the institution. Software foundations are joining together, new versions are rolling out, people are creating highly customized interfaces to meet very specific internal needs – it’s very dynamic. I’m excited to see some projects that attempt to get some of the biggest problems I face as a repo manager - dealing with authors and their work, the copyright issues with the work, and getting that work into the repository.

The future of repositories is uncertain, but certainly going SOMEWHERE. Where that is remains to be seen. I’m concerned that smaller institutions, or those that do not place a high priority on digital preservation/data curation, will be heavily relying on either open source or commercial software to meet their initial needs, but not placing any dollars into hiring developers or enough staff to really do something. As Elizabeth Yaken said, the repository has to be a part of the library’s total vision, and I’m fearful that this isn’t the case in many places.

Tags: No Comments.

A field guide to misunderstandings about open access

April 7th, 2009 by Shane
Respond

Peter Suber has written an exceptionally thorough list of misunderstandings about open access. It’s far too long to even pull highlights from, but I just wanted to say it’s excellent and you should read it.

However I doubt anyone reads this blog that doesn’t read his already.

Tags: No Comments.

The non-kittens-and-bunnies presentation

April 3rd, 2009 by Shane
Respond

I was fortunate enough to basically be handed the opportunity to give a conference presentation at Compuers in Libraries from a too-kind colleague who I imagine lacked the time or interest in giving it. The topic I was asked to speak on was Green Open Access, and that’s basically it. Kind of a big topic to cover in any great detail, and more difficultly a big topic for me to cover from my point of view and experience without coming off as a downer. I didn’t want to go in as a mostly frustrated repoman, but I absolutely did not want to present Green OA as the magical wonderland of kittens and bunnies and smiling flowers. I seethe when technologies and trends are presented to librarians as OMG ISN’T THIS GREAT, because let me tell 95% of the time THEY ARE DEDICEDLY NOT.

At this point you should know that I am heavily debating just how professional I should be on this vaguley professional blog, and whether or not I should make those all-caps bits into hyperlinks pointing you to just exactly what I’m getting at.

So my goals were to be realistic and speak from my experiences while still providing broad coverage of the topic and contemporary status of things, and to do this in a creative, concise, and aesthetically pleasing way that didn’t consist of a default PP template with a bunch of bullet points.

By the way, anyone who presents with any frequency whatsoever owes themselves at least a Macbook, because Keynote absolutely blows away Powerpoint. And I’m no Mac zealot.

I’d like to think I pulled this off, the whole “broad overview peppered with harsh reality” thing. I’ll be recording a voiceover for my Keynote and putting it up on Google video soon, so I’ll let readers be the judges.

Tags: 1 Comment

Mandates don’t solve one of the biggest problems…

February 23rd, 2009 by Shane
Respond

While it’s an excellent example for institutions to follow, Boston University’s (and others) mandate plans still aren’t going to solve one of the largest fundamental issues I see as a repoman - the fact that publishers still own the final, authoritative document, and the vast majority of them aren’t going to allow these documents to be freely shared in an open access repository. So the documents going into the repositories are, by and large, preprints or (if one uses my quasi-legal method) documents created by repurposing the content in the publisher PDF. This creates a repository that is full of material that might be excellent, valuable, and worthy of reading, but still requires digging up the authoritative version in order to cite the darn thing. Sure, one could figure out a way to cite the document you pulled from the IR, but it’s certainly in one’s best interest to use the final version, if possible. Additionally, most authors you speak to will certainly agree that they would prefer individuals cite the published version, as well.

The argument can be made that the information being freely available might increase it’s scholarly impact and let people find it more easily, but I’m going to make the bold claim that most people doing research in a scholarly institution will likely have access to the authoritative version for the vast majority of things. Why would a researcher want to waste time with the IR middleman when they can go straight to the store and get the best version for “free” already? Why would an author be excited about making a cruddier version of their article available to the public when they are likely of the opinion that their research is just fine being published in a respected journal and likely available to nearly every other researcher they know through their own institution? (And if it isn’t they’ll just send over the published PDF anyway - it’s their article, right?)

The longer I do this thing the less it makes sense to me, and that’s just frustrating.

Tags: 1 Comment

Response to “A Comparison of Subject and Institutional Repositories in Self-archiving Practices” by J. Xia

January 6th, 2009 by Shane
Respond

Jingfeng Xia, A Comparison of Subject and Institutional Repositories in Self-archiving Practices
The Journal of Academic Librarianship
Volume 34, Issue 6, November 2008, Pages 489-495

In this article, Xia reports the results of a small investigation into the self-archiving practices of physicists at the University of Southampton. He specifically investigates contributions made by researchers to the arXiv subject repository vs. contributions made to Soton, the e-Prints-based institutional repository at Southampton, with the hypothesis that active contribution to an SR will be mirrored with active contribution to a local IR. While the results are fairly interesting, I specifically found some of the observations made in the article worth mentioning and discussing.

In the introduction, Xia mentions the few mandates that have been created, specifically NIH and Harvard University, but states “it still remains questionable about how such mandate policies can be implemented in practice,” which I feel is a thoughtful statement. Much like IRs have not been successful merely by their creation, a mandate policy is not a magic bullet without successful implementation - likely requiring quite a bit of effort by a number of individuals within an institution. If a mandate was somehow created and implemented here at Mason, I can’t imagine how I would even begin to manage the amount of data that would be deposited and ensure it’s legality, especially with the nearly complete lack of support I currently have (and I’m hardly the only IR manager who would say this).

Xia makes some of the points that bear repeating: self-archiving hasn’t brought about a satisfactory number of content materials to the vast majority of IRs (although the definition of “satisfactory” varies between institutions), and faculty do not show interest in self-archiving, even with the increased awareness of IRs. Additionally, Xia mentions something that I think most upper administration don’t think about: “Among the operational tactics of the IRs that have played a crucial role in the making of the content are a liaison system and a mandate policy” [bolding mine]. This is becoming so blatantly obvious by the research and reports by managers, that I can really no longer pretend that my solo operation will keep the IR here limping along. I need the assistance of my fellow librarians, but that requires a fairly major shift in the way the IR is viewed by the University Librarian on down. Liaisons aren’t necessarily going to place a high value on scholarly communication, and may feel too busy to deal with gathering content from researchers or managing deposits. This is compounded with the frustrating (yet necessary) copyright issues that frequently require a substantial time spent on Google or waiting on a return phone call or email from a little-known publisher. I don’t fault liaisons for not jumping in to help with the IR, but it’s becoming obvious most IR managers need their help.

Xia’s method of comparing deposits to the SD vs. the local IR were quite thorough, and he ensured to check against name variations and verifying the affiliations of authors and such. His findings, while perhaps unsurprising, bear discussion. As many as 453 articles were deposited to arXiv by 24 Southampton researchers while only 21 scholars have deposited 240 articles into the IR. Xia states that this “may indicate that the faculty authors are not as interested in working with their IR as with a SR.” I’d say that’s pretty obvious. The SR is going to reach their target audience far more directly, why duplicate the effort for what will likely be far less payoff? What interests me more is what Xia discusses next - the lack of quality of both the content and metadata in the IR itself. He mentions that a large number of the materials in the IR have likely been deposited by a third party via mediated archiving. He is forced to reject his hypothesis by stating that the most enthusiastic SR depositors tend to generate much less, if any, of their own IR content. Additionally, nearly half of the IR contributors have never worked with arXiv. Thus, no correlation is shown between SR and IR use.

However, what I find to be the most interesting finding in the study is the hilariously awful functionality of Southampton’s IR. Nearly 62% of the articles in Soton are abstract-only, and “some” have links to a full text on an external source. Amazingly, “of the 468 items with an external link to the full text, more than half (253 items) have a dead link.” How astoundingly useless. Xia mentions that this may be caused by indifferent authors who feel forced to self-archive under the pressure of institutional requirements, and who do not follow exact archiving procedure and do not load a full text. Additionally, mediated deposit does not appear to be the final solution, as Xia points out that “it is unrealistic to expect a third-party depositor to read all articles before uploading them and filling out metadata information,” which leads to incorrect, missing, or vague subject terms. He correctly states that mediated depositing can hardly be called “self-archiving.”

He concludes by stating the obvious: when an article has been deposited in one repository, the author(s) will be hesitant to make it available in a second. Also, energetic participation in an SR does not necessarily mean the same in an IR, and vice-versa. He thoughtfully says “[p]ersonal interest in an IR means much more for the development of institutional repositories than being obligated by mandate requirement.” Deposits of abstract-only items with missing metadata and broken links will hardly prove useful in either the short or long term.

This article, while seemingly a simple investigation into the number of deposits of a single group of researchers, really gets to the heart of many of the problems IRs face. Unfortunately, it really makes me more concerned for the future - a mandate alone isn’t the answer, nor is mediated deposit. It’s a larger solution that involves manpower, dedication and personal involvement, something I fear is largely lacking with regard to IRs. Especially with the huge budget issues my state is facing, I can hardly expect any new hiring to take place to increase the number of people involved in scholarly communication. It’s a daunting challenge.

Tags: 1 Comment

Thoughts on “learning spaces” presentation

October 22nd, 2008 by Shane
Respond

Joan Lippincott of CNI was on Mason campus today and gave a talk concerning “Net Gen” students (aka Millenials, Gen Y, Digital Natives). I suppose the most concise way to put this is that I disagree with the fundamental premise of an argument that relies around the following points:

  • The scholarly output of K-12 and undergraduate students is making a shift to a multimedia “remix” culture and this trend will continue on as these students progress into graduate/PhD education and into faculty positions.
  • Gen Y students learn far more from audiovisual information sources and collaborative environments than older generations do.
  • The library and university need to adapt to these new learning models in order to best serve their users.

I respect Joan Lippincott. She is obviously well-versed in the subject area and well-regarded in the field. However, I have to respectfully disagree with this type of thinking. I don’t believe that Gen Y students have fundamentally different educational needs from students like myself. I’m 28 years old, barely disconnected from Gen Y in age and not at all disconnected in their “connectedness”. I use online social networks, I do the vast majority of my research online, I watch (and even post) videos on YouTube, I read RSS feeds of blogs and share posts with friends, I post on a few forums, I instant message my friends and family as my main form of communication. I’d wager I’m more active online than a substantial portion of Gen Y, and a number of my friends (both younger and older) could make the same claim.

I feel that these types of discussions focus on the social and entertainment desires of students and masking them as the educational needs of this “new generation” of learners. Learning Commons-type discussions always mention the requirement for group work spaces and collaborative environments, to support the way that Gen Y’ers actually work. You know why young people (including myself) work this way? They don’t suddenly become MORE productive in a group, they simply want to hang out with their friends to distract them from the drudgery that is the reality of academia. You know why students are using social networks or playing MMORPGs? Well, besides to communicate with their peers and have fun, to NOT do anything academic. You know why a student would prefer to look at a picture or watch a video? Because it’s way easier than reading something that would nearly always be more informative about the subject at hand. You know why a student would be more interested in producing, say, a video than writing a paper? Because writing well is DIFFICULT and it’s far easier to gloss up poor research by packaging it in a video format that appears to involve a lot of work.

Yes, older people who think that games, social networks, collaborative learning environments, and the creation audiovisual mashups are the future of education, the basic message I’m sending here is that young students don’t want to learn, they want to play, and presentations like the one I saw today essentially seem to be saying that we need to support this play (masked as educational needs) as much as possible in order to try to get some learning in there.

Why should the university and library spend so much time (not to mention MONEY) to support the production of multimedia products that will in almost near certainty be less informative than a well-crafted research paper on the same subject? Especially as students potentially enter a graduate and post-graduate environment, they will be reading and expected to produce quality research. Why should the library be expected to (potentially) slack on something like high-level research support, as they spend time and staff resources on audiovisual support? One thing I imagine nearly all reference librarians would agree with is that most students DO NOT KNOW how to do research, not even high-level undergraduates or graduate students, let alone analyze the information they acquire and then write a cogent argument or research paper. Joan responded to a statement of mine about traditional scholarly output being textual by saying that a very small population of students will progress into graduate education, and instead enter the work environment, where things happen differently. While I doubt they will be writing 25-page research papers in their job, the fundamentals of analyzing information, processing it, and creating meaningful output are still the problems at hand. I fail to see how this is solved by creating a suite of media creation stations - it requires instruction and support, not technology.

I asked Joan why things like games and online environments should be explored when the traditional educational model has functioned so well for so long. Her response centered on an example of beginner Electrical Engineering students (I believe) who were unable to pass the required math courses of the program. She explained how in a lecture-type class they employed a clicker system, where a professor asked the students to choose from 3 answers to a question. Students were getting the answer right about 1/3 of the time. The professor then told them to talk to someone near them and convince them their answer to another question was right, and then they answered again. This time 2/3 were correct. Therefore, the traditional system must be flawed in some way and new ways of instructing young students are required (in this example, I’m guessing the new way is collaborative learning).

This doesn’t say anything about the educational models that have been employed for QUITE a long time now, it says something about the state of education in the United States. Our schools are producing students that don’t have a high level of knowledge, reading comprehension, and analytical skills. Why must these weaknesses be supported in higher academics?

A colleague of mine mentioned that simply because Gen Y students are supposedly “connected” at all times and may expect information instantaneously doesn’t mean that the long-standing processes of research suddenly become quick and easy. Research is NOT instantaneous, it’s an interative process that takes a lot of WORK. I agree with this.

If you talk to most students, they will say that their use of things like FaceBook, hanging out with friends at the library, and game playing actively interfere with their educational output a large percentage of the time(a recent study at GWU that involved speaking with students about a library presence on FaceBook backs this up). I don’t think it’s the library’s role to somehow teach students that school is HARD, but it certainly shouldn’t be to support their expectation that it’s EASY.

I think it’s interesting that most of the people I know who are vocally opposed to things like, say, the academic use of Second Life, are primarily slightly older than Gen Y’ers. We are almost universally heavy net users and recent graduates. We understand that school is frequently difficult and requires effort. I am of the opinion that older individuals frequently think of younger people as dramatically different from them, and I’d wager the primary difference is that the older people know how to put in a good day of work and younger people don’t. Please realize that I consider myself a slacker and a procrastinator, and I certainly don’t have callouses on my hands from years of tending fields. However, I did manage to get through two somewhat difficult educational programs while spending 80% of my waking hours on the ‘net without a learning commons or a Wii in the study room.

Tags: 6 Comments

Some news on the Anti-NIH bill

September 22nd, 2008 by Shane
Respond

I imagine most people with more than a passing interest in open access/copyright law-type news have heard about H.R. 6845, the Fair Copyright in Research Works Act. Essentially what is taking place is that various publishers are severely concerned that the NIH policy will impact their bottom line, and is taking advantage of their value-added services like peer-review and editing. They are now pressuring the government to end the policy that requires all NIH-funded research to be made publicly available.

The backlash against this has been more-or-less expected, but is still good to see. The Library Journal posted an article discussing the 47 copyright experts (what makes one an expert is not explained) and professors of law who have written a letter to the House Judiciary Chairman, along with 33 Nobel Prize-winning scientists who have written to Congress with their support of the NIH policy.

As you may know, I don’t have a huge philosophical attachment to Open Access. However, the actions and lies from these publishers make it far easier to want to fight the power and encourage authors to avoid the traditional publishing entitites. If the publishers win this fight, and the NIH policy is struck down, it will be yet another clear example showing who controls Washington - big business and lobbyists.

Tags: No Comments.

Fantastic solution to a difficult problem

August 15th, 2008 by Shane
Respond

While this may be old news to some, it was new to me, and thus I think good to share. Yesterday on All Things Considered there was a story about a Carnegie Mellon project called reCAPTCHA, a security program designed to assist with the OCR’ing of words that computer programs are unable to recognize. In use by 40,000 sites including Ticketmaster, Facebook and Craigslist, reCAPTCHA shows a real security word along with a word from a scanned document that fooled the OCR software. The difficult word is entered by a number of different users, and when they agree, the word is incorporated into the scanned document. In this way, over 1.3 BILLION words have been entered, which are being used to assist with the digitization and OCR’ing of The New York Times, back to 1851.

THIS IS FREAKING BRILLIANT. Take all the wasted effort people spend entering in security words and convert it into something useful. If only this could be applied to any number of things… time spent reading your feeds on Google Reader or Bloglines (why?!) could run a protein folding Java applet for cancer research; every 20 minutes spent on Flickr triggered a mandatory photography exercise to assist with computer image recognition; listening to one of the free music stations on Last.fm required a user to first listen to a music sample and describe it’s characteristics… I can see the issue with these being that it’s nowhere near as fast as typing a word - humans have been recognizing strings of characters for a loooong time now - we see letters and words where they don’t even exist.

I was simply blown away by the ingenuity of the idea and the successful execution. Three cheers.

Tags: 1 Comment

A glimmer of hope?

July 8th, 2008 by Shane
Respond

Maybe someone here reads this blog, and decided to help me out. (This is unlikely.) But I got a call yesterday that gave me some hope that the University is actually moderately interested in what we’re doing here.

I spoke earlier of my meeting with the Dean of Research here, and how brutally honest he was about his feelings about MARS. Perhaps not surprisingly, he is no longer here. I highly doubt it had to do with MARS in the least, but perhaps his old-school thinking was holding him back in some larger way that was apparent to the administration… or maybe he just went somewhere else. Anyway, the new Dean of Research was previously an Assistant Dean of Research for a School here, and someone I have spoken to before. He was receptive of MARS and the potential for a higher adoption by the faculty, so having him in the position is already a Good Thing, in my opinion.

So about the call: The Director of Research Development, who is in the same office, called me yesterday, and we had a 50 (fifty!) minute conversation about what the IR did and how it could fit into their goals. This is a fairly dramatic difference than anything I’ve experienced before. Someone calling ME? Thinking that I can potentially help on a larger University need? Woah, get me a fainting couch, because I do believe I have the vapors.

The conversation was largely me explaining exactly what the IR is, and does, and so on, and a lot of thinking aloud by the both of us. She expressed a concern that “Mason Archival Repository Service” was too….. library-y, which I thought was kind of amusing. I don’t really disagree, and maybe a rebranding IS in order (can’t wait to bring that up…). That reminds me, I need to email that marketing professor back about a class project involving the IR.

So things aren’t exactly looking up but it’s the first time I’ve felt like there are some opportunities for collaboration out there.

Tags: 1 Comment