Issue 7, 2009-06-26

Editorial Introduction – Code4Lib: Long May You Run

Tom Keays

The Code4Lib Journal mirrors the diversity and depth of interests and expertise of its readership. Our successes, indeed, are yours.

How Hard Can It Be? : Developing in Open Source

Joann Ransom with Chris Cormack and Rosalie Blake

In 2000 a small public library system in New Zealand developed and released Koha, the world’s first open source library management system. This is the story of how that came to pass and why, and of the lessons learnt in their first foray into developing in open source.

Extracting User Interaction Information from the Transaction Logs of a Faceted Navigation OPAC

Cory Lown and Brad Hemminger

This paper discusses the analysis of Apache web server logs from a faceted catalog interface (OPAC) at North Carolina State University. By grouping individual HTTP requests into user sessions and analyzing in that context, requests can be understood as particular user actions, with more specificity as to purpose and effect of an action. Client IP address and time are used as a sufficient proxy for determining user sessions from logs. Some initial exploratory findings of user behavior in the NCSU OPAC are provided, including that users make use of facets less than of text searching, and that some facet groups are used significantly more than others. Links are provided to the scripts used to make this session-based analysis, which could be modified for use with other facetted OPACs which use an Apache front-end.

Using a Web Services Architecture with Me, Myself and I

Stephen Meyer

The UW-Madison Libraries Library Course Page system is used to deliver electronic reserves materials and course-focused library instruction webpages to students. As part of a rewrite of our system we broke the application into three component pieces: a file repository, a course timetable data service, and an interface application for building and viewing individual course pages. The new three-piece system was written with an inward facing service-oriented architecture that allowed us to choose the best technologies to solve each of the tasks the entire system needs to accomplish.

Deciphering Journal Abbreviations with JAbbr

Keith Jenkins

JAbbr is an online tool developed at Cornell University to help users decipher journal title abbreviations. This article discusses why these abbreviations are so problematic, and how traditional tools are often insufficient, and then describes the novel approach used by JAbbr. Given an abbreviation, JAbbr creates a regular expression for fuzzy matching, tests it against a list of serial titles extracted from the library catalog, and returns a list of possible matches to the user. JAbbr is available as a web site and as a web service.

Repurposing ProQuest Metadata for Batch Ingesting ETDs into an Institutional Repository

Shawn Averkamp and Joanna Lee

This article describes the workflow used by the University of Iowa Libraries to populate their institutional repository and their catalog with the data collected by ProQuest UMI Dissertation Publishing during the submission of students’ theses and dissertations. Re-purposing the metadata from ProQuest allowed the University of Iowa Libraries to streamline the process for ingesting theses and dissertations into their institutional repository The article includes a discussion of the benefits and limitations of the workflow described.

Bibliographic Metadata Extraction from Theses

Götz Hatop

This article presents the application of part-of-speech (POS) based statistical text analysis to the task of bibliographic metadata extraction from electronic dissertations. By using the approach described here it is possible to detect the title of a Ph.D. paper with an accuracy of about 80%. The accuracy measurements are done using a conceptually simple approach and implementation.