By John Durno
Introduction
In the fall of 2011, the University of Victoria (UVic) Libraries opted to transition from Ex Libris’ SFX link resolver to Serials Solutions’ 360 Link. This did not reflect dissatisfaction with Ex Libris’ technology, but rather the need to consolidate knowledgebase maintenance in our technical services unit following the implementation of the Summon discovery service in 2010. Because SFX was locally hosted and built with a modular architecture, it had been relatively easy to implement a number of customizations to better integrate it with our Voyager ILS (Integrated Library System), Relais ILL (Inter-Library Loan) Management System, and various other service and support workflows. It was deemed desirable to replicate many of these customizations in 360 Link.
360 Link provides more limited support for functional enhancements than does SFX, at least in its native interface. While arguably it can be made to do most of what we wanted out of the box, in fact its built-in abilities to integrate with external systems are more limited than what we had been used to with a locally hosted system. We knew from experience that optimizing the linking behaviour of the system would necessitate having the ability to iteratively tweak its behaviours in ways that were simply not possible if we did not have direct access to the code.
360 Link does provide a fairly rich API (Talsky, 2008), which would make it possible to considerably extend its feature set, assuming a willingness to build and/or support a locally hosted interface [1]. That approach was considered but eventually rejected, primarily because of the high degree of interdependence between Summon and our link resolver. At the time, Summon was generating more traffic for our link resolver than all our other research databases combined. It therefore made sense to host them on the same infrastructure. Hosting the interface locally would have meant living with an additional and arguably unnecessary point of failure in the technical architecture of one of our core services.
Furthermore, we realized that because of the increased traffic generated by Summon, the relative importance of our link resolver as a service delivery point had escalated. In our previous environment, a link resolver outage would have inconvenienced our users, but because they would have been using native database interfaces they would still have been able to access most online content directly. If, as seemed likely, use of native search interfaces were to decline following our Summon deployment, a link resolver outage would in fact be much more problematic, because it would render our major discovery service all but useless. Consequently, our existing single-server link resolver implementation was not adequate to ensure continuity of service now that link resolving had been promoted to ‘essential’ status. This constituted a supporting argument for not hosting our own interface. While building and maintaining a robust, redundant infrastructure was not beyond our capability, it would involve more effort and expense than rolling out a single server.
And finally, there was the argument that application hosting was one of the services we were paying for. Simply put, it seemed reasonable to expect that moving to a vendor-hosted service would reduce our local hosting requirements.
Having rejected a locally hosted interface as a solution, we then investigated a hybrid approach whereby the core functionality would be delivered from the hosted interface, but the supporting functionality would be delivered by a set of modules developed and hosted by the Library. “Core functionality” was defined as linking out to full text online content; “supporting functionality” was defined as linking out to other Library resources and services such as the catalogue, ILL management system, and so forth, which would typically only be of interest to users when no online content was found. Because the availability of these supporting services would not affect the operation of the discovery service, it was deemed acceptable to host them on local infrastructure. And because traffic for these services was expected to be relatively light, we could host them on an existing server.
A Quick Demo
Before launching into a detailed description of our implementation, it may be useful to view the following screencast demo of the user-facing aspect of our link resolver enhancements. The user interface of our production system is also publicly viewable, with the caveat that over time it will likely evolve away from the system described in this article as features are added and dropped in response to changes in our technical environment.
Passing metadata
The key to the hybrid approach lay in the ability to pass metadata via OpenURL from 360 Link to several locally hosted modules. Fortunately, link resolvers are by definition highly interoperable with external systems, and they are that way by means of an NISO standard (OpenURL, Z39.88-2004) for transmitting metadata over http. A hybrid, loosely-coupled approach appeared feasible largely thanks to the existence of OpenURL, which defines both a finite set of possible metadata elements and the means of their transmission (NISO, 2004).
In principle, passing metadata to our local modules should have been straightforward. 360 Link receives and parses OpenURL metadata as part of its normal operation, and it has the ability to link out to external targets.
However, it turned out that 360 Link does not have the innate ability to pass the OpenURL metadata it receives directly to external systems. Its outbound links are based on templates which, although customizable, could only be counted on to pass a subset of the information contained in the original OpenURL to the external target, since it is not realistically possible to define templates sophisticated enough to handle all of the permutations of metadata that could theoretically be found in an OpenURL. Consequently it was necessary to develop a workaround, since we wanted to ensure all the available metadata was available to the Library-hosted modules.
Javascript is of course the tool of choice when it comes to making web interfaces do things their creators never intended. Like many other hosted applications, 360 Link can accommodate custom HTML in its header and footer blocks, which can in turn include externally hosted javascript code. This enabled us to create and embed a simple function that extracts the OpenURL query string from within the browser itself, appends it to the URL pointing to the service requested by the user, and opens the resulting output in a popup window when the user clicks the appropriate link.
For example, if the user wanted to check the library catalogue, they would click on the following link:
1 | <a href= "javascript:popitup('catalogue')" >Check the UVic Libraries Catalogue</a> |
This specifies the service they wish to access by sending the ‘catalogue’ parameter to the javascript function below. The function then uses the built-in javascript location.search.substring function to obtain the OpenURL string from the users’ browser, constructs the appropriate link, and forwards them to their destination.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | function popitup(service) { /* location of our locally hosted modules */ /* get the openURL string from the user's browser */ var qstring = location.search.substring(1); /* default dimensions for the popup window */ var height = 800; var width = 800; /* determine where the service specified by the user lives and append it to the url*/ switch (service) { case 'catalogue ': url = url + ' /voyager/index.php? '; width=900; break; case ' illo ': url = url + ' /relais/index.php? '; break; case ' problem ': url = url + ' /problem/index.php? '; height = 600; break; /* other services go here */ default: return false; } /* open a new window and forward the user to the service they requested appending the OpenURL string obtained earlier */ newwindow=window.open(url+qstring,' name ',' height= '+height+' ,width= '+width+' ,resizable=1,scrollbars=1,menubar=1,location=1,toolbar=1'); if (window.focus) { newwindow.focus(); } } |
On receiving the ‘catalogue’ parameter, this function would open a popup window containing the output of a PHP script located at: http://library.uvic.ca/extfiles/360Link/voyager/index.php?$OPENURL
. (The token $OPENURL
is a placeholder for the contents of location.search.substring(1)
, which in this context is expected to be a valid KEV OpenURL.)
Parsing metadata
Once the user has been forwarded to the correct service, the OpenURL metadata must be parsed before anything useful can be done with it. As nearly as we could determine, there appeared to be no pre-existing OpenURL parser class available for us to use in the scripting language of our choice (PHP), so we set out to create one.
The code for the parser class is freely available under the GNU GPL [2], so a detailed description of how it works is unnecessary; those interested should just refer to the code. A few high-level considerations are probably worth noting, however.
We determined early on that the class would only represent a partial implementation of the NISO OpenURL standard. Twenty-seven metadata formats have been defined for OpenURL, but in our experience only a small number are used. We immediately eliminated the XML formats, as we had never (with one exception) encountered an XML OpenURL in production; all known sources appear to use the key/encoded-value (KEV) formats. We also focussed on the most commonly-encountered content/metadata types, those being journal, book, dublin core, dissertation, and patent. It was necessary to support version 0.1 of the standard as well as version 1.0, because a number of important sources (Google Scholar, for example) still use it. Version 0.1 was the development implementation, superceded by version 1.0 in 2004 when OpenURL became a formalized standard.
The class builds what is known in OpenURL parlance as a “context object”, which is just a fancy way of referring to OpenURL metadata when it has been bundled into a convenient package. The class is currently able to package 72 different metadata elements defined in the standard. It also supplies a number of utility methods to, for example, provide human-readable labels for the metadata elements, and determine which of the 72 possible elements are actually present in the context object.
Finally, it attempts to compensate for bad data in a variety of ways. Perhaps not surprisingly, there are a number of what one might charitably refer to as imperfect implementations of OpenURL out in the wild. Missing format identifiers, confusion of genres and formats, the use of incompatible metadata elements, and failure to include key metadata elements are just some of the evils visited upon those who would attempt to parse OpenURLs.
In practice it’s not that bad, particularly if one does not insist on enforcing a rigorous application of the standard. Most problematic are missing format identifiers, given that different formats often need to be processed in different ways. In such cases it becomes necessary to guess the format based upon other clues within the OpenURL string, such as the genre or the presence of an rft.jtitle or rft.btitle element (journal and book title, respectively). There is a limit to how much of this is possible, however.
Since rolling out the parser in production we have made a number of minor tweaks and one major enhancement. The major enhancement involved supplementing the metadata in the OpenURL with data from the Crossref DOI API in cases where little metadata beyond the DOI is supplied by the OpenURL. [3]
As noted above, parsing the OpenURL query string was intended to make as much metadata available to supporting services as possible. While that has generally worked as intended, there are instances in which less metadata is available than would have been the case had we used 360 Link’s native linking capability. Those are instances in which the metadata in the OpenURL is incomplete in some way (eg. missing or truncated book or journal titles). 360 Link is sometimes able to fill in in the missing metadata from its internal knowledgebase when it finds a match. We are therefore considering enhancing the parser class yet further with a callback to the Serials Solutions 360 XML API in cases where key data elements appear to be missing.
Building Services
Once we were able to parse the metadata, there were three kinds of services that we wanted to build:
- Web forms enabling the user to forward the citation information to various Library service points via email, without re-keying it. These service points included our Law Library Interlibrary loan service, our Distance Education service, and our problem reporting service.
- Services to transmit the metadata to other applications. These included our Relais Interlibrary Loan management system, our Voyager Library catalogue, and Google Scholar.
- Services to present additional information about the item. Currently only one service exists in this category; a ‘persistent link’ pointing back to the item in 360 Link.
Building the webforms was a relatively straightforward exercise. In all cases, once the metadata had been packaged as a context object it was trivial to embed it in a web form. Additional fields were added to the form to allow the user to specify their name, email address, and other information relevant to the action being performed. On submission, the form is processed by a script that creates an HTML email message, sends it to the appropriate recipient(s), and provides an acknowledgement to the user. [4]
Forwarding metadata to other applications, particularly the catalogue, was a somewhat more involved process, and we are still fine tuning our algorithms when we come up against interesting edge cases.
The standard algorithm for linking from link resolvers to library catalogues goes something like this: If it’s a journal and it has an ISSN, search on that, otherwise do a title search. If it’s a book, do a title search. While not perfect, this is probably the most effective algorithm possible in the majority of cases, given the known limitations of searching against ISBNs. It is possible to achieve a baseline implementation of this algorithm using 360 Link’s native linking capabilities; however we wanted more control in case there were ways we could improve the algorithm by tweaking it somewhat.
In practice, our tweaks have not been major. We still follow the logic of the standard algorithm; however we have made some modifications to increase the likelihood of a successful match. For example, our Voyager ILS requires the dash to be present in an ISSN or EISSN search; we can ensure that there is one by pre-processing the incoming data. For book searches, we can throw in an author keyword if a title search alone looks like it might retrieve too many hits, but only in cases where we are not attempting to match against a chapter citation. (In the case of books, OpenURL provides no way to determine whether an author element refers to the author of a section or the author of the entire work; we assume the former if the ‘atitle’ element is present in the OpenURL, indicating the citation is for a chapter.) We also break up titles into main and secondary strings based on the presence of a colon to accommodate an idiosyncracy in how Voyager parses title phrase searches, and remove certain punctuation characters that are known to cause problems. Finally, in cases where the OpenURL contains an ISBN but no title element, we run a search against the WorldCat basic search API to see if we can augment the metadata. Once the Voyager query string has been assembled, the user is forwarded to Voyager via a header redirect. [5]
Relais accepts an OpenURL-like syntax which is not entirely standard, so our Relais parser simply translates the incoming OpenURL into Relais-compatible syntax before forwarding the user on via a header redirect. There were no major challenges as the Relais syntax is fairly well documented; the major difficulties were posed by certain idiosyncracies in how some data elements needed to be formatted in order to parse effectively into Relais. A bit of trial and error was needed to get it right, but our Relais parser has been stable for several months now. [6]
For the Google Scholar target parser, our general impression has been the less precision the better. We do specify fields using Google Scholar parameters for elements like journal title, book title, and author, but too much precision can exclude useful matches. Better to send a few keywords and let the ranking algorithm bring the useful results to the top. [7]
Additional enhancements
In addition to the services above, we discovered shortly before our go-live date that 360 Link does not have the ability to prepend our proxy server prefix to the Refworks export function. This was particularly problematic because we use an instance of Refworks hosted by the Ontario Scholars Portal in order to comply with a Canadian hosting requirement mandated by provincial privacy legislation. On-campus users were being redirected appropriately from Refworks’ main site to the Ontario site; but off-campus users would hit a dead end on the main site, as Refworks would have no way of knowing they should be redirected. Once again, we were able to hack the interface using javascript, this time to create a custom submit button that inserts the proxy prefix when the Refworks export was chosen. [8]
Integrating 360 Link with Metalib
We originally acquired SFX as part of a package that also included Metalib, Ex Libris’ federated search tool. Although the two products operate independently from a technical standpoint, Metalib relies on SFX for some of its functionality at the interface level. As we did not wish to decommission Metalib or continue to maintain SFX following our 360 implementation, we were curious to see whether we could use 360 Link to replace SFX in this context.
As it turned out, it was possible. Earlier in this article I mentioned that with one exception we had never encountered an XML OpenURL. Metalib was the exception; it uses XML formatted OpenURLs to communicate with SFX. Unfortunately not only does our homegrown parser not parse XML OpenURLs; 360 Link does not parse them either. [9]
Our original plan was to create middleware to translate the XML OpenURLs into their KEV equivalents, however this proved to be unnecessary when we realized it is possible to tell Metalib to use OpenURL v.01 rather than 1.0. Since the XML formats did not appear until version 1.0, activating the earlier version of the standard caused Metalib to revert to KEV.
Things we couldn’t do
We were unable to replicate two of our SFX enhancements in 360 Link. Because SFX was hosted locally and we had complete access to the application, we were able to embed pretty much anything we wanted to in the interface. This included services which required callbacks to external applications, namely:
- An Ulrich’s lookup to indicate whether the item was contained within a peer-reviewed journal.
- The results of the catalogue lookup, obviating the need for users to click on a catalogue link to determine whether or not we have the item in print.
In both cases, the best way to approximate the service was to link out to the external resource, which regrettably makes the process less efficient for the user.
We were unable to reconstruct these enhancements because Serials Solutions was could not assign a ‘uvic.ca’ hostname to our 360 Link instance. [10] Consequently Ajax callbacks to library-hosted middleware would not work. For security reasons, browsers are designed to prevent javascript calls to hosts on domains other than the one the browser is connecting to (the “same origin policy”), preventing us from calling out to library middleware from the 360 Link results screen. This limitation regarding hostname configuration has other undesirable consequences as well, discussed below.
Transitioning the service
After many hacks and interface tweaks, it was time to move 360 Link into production. Cutting over a link resolver can be a time-consuming business, as links to the old resolver must be updated in every external resource that points to it. In practice, that often means updates to dozens or even hundreds of database interfaces, which is why it is not uncommon for sites transitioning between link resolvers to run the new one and the old one in tandem for a while to give their technical services staff time to make the necessary updates.
We did not want to run SFX and 360 Link in production simultaneously, for several reasons:
- The version of SFX we were running was going end of life at the end of 2011. We did not wish to upgrade a service we were about to phase out.
- Keeping both in production would have meant our technical services staff would have had to continue updating two knowledgebases even after the new service went live.
- Resource contraints in our technical services department meant that it was going to take several months to update all the third-party databases linking to our resolver.
- Linking to similar but different link resolvers might have confused some of our users.
In order to cut over cleanly, we changed the configuration of the webserver on host ‘sfx.uvic.ca’ to forward all OpenURL requests to our 360 Link instance at ‘lg5jh7pa3n.search.serialssolutions.com’ using Apache’s built-in mod-rewrite. The result was seamless to users, and allowed us to immediately decommission SFX (the software, not the server) when 360 Link went into production.
Hostname issues
Of course, it would have been even easier to transition the service had we originally chosen a more generic hostname for our link resolver (for example, ‘resolver.library.uvic.ca’), and been able to transfer that hostname to the new service.
Neither condition applied in this case, however. By calling our SFX server ‘sfx.uvic.ca’ we guaranteed that our technical services staff would have to do more work than would otherwise have been necessary during a linkresolver transition. However as it turned out Serials Solutions was not able to assign an institutional hostname to a hosted instance of its link resolver, so even if we had the foresight to choose a generic hostname we would not have been able to apply it to our new service.
This was somewhat unexpected, as it is not a limitation of hosted services in general. Typically, the customer can set up a CNAME alias in their DNS, while the external host updates their webserver configuration to associate the alias with the customer’s instance on their server. Unfortunately the best we could have done in this case would have been to set up a reverse proxy, a solution that would have had many of the disadvantages of hosting our own interface discussed in the introduction.
Apart from giving us the ability to more readily transition to a new service, a local hostname would have enabled us to avoid the cross-site scripting restrictions mentioned above.
Conclusions
I had three reasons for documenting our link resolver transition: first, on the chance that our experience will be useful to other sites contemplating a similar transition; second, to share the code that we created; and third, to lay out some background that might be useful for sites (including our own) contemplating the transition to vendor-hosted technologies more generally.
While the last is quite a large topic, I believe it is possible to generalize some points from our 360 Link implementation that have relevance to that broader scope:
Standards (even flawed ones) remain critical to achieving interoperability. No matter how feature-rich our hosted systems become, they will never do everything we want them to do. This means they will need to interoperate with a range of external systems, some of which have yet to be envisioned, let alone developed. Standards are critical for this. As we have seen above, systems developed around the OpenURL standard enabled us to develop supporting services, integrate with external systems, and efficiently transition from one service to another.
Where possible, develop to standards rather than proprietary APIs. Because most of our supporting services were built around the OpenURL standard, they should be portable in future. Developing extensively on top of a proprietary API can be considered a form of lock-in, as the decision to transition services means either jettisoning or extensively rewriting locally developed code. (Note that this applies just as much, if not more, to extensive customizations made to locally hosted systems).
Consider your exit strategy. Looking forward, it is fairly obvious that we lost ground in this respect. It will not be possible to replicate our clean cut-over from SFX to 360 Link should we ever need to transition to another link resolver, because we do not ‘own’ the hostname this time around, nor do we have the ability to rewrite 360 Link URLs.
Assign a local hostname, if possible. As a corollary to the preceeding point, there is an obvious administrative advantage to having the ability to assign a local hostname to a remotely-hosted service. There are also advantages from a development perspective, as it is possible to override some of the javascript ‘same origin’ restrictions for different hosts on the same domain. As customers, we need to ensure our vendors provide this capability, particularly as more of our services migrate to a hosted environment.
For libraries with technical staff, the option of working through vendor technical support is not always optimal. Achieving interoperability with local systems is often a matter of making many small tweaks after observing and testing the behaviour of applications in the wild. Even with the most responsive vendor support, this kind of iterative tinkering is always more efficient if it can be done directly.
Libraries typically need to enhance the functionality of vendor-supplied systems to meet local requirements, whether these be the addition of features or integration with the Library’s other systems and workflows. When locally hosted systems migrate offsite, it is important to retain as much of this capability as possible, while still maintaining the advantages of a hosted installation. Delivering the core service through the hosted interface while supporting the bulk of our enhancements locally is a functional split that has been working well, though not perfectly, in this context.
Notes
[1]. For example, Umlaut
https://github.com/team-umlaut/umlaut/wiki
[2]. All of the code referenced in this article is available from the UVicLinkResolver Github Repository:
https://github.com/jdurno/UVICLinkResolver.
The PHP OpenURL Parser class is available within the repository at:
https://github.com/jdurno/UVICLinkResolver/blob/master/modules/includes/contextobject.inc.php
[3]. Crossref/DOI class:
https://github.com/jdurno/UVICLinkResolver/blob/master/modules/includes/crossref.inc.php
[4]. For example, see the code for the problem report form at:
https://github.com/jdurno/UVICLinkResolver/tree/master/modules/problem
[5]. OpenURL -> Voyager module:
https://github.com/jdurno/UVICLinkResolver/blob/master/modules/voyager/index.php
[6]. OpenURL -> Relais module:
https://github.com/jdurno/UVICLinkResolver/blob/master/modules/relais/index.php
[7]. OpenURL -> Google Scholar module:
https://github.com/jdurno/UVICLinkResolver/blob/master/modules/google/index.php
[8]. See the anonymous function at the top of:
https://github.com/jdurno/UVICLinkResolver/blob/master/modules/popitup.js
[9]. Serials Solutions technical support, August 15, 2012
[10]. Serials Solutions technical support, August 8, 2012
References
NISO.(2004) Registry for the OpenURL Framework – ANSI/NISO Z39.88-2004. Retrieved from:
http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=Identify
Talsky, D. (2008) Auto-Populating an ILL form with the Serial Solutions Link Resolver API. code{4}lib journal (4) Retrieved from: http://journal.code4lib.org/articles/108
Acknowledgements
Much of the technical work described above was done by the UVic Libraries Web Developer, Ben Sheaff, and Senior Unix Administrator Sandy Gordon. The project benefitted enormously from their efforts and expertise.
About the Author
John Durno (jdurno <at/> uvic <dot/> ca ) manages systems at the University of Victoria Libraries on Vancouver Island, British Columbia.
Mang Sun, 2013-01-03
Excellent!
As to the Ajax same origin issue, two Apache directives ProxyPass and ProxyPassReverse can be used as a workaround.