by Donald Moses and Kirsta Stapelfeldt
The Evolution of UPEI’s IR
The provision of an Institutional Repository (IR) service has been a priority for the Robertson Library since 2008 with the release of University of Prince Edward Island’s Strategic Research Plan 2008?2018. That document included as a strategic priority the development of, “a publicly accessible institutional repository which will contain citations for all research conducted at UPEI, as well as, in accord with copyright protocols, access to copies of our research documents, publications, and reports.” [1] The first version of IslandScholar.ca was created shortly after and included the seeding of the repository with faculty citations harvested from RefWorks, tying-in authentication with UPEI’s LDAP service, the ability for faculty to add their own citations and attach the related publication, and integration with Sherpa/Romeo [2], OpenURL and COinS [3].
During February 2012 the University released a new policy entitled Open Access & Dissemination of Research Output [4] which encourages:
- scholars to deposit their scholarly research in IslandScholar,
- scholars to publish in journals that allow scholars to retain copyright in their work,
- scholars to consider open access and affordable / sustainable scholarly communication venues,
- scholars to deposit other types of research output in IslandScholar (preprints, research data, etc.),
- students to deposit their honours thesis, Master’s or PhD thesis on acceptance.
The policy specifies additional requirements such as ensuring that the content was harvestable and that deposited research data be linked to the final scholarly output. To fulfill these needs and others, a set of requirements was drafted for an IslandScholar v2 and the new service was launched in October 2012. As with the first IslandScholar, the system was designed and is delivered using Islandora software, which was first developed by the Robertson Library.
Summary of Changes
IslandScholar V2 received a design update with a new, responsive theme that enables the site to be accessed from most mobile devices. The data structure in the repository was modified to include additional objects and relationships, with the aim of improving the granularity and accuracy of the system’s ontology, and facilitating new metrics in the future. Other features include the ability for users to create and format citation lists, and access scholar-specific RSS feeds. The data in the repository is harvestable using OAI’s Protocol for Metadata Harvesting (PMH). An administrative user can now set the embargo date for content, and expect the embargo to be automatically lifted. Scholar deposit has also become simpler, and the project has recently begun writing code to expose alternative metrics, creating additional value for scholars seeking to understand the scope of their scholarly impact, IR administrators seeking to improve impact and guide researchers, and administrative groups.
About the Islandora Architecture
Islandora is an open-source software framework designed to help institutions and organizations and their audiences collaboratively manage, and discover digital assets using a best-practices framework. Islandora was originally developed by the University of Prince Edward Island’s Robertson Library, but is now implemented and contributed to by an ever-growing international community.
Islandora allows for digital assets to be stewarded following best practices for asset modelling by defining packages of data assets, derivatives, and XML-based administrative, technical, and descriptive metadata, with relationships to other data assets described in RDF. Not only is Islandora designed to serve best-practices for digital preservation, it also allows users to manage and discover assets, and for diverse user groups to create digital exhibitions, perform and collaborate on research, and enrich resources.
A number of open source applications form the structure of the Islandora framework which can be extended by integrating additional applications to meet new requirements. The front-end/interface for Islandora is provided by the popular Content Management System, Drupal [5]. Elements of Islandora are templated so that Drupal’s theming layer can be extended to Islandora elements (for example, displays and behaviors, search results displays, etc.). Drupal’s form API is also used to provide form building and editing for any XML-based metadata schema, workflows can also be orchestrated, numerous roles and permissions can be assigned, and Drupal permits integration with external identity stores using LDAP and Shibboleth.
The Islandora code layer provides integration with numerous third-party software as direct plug-ins (eg. ImageMagick [6], FITS [7], LAME [8], Tesseract [9], Internet Archive Bookreader [10], Open Seadragon [11], etc.), or through the Islandora micro services engine [12]. This software can be called upon to create derivatives and perform data transformations or extraction for data that is then stored in the digital object repository layer, built using FedoraCommons [13] software.
Micro services in Islandora leverage the JMS Server that comes packaged with FedoraCommons. FedoraCommons repository models and stores data assets and provides the repository’s RDF triple store (Mulgara) as well as governing security policies (written in XACML [14]) for access and management of objects. In Islandora 7 a new API, Tuque [15], simplifies the process of interacting with and creating FedoraCommons data. FedoraCommons’ Generic Search Service (GSearch) processes the data and passes it to Solr where it is indexed by Lucene. Islandora includes a Solr Client Module [16] that provides an administrative interface for configuring search, results, and facets. For an overview of this framework, see the diagram below.
Islandora releases “solution packs” designed to serve particular populations or types of data. The Robertson Library’s approach to modeling data in Islandora provides essential feedback for the ongoing development of an Institutional Repository solution pack. To learn more about the Islandora project and plans for solution packs, visit the project’s website at http://islandora.ca. The following describes the way that IslandScholar v2 leverages the Islandora architecture to provide a number of IR related services.
Modelling Metadata and Relationships in IslandScholar
IslandScholar contains collections of digital objects that adhere to Fedora content models. Within Fedora, content models define the composition of the digital objects (specifying what they contain — the “datastreams” — and how they behave — the “methods”). Currently the following types of objects are modeled in IslandScholar and the figure below illustrates their relationships:
- citations (both faculty publications and student theses)
- researchers
- departments
- grants
Citations
IslandScholar contains two content models related to citation-type metadata – faculty publications and student theses. Each of these objects, publications and theses are composed of files (or “datastreams” in Fedora terminology). These files are described in Figure Three.
The student thesis content model is similar in structure, but the MODS metadata is extended with the <extension> element so that ETD-MS [19] metadata can be included. ETD-MS metadata is specific to electronic theses and dissertations and meets the metadata requirements for participation in Library and Archives Canada’s e-thesis harvesting program [20].
Researchers and Departments
Recognizing that many citations would have authors entered in a variety of ways, we chose to create authority records for the people and organizations represented in the data. We describe these entities using the Metadata Authority Description Schema (MADS), an XML schema that provides metadata about people and organizations (and other elements) [21]. MADS is a complimentary schema to MODS and shares some of the same elements. Having authority records for researchers allows us to associate citations with related researcher authority records – facilitating the creation of browse displays and search queries that return expected results. The figure below illustrates a researcher profile page for a faculty member and it is composed of data from the MADS datastream, the thumbnail datastream from the researcher record, and data from objects that are related to the researcher.
Researcher records can be created in a number of ways within the IslandScholar system: through an associated import from the University’s LDAP system, adding a user through Drupal’s Users module, or by using Islandora’s XML form/Ingest framework. Basic data about researchers is contained within the University’s LDAP system; with the Drupal LDAP Integration module [23] enabled and configured, we are able to pull data directly from the campus LDAP server.
When logged in with the role of administrator, an Associate button is available for each citation. When selected it searches both the LDAP directory and existing researcher records in the repository, looking for likely name matches between the citation and researchers and displaying those records. This process highlights existing associations within LDAP and the repository and other possible associations. The results are then presented to the user (see Figure Five). Appropriate associations can then be selected and created by the administrator, and are written in RDF to the RELS-EXT datastream of the citation object.
Occasionally the researcher record will not be available via the campus LDAP. In this case, to manually add a Researcher record, an authorized user navigates to the collection containing the researcher records, selects Add, selects the Authority content model and proceeds to the data entry form. Figures Six and Seven show a user’s view of how this process unfolds.
Grants
Grants have recently been added to the IslandScholar framework – part of a desire to flesh out the Researcher’s profile with additional data and a desire to associate citations with the grants that funded them. Currently this is a manual process whereby we map a CSV export from UPEI’s Research Services to MODS. We are actively investigating the CASRAI Research Activity Profile [24] as a better match for modelling this data in the future. Currently grants are associated with researchers and the data is displayed on Researcher Profile pages as a block.
Integration of Citation Style Language
In the first version of IslandScholar, a pre-defined, generic citation display was created. To provide options for citation formats for both display and export, one of the goals of this version of IslandScholar was to integrate the Citation Style Language as a method to reliably generate citations formatted in particular styles. Citation Style Language (CSL) is “an open XML-based language to describe the formatting of citations and bibliographies.” [25] There is a large collection of community contributed styles [26] available for download and these styles can be added to and used by IslandScholar. Figure Eight (below) illustrates the styles already added to IslandScholar and the option for adding additional styles. A list of currently installed styles appears, and a user with the appropriate permissions can upload a new style, which then becomes available in the system.
The IslandScholar administrator can set a default style for the entire system within the Scholar Administration panel (See Figure Nine).
A CSL editor [27] has recently been released that can be used to author your own custom styles if one from the community does not fit your use case. For example, at UPEI some researchers would like to be able to output their citations in the format required by particular funders and a custom style could accommodate this request.
To generate formatted citations from the MODS metadata in IslandScholar, developers on the project first created php code [28] to convert the MODS metadata stored in IslandScholar into the CSL JSON format. To accomplish this task the code parses the MODS and outputs it in a format that conforms to CSL – for example, a “Journal Article” (a local genre term in the MODS metadata) is transformed to “article-journal” in the corresponding JSON. Names and dates are also reformatted so that they match CSL requirements. For many other fields XPATH queries of the MODS are used to populate CSL variables. The resulting CSL JSON is then handed to the citeproc-js [29] javascript module, the style is applied, and a formatted citation is the result.
Importing and Exporting Citation Metadata
Importing/Ingesting citations into the Institutional Repository
IslandScholar leverages a number of different technologies to provide users with a choice of formats to be used when adding citations into the repository. The IslandScholar application currently supports the import of citations in RIS [30], Endnote XML [31], RefWorks XML, or MODS bibliographic formats. A RIS, EndNote XML, or RefWorks XML document can generate a single or multiple digital objects on ingest (One for each record within the RIS, EndNote XML, or RefWorks document uploaded to the repository). The Islandora 7 version of the Scholar solution-pack module (currently under development) provides additional support for single and batch ingest using unique identifiers such as digital object identifiers (DOIs) and PubMed IDs (PMIDs).
At the Robertson Library, the primary method used to manage and harvest citation metadata is RefWorks. Citations are harvested into Refworks and are then edited to include IslandScholar specific information (identifiers for scholars and departments). Once the citations have been enriched, they are then exported as RefWorks XML files. Figure Ten illustrates the options for a user when ingesting a Refworks file of citations into the citation collection.
During RefWorks XML import, IslandScholar reads the Reference Type <rt> element from the RefWorks record, passes it through a matching XSLT and transforms the RefWorks XML to MODS. The relevant RefWorks XML is shown in Figure Eleven.
The RefWorks record is parsed and since <rt> = Journal Article the record is passed through the refworks_to_mods_journal.xsl [32] transformation (there are other transformations for other genres). Figure Twelve shows the resulting MODS formatted record that is stored in IslandScholar.
For files in the RIS or Endnote format, IslandScholar utilizes bibutils [33] to transform those formats to a standardized MODS metadata format. Bibutils is a set of command line tools used to transform various bibliographic formats to MODS.
Exporting / Sharing Data
Bibliography
A new feature added to IslandScholar V2 is the ability to select citations and add them to a list. The resulting list can be managed, and bibliographies generated in user-specified styles can be exported in several file formats. Users can add citations to a list from any search result page or from a researcher’s profile. Figure Thirteen illustrates a search result with two citations that have been added to a list.
When users wish to view / export the select citations, they select the MY LIST option and choose the desired output citation format. Several export options are available including HTML, RTF, PDF, and RIS. Figure Fourteen illustrates the options provided to a user.
In the case illustrated in Figure Fourteen, the option for HTML export has been selected. This means that an HTML export passes the citations through CSL and outputs the result to the screen (See Figure Fifteen)
RTF and PDF exports are handled by bibutils and the selected styles are applied. Bibutils also generates the tagged RIS export [34].
OAI-PMH
Another goal (and a requirement for e-thesis harvesting) in IslandScholar V2 was the integration of The Open Archives Initiative – Protocol for Metadata Harvesting (OAI-PMH). This was achieved by enabling and configuring the Islandora OAI module [35], a module that exposes and transforms the MODS datastream of digital objects in Islandora to OAI harvesters. Once the OAI module has been configured, a harvester can be pointed to particular collections. OAI output is governed by the module and the MODS datastream is passed through different stylesheets to return the requested metadata. For example, a MODS to OAI-DC stylesheet is applied to return results from the researchers’ citation collection as OAI-DC. So, for example, an OAI request like http://www.islandscholar.ca/oai2?verb=ListRecords&metadataPrefix=oai_dc&set=ir_citationCollection will provide results shown in Figure Sixteen.
To break down the OAI request the harvester has requested a list of records (verb=ListRecords) in the OAI DC metadata format (metadataPrefix=oai_dc) for the records contained in the repository’s ir:citationCollection collection (set=ir_citationCollecton). In order for our e-theses to be harvested, IslandScholar must make ETD-MS metadata available. To facilitate this, we configure the OAI module to include a transformation [36] of MODS to ETD-MS and expose the resulting metadata with the Islandora OAI module. For example, a request like http://www.islandscholar.ca/oai2?verb=ListRecords&metadataPrefix=oai_etdms&set=ir_thesisCollection will provide results shown in Figure Seventeen.
In this case the harvester has requested records in the ETD-MS metadata format (metadataPrefix=oai_etdms) from the repository’s ir:thesisCollection collection (set=ir_thesisCollecton). Figures Fifteen and Sixteen illustrate the ways that separate metadata profiles can be exposed. The module that enables this function was developed generically and includes a variety of configuration options. See Figure Eighteen for a screenshot of one of the sections of the administration screen:
RSS
IslandScholar V2 also provides a feed of a researcher’s citations, updated when new publications are associated with the researcher. This has value for the researcher wishing to embed an RSS feed in a faculty profile page or make similar use of an up-to-date list of publications (often required for tenure decisions, etc.). In IslandScholar, a basic RSS feed for researchers is generated by passing a Solr query to an XSLT. This is a feature that we hope to more fully develop in future work. Figure Nineteen shows this sample XML feed [37].
A link to a researcher’s feed is available from their profile page.
Managing Access to Content
One of the primary methods used to manage access to Islandora content is by the inclusion of a POLICY datastream in digital objects. POLICY datastreams are written in XACML, an xml based language used to define access control policies [38] in Fedora. When combined with Drupal’s existing user authentication and roles, fine grained access levels can be achieved at the collection, object, or datastream level. Figure Twenty shows a snippet from a POLICY datastream of a citation object that illustrates the users (fedoraAdmin, mleggott) and roles (administrator, Robertson Library) that can edit this object.
While all attempts are made to ensure that content is accessible within the repository, there are some use cases when associated content needs to be embargoed. The full text of student theses can be embargoed and there is a global XACML policy that applies to records that contain an embargo date. XACML policies can be difficult to write (and can potentially lock your object or the entire repository) and while not used in the current version of IslandScholar, a visual XACML editor [39] is now part of the Islandora toolset. In the future version of IslandScholar, the XACML editor will be used to simplify the application and management of embargo policies in IslandScholar.
Scholar deposit
As with the first iteration of IslandScholar, the Robertson Library made the choice to ‘seed’ the repository with the collected citations of scholars at UPEI and encourage researchers to contribute whichever version of the article is permitted by the publishing policy of the journal in which they published. To help scholars decide which version of their work can be legally submitted, IslandScholar V2 also maintained and updated the ability for researchers to review publisher copyright and self-archiving policies by integrating with the Sherpa/Romeo API. Researchers are also permitted in IslandScholar V2 to append any relevant associated research data to a citation by creating a .zip file of relevant materials and appending during the submission process. Figure Twenty-one illustrates the screen that is presented to a scholar appending documents to a citation.
Islandora’s microservices transform the uploaded file to a PDF/A for long-term preservation. The system supports any office-type format, eg. .doc, .docx, .odt, .xls, etc.. The resulting files are stored as part of the citation object.
Metrics
IslandScholar V2 provides more advanced tools for measuring downloads of an article and page views, and has begun integrating altmetrics into the tool set. Figure Twenty-two shows how this appears for scholars. Scholars log into the system and can view all citations, with a column showing the last date the citation was viewed, as well as total number of times the article was viewed.
Navigating directly to the record will show a user how many times an article has been downloaded, and when it was last downloaded (See Figure Twenty-three).
As the publishing platform for scholarly data, and the ways scholars share and interact with research has evolved, altmetrics has emerged as an approach that aims to account for those activities outside of the traditional peer-reviewed journal stream. [40] From the ciation view, IslandScholar V2 provides altmetrics to scholars and visitors to the system. While currently in the development stage, we are optimistic about the potential for this application to show just-in-time altmetrics alongside more traditional measures of popularity and impact (such as downloads and views). If an article in the system has a doi, pmid, or arxiv, a badge appears in the citation’s full record view. Scrolling over the badge shows a pop-up with altmetric statistics and a link to a more complete perspective on the impact of the article. This process is visible in Figure Twenty-four.
Conclusion
UPEI is proud of the recent upgrade to IslandScholar and the number of new tools that have been integrated to address emerging best-standards practice and the needs of IR users. At the same time, the team is committed to ongoing iteration and development. Embracing an open and community-centric approach, IslandScholar will continue to evolve to incorporate and develop new tools, and to take advantage of emerging tools. This work is also part of the ongoing development of Islandora and Islandora solution packs. Members of the community are warmly welcomed to visit and explore Islandscholar.ca and provide feedback to help guide development in the future.
References and Notes
[1] Strategic Research Plan 2008?2018 – http://research.upei.ca/files/research/v9%20Senate%2022Apr08.pdf
[2] Sherpa/Romeo – http://www.sherpa.ac.uk/romeo/
[3] COinS – http://ocoins.info/
[4] Open Access & Dissemination of Research Output – https://cab.upei.ca/sites/default/files/attachments/OpenAccessandDisseminationofResearchOutput.pdf
[5] Drupal – http://drupal.org
[6] ImageMagick – http://www.imagemagick.org/ – used in a variety of solution packs for image conversion
[7] FITS – https://code.google.com/p/fits/ integrated into the Islandora framework with the Islandora FITS module – https://github.com/Islandora/islandora_fits
[8] LAME – http://lame.sourceforge.net/ integrated into the Islandora framework with the Audio Solution Pack – https://github.com/Islandora/islandora_solution_pack_audio
[9] Tesseract – https://code.google.com/p/tesseract-ocr/ integrated into the Islandora framework with the Islandora OCR module – https://github.com/Islandora/islandora_ocr
[10] Islandora Internet Archive Bookreader – https://github.com/Islandora/islandora_internet_archive_bookreader
[11] Islandora Open Seadragon Viewer – https://github.com/Islandora/islandora_openseadragon
[12] There are at least two ‘flavours’ of micro services used in the Islandora context: Python based – https://github.com/Islandora/islandora_microservices; and PHP based – https://github.com/roblib/php_listeners
[13] Fedora Commons – http://fedora-commons.org
[14] Fedora XACML Policy Writing Guide – https://wiki.duraspace.org/display/FEDORA34/Fedora+XACML+Policy+Writing+Guide
[15] Build, Access, Modify and Delete Fedora objects with the Tuque interface – https://github.com/Islandora/islandora/wiki/Build,-Access,-Modify-and-Delete-Fedora-objects-with-the-Tuque-interface
[16] Islandora Solr Search module – https://github.com/Islandora/islandora_solr_search
[17] Metadata Object Description Schema (MODS) – http://www.loc.gov/standards/mods/
[18] XACML (eXtensible Access Control Markup Language) is an XML-based policy language enforced by Fedora at both the repository-wide and object level. Read more about XACML and Fedora at https://wiki.duraspace.org/display/FEDORA36/XACML+Policy+Enforcement. XACML is also discussed in more detail in a later section of this article.
[19] ETD-MS: an Interoperability Metadata Standard for Electronic Theses and Dissertations. http://www.ndltd.org/standards/metadata/etd-ms-v1.00-rev2.html
[20] About Electronic Theses: Harvesting Program. http://www.collectionscanada.gc.ca/thesescanada/027007-9200-e.html
[21] Metadata Authority Description Schema (MADS). http://www.loc.gov/standards/mads/
[22] The Researcher Profile page for Mark Leggott. http://www.islandscholar.ca/fedora/repository/ir:mleggott
[23] LDAP Integration Module. http://drupal.org/project/ldap_integration
[24] CASRAI Research Activity Profile. http://dictionary.casrai.org/research-activity-profile-draft
[25] Citation Style Language. http://citationstyles.org/
[26] Citation Styles. https://github.com/citation-style-language/styles and http://citationstyles.org/”>
[27] CSL Style Editor. http://editor.citationstyles.org/about/
[28] IslandScholar – converter.php. https://github.com/roblib/islandora_scholar_upei/blob/master/modules/citeproc/generators/converter.php
[29] citeproc-js. https://bitbucket.org/fbennett/citeproc-js/wiki/Home
[30] RIS File Format. http://en.wikipedia.org/wiki/RIS_%28file_format%29
[31] EndNote. http://en.wikipedia.org/wiki/EndNote
[32] XSLT used to transform RefWorks journal citation to MODS https://github.com/roblib/islandora_scholar_upei/blob/master/xsl/refworks_to_mods_journal.xsl
[33] http://sourceforge.net/p/bibutils/home/Bibutils/
[34] Islandora Bibliography Export. https://github.com/roblib/islandora_scholar_upei/blob/master/citation/bibliography/Export.inc
[35] Islandora OAI Module. https://github.com/Islandora/islandora_oai
[36] ETD-MS xslt: https://github.com/Islandora/islandora_oai/blob/7.x/transforms/mods_to_etdms.xsl
[37] Sample RSS feed output: http://www.islandscholar.ca/rss/mleggott
[38] https://wiki.duraspace.org/display/FEDORA36/Fedora+XACML+Policy+Writing+Guide
[39] Islandora XACML Editor. https://github.com/Islandora/islandora_xacml_editor
[40] Learn more about Altmetrics at http://altmetrics.org/manifesto/
About the Authors
Donald Moses, MLIS, is the digital initiatives and systems librarian at the University of Prince Edward Island’s Robertson Library, participates in the Islandora community, and is a member of the management team that oversees the Virtual Research Environment (VRE), including IslandScholar, at UPEI.
Kirsta Stapelfeldt, MA, MLIS, is the manager of the Islandora project at the UPEI’s Robertson Library, and has helped develop the VRE framework at UPEI, and served as a subject matter expert on a variety of
projects.
The authors would like to acknowledge the programmers that have developed the IslandScholar code: Paul Pound and Richard Wincewicz.
Subscribe to comments: For this article | For all articles
Leave a Reply