Search Results

Showing 30 articles matching "python"

The Use of Python to Support Technical Services Work in Academic Libraries

Issue 58 | 2023-12-04

Maria Collins, Xiaoyan Song, and Sherri Schon

Technical services professionals in academic libraries are firmly committed to digital transformation and have embraced technologies and data practices that reshape their work to be more efficient, reliable, and scalable. Evolving systems, constantly changing workflows, and management of large-scale data are constants in the technical services landscape. Maintaining one’s ability to effectively work in this kind of environment involves embracing continuous learning cycles and incorporating new skills – which in effect means training people in a different way and re-conceptualizing how libraries provide support for technical services work. This article presents a micro lens into this space by examining the use of Python within a technical services environment. The authors conducted two surveys and eleven follow up interviews to investigate how Python is used in academic libraries to support technical services work and to learn more about training and organizational support across the academic library community. The surveys and interviews conducted for this research indicate that understanding the larger context of culture and organizational support are of high importance for illustrating the complications of this learning space for technical services. Consequently, this article will address themes that affect skills building in technical services at both a micro and macro level.

Standardization of Journal Title Information from Interlibrary Loan Data: A Customized Python Code Approach

Issue 57 | 2023-08-29

Jennifer Ye Moon-Chung

Interlibrary loan (ILL) data plays a crucial role in making informed journal subscription decisions. However, inconsistent or incomplete data associated with journal titles and International Standard Serial Numbers (ISSNs) as data points often entered inaccurately by requestors, presents challenges when attempting to make use of the ILL data. This article introduces a solution utilizing customized Python code to standardize journal titles obtained from user-entered data. The solution incorporates a preprocessing workflow that filters out irrelevant information and employs Application Programming Interfaces (APIs) to replace inaccurate titles with precise ones based on retrieved ISSNs, ensuring data accuracy. The solution then presents the processed data in a dashboard format, highlighting the most requested journals and enabling librarians to interactively explore the data. By adopting this approach, librarians can make well-informed decisions and conduct thorough analysis, resulting in more efficient and effective management of library resources.

Apples to Oranges: Using Python and the pymarc library to match bookstore ISBNs to locally held eBook ISBNs

Issue 56 | 2023-04-21

Mitchell Scott

To alleviate financial burdens faced by students and to provide additional avenues for the benefits shown to be present when no-cost materials are available to students (equity and access and an increase in student success metrics), more and more libraries are leveraging their collections and acquisition processes to provide no-cost eBook alternatives to students. It is common practice now for academic libraries to have a partnership with their campus bookstore and to receive a list of print and eBook materials required for an upcoming semester. Libraries take these lists and use various processes and workflows, some extremely labor intensive and others semi-labor intensive, to identify which of these titles they already own as unlimited access eBooks, and which titles could be purchased as unlimited access eBooks. The most common way to match bookstore titles to already licensed eBooks is by searching the bookstore provided ISBN or title in either the Library Management System (LMS), the Analytics and Reporting layer of the LMS, the Library Discovery Layer, or via another homegrown process. While some searching could potentially be automated, depending on the available functionality of the LMS or the Analytics component of the LMS, the difficulty lies in matching the bookstore ISBN, often the print ISBN, to the library eBook ISBN. This article will discuss the use of Python, the Pymarc library in Python, and Library eBook MARC records to create an automated identification process to accurately match bookstore lists to library eBook holdings.

Utilizing R and Python for Institutional Repository Daily Jobs

Issue 56 | 2023-04-21

Yongli Zhou

In recent years, the programming languages R and Python have become very popular and are being used by many professions. However, they are not just limited to data scientists or programmers; they can also help librarians to perform many tasks more efficiently and possibly achieve goals that were almost impossible before. R and Python are scripting languages, which means they are not very complicated. With minimal programming experience, a librarian can learn how to program in these languages and start to apply them to work. This article provides examples of how to use R and Python to clean up metadata, resize images, and match transcripts with scanned images for the Colorado State University Institutional Repository.

Using Python Scripts to Compare Records from Vendors with Those from ILS

Issue 55 | 2023-1-20

Dan Lou

An increasing challenge libraries face is how to maintain and synchronize the electronic resource records from vendors with those in the integrated library system (ILS). Ideally vendors send record updates frequently to the library. However, this is not a perfect solution, and over time a problem with record discrepancies can become severe with thousands of records out of sync. This is what happened when, at a certain point, our acquisitions librarian and our cataloging librarian noticed a big record discrepancy issue. In order to effectively identify the problematic records among tens of thousands of records from both sides, the author of this article developed some solutions to analyze the data using Python functions and scripts. This data analysis helps to quickly scale down the issue and reduce the cataloging effort.

Leveraging a Custom Python Script to Scrape Subject Headings for Journals

Issue 52 | 2021-09-22

Shelly R. McDavid, Eric McDavid, and Neil E. Das

In our current library fiscal climate with yearly inflationary cost increases of 2-6+% for many journals and journal package subscriptions, it is imperative that libraries strive to make our budgets go further to expand our suite of resources. As a result, most academic libraries annually undertake some form of electronic journal review, employing factors such as cost per use to inform budgetary decisions. In this paper we detail some tech savvy processes we created to leverage a Python script to automate journal subject heading generation within the OCLC’s WorldCat catalog, the MOBIUS (A Missouri Library Consortium) Catalog, and the VuFind Library Catalog, a now retired catalog for the CARLI (Consortium for Academic and Research Libraries in Illinois). We also describe the rationale for the inception of this project, the methodology we utilized, the current limitations, and details of our future work in automating our annual analysis of journal subject headings by use of an OCLC API.

Automated Collections Workflows in GOBI: Using Python to Scrape for Purchase Options

Issue 49 | 2020-08-10

Katharine Frazier

The NC State University Libraries has developed a tool for querying GOBI, our print and ebook ordering vendor platform, to automate monthly collections reports. These reports detail purchase options for missing or long-overdue items, as well as popular items with multiple holds. GOBI does not offer an API, forcing staff to conduct manual title-by-title searches that previously took up to 15 hours per month. To make this process more efficient, we wrote a Python script that automates title searches and the extraction of key data (price, date of publication, binding type) from GOBI. This tool can gather data for hundreds of titles in half an hour or less, freeing up time for other projects.

This article will describe the process of creating this script, as well as how it finds and selects data in GOBI. It will also discuss how these results are paired with NC State’s holdings data to create reports for collection managers. Lastly, the article will examine obstacles that were experienced in the creation of the tool and offer recommendations for other organizations seeking to automate collections workflows.

Leveraging the RBMS/BSC Latin Place Names File with Python

Issue 48 | 2020-05-11

kalan Knudson Davis

To answer the relatively straight-forward question “Which rare materials in my library catalog were published in Venice?” requires an advanced knowledge of geography, language, orthography, alphabet graphical changes, cataloging standards, transcription practices, and data analysis. The imprint statements of rare materials transcribe place names more faithfully as it appears on the piece itself, such as Venetus, or Venetiae, rather than a recognizable and contemporary form of place name, such as Venice, Italy. Rare materials catalogers recognize this geographic discoverability and selection issue and solve it with a standardized solution. To add consistency and normalization to imprint locations, rare materials catalogers utilize hierarchical place names to create a special imprint index. However, this normalized and contemporary form of place name is often missing from legacy bibliographic records. This article demonstrates using a traditional rare materials cataloging aid, the RBMS/BSC Latin Place Names File, with programming tools, Jupyter Notebook and Python, to retrospectively populate a special imprint index for 17th-century rare materials. This methodology enriched 1,487 MAchine Readable Cataloging (MARC) bibliographic records with hierarchical place names (MARC 752 fields) as part of a small pilot project. This article details a partially automated solution to this geographic discoverability and selection issue; however, a human component is still ultimately required to fully optimize the bibliographic data.

Reporting from the Archives: Better Archival Migration Outcomes with Python and the Google Sheets API

Issue 46 | 2019-11-05

David W. Hodges and Kevin Schlottmann

Columbia University Libraries recently embarked on a multi-phase project to migrate nearly 4,000 records describing over 70,000 linear feet of archival material from disparate sources and formats into ArchivesSpace. This paper discusses tools and methods brought to bear in Phase 2 of this project, which required us to look closely at how to integrate a large number of legacy finding aids into the new system and merge descriptive data that had diverged in myriad ways. Using Python, XSLT, and a widely available if underappreciated resource—the Google Sheets API—archival and technical library staff devised ways to efficiently report data from different sources, and present it in an accessible, user-friendly way,. Responses were then fed back into automated data remediation processes to keep the migration project on track and minimize manual intervention. The scripts and processes developed proved very effective, and moreover, show promise well beyond the ArchivesSpace migration. This paper describes the Python/XSLT/Sheets API processes developed and how they opened a path to move beyond CSV-based reporting with flexible, ad-hoc data interfaces easily adaptable to meet a variety of purposes.

Generating Geographic Terms for Streaming Videos Using Python: A Comparative Analysis

Issue 45 | 2019-08-09

Patrick Harrington

In libraries, the relationship between textual descriptions of audiovisual material and access to that material is a primary concern, as users expect to have access to all the library’s resources—which increasingly include audiovisual content—through a simple and effective web interface. At UW-Oshkosh, library staff developed a unique site for its streaming video collection that would allow users to search for videos and browse collections on particular topics across each of the three vendors. In order to create more meaningful and topical collections, various programming tools and techniques were employed to identify geographical locations in vendor-supplied MARC records. This article describes three different methods for generating geographic terms for streaming videos using different Python libraries and evaluates them based on the number of terms generated, overlap in terms generated between the three methods, and the amount of cleanup needed to generate useful geographic terms.

Large-Scale Date Normalization in ArchivesSpace with Python, MySQL, and Timetwister

Issue 44 | 2019-05-06

Alicia Detelich

Normalization of legacy date metadata can be challenging, as standards and local practices for formulating dates have varied widely over time. With the advent of archival management systems such as ArchivesSpace, structured, machine-actionable date metadata is becoming increasingly important for search and discovery of archival materials. This article describes a recent effort by a group of Yale University archivists to add ISO 8601-compliant dates to nearly 1 million unstructured date records in ArchivesSpace, using a combination of Python, MySQL, and Timetwister, a Ruby gem developed at the New York Public Library (NYPL).

Analyzing EZproxy SPU Logs Using Python Data Analysis Tools

Issue 42 | 2018-11-08

Brighid M. Gonzales

Even with the assortment of free and ready-made tools for analyzing EZproxy log files, it can be difficult to get useful, meaningful data from them. Using the Python programming language with its collection of modules created specifically for data analysis can help with this task, and ultimately result in better and more useful data customized to the needs of the library using it. This article describes how Our Lady of the Lake University used Python to analyze its EZproxy log files to get more meaningful data, including a walk-through of the code needed to accomplish this task.

Alma Enumerator: Automating repetitive cataloging tasks with Python

Issue 42 | 2018-11-08

Nausicaa Rose

In June 2016, the Warburg College library migrated to a new integrated library system, Alma. In the process, we lost the enumeration and chronology data for roughly 79,000 print serial item records. Re-entering all this data by hand seemed an unthinkable task. Fortunately, the information was recorded as free text in each item’s description field. By using Python, Alma’s API and much trial and error, the Wartburg College library was able to parse the serial item descriptions into enumeration and chronology data that was uploaded back into Alma. This paper discusses the design and feasibility considerations addressed in trying to solve this problem, the complications encountered during development, and the highlights and shortcomings of the collection of Python scripts that became Alma Enumerator.

Approaching the largest ‘API’: extracting information from the Internet with Python

Issue 39 | 2018-02-05

Jonathan E. Germann

This article explores the need for libraries to algorithmically access and manipulate the world’s largest API: the Internet. The billions of pages on the ‘Internet API’ (HTTP, HTML, CSS, XPath, DOM, etc.) are easily accessible and manipulable. Libraries can assist in creating meaning through the datafication of information on the world wide web. Because most information is created for human consumption, some programming is required for automated extraction. Python is an easy-to-learn programming language with extensive packages and community support for web page automation. Four packages (Urllib, Selenium, BeautifulSoup, Scrapy) in Python can automate almost any web page for all sized projects. An example warrant data project is explained to illustrate how well Python packages can manipulate web pages to create meaning through assembling custom datasets.

Leveraging Python to improve ebook metadata selection, ingest, and management

Issue 38 | 2017-10-18

Kelly Thompson and Stacie Traill

Libraries face many challenges in managing descriptive metadata for ebooks, including quality control, completeness of coverage, and ongoing management. The recent emergence of library management systems that automatically provide descriptive metadata for e-resources activated in system knowledge bases means that ebook management models are moving toward both greater efficiency and more complex implementation and maintenance choices. Automated and data-driven processes for ebook management have always been desirable, but in the current environment, they become necessary. In addition to initial selection of a record source, automation can be applied to quality control processes and ongoing maintenance in order to keep manual, eyes-on work to a minimum while providing the best possible discovery and access. In this article, we describe how we are using Python scripts to address these challenges.

Python, Google Sheets, and the Thesaurus for Graphic Materials for Efficient Metadata Project Workflows

Issue 35 | 2017-01-30

Jeremy Bartczak, Ivey Glendon

In 2017, the University of Virginia (U.Va.) will launch a two year initiative to celebrate the bicentennial anniversary of the University’s founding in 1819. The U.Va. Library is participating in this event by digitizing some 20,000 photographs and negatives that document student life on the U.Va. grounds in the 1960s and 1970s. Metadata librarians and archivists are well-versed in the challenges associated with generating digital content and accompanying description within the context of limited resources. This paper describes how technology and new approaches to metadata design have enabled the University of Virginia’s Metadata Analysis and Design Department to rapidly and successfully generate accurate description for these digital objects. Python’s pandas module improves efficiency by cleaning and repurposing data recorded at digitization, while the lxml module builds MODS XML programmatically from CSV tables. A simplified technique for subject heading selection and assignment in Google Sheets provides a collaborative environment for streamlined metadata creation and data quality control.

Processing Government Data: ZIP Codes, Python, and OpenRefine

Issue 25 | 2014-07-21

Frank Donnelly

While there is a vast amount of useful US government data on the web, some of it is in a raw state that is not readily accessible to the average user. Data librarians can improve accessibility and usability for their patrons by processing data to create subsets of local interest and by appending geographic identifiers to help users select and aggregate data. This case study illustrates how census geography crosswalks, Python, and OpenRefine were used to create spreadsheets of non-profit organizations in New York City from the IRS Tax-Exempt Organization Masterfile. This paper illustrates the utility of Python for data librarians and should be particularly insightful for those who work with address-based data.

Augmenting the Cataloger’s Bag of Tricks : Using MarcEdit, Python, and PyMARC for Batch-Processing MARC Records Generated From the Archivists’ Toolkit

Issue 20 | 2013-04-17

Heidi Frank

Catalogers have traditionally created and edited MARC records on a one-by-one basis. Recently, it has become more common for catalogers to delve into scripting and programming tools in order to automate the processing of large numbers of records simultaneously. This article provides a case study showing how MARCXML archival records generated by the Archivists’ Toolkit (AT) can be modified in batches using the MarcEdit software, Python scripting and the PyMARC module in Python. It analyzes selected problems with the MARC records exported by the AT and shows how MarcEdit and Python are used to resolve them so that the records are formatted correctly for loading into the library’s local catalog. Similar methods could be used by catalogers dealing with any large set of MARC data, such as ebook records from vendors.

Editorial

Issue 58 | 2023-12-04

Brighid M. Gonzales

Issue 58 of the Code4Lib Journal is bursting at the seams with examples of how libraries are creating new technologies, leveraging existing technologies, and exploring the use of AI to benefit library work. We had an unprecedented number of submissions this quarter and the resulting issue features 16 articles detailing some of the more unique and innovative technology projects libraries are working on today.

Enhancing Serials Holdings Data: A Pymarc-Powered Clean-Up Project

Issue 58 | 2023-12-04

Minyoung Chung and Phani Chaitanya Pendyala

Following the recent transition from Inmagic to Ex Libris Alma, the Technical Services department at the University of Southern California (USC) in Los Angeles undertook a post-migration cleanup initiative. This article introduces methodologies aimed at improving irregular summary holdings data within serials records using Pymarc, regular expressions, and the Alma API in MarcEdit. The challenge identified was the confinement of serials’ holdings information exclusively to the 866 MARC tag for textual holdings.

To address this challenge, Pymarc and regular expressions were leveraged to parse and identify various patterns within the holdings data, offering a nuanced understanding of the intricacies embedded in the 866 field. Subsequently, the script generated a new 853 field for captions and patterns, along with multiple instances of the 863 field for coded enumeration and chronology data, derived from the existing data in the 866 field.

The final step involved utilizing the Alma API via MarcEdit, streamlining the restructuring of holdings data and updating nearly 5,000 records for serials. This article illustrates the application of Pymarc for both data analysis and creation, emphasizing its utility in generating data in the MARC format. Furthermore, it posits the potential application of Pymarc to enhance data within library and archive contexts.

Pipeline or Pipe Dream: Building a Scaled Automated Metadata Creation and Ingest Workflow Using Web Scraping Tools

Issue 58 | 2023-12-04

Matthew Krc and Anna Oates Schlaack

Since 2004, the FRASER Digital Library has provided free access to publications and archival collections related to the history of economics, finance, banking, and the Federal Reserve System. The agile web development team that supports FRASER’s digital asset management system embarked on an initiative to automate collecting documents and metadata from US governmental sources across the web. These sources present their content on web pages but do not serve the metadata and document links via an API or other semantic web technologies, making automation a unique challenge. Using a combination of third-party software, lightweight cloud services, and custom Python code, the FRASER Recurring Downloads project transformed what was previously a labor-intensive daily process into a metadata creation and ingest pipeline that requires minimal human intervention or quality control.

This article will provide an overview of the software and services used for the Recurring Downloads pipeline, as well as some of the struggles that the team encountered during the design and build process, and current use of the final product. The project required a more detailed plan than was designed and documented. The fully manual process was not intended to be automated when established, which introduced inherent complexity in creating the pipeline. A more comprehensive plan could have made the iterative development process easier by having a defined data model, and documentation of—and strategy for—edge cases. Further initial analysis of the cloud services used would have defined the limitations of those services, and workarounds could have been accounted for in the project plan. While the labor-intensive manual workflow has been reduced significantly, the required skill sets to efficiently maintain the automated workflow present a sustainability challenge of task distribution between librarians and developers. This article will detail the challenges and limitations of transitioning and standardizing recurring web scraping across more than 50 sources to a semi-automated workflow and potential future improvements to the pipeline.

A practical method for searching scholarly papers in the General Index without a high-performance computer

Issue 58 | 2023-12-04

Emily Cukier

The General Index is a free database that offers unprecedented access to keywords and ngrams derived from the full text of over 107 million scholarly articles. Its simplest use is looking up articles that contain a term of interest, but the data set is large enough for text mining and corpus linguistics. Despite being positioned as a public utility, there is no user interface; one must download, query, and extract results from raw data tables. Not only is computing skill a barrier to use, but the file sizes are too large for most desktop computers to handle. This article will show a practical way to use the GI for researchers with moderate skills and resources. It will walk though building a bibliography of articles and a visualizing yearly prevalence of a topic in the General Index, using simple R programming commands and a modestly equipped desktop computer (code is available at https://osf.io/s39n7/). It will briefly discuss what else can be done (and how) with more powerful computational resources.

Using Airtable to download and parse Digital Humanities Data

Issue 58 | 2023-12-04

William K. Dewey

Airtable is an increasingly popular cloud-based format for entering and storing research data, especially in the digital humanities. It combines the simplicity of spreadsheets like CSV or Excel with a relational database’s ability to model relationships and link records. The Center for Digital Research in the Humanities (CDRH) at Nebraska uses Airtable data for two projects, African Poetics (africanpoetics.unl.edu) and Petitioning for Freedom (petitioningforfreedom.unl.edu). In the first project, the data focuses on African poets and news coverage of them, and in the second, the data focuses on habeas corpus petitions and individuals involved in the cases. CDRH’s existing software stack (designed to facilitate display and discovery) can take in data in many formats, including CSV, and parse it with Ruby scripts and ingest it into an API based on the Elasticsearch search index. The first step in using Airtable data is to download and convert it into a usable data format. This article covers the command line tools that can download tables from Airtable, the formats that can be downloaded (JSON being the most convenient for automation) and access management for tables and authentication. Python scripts can process this JSON data into a CSV format suitable for ingesting into other systems The article goes on to discuss how this data processing might work. It also discusses the process of exporting information from the join tables, Airtable’s relational database-like functionality. Join data is not human-readable when exported, but it can be pre-processed in Airtable into parsable formats. After processing the data into CSV format, this article touches on how CDRH API fields are populated from plain values and more complicated structures including Markdown-style links. Finally, this article discusses the advantages and disadvantages of Airtable for managing data, from a developer’s perspective.

Developing a Multi-Portal Digital Library System: A Case Study of the new University of Florida Digital Collections

Issue 58 | 2023-12-04

Todd Digby, Cliff Richmond, Dustin Durden, and Julio Munoz

The University of Florida (UF) launched the UF Digital Collections in 2006. Since this time, the system has grown to over 18 million pages of content. The locally developed digital library system consisted of an integrated public frontend interface and a production backend. As with other monoliths, being able to adapt and make changes to the system became increasingly difficult as time went on and the size of the collections grew. As production processes changed, the system was modified to make improvements on the backend, but the public interface became dated and increasingly not mobile responsive. A decision was made to develop a new system, starting with decoupling the public interface from the production system. This article will examine our experience in rearchitecting our digital library system and deploying our new multi-portal, public-facing system. After an environmental scan of digital library technologies, it was decided to not use a current open-source digital library system. A relatively new programming team, who were new to the library ecosystem, allowed us to rethink many of our existing assumptions and provided new insights and development opportunities. Using technologies that include Python, APIs, ElasticSearch, ReactJS, PostgreSQL, and more, has allowed us to build a flexible and adaptable system that allows us to hire developers in the future who may not have experience building digital library systems.

Jupyter Notebooks and Institutional Repositories: A Landscape Analysis of Realities, Opportunities and Paths Forward

Issue 58 | 2023-12-04

Adrienne VandenBosch, Keith E. Maull, and Matthew Mayernik

Jupyter Notebooks are important outputs of modern scholarship, though the longevity of these resources within the broader scholarly record is still unclear. Communities and their creators have yet to holistically understand creation, access, sharing and preservation of computational notebooks, and such notebooks have yet to be designated a proper place among institutional repositories or other preservation environments as first class scholarly digital assets. Before this can happen, repository managers and curators need to have the appropriate tools, schemas and best practices to maximize the benefit of notebooks within their repository landscape and environments.

This paper explores the landscape of Jupyter notebooks today, and focuses on the opportunities and challenges related to bringing Jupyter Notebooks into institutional repositories. We explore the extent to which Jupyter Notebooks are currently accessioned into institutional repositories, and how metadata schemas like CodeMeta might facilitate their adoption. We also discuss characteristics of Jupyter Notebooks created by researchers at the National Center for Atmospheric Research, to provide additional insight into how to assess and accession Jupyter Notebooks and related resources into an institutional repository.

Beyond the Hype Cycle: Experiments with ChatGPT’s Advanced Data Analysis at the Palo Alto City Library

Issue 58 | 2023-12-04

M Ryan Hess and Chris Markman

In June and July of 2023 the Palo Alto City Library’s Digital Services team embarked on an exploratory journey applying Large Language Models (LLMs) to library projects. This article, complete with chat transcripts and code samples, highlights the challenges, successes, and unexpected outcomes encountered while integrating ChatGPT Pro into our day-to-day work.

Our experiments utilized ChatGPTs Advanced Data Analysis feature (formerly Code Interpreter). The first goal tested the Search Engine Optimization (SEO) potential of ChatGPT plugins. The second goal of this experiment aimed to enhance our web user experience by revising our BiblioCommons taxonomy to better match customer interests and make the upcoming Personalized Promotions feature more relevant. ChatGPT helped us perform what would otherwise be a time-consuming analysis of customer catalog usage to determine a list of taxonomy terms better aligned with that usage.

In the end, both experiments proved the utility of LLMs in the workplace and the potential for enhancing our librarian’s skills and efficiency. The thrill of this experiment was in ChatGPT’s unprecedented efficiency, adaptability, and capacity. We found it can solve a wide range of library problems and speed up project deliverables. The shortcomings of LLMs, however, were equally palpable. Each day of the experiment we grappled with the nuances of prompt engineering, contextual understanding, and occasional miscommunications with our new AI assistant. In short, a new class of skills for information professionals came into focus.

Editorial: Big code, little code, open code, old code

Issue 57 | 2023-08-29

Péter Király

Paraphrasing the title of Christine L. Borgman’s inaugural lecture in Göttingen some years ago “Big data, little data, open data” I could say that the current issue of Code4Lib is about big code, little code, open code, old code. The good side of coding is that effective contribution could be done with different levels and types of background knowledge. The issue proves to us that even small modifications or sharing knowledge about command line usage of a tool might be very useful for the user community. Let’s see what we have!

ChronoNLP: Exploration and Analysis of Chronological Textual Corpora

Issue 57 | 2023-08-29

Erin Wolfe

This article introduces ChronoNLP, a free and open-source web application designed to enable the application of Natural Language Processing (NLP) techniques to textual datasets with a time-based component. This interactive Python platform allows users to filter, search, explore, and visualize this data, allowing the temporal aspect to play a central role in data analysis. ChronoNLP makes use of several powerful NLP libraries to facilitate various text analysis techniques including topic modeling, term/TF-IDF frequency evaluation, automated keyword extraction, named entity recognition and other tasks through a graphical interface without the need for coding or technical knowledge. By highlighting the temporal aspect of specific types of corpora, ChronoNLP provides access to methods of parsing and visualizing the data in a user-friendly format to help uncover patterns and trends in text-based materials.

A Very Small Pond: Discovery Systems That Can Be Used with FOLIO in Academic Libraries

Issue 57 | 2023-08-29

Aaron Neslin, Jaime Taylor

FOLIO, an open source library services platform, does not have a front end patron interface for searching and using library materials. Any library installing FOLIO will need at least one other software to perform those functions. This article evaluates which systems, in a limited marketplace, are available for academic libraries to use with FOLIO.

Editorial: Forget the AI, We Have Live Editors

Issue 56 | 2023-04-21

Sara Amato

Welcoming new editors to the Code4Lib Journal

Search Results

Showing 30 articles matching "python"

The Use of Python to Support Technical Services Work in Academic Libraries

Standardization of Journal Title Information from Interlibrary Loan Data: A Customized Python Code Approach

Apples to Oranges: Using Python and the pymarc library to match bookstore ISBNs to locally held eBook ISBNs

Utilizing R and Python for Institutional Repository Daily Jobs

Using Python Scripts to Compare Records from Vendors with Those from ILS

Leveraging a Custom Python Script to Scrape Subject Headings for Journals

Automated Collections Workflows in GOBI: Using Python to Scrape for Purchase Options

Leveraging the RBMS/BSC Latin Place Names File with Python

Reporting from the Archives: Better Archival Migration Outcomes with Python and the Google Sheets API

Generating Geographic Terms for Streaming Videos Using Python: A Comparative Analysis

Large-Scale Date Normalization in ArchivesSpace with Python, MySQL, and Timetwister

Analyzing EZproxy SPU Logs Using Python Data Analysis Tools

Alma Enumerator: Automating repetitive cataloging tasks with Python

Approaching the largest ‘API’: extracting information from the Internet with Python

Leveraging Python to improve ebook metadata selection, ingest, and management

Python, Google Sheets, and the Thesaurus for Graphic Materials for Efficient Metadata Project Workflows

Processing Government Data: ZIP Codes, Python, and OpenRefine

Augmenting the Cataloger’s Bag of Tricks : Using MarcEdit, Python, and PyMARC for Batch-Processing MARC Records Generated From the Archivists’ Toolkit

Editorial

Enhancing Serials Holdings Data: A Pymarc-Powered Clean-Up Project

Pipeline or Pipe Dream: Building a Scaled Automated Metadata Creation and Ingest Workflow Using Web Scraping Tools

A practical method for searching scholarly papers in the General Index without a high-performance computer

Using Airtable to download and parse Digital Humanities Data

Developing a Multi-Portal Digital Library System: A Case Study of the new University of Florida Digital Collections

Jupyter Notebooks and Institutional Repositories: A Landscape Analysis of Realities, Opportunities and Paths Forward

Beyond the Hype Cycle: Experiments with ChatGPT’s Advanced Data Analysis at the Palo Alto City Library

Editorial: Big code, little code, open code, old code

ChronoNLP: Exploration and Analysis of Chronological Textual Corpora

A Very Small Pond: Discovery Systems That Can Be Used with FOLIO in Academic Libraries

Editorial: Forget the AI, We Have Live Editors

Current Issue

Previous Issues

For Authors