By Andrew Weaver and Ashley Blewer
Introduction
Despite its widespread adoption, optical media is widely recognized as being a format with severe preservation risks due to its high potential for damage, degradation and technical obsolescence (Schweikert, 2018). This is particularly true in the case of recordable optical media, such as CD-Rs, where the inherent vices of unstable base materials and variable recording quality combine to create a high level of uncertainty for the lifespan of these discs. (Schüller, 2008).
Accordingly, an increasing amount of effort within the Preservation field has been focused on exploring tools and workflows to migrate the data off of these materials before it is permanently lost. Thanks to the generosity of practitioners, quite a lot of this work has been documented and made publicly available through resources such as George Blood LP’s 2014 report for the Library of Congress (Morel et al., 2014) and Alexander Duryee’s introduction to the topic in Code4Lib journal (Duryee, 2014). One optical format, however, has been broadly ignored by the existing body of work: the humble Video CD.
While never a dominant format in the Anglosphere, the Video CD, or VCD, held wide popularity from the 1990s through the 2000s in Asia and other regions. As such, a dedicated exploration of preservation solutions for VCD has utility both as a resource for institutions that collect heavily in Pacific Rim materials, as well as a means to, in a minor way, aid in the ongoing efforts to expand the Digital Preservation corpus beyond its traditional focus of issues prevalent in North America and Europe. This paper will present an overview of VCD as a format, its unique characteristics that impact preservation decisions and a survey of existing tools and methods for the migration of VCD contents off of the original carrier and into digital preservation and access workflows.
Background & Context
VCD is a standard digital data format for storing video on a compact disc with the intended purpose of linear video playback. Essentially, VCD was an optical media disc (similar to a CD or DVD) that was used to store and play video. As a popular format in the 1990s and 2000s, VCD usage fits between LaserDisc and DVD, with an image quality comparable to VHS (Davidson and Lediaev, 1993).
VCD discs were the same dimensions and composition of CDs and DVDs: Discs were 120 mm (4.7 in.) in diameter and composed of a clear polycarbonate plastic substrate, a reflective metallic layer, and a clear plastic protective coating. Physical preservation efforts should adhere to the same practice as other optical media formats.
The authority for the VCD specification comes from the “White Book” standard released in 1993 by a group of video technology companies (JVC, Panasonic, Philips, and Sony). The title “White Book” is due to this specification being part of a series of standards that were developed for compact discs around this time period, creating what is known as the Rainbow Books. The Rainbow Books started with Red (CD-DA Digital Audio) in 1980 until the Purple Book (for DDCD Double Density). The White Book sits between the Beige Book (1992), which covers Photo CD technology and was written by Kodak, and the Blue Book (1995), which covers Enhanced CD technology and was written by Philips and Sony.
The VCD “White Book” standard was based off of previous work on the Karaoke-CD system. Thus, VCDs included and built upon similar technical features like using MPEG-1 video, closed captioning, and ability to select and start from defined sections (Super Video Compact Disc, 2001).
As a format, VCDs were especially popular in Southeast Asia and China. VCD premiered in these markets at the right time, as VHS had not yet saturated the market, as it had in other geographic regions. Optical media also has less risk of mold and mildew accumulation compared to magnetic tape, making them more suitable for very warm climates.
Like CDs and DVDs, VCDs were either commercially pressed or they were “burned” onto writable media. While writing this paper, the authors encountered great numbers of these “burned” VCDs, both in collections that had been created by community members as well as items that were commercial releases that had been purchased for circulation. These VCDs are subject to the same elevated risks noted for writable optical media formats.
Brief overview of VCD technical characteristics
The defined length of a VCD was marketed as offering typically up to 74 or 80 minutes on one side, a bit less than a standard feature-length movie. This minute mark was just an estimate; size constraints were determined by the amount of video stored rather than video duration. Technically, VCD discs could store up to 800 MB of compressed video, with the most commonly produced sizes being 650 MB (74 minutes) or 700 MB (80 minutes) [1]. So while VCDs were advertised as having a certain number of minutes of video, the true maximum duration is determined by the video content and compression.
VCDs use the MPEG-1 encoding standard for compressing and storing both video and audio. MPEG-1 is defined under ISO/IEC 11172 Specifications for MPEG-1. It may also be referred to as H.261 [2]. The MPEG-1 format was released the same year as VCD and sought to create video files that were of reasonable size and of sufficient quality (for the era).
The structure of a typical VCD is broken up into one or more CD-ROM XA (“eXtended Architecture”) sectors:
- First track: Contains an ISO-9660 file system that stores 2048 bytes/sector. This track’s primary function is to hold metadata and pointers that point to specific content (e.g. a tracklist or chapter list). It can also hold still images or frames, mostly for use in menu screens. This track is small but crucial to the operation of a structured Video CD. It uses the CD-ROM XA formatting method “mode 2 form 1,” which takes up more space but allows for some error correction functionality.
- Subsequent tracks: The second and any other additional tracks contain raw MPEG tracks that store 2324 bytes/sector, with one MPEG data packet stored per sector. These tracks are not inherently part of the disc’s file system structure and thus cannot be mounted. Windows operating systems may assume the intended usage of these files, interpret the contents, and “mount” the files even though they are just raw data, similar to how some systems will virtualize the raw PCM tracks on audio CDs as WAV files. These tracks use CD-ROM XA formatting method “mode 2 form 2”; they do not support error correction but each sector has the capacity to store more data.
VCDs can also contain exceptions to this standard arrangement, for example, discs containing audio (CD-DA) tracks or only a single track without a file system.[3]
Migration considerations
When browsing the contents of a VCD via the directory viewer of an operating system, there will be a folder called ‘mpegav’ (See Figure 1) that contains one or more .DAT “files”. These .DAT files are filesystem representations of the MPEG sectors, not the MPEG sectors themselves. The other visible folders and files form the non-video elements of the disc’s structure, and include information such as playback sequence metadata [4].
Any attempts to copy, open or access them on a Linux system will fail with an I/O error. As noted above, the Windows operating system, however, will virtualize and play this raw data. Legacy methods mention workarounds for Linux machines using the cdfs kernel driver, but unfortunately these are no longer viable due to cdfs being incompatible with the Linux kernel since at least 2013 [5].
Although directly copying data on a Windows machine may be viable for accessing content on Video CDs, it is insufficient preservation practice to copy and paste the files from the disc onto a harddrive, and the bulk of the remainder of this paper will be focused around workflows designed for preservation minded applications where a complete and replicable raw image of the original disc is produced. Similar to DVDs, where information such as titles etc. may be lost if only the video information is extracted from the disc, the authors believe that the creation of an image, specifically a BIN/CUE pairing, is the most appropriate strategy when working with VCDs for preservation purposes.
The BIN/CUE formats work in combination to yield an exact replica of the disc. The .bin file holds the complete raw binary data of the disc contents and the .cue file stores metadata for associated track layouts.This method helps ensure preservation of the complete structure of the disc.
An image, however, is only useful if it can be reliably interacted with in the future – and it is here that VCDs are particularly unique when compared to other optical formats. With their combination of an ISO-9660 track followed by an MPEG stream track or tracks, their images can’t be opened or played back in the same fashion as DVDs or other common data disc formats, and some extra steps are necessary to interact with the contents of BIN/CUE pairs made from VCDs. Accordingly, we will compare a few possible methods of accessing VCD data both within preservation workflows as well as for simple access.
Tools Tested for Creating Disc Images
With Video CDs being based on the CD-ROM XA architecture, in selecting tools for comparison we sought out tools that have been used in workflows with other types of CD-ROM XA-based items. Mixed mode audio CDs [6], being composed of a single session with a first track containing a data followed by successive tracks of PCM audio, seemed the most similar format with extant documentation available, such as the workflows shared in by Johan van der Knijff (van der Knijff, 2015). Of the tools we selected, two of them (Isobuster and Cdrdao) are commonly used in Mixed mode audio CD preservation, and one (VCDImager) is a tool specific to VCDs.
The sample commands in the following sections (with the exception of the description of Isobuster) were written assuming the Linux version of the tool is being used.
Isobuster
Isobuster is a commercial data recovery application that is capable of creating raw disc images in the BIN/CUE format and is specific to the Windows operating system. Of the tools we surveyed, Isobuster is perhaps the most straightforward option for creating VCD images (See Figure 2). While Isobuster is commercial software that requires registration for full functionality, its licensing information notes that all VCD capabilities are within its free tier [7].
Process to create bin/cue
Conveniently, Isobuster is able to create a BIN/CUE pair from a VCD in a single step – the user simply selects ‘Extract CD raw (*.bin)’ from the drop down menu and Isobuster will output the pair in the chosen location.
Cdrdao
Cdrdao is a general use utility for writing audio and data CD-Rs. Cdrdao [8] (which stands for “CD-R disc at once”) is a free and open source command line utility for both Win32 and Linux that allows the reading and writing of CDs in ‘disc-at-once’ mode [9]. This method extracts data in one pass, as opposed to the alternative methods that operate on a per-track or per-session basis. At the time of writing (July 2023) it is available in locations such as Ubuntu’s package manager and has had a release this year (2023), which suggests it is still a stable project. Cdrdao’s output when run on VCDs is a BIN/TOC pair, where the BIN file represents the raw data and the TOC (table of contents) functions in the same manner as a CUE sheet. Because Cdrdao does not generate a CUE sheet, workflows that use it must use the associated, and descriptively named toc2cue tool as an additional step (See Figure 2).
Process to create bin/cue
The command (which assumed the default optical device) used in our testing to generate the BIN/TOC pair was:
cdrdao read-cd --device /dev/sr0 --paranoia-mode 2 --read-raw --driver generic-mmc-raw --datafile output.bin output.toc
It is worth noting that Cdrdao includes the CD Paranoia library [10], which is a library that aims to help achieve more accurate rips of audio CDs. According to its manual page, Cdrdao defaults to a setting of --paranoia-mode 3
. In our testing we found that it was difficult to draw strong conclusions about the results of differing paranoia settings on damaged discs. There were a wide range of results across different discs and different drives with no setting consistently providing a clear advantage with regards to discernable A/V errors. The test command here includes the setting for level 2, but that is more to highlight the use of the flag than to endorse it – any workflow involving damaged discs would most likely benefit from some local experimentation to see how the different paranoia options interact with the drive or drives being used.
The second command used to generate the BIN/CUE pair is toc2cue output.toc output.cue
. This uses the toc2cue tool (which is a part of the Cdrdao installation) and converts the .TOC file created by Cdrdao to a CUE sheet that can be used by other tools.
VCDImager
VCDImager is a ‘full-featured mastering suite for authoring, disassembling and analyzing Video CDs’ [11] that is maintained as a part of the GNU Project and has existed since the year 2000. As one of the few dedicated VCD projects that is still actively maintained it was of great interest to us as a tool. The suite includes a tool for ripping the contents out of VCDs, as well as a tool for generating BIN/CUE images that can be written to discs from extracted VCD information. As such, it is not directly creating a BIN/CUE pair in the manner of the other tools tested, but rather is creating a file dump with an XML file of technical metadata that can then be used to engineer a viable VCD image. Vcdxrip is also capable of extracting files from BIN/CUE images of VCDs.
Process to create BIN/CUE
The initial command to generate the BIN/CUE pair in our testing was vcdxrip -v -C=/dev/sr0
which generates a full dump of all the VCD’s contents and automatically converts the .DAT files to .MPG for easy use. It also generates an XML file that contains technical information about the VCD which can be used both for analysis and to generate a BIN/CUE pair from the extracted VCD contents using the command vcdxbuild -b output.bin -c output.cue input.xml
.
Comparison of damage handling
VCDImager’s vcdxrip tool was noted in many legacy online sources as being unable to handle damaged discs, and testing confirms that its VCD extraction will abort when it hits the damaged portion of intentionally damaged test discs. Additionally, beyond this limitation on physical discs, vcdxrip failed in an identical way on extracting files from a BIN/CUE pair generated from the damaged disc with alternate tools.
Cdrdao created a complete BIN/CUE pair from intentionally damaged test discs that was both able to be split into its component elements as well as mounted and played. As was noted in the above section, we found that using the flag --paranoia-mode 2
could at times result in different results than the tool default of --paranoia-mode 3
depending on the nature of damage and the drive used. It is recommended that in any implementation some local experimentation be done to confirm if either setting results in more optimal results.
Isobuster also created a complete BIN/CUE pair from damaged test discs that was able to be manipulated and played back.
Comparison of replicability
The following tables show both md5 hashes and file sizes for the BIN outputs of each tool. Note that Isobuster and Cdrdao both generated images that were identical while vcdxrip’s output (that was generated in its two step process) was different both in file size and file fingerprint.
Table 2. Disc 2 – Pressed Commercial Release of Kikujiro no Natsu Disc 1 (See Figure 3) [13].
Tool | md5 | Image size (bytes) |
Isobuster | 73ed9542490afc1162b87773a3cb24a6 | 677284272 |
cdrdao | 73ed9542490afc1162b87773a3cb24a6 | 677284272 |
vcdimager | 6026fa6dfe985601a37ff0a3f9899d92 | 677418336 |
Subscribe to comments: For this article | For all articles
Leave a Reply