Bibliographic Metadata Extraction from Theses

This article presents the application of part-of-speech (POS) based statistical text analysis to the task of bibliographic metadata extraction from electronic dissertations. By using the approach described here it is possible to detect the title of a Ph.D. paper with an accuracy of about 80%. The accuracy measurements are done using a conceptually simple approach and implementation.