<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Rasmuson Library DVD Browser:  Fun with Screen Scraping and Drupal</title>
	<atom:link href="http://journal.code4lib.org/articles/469/feed" rel="self" type="application/rss+xml" />
	<link>http://journal.code4lib.org/articles/469</link>
	<description></description>
	<lastBuildDate>Wed, 22 May 2013 14:07:44 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: mark411</title>
		<link>http://journal.code4lib.org/articles/469/comment-page-1#comment-2557</link>
		<dc:creator>mark411</dc:creator>
		<pubDate>Mon, 31 Aug 2009 13:32:07 +0000</pubDate>
		<guid isPermaLink="false">http://journal.code4lib.org/?p=469#comment-2557</guid>
		<description><![CDATA[Hi! Is it possible to retrieve information from covers.hearsay24.com ?]]></description>
		<content:encoded><![CDATA[<p>Hi! Is it possible to retrieve information from covers.hearsay24.com ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Morlino</title>
		<link>http://journal.code4lib.org/articles/469/comment-page-1#comment-1301</link>
		<dc:creator>Mark Morlino</dc:creator>
		<pubDate>Mon, 22 Dec 2008 16:50:20 +0000</pubDate>
		<guid isPermaLink="false">http://journal.code4lib.org/?p=469#comment-1301</guid>
		<description><![CDATA[Hi Jonathan,

Thanks for your interest in our article.

We are just matching by title. IMDB and Rotten Tomatoes are both usually pretty good. If there are multiple matches, the first one is very often the correct one. The development and testing was time consuming because when I encountered false positives or false negatives I modified the code to prevent them, and that would require dumping all of the data and starting from the first DVD to make sure the change to the screen scraping code did not affect the processing of any of the other DVDs. 

Freecovers.net is the only one that offers a public API. Basically, the API offers the same searching functionality as the web page but returns an XML document rather than an HTML document. So we can parse the XML into a data structure easily and consistently to look for matches. It is considerably easier to code and less error prone than the screen scraping. Unfortunately, the perl module we are using to parse the XML stores the list of possible title matches as a hash, which means we do not maintain the order in which they were sent, and Freecovers appends various information to the titles (to indicate a release, a language, a region, or sometimes a video game by the same name) so there are many DVDs that have cover images available from Freecovers that the program doesn&#039;t because it cannot determine which one to use.

The DVDs in our catalog all have call numbers that begin with &quot;DVD&quot; the rest of the call number is based on the order in which the DVD was cataloged and whether or not it is part of a multiple dvd set. So it is fairly easy for us to search for them in the catalog by number. The browser can just search for the most recent DVD that it knows about and keep processing results until it finds a call number that does not begin with DVD.

I hope this helps.

-Mark]]></description>
		<content:encoded><![CDATA[<p>Hi Jonathan,</p>
<p>Thanks for your interest in our article.</p>
<p>We are just matching by title. IMDB and Rotten Tomatoes are both usually pretty good. If there are multiple matches, the first one is very often the correct one. The development and testing was time consuming because when I encountered false positives or false negatives I modified the code to prevent them, and that would require dumping all of the data and starting from the first DVD to make sure the change to the screen scraping code did not affect the processing of any of the other DVDs. </p>
<p>Freecovers.net is the only one that offers a public API. Basically, the API offers the same searching functionality as the web page but returns an XML document rather than an HTML document. So we can parse the XML into a data structure easily and consistently to look for matches. It is considerably easier to code and less error prone than the screen scraping. Unfortunately, the perl module we are using to parse the XML stores the list of possible title matches as a hash, which means we do not maintain the order in which they were sent, and Freecovers appends various information to the titles (to indicate a release, a language, a region, or sometimes a video game by the same name) so there are many DVDs that have cover images available from Freecovers that the program doesn&#8217;t because it cannot determine which one to use.</p>
<p>The DVDs in our catalog all have call numbers that begin with &#8220;DVD&#8221; the rest of the call number is based on the order in which the DVD was cataloged and whether or not it is part of a multiple dvd set. So it is fairly easy for us to search for them in the catalog by number. The browser can just search for the most recent DVD that it knows about and keep processing results until it finds a call number that does not begin with DVD.</p>
<p>I hope this helps.</p>
<p>-Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://journal.code4lib.org/articles/469/comment-page-1#comment-1298</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Sat, 20 Dec 2008 20:09:42 +0000</pubDate>
		<guid isPermaLink="false">http://journal.code4lib.org/?p=469#comment-1298</guid>
		<description><![CDATA[Oh, it also occurs to me that I&#039;m not sure how to identify _which_ bib records in my catalog represent movies. How are you determining that, by call number including &#039;DVD&#039; at your library?]]></description>
		<content:encoded><![CDATA[<p>Oh, it also occurs to me that I&#8217;m not sure how to identify _which_ bib records in my catalog represent movies. How are you determining that, by call number including &#8216;DVD&#8217; at your library?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://journal.code4lib.org/articles/469/comment-page-1#comment-1297</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Sat, 20 Dec 2008 20:06:39 +0000</pubDate>
		<guid isPermaLink="false">http://journal.code4lib.org/?p=469#comment-1297</guid>
		<description><![CDATA[I would love more information about how you retrieve the information from imdb, rottentomatoes, and FreeCovers.net.  You are just matching on title?  Do you have much trouble with false positives or false negatives matching on title keyword? Do these three services offer any kind of an API, or are you screen-scraping to find content?]]></description>
		<content:encoded><![CDATA[<p>I would love more information about how you retrieve the information from imdb, rottentomatoes, and FreeCovers.net.  You are just matching on title?  Do you have much trouble with false positives or false negatives matching on title keyword? Do these three services offer any kind of an API, or are you screen-scraping to find content?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
