By: tomkeays

tomkeays — Mon, 24 May 2010 03:31:22 +0000

ISI Web of Science has changed their output format since the article was published, adding 14 new columns and requiring, therefore, that the Perl script be modified slightly. In each recordline array:

instances of @recordline[4], which reference the SO (journal source) column, need to be changed to @recordline[8].
instances of @recordline[7], which reference the DT (document type) column, need to be changed to @recordline[11].
instances of @recordline[14], which reference the CR (cited reference) column, need to be changed to @recordline[25].
instances of @recordline[36], which reference the UT (ISI number) column, need to be changed to @recordline[50].

The full revised script is:

# PROCESS THE SECTION BELOW THE SCRIPT (BELOW THE  __DATA__ TOKEN)
# RECORD BY RECORD

# file handle for output to a file:
$outfile="./isioutrefs2004.txt";
open (OUTPUT, ">$outfile");

while ( <DATA> ) {
    # remove trailing record separator if found:
    chomp;
    # split line by tab delimiter and put the result in an array:
    @recordline=split (/\t/,$_);
    $fullrec=$_ ;
    chomp (@recordline[11]);
    while ( <DATA> ) {
        # remove trailing record separator if found:
        chomp;
        # split line by tab delimiter and put the result in an array:
        @recordline=split (/\t/,$_);
        $fullrec=$_ ;
        # save the isinumber -the 37th element under a separate variable
        $isinumber=@recordline[50];
        $publishedinjournal=@recordline[8];
        chomp (@recordline[11]);
        foreach (@recordline[25]) {
            $count++;
            @citedrefs=split /;/, $_;
            foreach (@citedrefs) {
                # do for each cited reference               
                $countrefs++;
                @onecite=split /,/,$_;
                # add a line with running number, PubYear, #Title, and ISINumber to the output file
                print OUTPUT "$countrefs\t$onecite[1]\t$onecite[2]\t$isinumber\n";
                # this line just echoes some fields to the screen #as the script runs.
                # it may be commented out.
                print "$countrefs\t$onecite[1]\t$onecite[2]\t$isinumber\n";
            }
        }
    }
}
# the next two lines just echo some fields to the screen as the script finishes.
# they may be commented out.
print "Total Refs: $countrefs\n";
print "Total Number of Articles Authored/Co-Authored by MCW: $count";

# the data to be processed begins below the __DATA__ token.
__DATA__

By: SmartiePants

SmartiePants — Mon, 22 Sep 2008 21:16:42 +0000

Bravo Alfred, Bravo.

Comments on: Mining Data from ISI Web of Science® Reports

By: tomkeays

By: SmartiePants