4.9.3 Reading A “Record” At A Time

4.9.3  Reading A “Record” At A Time

 Let us now assign a value to $/ other than the empty string or the undefined value. To illustrate such use of $/, we go back to the bibliographic example we have discussed off and on in this chapter. Assume we have a special bibliographic file or database to hold all our bibliographic entries. In the case of TeX/LaTeX bibliographic entries, we follow a pre-specified syntax that we have adopted for ourselves to indicate individual
entries. Two example entries in the database file are given below. The bibliographic file is stored in ASCII format

@InBook{Abney92,
  author =      "Steven P. Abney",
  title =       "Principle-Based Parsing: Computation and Psycholinguistics",
  chapter =     "Parsing by Chunks",
  publisher =   "Kluwer Academic Publishers",
  year =        "1992",
  editor =      "Robert C. Berwick and Steven P. Abney and Carol Tenny",
  pages =       "257-278",
  address =     "Dordrecht, Netherlands"
}
       
@Book{Aoun93,
  author =  "J. Aoun and A. Li",
  title =  "The Syntax of Scope",
  publisher =  "MIT Press",
  year =  "1993",
  address =  "Cambridge, MA"
}

The bibliographic entry indicates the nature of the entry. In this case, the second is a book and the first is a chapter of a book. There are several other possibilities such as a paper in conference proceedings, a Ph.D. thesis and an unpublished manuscript, We assume there are no double quotes inside the value of an attribute such as title or author. We also assume that the attributes can occur in any order.

Our goal in this program is to go through a bibliographic database like this and for each entry, print the name of the author, the publisher, the year of publication and the address of the publisher. The program is given below.

 Program 4.39

#!/usr/bin/perl

$/ = "}";
$file = $ARGV[0];
open (IN, $file);

while ($record = ){
    if ($record =~ /@\w+{\w+,(.+)/s){
       $attrValPair = $1;
       ($author) = ($attrValPair =~ /author\s*=\s*"([^"]+)/);
       ($title) = ($attrValPair =~ /title\s*=\s*"([^"]+)/);
       ($publisher) = ($attrValPair =~ /publisher\s*=\s*"([^"]+)/);
       ($year) = ($attrValPair =~ /year\s*=\s*"([^"]+)/);
       ($address) = ($attrValPair =~ /address\s*=\s*"([^"]+)/);
        print "$author, $title, $publisher, $year, $address.\n"
    }
}
close IN;

The program sets the value of the input record separator to }. This is done with the assumption that the closing brace does not occur anywhere inside an entry. As a result, every time we read from the file, we read a whole record. After Perl has read a record, we extract the attribute-value pairs as one string. From this string, we extract the values of the various attributes that we want and then print them. The output of this program looks like the following. For each record, the program prints one line although in the output printed below the lines may be broken up into two or more.

Steven P. Abney, Principle-Based Parsing: Computation and Psycholinguistics, Kluwer 
    Academic Publishers, 1992, Dordrecht, Netherlands.
J. Aoun and A. Li, The Syntax of Scope, MIT Press, 1993, Cambridge, MA.
Adriana Balletti, Generalized Verb Movement, Rosenburg and Sellier, 1990, Turin, Italy.
Joseph Bayer, Directionality and Logical Form, Kluwer Academic Publishers, 1996, Dordrecht, 
    Netherlands.
Joseph Bayer, Final Complementizers in Hybrid Languages, , November, 1995, Katholieke 
    Universiteit Brabant, Tilber.
Chris Collins, Local Economy, MIT Press, 1997, Cambridge, Massachusetts.
Nelson Correa, Principle-Based Parsing: Computation and Psycholinguistics, Kluwer Academic 
  Publishers, 1992, Dordrecht, Netherlands.
Peter W. Culicover, Principles and Parameters: An Introduction to
                 Syntactic Theory, Oxford University Press, 1997, Oxford, England.
M. Diesing, Indefinites, MIT Press, 1992, Cambridge, MA.
Samuel Epstein, Principle-Based Parsing: Computation and Psycholinguistics, Kluwer Academic    
    Publishers, 1992, Dordrecht, Netherlands.
M. V. Liliane Haegeman, Introduction to Government and Binding Theory, Oxford University 
    Press,  1994, Cambridge, MA.
Norbert Hornstein, Logical Form, From GB to Minimalism, Blackwell Publishers, 1995, 
   Cambridge, MA.
Mark Johnson, Principle-Based Parsing: Computation and Psycholinguistics, Kluwer Academic 
   Publishers, 1992, Dordrecht, Netherlands.

Some lines have been broken into two for printing.