11.1.2.2 Creating Digests for Files

11.1.2.2  Creating Digests for Files

  We can obtain the message digest for a file. Sometimes this is the preferred thing to do where we store the message to be sent in a file and then process it. Otherwise, there are situations where we want to ensure that an important file is not tampered with from the time we create it to the time we use it.

The following program reads a text file line by line and then adds the lines to the context of an MD5 hash. Finally, it creates a message digest for the whole file’s content. It prints the digest in hexadecimal format for our benefit. It also stores the digest in binary form in a file that has the .digest extension. If the file we are digesting is called processes.tex, the digest file is called processes.tex.digest. The reason for storing the digest file is that later, if we try to use the same original file, we can check to see whether the file has been tampered with from its original condition. We can also give the file to someone else, and that person can verify whether the file is still in its unadulterated original form. For this to be successful, the digest has to be given to the second party as well. Also, the second party has to be told what digest algorithm was used. The program is given below.

 Program 11.2

#!/usr/bin/perl
#file md5File.pl

use strict;
use Digest::MD5;

my ($file, $context, $digest, $lineCount);

print "Give the name of a file to digest: ";
$file = ;
chomp $file;

$context = Digest::MD5 -> new ();

open IN, $file;
while (){
     $context -> add ($_);
     $lineCount++;
   }

print "Added $lineCount lines to the digest context\n";
$digest = $context -> digest ();
printf "The  digest is %s\n", unpack ("H*", $digest);

open OUT, ">$file.digest";
print OUT $digest;
close OUT;

The program starts a new MD5 context called $context. It reads the file line by line in a while loop. As each line is read, it is added to $context. We also print a line count just to see how many lines are there in the file being processed. The digest is written to a new file at the very bottom of the program. An interaction with this program is shown below.

 

Give the name of a file to digest: processes.tex

Added 3264 lines to the digest context

The  digest is 48a38bf059676b9f7644e64b25de08e9

 

Thus, the digest printed above in hexadecimal format corresponds to all of the 3264 lines of text in the file processes.tex.

The Digest::Message module provides us with a shortcut to produce the digest of the content of a whole file. There is an MD5 context method called addfile that does so. The following program is a rewrite of the previous program.

 Program 11.3

#!/usr/bin/perl
#file md5File1.pl

use strict;
use Digest::MD5;

my ($file, $context, $digest, $lineCount);

print "Give the name of a file to digest: ";
$file = ;
chomp $file;

$context = Digest::MD5 -> new ();

open FILE, $file;
$context -> addfile (*FILE);
close FILE;

$digest = $context -> digest ();
printf "The  digest is %s\n", unpack ("H*", $digest);

open OUT, ">$file.digest";
print OUT $digest;
close OUT;

 

This program can produce an MD5 digest of any file, whether text or binary. In Perl, when a filehandle is passed as an argument to a function call, one way to do so is by placing a * in front of it and making it a so-called typeglob. Output of two runs of the program is given below.

 

Give the name of a file to digest: processes.tex

The  digest is 48a38bf059676b9f7644e64b25de08e9

 

Give the name of a file to digest: jk1.jpg

The  digest is 1178ebfe124542afcc586e766d96c1a5

 

Here, processes.tex is a text file whereas jk1.jpg is a binary graphic file. Notice that the digest for the file processes.tex was produced by the program in the previous section as well, and the digest comes out as the same, whether we add the file to the context line by line, or use the addfile construct.

Please note that for the digest to have any value, we must write another program that looks at the digest and verifies it when the digested file is about to be used. Another point to note is that the digest can be stored in the same file as the original, instead of a separate file.

Below, we present a program that is given a file name as a command-line argument. It computes a new MD5 digest of the content of the file and compares it with the MD5 digest previously computed to see if the two digests are the same. If the two are the same, it concludes that the file has not been tampered with and is the original.

 Program 11.4

#!/usr/bin/perl
#file: md5FileVerify.pl
#usage: %md5FileVerify.pl originalFileName
use strict;
use Digest::MD5;

my ($file, $context, $oldDigest, $newDigest);

$file = $ARGV[0];
if (!(-e "$file.digest")){
      print "There is no digest file: $file.digest\n";
      print "Conclusion: The file is not original\n";
      exit;
   }

$context = Digest::MD5->new();
open FILE, $file;
$context -> addfile (*FILE);
close FILE;
$newDigest = $context -> digest ();
print "The new digest is " , unpack ("H*", $newDigest), "\n";

open DIGEST, "$file.digest";
$oldDigest = ;
close DIGEST;
print "The old digest is " , unpack ("H*", $oldDigest), "\n";

if ($newDigest eq $oldDigest){
    print "Conclusion: The file is original\n";
  }
  else{
    print "Conclusion: The file is NOT original\n";
  }

 

The program obtains the name of the file to verify from the command line. It checks to see if a corresponding file with the extension .digest exists. If such a file does not exist, it concludes that the file cannot be verified as original and exits.

If the digest file exists, the program opens the file to verify and computes a new digest for the content called $newDigest. It reads the digest file and reads the old digest into the variable $oldDigest. If $oldDigest and $newDigest are equal, the program concludes that the file given as command-line argument is the original.

Here is the output the program runs when it is called with the argument jk1.jpg.

 

%md5FileVerify.pl jk1.jpg

The new digest is 1178ebfe124542afcc586e766d96c1a5

The old digest is 1178ebfe124542afcc586e766d96c1a5

Conclusion: The file is original

 

Even if one character is changed in the original file, the program will not verify the file as the original.

If we alter one character of the file processes.tex using a text editor, and then we verify it, we get the following printout on the terminal.

 

%md5FileVerify.pl processes.tex

The new digest is 9a969870291ec7368a767acfb93fcd05

The old digest is 48a38bf059676b9f7644e64b25de08e9

Conclusion: The file is NOT original

 

The two digests are quite different after one character was changed in a file with more than 3000 lines of text.

However, the fact that two files are different, is something that can be easily done by a command such as diff in Unix. Of course, we need two files, the original and the new to see if the files are the same or different. Quite often, two such files are not available. diff uses an efficient string comparison algorithm to find the differences among two files.