The article describes a method to present minute differences in large document volumes in a space efficient way.
Publication layout verification method
Disclosed is a method to calculate and present page differences for large page volumes in document processing work flows.
Publications have to be manually verified for correct layout and composition. To refine the esthetic representation of a publication, it is required to apply repeatedly subtle changes to the document data and make subsequent reviews of those changes. It is also required to verify, that the changes do only affect the desired part of the document where as the remaining part has to stay unchanged. With larger documents the repeated modifications are cumbersome and costly to verify.
Markup languages are taking care of a document layout, but content changes done by the author and targeted for one page might still propagate into other pages which were not targeted for the change. Using a side by side comparison of the previous and the current version, the author verifies, that the changes made to the document are as intended. Usually the verification effort is reduced to a limited number of pages. Even if the verification is based on the author's experience, many pages remain unchecked. For quality production the verification of the content modification on only the target page is not sufficient.
The unique method combines a series of processing steps to identify and present page differences on one screen for large page volumes. The reduction in processing time to detect minor changes is significant.
For a demonstration of the method, the system tools (pdf viewer, convert, gimp) and APL2 functions on Fedora 11 are used.
The method starts with the comparison of two 'identical' pdf documents. In a first step, the conversion to raster images is done:
convert Document
_
_B.bmp
The conversion creates one .bmp file per pdf page:
Document
A-0.bmp ... Document
_B-0.bmp ... Document
_B-nn.bmp.
nn denotes the maximum page number of the pdf-document.
For this demonstration, comparing the data from two different instances of cgi.pdf
[1] reveals the method, even if the page number is small (13). Viewing the two documents in a pdf viewer did not allow to detect any difference. The bmp format shows the right image with a wider margin (Figure1, Figure 2). This margin difference between the two bmp instance is identical for all pages. Because only the content is of interest, all margins got removed before further processing. For reference, page 1 of each file is shown (Figure 1, Figure 2).
Document
_A.pdf Document
_B.pdf Document
A.bmp
convert Document
A-nn.bmp,
_
_
1
Fig.1: BMP version of page 1 of cgi.pdf instance A.
2
Fig.2: BMP version of page 1 of cgi.pdf instance B.
In a second processing step a function extracts the bitmaps (header and padding removed) from the corresponding pages of the Docu...