Wednesday, July 31, 2024

Creating PDFs of Vintage Manuals - Harder Than it Appears?

I have accumulated quite a bit of documentation, over the years. It's always been on my "things to do someday" list to scan what I have, and upload it to Internet Archive. However, recently, I was really annoyed because I needed a manual that the only source for was a very expensive copy, on eBay. After grinding my teeth about it for a while, I purchased the manual, and waited for it to arrive.

Once it showed up at my house, it was at least well packaged, and in good condition. After this, I tried to take apart a less valuable manual, that was also spiral bound, for scanning. I am lucky enough to have a decent Brother duplexing MFP that also has an ADF for duplex scanning. That's where my luck ran out, though. I quickly discovered that my printer both was unable to reliably feed books, with their shiny covers, and with their varying weight of paper. I found a workaround, where I would separately scan the covers on the flatbed, and then run the pages through the ADF. My next issue was that the MFP refused to duplex scan 8x10" documents. To get past that issue, I scanned all the pages single-sided, and used the really cool pdftk utility to interleave the pages together into a single PDF.

Now that the scanning part was done, I took a look at the result. Unfortunately, it wasn't fantastic. Quite a few of the pages, which were on a fairly lightweight bond paper, had significant bleed-through of the text on their backs to the front, due to the brightness of the scanner's lamp. After talking to a friend who does a lot more book scanning than I do, it was suggested to me to use GraphicConverter. That turned out to be a great idea, and I was able to change pages like this into to ones like this.

This worked, but it was a heck of a lot of work. I quickly realized that I was going to be far too lazy to do this more than a few times. I spoke to my friend again, and they suggested that I get a dedicated ADF scanner that was well-thought-of by folks who do a lot of scanning. After a bit of research, I settled on a Kodak i2600 Document Scanner that was well-used, but fully functional, and available for a reasonable price on eBay. That scanner turned out to be exactly as described, and worked immediately, after I figured out how to properly unfold the input and output trays. With that scanner, I was able to much more easily generate output that looks like this.

To document this process, here is exactly what I did:

  1. remove the spiral binding from the book
  2. split the book into ~50 page sections
  3. run each section through the Kodak i2600, generating individual page TIFF images
  4. convert each TIFF image into a PDF
    1. for doc in *; do echo $doc; convert $doc $doc.pdf; done
  5. reduce the size of the very large (>100MB) individual pages to something more reasonable
    1. for file in *; do if [ ! -d $file ]; then echo $file; pdf2ps $file ../pdf/$( echo $file | sed 's/ps$/pdf/' ); fi; done
    2. for file in *; do if [ ! -d $file ]; then echo $file; ps2pdf $file ../pdf/$( echo $file | sed 's/ps$/pdf/' ); fi; done
  6. last, I used pdftk to join the individual pages into a single book
    1. pdftk `ls -1` cat output merged.pdf
  7. that's it!

I'll be looking to improve the process, but this has worked well enough, so far.

 

No comments:

Post a Comment