Sometimes having a digital copy of a book is a practical way of accessing information. At the moment, I am moving around a lot, so have time whilst commuting, but also don’t want to carry a bulk of mass around with me. For this reason I scanned a small book I often read. I wasn’t happy with these images and wanted to create something more consolidated (a pdf) and also use OCR so I can quickly search the text for what I am looking for.
After some research I trailed a few different programs but (on Mac) found that ScanTailor gave me the best results. Their homage doesn’t have any Mac builds, but luckily someone has compiled ScanTailor-OSX (not all of them worked for me).
You can then create a new project and select the folder with your raw images. You can then follow the numbered steps in the top left.

EDIT: 07/May/2015 – If your original files are .pdfs you will need to convert them to images. On mac you can do this using Automator:

Depending on the text you scanned, you can then rotate the page(s) as necessary. You can use the “Apply to …” button to apply the orientation setting to all the images.
If you have split pages, select Auto and then press the small play button to apply the changes to all the images. Some of the changes probably won’t work so you’ll need to manually adjust a few.
Go through all the other settings in a similar way…
In Output (step 6) I set the mode to Black and White and then changed only select pages to colour.
I had some setting incorrect so I was getting the A5 page on a landscape A4 page. I wrote a quick script to fix this as I couldn’t find the problem:
mkdir -p ./new/
for f in *.tif
do
    convert “$f” -gravity Center -crop 3400×4808+0+0 +repage “./new/${f}_new.tif”
done

Crop and Canvas Page
The “-crop” image operator will simply cut out the part of all the images in the current sequence at the size and position you specify by its geometry argument.


convert rose: rose.gif
convert rose: -crop 40x30+10+10 crop.gif
convert rose: -crop 40x30+40+30 crop_br.gif
convert rose: -crop 40x30-10-10 crop_tl.gif
convert rose: -crop 90x60-10-10 crop_all.gif
convert rose: -crop 40x30+90+60 crop_miss.gif

[IM Output] ” src=”http://www.imagemagick.org/Usage/img_www/right.gif” height=”20″ width=”20″ /> [IM Output] [IM Output] [IM Output] [IM Output] [IM Output]

Removing Canvas/Page Geometry
If this canvas and position image information is not wanted, then you can use the special “+repage” operator to reset the page canvas and position to match the actual cropped image.


convert rose: -crop 40x30+10+10 +repage repage.gif
convert rose: -crop 40x30+40+30 +repage repage_br.gif
convert rose: -crop 40x30-10-10 +repage repage_tl.gif
convert rose: -crop 90x60-10-10 +repage repage_all.gif
convert rose: -quiet -crop 40x30+90+60 +repage repage_miss.gif

[IM Output]” src=”http://www.imagemagick.org/Usage/img_www/right.gif” height=”20″ width=”20″ /> [IM Output] [IM Output] [IM Output] [IM Output] [IM Output]

 

Crop relative to Gravity
The offset position of the “-crop” by default is relative to the top-left corner of the image. However by setting the “-gravity” setting, you can tell “-crop” to cut the image relative to either the center, corner, or an edge of the image. The most common use of a gravitated crop, is to crop the ‘center‘ of an image.


convert rose: -gravity Center -crop 32x32+0+0 +repage crop_center.gif
[IM Output]

Once you have all the image you will probably want to combine them into a pdf. I used the “Combine files into a single pdf” in Adobe Acrobat Pro to do this.

You can then use the Text Recognition to make the pdf searchable and select text and create bookmarks

The final thing I wanted was that the table of contents (bookmarks) appeared when the pdf was opened. After some reading I found you can to this on the Document Properties (File -> Properties -> Initial View)

References:

DIY Book Scanning consists of four steps:
  1. Build a book scanner
  2. Scan a book
  3. Clean up the images and package into an ebook
  4. Enjoy your ebook
Step 1. Build a book scanner
The DIY Book Scanner community has cooperatively developed many different styles of book scanner, but they all generally have the following components in common:
  • A cradle which holds the book open, but not flat (to protect more delicate books)
  • A transparent platen (glass or acrylic) which flattens the pages
  • A camera (usually two cameras, see here for recommendations) which takes images of the pages
Scanners range in complexity from the simplest (a cardboard box and a camera), to the most common (the standard DIY Book Scanner), to the cutting edge (automatic page turners). If you’re interested in scanning one or two books, the cardboard box scanner is where to start. If you have a more ambitious plan, you will want the standard DIY Book Scanner.
Since the most difficult part of building a standard DIY Book Scanner is cutting out the parts, we have a special forum for people willing to pay someone else to cut out the parts for them.
Step 2. Scan a book
Once you have a scanner, scanning a book involves setting up your cameras and lighting, taking pictures, and turning the pages of the book. People using the DIY Book Scanner have reported page rates in the range of 14 pages per minute.
You’ll end up with memory cards containing your book images, which you should then copy to your computer. One directory for the left page images, and one for the right images. People have reported successes using EyeFi wireless memory cards to automatically copy the images over to your computer, thus saving a step.
Step 3. Clean up the images and package into an ebook
Some members have shared their workflow for this step, but this gives the general idea:
Check your images to make sure you don’t have any pages missing or any pages duplicated. Once that’s done, you can combine the left and right page images into a single directory by renaming the images. On Windows, this can be done using Total Commander or Free Commander, and on OSX we have FileWrangler or Name Mangler.
The next stage is to get rid of image distortion, and crop the images so they look consistent from page to page. We use one of two awesome programs to do this, one is Scan Tailor and the other, for more command-line oriented users, is Book Scan Wizard.
The resulting images are then packaged into an ebook. Many of our members use Adobe Acrobat 9 to combine the images into a PDF and have Adobe OCR the images as well. Others give the images directly to ABBYY Finereader and let it do OCR and output.
Step 4. Enjoy your ebook
Once you have a PDF or other file, you can use other programs to convert to whatever ebook format you need. How this is done, and whether indeed it should be done, is a subject of ongoing debate.
Nevertheless, the hard work has been done, and your physical book is now in digital form!
References:
Advertisements