daemon like scan2pdf with OCR in Linux

I spend some time rewriting my scanner script.

My script has now these feaures:

  • autocrop
  • OCR via cuneiform
  • daemon like behavior
  • play a sound when script is finished
  • add TODO entry in my orgmode

To get the script running under Ubuntu 12.10 you need some packages:

apt-get install cuneiform sane-utils imagemagick exactimage

 

Now you need to find the name of you scanner by running

scanimage -L

In my case it is pixma:04A9173A_8447DA

You also need to create a folder to store your pdfs

mkdir $HOME/scans

Now you need to save the script somewhere

nano $HOME/scan2ocr.sh

 

#!/bin/bash

scan_device=pixma:04A9173A_8447DA
path=$HOME/scans/
org_file=/home/me/org/scans.org
sound=/usr/share/sounds/purple/receive.wav

while :

do

date=`date +%F-%H-%M-%S`

#scan image

scanimage –device-name $scan_device –format tiff –resolution 150 –mode Gray –button-controlled=yes >/tmp/scan.tiff

# crop image

convert /tmp/scan.tiff -crop `convert /tmp/scan.tiff -virtual-pixel edge -blur 0x15 -fuzz 15% -trim -format ‘%[fx:w]x%[fx:h]+%[fx:page.x]+%[fx:page.y]’ info:` +repage /tmp/scan_crop.tiff

# create one-page pdf
#tiff2pdf -o /home/me/test.pdf -p A4 -F -f /home/me/test_crop.tiff

# OCR

cuneiform -l ger -f hocr -o /tmp/scan_ocr.hocr /tmp/scan_crop.tiff

# combine *.hocr and *.tiff to pdf file

hocr2pdf -i “/tmp/scan_crop.tiff” -s -o “$path$date.pdf” < “/tmp/scan_ocr.hocr”

# add TODO entry for org-file

echo “* TODO sort [[file:$path$date.pdf]] :scans:” >> $org_file

aplay $sound

done

 

and make it executable by

 

chmod +x $HOME/scan2ocr.sh

 

Now you can run the script with

 

./$HOME/scan2ocr.sh

 

The script waits for you to press the scan button and starts the process when pressed. After the image is scanned and processed it will give you a feedback by playing a sound.

 

You might need to change the path to the sound file you may also don’t want to use the orgmode integration.

Advertisements