daemon like scan2pdf with OCR in Linux

I spend some time rewriting my scanner script.

My script has now these feaures:

  • autocrop
  • OCR via cuneiform
  • daemon like behavior
  • play a sound when script is finished
  • add TODO entry in my orgmode

To get the script running under Ubuntu 12.10 you need some packages:

apt-get install cuneiform sane-utils imagemagick exactimage


Now you need to find the name of you scanner by running

scanimage -L

In my case it is pixma:04A9173A_8447DA

You also need to create a folder to store your pdfs

mkdir $HOME/scans

Now you need to save the script somewhere

nano $HOME/scan2ocr.sh




while :


date=`date +%F-%H-%M-%S`

#scan image

scanimage –device-name $scan_device –format tiff –resolution 150 –mode Gray –button-controlled=yes >/tmp/scan.tiff

# crop image

convert /tmp/scan.tiff -crop `convert /tmp/scan.tiff -virtual-pixel edge -blur 0x15 -fuzz 15% -trim -format ‘%[fx:w]x%[fx:h]+%[fx:page.x]+%[fx:page.y]’ info:` +repage /tmp/scan_crop.tiff

# create one-page pdf
#tiff2pdf -o /home/me/test.pdf -p A4 -F -f /home/me/test_crop.tiff


cuneiform -l ger -f hocr -o /tmp/scan_ocr.hocr /tmp/scan_crop.tiff

# combine *.hocr and *.tiff to pdf file

hocr2pdf -i “/tmp/scan_crop.tiff” -s -o “$path$date.pdf” < “/tmp/scan_ocr.hocr”

# add TODO entry for org-file

echo “* TODO sort [[file:$path$date.pdf]] :scans:” >> $org_file

aplay $sound



and make it executable by


chmod +x $HOME/scan2ocr.sh


Now you can run the script with




The script waits for you to press the scan button and starts the process when pressed. After the image is scanned and processed it will give you a feedback by playing a sound.


You might need to change the path to the sound file you may also don’t want to use the orgmode integration.


