Tuesday, October 26, 2010

Creating djvu documents (ebooks) from scans

The best way to make ebooks would be to scan and then convert them to djvu. Not pdf because pdf is very bad when it comes to packing images. Djvu on the other hand is more optimised to handle scanned documents. It could generate a small file without losing quality of the output.

After getting all the scans in place there are a few things to do to get the final djvu document.

  • Rename all files in the following order
    book-001.jpg book-002.jpg...... book-024.jpg book-025.jpg.... book-165.jpg book-166.jpg
    It can be anything instead of book-
    But the crucial thing is that the numbers should be padded to facilitate numerical sorting otherwise if it is like 1.jpg 2.jpg ... 10.jpg 11.jpg ... then 10.jpg would occur before 2.jpg.
    We do not want that. Hence naming should be such.

    If it is already in the format : 1.jpg 2.jpg ... 10.jpg 11.jpg ... : then put 1-9 in one folder, 10-99 in another , 100-999 in another and so on. Now, go to each folder and just bulk rename them using

    rename -v 's/book-/book-00/' book-*.jpg
  • Now, collect all files at one place and convert all files to jpg.
  • Now, run the jpg -> djvu script

    The script could be found here http://www.howtoforge.com/creating_djvu_documents_on_linux

    I use a modified version suitable only for black and white scans. It is as follows -

    # any2djvu-bw

    if [ -z `which anytopnm` -o -z `which ppmtopgm` -o -z `which pgmtopbm` -o -z `which cjb2` ]; then
      echo "Error: anytopnm, ppmtopgm, pgmtopbm and cjb2 are needed"
      exit 1

    shopt -s extglob

    # uncomment the following line to compile a bundled DjVu document

    function usage() {
      echo "usage:"
      echo "$0 [\"REGEXP\"]"
      echo "    converts single pages with the default mask $DEFMASK (or REGEXP if provided)"
      echo "    in the current directory to single-page black and white djvu documents"
    # uncomment the following line to compile a bundled DjVu document
    # echo "    and bundles them as a djvu file $OUTFILE"

    if [ -n "$1" ]; then

    for i in $MASK; do
      if [ ! -e $i ]; then
        echo "Error: current directory must contain files with the mask $MASK"
        exit 1
      if [ ! -e $i.djvu ]; then
        echo "$i"
        anytopnm $i | ppmtopgm | pgmtopbm -value 0.499 > $i.pbm
    # in netpbm >= 10.23 the above line can be replaced with the following:
    #   anytopnm $i | ppmtopgm | pamditherbw -value 0.499 > $.pbm
        cjb2 -dpi $DPI $i.pbm $i.djvu
        rm -f $i.pbm

    # uncomment the following line to compile a bundled DjVu document
    #djvm -c $OUTFILE $MASK.djvu
  • The djvu joiner in the script does not work well. So use the method given in http://en.wikisource.org/wiki/Help:DjVu_files#Method_1 instead.

    djvm -c outputfile.djvu book-*.djvu
Now, we have a properly arranged ebook. Enjoy !!

