Dependencies
============

* Python ≥ 2.5

* An OCR engine:

  + OCRopus_ ≥ 0.2 (tested with 0.2 and 0.3.1) —
    document analysis and OCR system

  + Cuneiform_ ≥ 0.7 (tested with 0.7, 0.8, 0.9, 1.0) —
    document analysis and OCR system

  + Ocrad_ (tested with 0.17 and 0.21) —
    document analysis and OCR system

  + GOCR_ ≥ 0.40 (tested with 0.48) —
    document analysis and OCR system

  + Tesseract_ ≥ 2.00 (tested with 2.04 and 3.00) —
    an OCR system

* DjVuLibre_ ≥ 3.5.21 —
  library for the DjVu_ file format

* python-djvulibre_ ≥ 0.1.14 —
  Python bindings for DjVuLibre_

* PyICU_ -
  Python bindings for PyICU IBM's ICU_ C++ API

* lxml_ —
  Python bindings for libxml2_

* html5lib_ -
  HTML parser based on the HTML5_ specification

* argparse_ —
  Python command line parser

.. _OCRopus:
   http://code.google.com/p/ocropus/
.. _Cuneiform:
   http://launchpad.net/cuneiform-linux
.. _Ocrad:
   http://www.gnu.org/software/ocrad/
.. _GOCR:
   http://jocr.sourceforge.net/
.. _Tesseract:
   http://code.google.com/p/tesseract-ocr/
.. _DjVuLibre:
   http://djvu.sourceforge.net/
.. _DjVu:
   http://djvu.org/
.. _python-djvulibre:
   http://jwilk.net/software/python-djvulibre.html
.. _PyICU:
   http://pyicu.osafoundation.org/
.. _ICU:
   http://www-306.ibm.com/software/globalization/icu/
.. _lxml:
   http://codespeak.net/lxml/
.. _libxml2:
   http://xmlsoft.org/
.. _html5lib:
   http://code.google.com/p/html5lib/
.. _HTML5:
   http://www.whatwg.org/specs/web-apps/current-work/
.. _argparse:
   http://code.google.com/p/argparse/

.. vim:ft=rst ts=3 sw=3 et tw=72
