Welcome to CorrectOCR’s documentation!¶
- Workflow
- Configuration
- Commands
- Heuristics
- Correction Interface
- CorrectOCR package
History¶
CorrectOCR is based on code created by:
Caitlin Richter (ricca@seas.upenn.edu)
Matthew Wickes (wickesm@seas.upenn.edu)
Deniz Beser (dbeser@seas.upenn.edu)
Mitchell Marcus (mitch@cis.upenn.edu)
See their article “Low-resource Post Processing of Noisy OCR Output for Historical Corpus Digitisation” (LREC-2018) for further details, it is available online: http://www.lrec-conf.org/proceedings/lrec2018/pdf/971.pdf
The original python 2.7 code (see original
-tag in the repository)
has been licensed under Creative Commons Attribution 4.0
(CC-BY-4.0, see also
license.txt
in the repository).
The code has subsequently been updated to Python 3 and further expanded by Mikkel Eide Eriksen (mikkel.eriksen@gmail.com) for the Copenhagen City Archives (mainly structural changes, the algorithms are generally preserved as-is). Pull requests welcome!
Requirements¶
Python >= 3.6
For package dependencies see requirements.txt.
They can be installed using pip install -r requirements.txt