In order to scan and recognize Chinese pages, I was looking for an OCR on my linux distribution.
After several tests, I came to choose tesseract, which was originally developed… in HP Labs ! (now under Apache license).
Unfortunately, the Ubuntu package available was built for version 2.x, which does not allow Chinese support (need version 3.x)…
So I installed version 3.0 from scratch.
This is not too much difficult, but there are some steps to follow, and I propose to guide you there…
Continue reading “OCR on Ubuntu ?” »