Some assistance/advice for OCRing?

hedge@beehaw.org · 1 year ago

Some assistance/advice for OCRing?

MasterBuilder@lemmy.one · 1 year ago

I use ocrmypdf, after being a bit frustrated with gscan2pdf. There is a simple ui available, but I just created a tiny script that does the ocr , deskew, etc. In one operation with wildcard file selection.

I also installed a jbig compressor that really shrinks images. My processed docs are generally 40% to 80% smaller, and it seems to get better tesseract output than gscan does.

donio@beehaw.org · edit-2 1 year ago

OCRmyPDF is what I use as well, had good luck with it on boardgame rulebooks that sometimes come with missing or partial embedded text. Combined with recoll and the Emacs pdf-tools mode I have it all indexed and at my fingertips.

hedge@beehaw.org · 11 months ago

deleted by creator

MasterBuilder@lemmy.one · 11 months ago

I don’t know, but there might be pdf viewers that permit editing layers. Try LibreOffice Draw or gscan2pdf. Maybe The Gimp can do it.