Converting a scanned document into a compressed, searchable PDF with redactions

Created on 2022-08-27T06:52:40-05:00

Return to the Index

This card pertains to a resource available on the internet.

This card can also be read via Gemini.

$ infile=scan.pdf
$ tmpfile=$(mktemp)
$ outfile=searchable-scan.pdf
$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$tmpfile" "$infile"
$ ocrmypdf -l eng --deskew "$tmpfile" "$outfile"
$ rm $tmpfile

Order of compression matters. Article author found running optimization with gs prior to OCRmyPDF shaved the file from 1.5mb to 1mb. Running only OCRmyPDF took the scanner's raw output from 7.9mb to 2.7mb.