Dangerzone will convert possibly insecure files like PDF, DOCX into a safe PDF

PenguinCoder@beehaw.org to Technology@beehaw.org – 67 points –
Dangerzone
dangerzone.rocks

Literally one of the worst formats I deal with daily, from a security standpoint are PDFs. Very useful and predictable for the end user; yes, but very dangerous for the capabilities it allows.

Dangerzone works like this: You give it a document that you don't know if you can trust (for example, an email attachment). Inside of a sandbox, Dangerzone converts the document to a PDF (if it isn't already one), and then converts the PDF into raw pixel data: a huge list of RGB color values for each page. Then, in a separate sandbox, Dangerzone takes this pixel data and converts it back into a PDF.

9

You are viewing a single comment

So it basically rasterizes it? I wonder how it affects file size

No mention of OCR? Copy-pasting links or data will be a joy..

There is an optional Ocr pass, from what I understand

Yeah, definitely increases the size and removes some functionality that others may rely on. But for presentation of content which is what a PDF SHOULD BE, then it has typically worked fine. I've been using pandoc and some home grown scripts to do this sort of thing for a while.