How to summarize a scanned PDF document by AI?

meowmeowmeow@lemmy.ml to Asklemmy@lemmy.ml – 23 points –
  1. Don't have ChatGPT
  2. OCR needed
  3. Preferably Android

Thanks.

16

It will be a great deal quicker just to read the damn thing.

  1. Download any OCR software from f-droid, or preferred store.
  2. Copy text.
  3. Run llama-gpt¹ if you want something self-hosted or any LLM² on huggingface chat if you want ready solution
  4. Paste text and write something like "summary:" below.

¹Theoretically possible on mobile, but for better performance, run it on PC.

²Default one should do the job.

Disclaimer: I think that it should work, but I haven't done anything like that before

I have actually tried it, but from doc files on a PC and running python.

My main issue is that the model doing it well need a commercial licence. I have the paygrade to experiment by myself on my work time, but not the one to spend company's money for it. And IT just signed a contract to get GPT4 has part of bing chat pro

Android won't be easy, but you can slap together a python script that runs tesseract or easyOCR and runs it through a pretrained LLM like T5. Those are well-known and well-documented, so chatGPT can probably write the script for you without too many hiccups.

chatGPT can probably write the script for you

From OP:

  1. Don't have ChatGPT

I read that as either "I don't have premium" or "I can't run this data through chatgpt for whatever reason".

Free chatGPT is viable for writing scripts in any case.

Yeah, maybe he/she don't have API access, I didn't think about it that way.

I'm guessing they meant don't want to use chatGPT considering it's free

Well, you give open AI a lot of personal data, so it's not free from a certain point of view. That may be the reason why OP don't want to use it.

Plenty of valid reasons not to want to use it was just the wording that seemed odd

And you can run that in termux, so you csn use it in android

What‘s the worth of AI generated summaries if they are not factually reliable? The new Google search result previews that are generated by AI (and I believe Google as a large company has more resources than most of us do) contain so many obvious factual errors (i.e. made-up names, wrong places, false dates) that I really doubt current generation AI is ready to be a reliable help in this use case.

I, too, like the idea of not having to do all this work manually. But we’re not there yet.