ChatGPT, how do I use OCR in Word? to – 614 points –


ChatGPT is a revolution in surrealism.

No joke, there's a whole world of memes and interpretations we can get from them

What is it showing? What did it learn from in order to do that?

Like r/DisneyVacation, but with whatever the AI was smoking in slide 4

This is the only valid take tbh

It then gave me step-by-step text instructions on how to use the OCR feature in Microsoft Word to import text from a picture, and admitted in step 3 that the function doesn't exist. There were 6 steps.

and admitted in step 3 that the function doesn't exist.


I noticed that on my fourth read through. I'm still finding new obscure things in it.

Reading this is what I imagine having a stroke feels like.

It's also a glimpse of what it's like dealing with someone with dementia.

Some of the current thought on shortcomings of LLM capabilities actually takes influence from human cognitive science, and what can be learned from those with neurological impairments. It's thought that human language abilities are strongly dissociated from other reasoning abilities because individuals with aphasia can lack the ability to speak or comprehend language, yet be able to solve mathematical problems, engage in logical reasoning, enjoy music, categorize objects and events, etc.

It's shown that LLMs develop a crude world model for performing reasoning tasks, yet it's inextricably tied up with their language functionalities (since they are ONLY language based). The hope for future research is to develop AIs with world models and planning faculties that are decoupled from the language analysis module, which would mitigate hallucination and aid in interpretability.

Here's the Linux version of this:

I was following it correctly up until the part where you have to place a child on your laptop. I wish these things would let you know the parts required beforehand.

I regularly feel like I've turned into a magician when 30 layers deep into my process of "fixing" something. So at least that is accurate.

But are you using child labor yet?

This is some /r/surrealmemes shit right here.

Part of me wants AI to never evolve so it can keep making images like these forever.

Old technology never dies.

Except when it's closed source and on a company server somewhere

The specific programs may be lost but the idea behind them won't be.

What is fun to me is that it completely made up a bunch of computer and office accessories that don't exist.

Just wait, soon we'll all be editing documents using tiny scalpels.

Of course, the only person who would try to do this is doing an essay about marijuana.

That was common in graphical design back in the (pre 90s) day.

I worked in a print shop in the 90s (until 95) and we still used xacto knives for our layouts. We had a computer but on now really knew how to use it for graphic design yet.

The text in the image represents how accurate it tends to be whenever I try to OCR a document.

for windows use, try powertoys's powerocr

It uses the windows built in API for ocr. Isn't very good in my experience.

If you're not getting good results, have you tried Ocr asegontorrittln the image first?

fwiw I've used it pretty extensively on screenshots of text I keep getting sent at work, so far I haven't noticed any mistakes at all. may just be the type of images though

Laugh while you can, fellow meat bags.

I'm actually impressed by the reasonably coherent (though nonsense) text. If you think about how generative AI works it's very surprising it could form words in images.

Microsoft's image generator has been getting better and better at text. There are still plenty of problems, especially with small text, but someone on another forum was able to get it to output this with a very small prompt:

Last step, "diable" is the devil in French. Therefore the last text more or less means "OCR the text into the devil's text"

This looks like scam email and Aliexpress products merged together.

Ah shit. It lost me at painting the QR code by hand.

I prefer to convord ttp manually rather than use the trext tims.

But then how do you ensure that the text will be diåble?

I have never seen ChatGPT produce images. Is this a feature of 4.0?

Yeah is this linked with dall-e?

It is. The paid version (GPT-4) is integrated with DALLE-3.

This has all the hallmarks of "human pretending to be an AI" rather than actual AI output

I disagree. This is as you say Precisely the type of thing that happens when an image generator is asked to make a chart/diagram, so to me it seems a really wild leap to go from "This looks like exactly what happens when X" to "someone must have designed this to look like what happens when X".

If it were human designed, I think it would be intentionally funny (which realistically would backfire, but anyway...)

(And besides, paid ChatGPT does indeed connect to DALL-E 3 now)

Tbf I thought DALL-E3 was still just available via bing image creator, missed the memo that ChatGPT was hooked up to it too.

Still, for me though it still looks like it's human generated to try and be funny (it's just haha-AI-so-silly isn't groundbreakingly funny any more). It's mostly the information continuity throughout the image that I've not really seen from an image generating AI before (especially when not even prompted for it), and I've had a play around with DALL-E3 so I would expect the ChatGPT version to be equivalent.

Maybe I'm too cynical, but this just reeks of fake to me.

I tried the same prompts as OP, it didn't generate an image at first instance - had to ask it to generate one. This is the image I got:

ChatGPT takes the liberty of creating a DALL-E prompt that it doesn't feel the need to share with the user. You can, however, ask ChatGPT to share the exact prompt and seed with you to reproduce the image. Here is the actual prompt and seed DALL-E ended up working with:

Prompt: "A step-by-step visual guide on using Optical Character Recognition (OCR) in Microsoft Word. The guide includes steps like opening Microsoft Word, inserting an image into a Word document, selecting the image, and using the OCR feature to convert the text in the image into editable text. The layout should be clear and easy to follow, with each step labeled and illustrated in a user-friendly manner, catering to users with basic proficiency in Microsoft Word."

Seed: 3993182816

To be clear, ChatGPT decided on its own to create and send this prompt to DALL-E in response to my request for tech support.

Why do you think that?

There's a level of continuity in the image you don't get with image generating AI yet.

Also it's littered with "AI getting things slightly wrong* memes

Also also, ChatGPT doesn't output images

Ah fair play, I missed that memo, the first two points still apply though

Yep, sure, it's a wild world we live in and this topic is changing fast. Missing this memo won't matter when the next one will be the next generation but generations are only 6 months apart.

I had a lot of fun asking it to draw ASCII art for me... especially if you ask it for corrections about specific aspects of its art

Okay I've convorded the ttp by using the trext tins, but I'm not sure what comes next or why I'm holding a paint brush.

ChatGPT is like the special kid in school that can't have scissors or glue unattended.

And people are terrified at the idea of AGI. Lol. Lmao even.

It's not AGI that's terrifying, but how people are so willing to let anything take over their control. LLMs are "just" predictive text generation with a lot of extras to make things come out really convincing sometimes, and yet so many individuals and companies basically handed over the keys without even second guessing its answers.

These past few years have shown how if (and it's a big if) AGI/ASI comes along, we are so screwed, because we can't even handle dumber tools well. LLMs in the hands of willing idiots can be a disaster itself, and it's possible we're already there.