But Claude said tumor!

ElCanut@jlai.lu to Technology@beehaw.org – 394 points –
79

"AI is nowhere near to being ready to replace you at your job. It is, however, ready enough to convince your boss that it's ready to replace you at your job."

I remember reading an article or blog post years ago that persuasively argued that the danger of AI is not going to be that it ends up doing things better than humans, but that it causes a lot of harm when entrusted with tasks it actually isn't good at. I think that thesis seems much more plausible now, watching people respond to clearly flawed AI systems.

Never attribute to malevolence that which can be explained by incompetence.

Including the end of humanity at the hands of the robots apparently

That reminds me of a fairly recent article about research around visualisation systems to aid with interpretable or explainable AI systems (XAI). The idea was that if we can make AI systems that explain their reasonings, then they can be a useful tool, especially in the hands of domain experts.

Turns out that actually, the fancy visualisations that made it easier to understand how the model had come to a conclusion actually made subject matter experts less accurate in catching errors. This surprised researchers and when they later tried to make sense of it, they realised that they had inadvertently dialled up people's likelihood to trust the model because it looked legit.

One of my favourite aphorisms is "all models are wrong, some are useful." Seems that the tricky part is figuring out how wrong and how useful.

This is nothing new though. For decades, managers have fallen for "solution in a box" sales pitches even though front line workers know it's doomed to fail as soon as they set eyes on it. This time the solution just happens to be "AI."

It’s worse now than ever though, many managers have been steeped in tech optimism their whole working careers. The failures of “revolutionary new systems” have been forgotten about while the success of other things are lauded.

They’ve been primed to jump on any new “innovation” and at the same time B2B marketing has started adopting some of the most manipulative practices that used to be only used on consumers. They’ve crafted a narrative that shapes discourse so the main objections that appear are irrelevant to the actual issues managers might run in to.

Stuff like “but what if it is TOO good?!” and “what if the wrong people get their hands on this AMAZINGLY POWERFUL new tech?!”

Instead of “but does this actually understand anything or is it just giving output that looks correct?” or “ Wait, so, how was this training data obtained? Will there be legal issues from deliverables made with this?”

The average manager has been primed by the zeitgeist to ask the sales rep the kinds of questions they want to answer.

Seems to me that a lot of the world's problems start with "well, the managers think..." They all seem extremely bad at the whole managing thing, good thing we don't overpay them or anything like that.

Probably bosses are trying to convince AI that AI is ready.

[...] ready enough to convince your boss that it's ready to replace you at your job."

That's great though. Then said boss can rehire the people they fired for a noicely risk-adjusted premium.

Stupidity traditionally hurts (the wallet)

Using a Large Language Model for image detection is peak human intelligence.

I had to prepare a high level report to a senior manager last week regarding a project my team was working on.

We had to make 5 professional recommendations off of data we reported.

We gave the 5 recommendations with lots of evidence and references to why we came to that decision.

The top question we got was: “What are ChatGPT’s recommendations?”

Back to the drawing board this week because LLMs are more credible than teams of professionals with years of experience and bachelor-masters level education on the subject matter.

It is quite terrifying that people think these unoriginal and inaccurate regurgitators of internet knowledge, with no concept of or heuristic for correctness... are somehow an authority on anything.

All you need to succeed on this planet is the self confidence to say things. It literally does not matter the accuracy. It’s how you express it. I wish I knew this when I was younger. I’d cut out all the imposter syndrome that held me back.

I wish it was that easy. If you go too long it's boring, and if you're too confident you sound arrogant. At this point I've kind of just accepted there are people who can sell, and that I'm not one of those people.

I think this depends on the crowd. Unfortunately, the intelligent crowd and the crowd with money and power is not exactly the same. Though hopefully there is overlap.

You, and we, are better off for it.

The issue is that it's been forgot (Remember the 5th of November)

Only thing you need to do to realise how bad they are is to play Chess against it. Vs using a chessbot from 30 years ago, it really shows.

you fool

"these are chatgpt's recommendations we just provided research to back them up and verify the ai's work"

"What do we pay you guys for then? You are all fired and Tummy the intern will do everything with ChatGPT from here on out!"

You joke but several sections of our HR department got cut and replaced with Enterprise GPT-4. We talk to an internal chatbot now about HR questions and some forms.

You should see if you can get it to hallucinate a pay raise or 3 months vacation.

It did the opposite lmao. I asked it what my vacation leave was because you need to verify leave amounts before you’re allowed to request any additional leave. It said I had 0 in my balance and I know for a fact I have at least a week left 🤪 took almost a month to sort it out. Had to provide balance screenshots and everything. I’d be probably fucked if I hadn’t manually screenshot my leave amounts beforehand.

Why can't they just use a simple calendar app system where you book it off???? Who would use a large language model for that rubbish?

That is the least worst implementation!

I knew one HR person who cared about employees and did her best to help out. She only lasted 6 months.

Haha and then the conversation would then be “Yes but can we see ChatGPT’s research?”

That's when you drop trou, bend over, spread the cheeks, and ask them to let you know when they're done reviewing ChatGPT's "research".

My butt is much too perky for these goons. They don’t deserve it.

"It came up with more or less the same recommendations. Though it didn't fully understand the specific target goals of your project, so our recommendations are more complete and actionable ready."

I think this points to a large problem in our society is how we train and pick our managers. Oh wait we don't. They pick us.

I mean, as long as you are the one prompting ChatGPT, you can probably get it to spit out the right recommendations. Works until they fire you because they are convinced AI made you obsolete.

AI cars are still running over pedestrians and people think computers are to the point of medical diagnosis?

There are some very impressive AI/ML technologies that are already in use as part of existing medical software systems (think: a model that highlights suspicious areas on an MRI, or even suggests differential diagnoses). Further, other models have been built and demonstrated to perform extremely well on sample datasets.

Funnily enough, those systems aren't using language models 🙄

(There is Google's Med-PaLM, but I suspect it wasn't very useful in practice, which is why we haven't heard anything since the original announcement.)

I have read some headline that said that some of these models just measure age of a patient and a quality of the machine making photos.

I have read some headline

Really.

Says all you need to know about their opinion lol

Still AI misalignment is a real issue. I just don't remember which model was studied and had been found out that it was missaligned.

That and bias, absolutely need improvements. That doesn't mean LLMs can't be extremely effective if given appropriate tasks. The problem is that the people who make decisions about where they're used aren't technical enough to understand their strengths and limitations

I don’t think technical knowledge gives as good a sense as a lot of experience working with one.

Like saying the guys who designed a particular car would know best how it’ll perform on various racetracks. My sense is a driver would have a better sense.

I guess what I meant by technical knowledge meant to be less about general tech and more about specifically LLM tech

Eh. Depends on which tech is being used and how. For a lot of things, relatively basic ML models purposefully trained do a pretty good job, and are, in fact, limited by the diagnoses in the training data. But more generalized "AI" tools seem rather... questionable.

Like, you can train a SVM on fMRIs to compare structures in the brain between patients diagnosed with bipolar disorder and those that are not diagnosed with it, and it will have an accuracy rate on new patients basically equal to the accuracy rate of the doctors who did the diagnosing in the training set. But you'll have a much harder time creating a model that takes in fMRIs and reports back answers to the question of "which brain disease or abnormality do I have?"

This stuff works much closer to advertised when it's narrowly defined and purpose built, but the people making and funding this work want catch-all doctor replacements, because of course they do, because there's way more money in charging hospitals and patience 10% less than a doctor's salary than there is in providing tools that make doctors' efforts in diagnosing specific illnesses easier.

Or, at least there is if you can pull it off.

Precisely. Many of the narrowly scoped solutions work really well, too (for what they're advertised for).

As of today though, they're nowhere near reliable enough to replace doctors, and any breakthrough on that front is very unlikely to be a language model IMO.

And they should no more replace doctors in the future than x-ray machines did in the past. We should never want them to.

They are already used in medicine reliably. Often. Welcome to the future. Computers are pretty good tools for many things actually.

Peak intelligence, is realizing an LLM doesn't care whether its tokens represent chunks of text, sound, images, videos, 3D models, paths, hand movements, floor planning, emojis, etc.

The keyword is: "multimodal".

As for being able to correctly correlate some "chunks of MRI scan" with the word "tumor"... that's all about the training (which I'd bet Claude is missing... did I hear "investment opportunity"? Guy isn't wrong).

I am glad that "I googled why I was coughing and it said I had cancer and would die in 7 days so farewell you are a good friend" will live on for more years.

I'm not following this story..

a friend sent me MRI brain scan results and I put it through Claude

...

I annoyed the radiologists until they re-checked.

How was he in a position to annoy his friend's radiologists?

Money. Guy is loaded, he can annoy anyone he wants.

I think it's being framed wrongly for the narrative by the guy posting the screenshot.

A friend sent me MRI brain scan results

Without more context I have to assume guy was still convinced of his brain tumor, knew a friend who knew and talked about Claude, had said friend run results through Claude and told guy who's brain was scanned that Claude gave a positive result, and friend went to multiple doctors for a second, third, fourth opinion.

In America we have to advocate hard when there is an ongoing, still unsolved issue, and that includes using all tools at your disposal.

maybe his friend is also a radiologist and sent op a picture of his own head

Maybe consider a tool made for the task and not just some random Claude, which isn't trained on this at all and just makes up some random impression of what an expert could respond in a dramatic story?!

I know of at least one other case in my social network where GPT-4 identified a gas bubble in someone's large bowel as "likely to be an aggressive malignancy." Leading to said person fully expecting they'd be dead by July, when in fact they were perfectly healthy.

These things are not ready for primetime, and certainly not capable of doing the stuff that most people think they are.

The misinformation is causing real harm.

This is nothing but a modern spin on "hey internet, what's wrong with me? WebMD: it's cancer."

To be honest, it is not made to diagnose medical scans and it is not supposed to be. There are different AIs trained exactly for that purpose, and they are usually not public.

Exactly. So the organisations creating and serving these models need to be clearer about the fact that they're not general purpose intelligence, and are in fact contextual language generators.

I've seen demos of the models used as actual diagnostic aids, and they're not LLMs (plus require a doctor to verify the result).

I need help finding a source, cuz there are so many fluff articles about medical AI out there...

I recall that one of the medical AIs that the cancer VC gremlins have been hyping turned out to have horribly biased training data. They had scans of cancer vs. not-cancer, but they were from completely different models of scanners. So instead of being calibrated to identify cancer, it became calibrated to identify what model of scanner took the scan.

Wasn't there something about CV's for job applications and the AI ended up figuring out that black people or women are less likely to get the job so adjusted accordingly? Or how in England during COVID, poorer schools got lower predicted grades while more upper schools got higher, even against the Teacher's grade, regardless of the work done

I am failing to find source, but there is also a story about an older predictive model that worked great at one hospital, but failed miserably at the next. There was just enough variation in everything that the model broke.

(I think the New England Journal of Medicine podcast, but I am not finding the episode.)

Unpopular opinion incoming:

I don't think we should ignore AI diagnosis just because they are wrong sometimes. The whole point of AI diagnosis is to catch things physicians don't. No AI diagnosis comes without a physician double checking anyway.

For that reason, I don't think it's necessarily a bad thing that an AI got it wrong. Suspicion was still there and physicians double checked. To me, that means this tool is working as intended.

If the patient was insistent enough that something was wrong, they would have had them double check or would have gotten a second opinion anyway.

Flaming the AI for not being correct is missing the point of using it in the first place.

I don't think it's necessarily a bad thing that an AI got it wrong.

I think the bigger issue is why the AI model got it wrong. It got the diagnosis wrong because it is a language model and is fundamentally not fit for use as a diagnostic tool. Not even a screening/aid tool for physicians.

There are AI tools designed for medical diagnoses, and those are indeed a major value-add for patients and physicians.

The minute I see some tool praising the glory of AI, I block them. Engaging with them is a futile waste of time.

I feel like the book I, Robot provides some fascinating insight into this... specifically Liar

exactly how hard did beer person have to try to miss the point when they read a thread about how an AI confidently provided a wrong diagnosis and warning about how we shouldn't always trust AI and proceeded to write a reply accusing Misha Saul of being a tech bro who believed an AI over a human doctor