OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.
businessinsider.com
OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.
Text written before 2023 is going be exceptionally valuable because that way we can be reasonably sure it wasn’t contaminated by an LLM.
This reminds me of some research institutions pulling up sunken ships so that they can harvest the steel and use it to build sensitive instruments. You see, before the nuclear tests there was hardly any radiation anywhere. However, after America and the Soviet Union started nuking stuff like there’s no tomorrow, pretty much all steel on Earth has been a little bit contaminated. Not a big issue for normal people, but scientists building super sensitive equipment certainly notice the difference between pre-nuclear and post-nuclear steel
The background radiation did go up, but saying "there was hardly any radiation anywhere" is wrong. Today's steel (and background radiation) is pretty much back to pre-nuke levels. Low-background steel Background radiation
It is also worth nothing that we can make low or no radiation-contaminated steel, it's just really expensive and hard and happens in very low quantities.
We could even make isotropically pure iron, jeah.
Wonder how much that would cost per kilogram
Not really. If it's truly impossible to tell the text apart, than it doesn't really pose a problem for training AI. Otherwise, next-gen AI will be able to tell apart text generated by current gen AI, and it will get filtered out. So only the most recent data will have unfiltered shitty AI-generated stuff, but they don't train AI on super-recent text anyway.
This is not the case. Model collapse is a studied phenomenon for LLMs and leads to deteriorating quality when models are trained on the data that comes from themselves. It might not be an issue if there were thousands of models out there but there are only 3-5 base models that all the others are derivatives of IIRC.
I don't see how that affects my point.
Today's AI detector can't tell apart the output of today's LLM.
Future AI detector WILL be able to tell apart the output of today's LLM.
Of course, future AI detector won't be able to tell apart the output of future LLM.
So at any point in time, only recent text could be "contaminated". The claim that "all text after 2023 is forever contaminated" just isn't true. Researchers would simply have to be a bit more careful including it.
Your assertion that a future AI detector will be able to detect current LLM output is dubious. If I give you the sentence "Yesterday I went to the shop and bought some milk and eggs." There is no way for you or any detection system to tell if that was AI generated or not with any significant degree of certainty. What can be done is statistical analysis of large data sets to see how they "smell", but saying around 30% of this dataset is likely LLM generated does not get you very far in creating a training set.
I'm not saying that there is no solution to this problem, but blithely waving away the problem saying future AI will be able to spot old AI is not a serious take.
If you give me several paragraphs instead of a single sentence, do you still think it's impossible to tell?
"If you zoom further out you can definitely tell it's been shopped because you can see more pixels."
What they're getting towards (one thing, anyways) is that "indistinguishable to the model" and "the same" are two very different things.
IIRC, one possibility is that LLMs which learn from one another will make such incremental changes to what's considered "acceptable" or "normal" language structuring that, over time, more noticeable linguistic changes begin to emerge that go unnoticed by the models.
As it continues, this phenomena creates a "positive feedback loop" in which the gap progressively widens -- still undetected, because the quality of training data is going down -- to the point where models basically "collapse" in their effectiveness.
So even if their output is indistinguishable now, how the tech is used (I guess?) will determine whether or not a self-destructive LLM echo chamber is produced.
no, they won't. We have already built the models that we have already built. Any current works in progress are the future ai you are talking about. And we just can't do it. Openai themselves have admitted that the ones they tried making just didn't work. And it won't, because language is not just the statistical correlations between words that have already been written in the past.
There is not enough entropy in text to even detect current model output. it’s game over.
People still tap into real world while AI does not do that yet. Once AI will be able to actively learn from realworld sensors, the problem might disappear, no?
They already do. where do you think the training corpus comes from? The real world. It's curated by humans and then fed to the ml system.
Problem is that the real world now has a bunch of text generated by ai. And it has been well studied that feeding that back into the training will destroy your model (because the networks would then effectively be trained to predict their own output, which just doesn't make sense)
So humans still need to filter that stuff out of the training corpus. But we can't detect which ones are real and which ones are fake. And neither can a machine. So there's no way to do this properly.
The data almost always comes from the real world, except now the real world also contains "harmful" (to ai) data that we can't figure out how to find and remove.
There are still people in between, building training data from their real world experices. Now digital world may become overwhelmed with AI creations, so training may lead to model collapse. So what if we give AI access to cameras, microphones, all that, and even let it articulate them. It would also need to be adventurous, searching for spaces away from other AI work. There is lot's of data in there which is not created by AI, although some point it might become so as well. I am living aside at the moment obvious dangers of this approach.
The wording of every single article has such an anti AI slant, and I feel the propaganda really working this past half year. Still nobody cares about advertising companies, but LLMs are the devil.
Existing datasets still exist. The bigger focus is in crossing modalities and refining content.
Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?
I swear, most of the people with heavy opinions don't even know half of how the machines work or what they are doing.
Probably because LLMs threaten to (and has already started to) shittify a truly incredible number of things like journalism, customer service, books, scriptwriting etc all in the name of increased profits for a tiny few.
again, the issue isn't the technology, but the system that forces every technological development into functioning "in the name of increased profits for a tiny few."
that has been an issue for the fifty years prior to LLMs, and will continue to be the main issue after.
removing LLMs or other AI will not fix the issue. why is it constantly framed as if it would?
we should be demanding the system adjust for the productivity increases we've already seen, as well to what we expect in the near future. the system should make every advancement a boon for the general populace, not the obscenely wealthy few.
even the fears of propaganda. the wealthy can already afford to manipulate public discourse beyond the general public's ability to keep up. the bigger issue is in plain sight, but is still being largely ignored for the slant that "AI is the problem."
Yep, the problem was never LLMs, but billionaires and the rich. The problems have always been the rich for thousands of years, and yet they are immensely successful at deflecting their attacks to other groups for those thousands of years. They will claim it's Chinese immigrants, or blacks, or Mexicans, or gays, or trans people. Now LLMs and AI are the new boogieman.
We should be talking about UBI, not LLMs.
It’s a capitalism problem not an AI or copyright problem.
Sure but lets say you try to solve this problem. What's the first thing you think a coordinated group could do, get sensible regulations about AI, or overthrow global capitalism. Its framed the way it is because unless you want ro revolt that's the framework we're gonna have to use to deal with it. I suppose we could alwyas do nothing to AI specifically and focus on just overthrowing capitalism, but during that time lots of harm will come to lots of workers because of AI use. I dont think anticapitalism has reached a critical mass (we need this for any real sustem wide attacks on and alternatives to capitalism) so I think dealing with this AI problem and trying to let everyone else know about how it's really a capitalism thing would do more to build support and avert harm to workers. I hate that its like that too but those choices are basically the real options we have moving forward from my pov.
You tell me what "sensible regulations about AI" are that don't hurt small artists and creators more than they centralize the major players and enrich copyright hoarding, copyright-maximalist corporations. (Seriously, this isn’t bait. I’ve been wracking my mind on the issue for months. Because the only serious proposals so far are expanding the already far-too-broad copyright rights to things like covering training or granting artists more rights to their work during their lifetime - something that will only hurt small artists) We desperately need more fair use, not less. The only "sensible regulations" that we should and could be talking about is some form of UBI. That's it.
UBI is a bandaid that doesn't solve the core issues of production under capitalism, the people with capital still control production, still make more money than eveyone else and still have more money and power to use influencing the politicians that write the laws surrounding UBI. And expecting me to solve the AI problem in a comment section is like me asking you to implement UBI in a way that landlords dont just jack up rent or business dont inflate prices with more cash and demand floating around, also whats your plan for when the level of UBI legislated , or planned increases in UBI is no longer sufficient enough to pay for housing food and other necessities? What do you do to counter the fact that the capitists still have more access to politicians and media empires they can use to discredit and remove UBI?
UBI is a bandaid, sure. But bandaids actually help; “sensible AI regulations” - a nothing phrase that will most likely materialize as yet another expansion of copyright — will actively make things worse. UBI is achievable, and can be expanded on once it’s enacted. You establish protections and regulations that actually help people, and dare opposition to ever try to take them away; instead of carrying water for copyright maximalists along the way.
Exactly. We need to break apart copyright with a crowbar. It's a broken system that only benefits the rich, and AI has the opportunity to turn the entire system into a pile of unenforceable garbage.
Why does legislation or regulation surrounding AI necessarily have to be copyright maxamilism but UBI regulations are somehow in some undescribed way going to be strong enough to prevent lobbying from the people who still control the mean of production? You're arguement gets to use the magic regulations that don't get challenged or changed, but my arguement is stuck to the one mainstream idea that has people worried?
Because those are the only “sensible AI regulations” seriously being talked about. Tell me any other actual regulatory schemes that are being proposed that aren’t, and I’ll be happy to talk about those, and likely support them. I’m not getting the hostility, btw. fwiw this (getting stronger consumer protection laws passed) is literally my job; I’m going to go out on a limb here and say we probably agree with more than we disagree, based on your comment history. Obviously UBI won’t be enough to - will never be enough to - oust capitalists from having an outsized influence in policy, but what I don’t support at all are regulations that would further centralize the corporate IP holders and tech companies that would actually benefit from the copyright maximalist proposals currently being bandied about by the fear mongering anti generative AI discourse.
Fundamentally we’re not going to copyright our way out of the externalities AI brings with it.
My arguement is not limited to the only regulations being currntly talked about any more than your arguement is limited by what types of UBI are currently talked about because im not hesring any talk on UBI.
Friend you haven’t proposed ANY thing except “sensible” regulations. I’ve asked you to elaborate on what those might be, but I’m cautioning against the regulatory schemes currently being proposed and considered. Maybe I’m too in the weeds on this issue — again, literally my job — but I can tell you the only proposals actually being discussed right now are copyright maximalist.
There's no such thing as "sensible regulations" for AI. AI is a technological advantage. Any time you regulate that advantage, other groups that don't have those regulations will fuck you over. Even if you start talking about regulations, the corpos will take over and fuck you over with regulations that only hurt the little guy.
Hell, even without regulations, we're already seeing this on the open-source vs. capitalism front. Google admitted that it lost some advantages because of open-source AI tools, and now these fucking cunts are trying to hold on to their technology as close as possible. This is technology that needs to be free and open-source, and we're going to see a fierce battle with multi-billion-dollar capitalistic corporations clawing back whatever technological gains OSS acquired, until you're forced to spend hundreds or thousands of dollars to use a goddamn chess bot.
GPLv3 is key here, and we need to force these fuckers into permanent copyleft licenses that they can't revoke. OpenAI is not open, StabilityAI is not the future, and Google is not your friend.
Isnt forcing a copyleft licence exactly a regulation that would be sensible though? So why wouldn't regulations and legislation work if thats your solution too?
There's never been a bill that had the word "copyleft" or "GNU Public License" on it at all, and thanks to corpo lobbyists, there probably never will be. We have to be realistic here, and the only realistic option is to encourage as much protected open-source software on the subject as possible.
This isn’t a technological issue, it’s a human one
I totally agree with everything you said, and I know that it will never ever happen. Power is used to get more power. Those in power will never give it up, only seek more. They intentionally frame the narrative to make the more ignorant among us believe that the tech is the issue rather than the people that own the tech.
The only way out of this loop is for the working class to rise up and murder these cunts en masse
Viva la revolucion!
It is a completely understandable stance in the face of the economic model, though. Your argument could be fitted to explain why firearms shouldn’t be regulated at all. It isn’t the technology, so we should allow the sale of actual machine guns (outside of weird loopholes) and grenade launchers.
The reality is that the technology is targeted by the people affected by it because we are hopeless in changing the broader system which exists to serve a handful of parasitic non-working vampires at the top of our societies.
Edit: not to suggest that I’m against AI and LLM. I want my fully automated luxury communism and I want it now. However, I get why people are turning against this stuff. They’ve been fucked six ways from Sunday and they know how this is going to end for them.
Plus, a huge amount of AI doomerism is being pushed by the entrenched monied AI players, like OpenAI and Meta, in order to used a captured government to regulate potential competition out of existence.
Exactly. I work in AI (although not the LLM kind, just applying smaller computer vision models), and my belief is that AI can be a great liberator for humanity if we have the right political and economic apparatus. The question is what that apparatus is. Some will say it's an inherent feature of capitalism, but that's not terribly specific, nor does it explain the relatively high wealth equality that existed briefly during the middle of the 20th century in America. I think some historical context is important here.
Historical Precedent
During the Industrial Revolution, we had an unprecedented growth in average labor productivity due to automation. From a naïve perspective, we might expect increasing labor productivity to result in improved quality of life and less working hours. I.e., the spoils of that productivity being felt by all.
But what we saw instead was the workers lived in squalor and abject poverty, while the mega-rich captured those productivity gains and became stupidly wealthy.
Many people at the time took note of this and sought to answer this question: why, in an era over greater-than-ever labor productivity, is there still so much poverty? Clearly all that extra wealth is going somewhere, and if it's not going to the working class, then it's evidently going to the top.
One economist and philosopher, Henry George, wrote a book exploring this very question, Progress and Poverty. His answer, in short, was rent-seeking:
Rent-seeking takes many forms. To list a few examples:
George's argument, essentially, was that the privatization of the economic rents borne of god-given things — be it land, minerals, or ideas — allowed the rich and powerful to extract all that new wealth and funnel it into their own portfolios. George was not the only one to blame these factors as the primary drivers of sky-high inequality; Nobel-prize winning economist Joseph Stiglitz has stated:
George's proposed remedies were a series of taxes and reforms to return the economic rents of those god-given things to society at large. These include:
such as in the Norwegian model:
(continued)
Present Day
Okay, so that's enough about the past. What about now?
Well, monopolization of land and housing via the housing crisis has done tremendous harm:
And that is just one form of rent-seeking. Imagine the collective toll of externalities (e.g., the climate crisis), monopolistic/oligopolistic markets such as energy and communications, monopolization of valuable intellectual property, etc.
So I would tend to say that — unless we change our policies to eliminate the housing crisis, properly price in externalities, eliminate monopolies, encourage the growth of free and open IP (e.g., free and open-source software, open research, etc.), and provide critical public goods/services such as healthcare and education and public transit — we are on a trajectory for AI to be Gilded Age 2: Electric Boogaloo. AI merely represents yet another source of productivity growth, and its economic spoils will continue to be captured by the already-wealthy.
I say this as someone who works as an AI and machine learning research engineer: AI alone will not fix our problems; it must be paired with major policy reform so that the economic spoils of progress are felt by all, not just the rich.
Joseph Stiglitz, in the same essay I referred to earlier, has this to say:
Dude seek help. If you truly “work in AI” your post was such slop that it was 100% written by a LLM. If you’re going to propagandize, do it well. BRB regurgitating my scraped wall of text from Wikipedia combined with some vague leftist concepts to sound educated and progressive (when I’m really not.) lmao
Technology is but a tool. It cannot tell you how to use it. If it's in the hands of a writer it's a helpful sounding board. If it's in the hands of a Netflix producer it's an anti-labor tool. We need to protect people's livelyhoods
Journalism and customer service can't possibly get worse than they already are.
Books and movies are not at risk - there will always be lots of people willing to write good content for both, and the best content will be published. And "the best" will be a hybrid of humans and AI working together - which is what has some people in that industry so scared. Just like factory workers were scared when machines entered that industry.
It's an irrational fear - there are still factory workers today. Probably more than ever. And there will still be human writers - it's an industry that will never go away.
If, however, you refuse to work with AI... then yeah, you're fucked. Pretty soon you'll be unemployable and nobody will publish your work, which is why the movie publishers aren't going to budge. They recognise a day is coming where they can't sell movies and tv shows that were made exclusively by humans and they are never going to sign a contract locking them into a dead and path.
You make it sound like the quality is what will increase with human/AI partnership. What will realistically happen is an expected rate of output. Why can't you deliver a book every year with an AI ghost writing? People who work slowly or meticulously will be phased out by those that can quickly throw together a collage of their own words and their guided filler. It's amazing and futuristic. It can be very useful and inspirational. But I do not share your optimism that it will make creative industries better. It will allow a single person to put together a script that would've taken a team... But the better content will now be drowning in this sea. Unfortunately, I expect an equivalent of the media explosion that happened when the reality tv format became an ok thing, eventually leading to shows half filmed on phones. The end result will be double the marvel movies every year.
I am so tired of techno-fetishist AI bros complaining every single time any of the many ways in which AI will devastate and rot out daily lives is brought up.
"It's not the tech! It's the economic system!"
As if they're different things? Who is building the tech? Who is pouring billions into the tech? Who is protecting the tech from proper regulation, smartass? I don't see any worker coops using AI.
"You don't even know how it works!"
Just a thought terminating cliche to try to avoid any discussion or criticism of your precious little word generators. No one needs to know how a thing works to know it's effects. The effects are observable reality.
Also, nobody cares about advertising companies? What the hell are you on about?
they are different things. it's not exclusively large companies working on and understanding the technology. there's a fantastic open-source community, and a lot of users of their creations.
would destroying the open-source community help prevent the big-tech from taking over? that battle has already been lost and needs correction. crying about the evil of A.I. doesn't actually solve anything. "proper" regulation is also relative. we need entirely new paradigms of understanding things like "I.P." which aren't based on a century of lobbying from companies like disney. etc.
and yes, understanding how something works is important for actually understanding the effects, when a lot of tosh is spewed from media sites that only care to say what gets people to engage.
i'd say a fraction of what i see as vaguely directed anger towards anything A.I. is actually relegated to areas that are actual severe and important breaches of public trust and safety, and i think the advertising industry should be the absolute focal point on the danger of A.I.
Are you also arguing against every other technology that has had their benefits hoarded by the rich?
It's mostly large companies, some models are open source (of which only some are also community driven), but the mainstream ones are the ones being entirely funded by, legally protected by, and pushed onto everything by capitalist olligarchs.
What other options do you have? I'm sick and tired of people like you seeing workers lose their jobs, seeing real people used like meat puppets by the internet, seeing so many artists risking their livelihoods, seeing that we'll have to lose faith in everything we see and read because it could be irrecognizably falsified, and CLAIMING you care about it, only to complain every single time any regulation or way to control this is proposed, because you either don't actually care and are just saying it for rhetoric, or you do care but only to the point you can still use your precious little toys restriction-free. Just overthrow the entire economic system of all countries on earth, otherwise don't do anything, let all those people burn! Do you realize how absurd you sound?
It's sociopathic. I don't say it as an insult, I say it applying the definition of a word, it's a complete lack of empathy and care for your fellow human beings, it's viewing an inmaterial piece of technology, nothing but a thoughtless word generator, like inherently worth more than the livelihood of millions. I'm absolutely sick of it. And then you have the audacity to try to seem like the reasonable ones when arguing about this, knowing if you had your way so many would suffer. Framing it as anti-capitalism knowing that if you had your way you'd pave the way for the olligarchs to make so many more billions off of that suffering.
it's like you just ignored my main points.
get rid of the A.I. = the problem is still the problem. has been especially for the past 50 years, any non-A.I. advancement continues the trend in the exact same way. you solved nothing.
get rid of the actual problem = you did it! now all of technology is a good thing instead of a bad thing.
false information? already a problem without A.I. always has been. media control, paid propagandists etc. if anything, A.I. might encourage the main population to learn what critical thought is. it's still just as bad if you get rid of A.I.
" CLAIMING you care about it, only to complain every single time any regulation or way to control this is proposed, because you either don’t actually care and are just saying it for rhetoric" think this is called a strawman. i have advocated for particular A.I. tools to get much more regulation for over 5-10 years. how long have you been addressing the issue?
you have given no argument against A.I. currently that doesn't boil down to "the actual problem is unsolvable, so get rid of all automation and technology!" when addressed.
which again, solves nothing, and doesn't improve anything.
should i tie your opinions to the actual result of your actions?
say you succeed. A.I. is gone. nothing has changed. inequality is still getting worse and everything is terrible. congratulations! you managed to prevent countless scientific discoveries that could help countless people. congrats, the blind and deaf lose their potential assistants. the physically challenged lose potential house-helpers. etc.
on top of that, we lose the biggest argument for socializing the economy going forward, through massive automation that can't be ignored or denied while we demand a fair economy.
for some reason i expect i'm wasting my time trying to convince you, as your argument seems more emotionally motivated than rationalized.
What are you on about? Who's talking about "completely getting rid of AI"? And you accuse me of strawmanning? I didn't even argue that it should be stopped. I argued that every single time anyone tries or suggests doing anything to curtail these things people like you jump out to vehemently defend your precious programs from regulation or even just criticism, because we should either completely destroy capitalism or not do anything at all, there is no inbetween, there is nothing we can do to help anyone if it's not that.
Except there is. There are plenty of things that can be done to help the common people besides telling them "well just tough it out until we someday magically change the fundamentals of the economic system of the entire world, nerd". It just would involve restricting what these things can do. And you don't want that. It's fine but own up to it. Trying to have this image that you really do care about helping but just don't want to help at all unless it's via an incredibly unprobable miracle pisses me off.
For someone who accuses others of not understanding how AI works, to then say something like this is absurd. I hope you're being intellectually dishonest and not just that naive. There is absolutely no comparison between a paid propagandist and the irrecognizable replicas of real things you could fabricate with AI.
People are already abusing voice actors by sampling them and making covers with their voices without their permission and certainly without paying. We can already make amateur videos of the person speaking to pair it up with the generated audio. In a few years when the technology innevitably gets better I will be able to perfectly fabricate a video that can ruin someone's life with a few clicks. If this process is sophisticated enough there will be minimal points of failure, there will be almost nothing to investigate and try to figure out if the video is false or not. No evidence will ever mean anything, it could all be fabricated. If you don't see how this is considerably worse than ANYTHING we have right now to falsify information, then there is nothing I can say to ever convince you. "Oh, but if nothing can be demonstrably true anymore, the masses will learn critical thought!" Sure.
This is what I mean. You people lack any kind of nuance. You can only work in this "all or nothing" thinking. No "anti-AI" person wants to fully and completely destroy every single machine and program powered by artificial intelligence, jesus christ. It's almost like it's an incredibly versatile tool that has many uses that can be used for good and bad, It's almost like we should, call me an irrational emotional snowflake if you want, put regulations in place so the bad uses are heavily restricted, so we can live with this incredible technology without feeling constantly under threat because we are using it responsibly.
Instead what you propose is, don't you dare limit anything, open the flood gates and let's instead change the economic system so that the harmful don't also destroy people economically. Except the changes you want not only don't fix some of the problems unregulated and free AI use for everything bring, they go against the interests of every single person with power in this system, so they have an incredibly minuscule chance of ever being close to happening, much less happening peacefully. I'd be okay if it was your ultimate goal, but if you're not willing to have a compromise on something that could minimize the harm this is doing in the meantime without being a perfect solution, why shouldn't I assume you just don't care? What reasons are you giving me to not believe that you simply prefer seeing the advancements of technology rather than the security of your fellow humans, and you're just saying this as an excuse to keep it that way?
Right, because that's the way to socialize the economy. By having a really good argument. I'm sure it will convince the people that have unmeasurable amounts of wealth and power precisely because the economy is not socialized. It will be so convincing they will willingly give all of that up.
then what the fuck are you even arguing? i never said "we should do NO regulation!" my criticism was against blaming A.I. for things that aren't problems created by A.I.
i said "you have given no argument against A.I. currently that doesn’t boil down to “the actual problem is unsolvable, so get rid of all automation and technology!” when addressed."
because you haven't made a cohesive point towards anything i've specifically said this entire fucking time.
are you just instigating debate for... a completely unrelated thing to anything i said in the first place? you just wanted to be argumentative and pissy?
i was addressing the general anti-A.I. stance that is heavily pushed in media right now, which is generally unfounded and unreasonable.
I.E. addressing op's article with "Existing datasets still exist. The bigger focus is in crossing modalities and refining content." i'm saying there is a lot of UNREASONABLE flak towards A.I. you freaked out at that? who's the one with no nuance?
your entire response structure is just.. for the sake of creating your own argument instead of actually addressing my main concern of unreasonable bias and push against the general concept of A.I. as a whole.
i'm not continuing with you because you are just making your own argument and being aggressive.
I never said "we can't have any regulation"
i even specifically said " i have advocated for particular A.I. tools to get much more regulation for over 5-10 years. how long have you been addressing the issue?"
jesus christ you are just an angry accusatory ball of sloppy opinions.
maybe try a conversation next time instead of aggressively wasting people's time.
So, like, everything? I've talked about the problems and shit AI will cause on us and your only response consistantly was "Yeahhhh but if we completely ban my precious AI then we wont have all of it's nice things! Better to wait until capitalism is magically solved". Then the problems that weren't economics-related you handwaved away. Please illuminate me on a problem you think is real, caused by AI, and that you would be willing to regulate within the bounds of our current system?
I've argued against your mentality and the mentality of the people that always show up on posts concerned about AI to defend it. I've even responded to specific phrases you said. What else do you even want?
If you don't want people to feel threatened by AI, maybe be willing to fix its threats? Maybe don't just go to something people find concerning and say "we can't dare to do anything about this!" and try to reframe it as some sort of tragic prevention by the ignorant masses?
That means nothing to me. Those are just words. Your actions have been vehemently defending AI and trying to convince me that curtailing it is pointless, while also trying to appear concerned about its threats. Those are completely contradictory positions that you held now, and that's what I pointed out. I don't care what you have been doing for years.
When have I ever obbligated you to respond to me? I've never hidden that this topic and people who think like you make me angry. If you didn't want to deal with it you could have just ignored me.
It does make me angry. It makes me angry to see so many people be threatened so unfairly. It makes me angry to see people not care about that and prefer seeing further sophistication of a thoughtless algorithm over lives, but at least those people are honest. Dishonesty is what makes me livid. Those that have the same attitude but know they'd sound awful if they said it straight forwardly, so they try to find a hip and cool thing to parade as. It's totally anti-capitalist to not want to stop capitalist corporations abusing workers to replace them bro, trust me.
Yah I think it's fairly obvious that people are both fascinated and scared by the tech and also acknowledge that under a different economic structure, it would be extremely beneficial for everyone and not just for the very few. I think it's more annoying that people like you assume that everyone is some sort of diet Luddite when they're just trying to see how the tool has the potential to disrupt many, many jobs and probably not in a good way. And don't give me this tired comparison about the industrial revolution because it's a complete false equivalence.
We built a machine to mimic human writing. There's going to a point where there is no difference. We might already be there.
The machine used to mimic human text uses human text. If it can't find the difference in it's text and human text, it will begin using AI text to mimic human text. This will eventually lead to errors, repetitions, and/or less human like text.
We are already seeing it 1 year into GPT as human authors bow out when not paid.
Predictable issue if you knew the fundamental technology that goes into these models. Hell it should have been obvious it was headed this way to the layperson once they saw the videos and heard the audio.
We're less sensitive to patterns in massive data, the point at which we cant tell fact from ai fiction from the content is before these machines can't tell. Good luck with the FB aunt's.
GANs final goal is to develop content that is indistinguishable... Are we surprised?
Edit since the person below me made a great point. GANs may be limited but there's nothing that says you can't setup a generator and detector llm with the distinct intent to make detectors and generators for the sole purpose of improving the generator.
For laymen who might not know how GANs work:
Two AI are developed at the same time. One that generates and one that discriminates. The generator creates a dataset, it gets mixed in with some real data, then that all of that gets fed into the discriminator whose job is to say "fake or not".
Both AI get better at what they do over time. This arms race creates more convincing generated data over time. You know your generator has reached peak performance when its twin discriminator has a 50/50 success rate. It's just guessing at that point.
There literally cannot be a better AI than the twin discriminator at detecting that generator's work. So anyone trying to make tools to detect chatGPT's writing is going to have a very hard time of it.
Fantastically put!
Tx!
Unless I'm mistaken, aren't GANs mostly old news? Most of the current SOTA image generation models and LLMs are either diffusion-based, transformers, or both. GANs can still generate some pretty darn impressive images, even from a few years ago, but they proved hard to steer and were often trained to generate a single kind of image.
I haven't been in decision analytics for a while (and people smarter than I are working on the problem) but I meant more along the lines of the "model collapse" issue. Just because a human gives a thumbs up or down doesn't make it human written training data to be fed back. Eventually the stuff it outputs becomes "most likely prompt response that this user will thumbs up and accept". (Note: I'm assuming the thumbs up or down have been pulled back into model feedback).
Per my understanding that's not going to remove the core issue which is this:
Any sort of AI detection arms race is doomed. There is ALWAYS new 'real' video for training and even if GANs are a bit outmoded, the core concept of using synthetically generated content to train is a hot thing right now. Technically whomever creates a fake video(s) to train would have a bigger training set than the checkers.
Since we see model collapse when we feed too much of this back to the model we're in a bit of an odd place.
We've not even had a LLM available for the entire year but we're already having trouble distinguishing.
Making waffles so I only did a light google but I don't really think chatgpt is leveraging GANs for it's main algos, simply that the GAN concept could be applied easily to LLM text to further make delineation hard.
We're probably going to need a lot more tests and interviews on critical reasoning and logic skills. Which is probably how it should have been but it'll be weird as that happens.
sorry if grammar is fuckt - waffles
So a few tidbits you reminded me of:
You're absolutely right: there's what's called an alignment problem between what the human thinks looks superficially like a quality answer and what would actually be a quality answer.
You're correct in that it will always be somewhat of an arms race to detect generated content, as lossy compression and metadata scrubbing can do a lot to make an image unrecognizable to detectors. A few people are trying to create some sort of integrity check for media files, but it would create more privacy issues than it would solve.
We've had LLMs for quite some time now. I think the most notable release in recent history, aside from ChatGPT, was GPT2 in 2019, as it introduced a lot of people to to the concept. It was one of the first language models that was truly "large," although they've gotten much bigger since the release of GPT3 in 2020. RLHF and the focus on fine-tuning for chat and instructability wasn't really a thing until the past year.
Retraining image models on generated imagery does seem to cause problems, but I've noticed fewer issues when people have trained FOSS LLMs on text from OpenAI. In fact, it seems to be a relatively popular way to build training or fine-tuning datasets. Perhaps training a model from scratch could present issues, but generally speaking, training a new model on generated text seems to be less of a problem.
Critical reading and thinking was always a requirement, as I believe you say, but certainly it's something needed for interpreting the output of LLMs in a factual context. I don't really see LLMs themselves outperforming humans on reasoning at this stage, but the text they generate certainly will make those human traits more of a necessity.
Most of the text models released by OpenAI are so-called "Generative Pretrained Transformer" models, with the keyword being "transformer." Transformers are a separate model architecture from GANs, but are certainly similar in more than a few ways.
Here is an alternative Piped link(s): https://piped.video/viJt_DXTfwA?t=980
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I'm open-source, check me out at GitHub.
These all align with my understanding! Only thing I'd mention is that when I said "we've not had llms available" I meant "LLMs this powerful ready for public usage". My b
Yeah, that's fair. The early versions GPT3 kinda sucked compared to what we have now. For example, it basically couldn't rhyme. RLHF or some of the more recent advanced seemed to turbocharge that aspect of LLMs.
On the one hand, our AI is designed to mimic human text, on the other hand, we can detect AI generated text that was designed to mimic human text. These two goals don't align at a fundamental level
So every accusation of cheating/plagiarism etc. and the resulting bad grades need to be revised because the AI checker incorrectly labelled submissions as "created by AI"? OK.
i laughed pretty hard when south park did their chatgpt episode. they captured the school response accurately with the shaman doing whatever he wanted, in order to find content "created by AI."
I mean, based on the dozens of AI generated essays I've seen, its not hard to spot them based on the many hallucinated references they all have in them.
I mean, the entire goal of the technology was to create human-like text.
This just illustrates the major limitation of ML: Access to reliable training data. A machine that has no concept of internal reasoning can never be truly trusted to solve novel problems, and novel problems, from minor issues to very complex ones, are solved in a bunch of professions every day. That's what drives our world forward. If we rely too heavily on AI to solve problems for us, the issue of obtaining reliable training data to train future AI's will only expand. That's why I currently don't think AI's will replace large swaths of the work force, but to a larger degree be used as a tool by the humans in the workforce.
Relax, everybody. I have figured out the solution. We pass a law that all AI generated text has to be in Pig Latin or Ubbi Dubbi.
i wonder why Google is still not considering buying reddit and other forums where personal discussion takes place and most user base sort quality content free of charge. it has been established already that Google queries are way more useful when coupled with reddit
Making google better is not google's goal. Growth is their goal.
I'm honestly under the impression Google Search is one of their less valuable products, even if it's the one everyone associates the company's name with.
Why buy it when you can get the same data for free?
Why buy data for accuracy when you don't care and support your company with seo spam?
I wonder if AI generated texts (or speech) will impact our language. Kinda interesting thing to think about.
OpenAI also financially benefits from keeping the hype training rolling. Talking about how disruptive their own tech is gets them attention and investments. Just take it with a grain of salt.
Its not possible to tell AI generated text from human writing at any level of real world accuracy. Just accept that.
Citation needed
The entropy in text is not good enough to provide enough space for watermarking. No it does not get better in longer text because you have control over i lot/chunking. You have control over top-k and temperature and prompt which creates infinite output space. Open text-generation-webui, go to the parameter page and count the number of parameters you can adjust to guide outcome. In the future you can add wasm encoded grammar to that list too.
Server side hashing / watermarking can be trivially defeated via transformations / emoji injection Latent space positional watermarking breaks easily with post processing. It would also kill any company trying to sell it (Apple be like … you want all your chats at openAI or in the privacy of your phone?) and ultimately be massively dystopian.
Unlike plagiarism checks you can’t compare to a ground truth.
Prompt guidance can box in the output space to a point you could not possibly tell it’s not human. The technology has moved from central servers to the edge, even id you could build something for one LLM, another one not in your control, like a local LLAMA which is open source (see how quickly Stable Diffusion 2 Vae watermarking was removed after release)
In a year your iphone will have a built in LLM. Everything will have LLMs, some highly purpose bound with only a few M parameters. Finetuning like LoRa is accessible to a large number of people with consumer GPUs today and will be commoditized in a year. Since it can shape the output, it again increases the possibility space of outputs and will scramble patterns.
Finally, the bar is not “better than a flip of a coin. If you are going to accuse people or ruin their academic career, you need triple nine accuracy or you’ll wrongfully accuse hundreds of essays a semester.
The most likely detection would be if someone finds a remarkable stable signature that magically works for all the models out there (100s by now), doesn’t break with updates (lol - see chatgpt presumably getting worse), survives quantisation and somehow can be kept secret from everyone including AI which can trivially spot patterns in massive data sets. Not Going To Happen.
Even if it was possible to detect, it would be model or technology specific and lagging technology - we are moving at 2000miles and hour and in a year it may mot be transformers. They’ll be GAN or RNN elements fused into it or something completely new.
The entire point of the technology is to approximate humanity - plus we are moving at it from the other direction - more and more conventional tools embed AI (from your camera not being able to take non AI touched pictures anymore to Photoshop infill to word autocomplete to new spellchecking and grammar models).
People latch onto the idea that you can detect it because it provides an escapism fantasy and copium so they don’t have to face the change that is happening. If you can detect it you can keep it out. You can’t. Not against anyone who has even the slightest idea of how to use this stuff.
It’s like gunpowder was invented and Samurai would throw themselves into the machine guns because it rendered decades of training and perfection, of knowledge about fortification, war and survival moot.
On video detection will remain viable for a long time due to the available entropy. Text. It’s always been snakeoil and everyone peddling it should be shot.
How not? You ever talk to Chat-GPT, it's full of blatant lies and failure to understand context.
And? Blatant lies are not exclusive to AI texts. Every right wing media is full of blatant lies, yet are written by humans (for now).
The problem is, if you properly prompt the AI, you get exactly what you want. Prompt it a hundred times, and you get a hundred different texts, posted to a hundred different social media channels, generating hype. How in earth will you be able to detect this?
Just like your comment you say? Indistinguishable from human - garbage in, garbage out .
If you actually used the technology rather than being a stochastic parrot, you’d understand:)
I.. did... it was useless after I realized any research I asked it to help with lead to it lying to me.
You don't know what you are talking about. The two are been distinguishable.
Clearly not :)
FWIW It's not clear cut if AI generated data feeding back into further training reduces accuracy, or is generally harmful.
Multiple papers have shown that generated images by high quality diffusion models with a proportion of real images in mix (30-50%) improve the adversarial robustness of the models. Similiar things might apply to language modeling.
There is one way OpenAI can be near 100% sure whether a piece of text was written by or with the help of ChatGPT. They could compare the piece of text against every conversation ChatGPT has ever had. (not saying it's a good idea)
Nope. You’d just ask chatgpt to generate the conversation with emojis instead of spaces and replace the emojis after.
A million variations of this approach AND it would push people towards Apple who will launch an on the phone LLM in the next 12 month.
In a year the technology will run locally on any computer - it’s time to give up on the fantasy that this can be detected or controlled. Today you can run a GPT 3.5 alike with 30B parameters on a consumer GPU at home that, with the right finetuning - will reach chatgpt performance.
Just let the idea go, it doesn’t work.
If it could, it couldn’t claim that the content out produced was original. If AI generated content were detectable, that would be a tacit admission that it is entirely plagiarized.
Being detectable does not mean plagiarism. The way they did it was by using a fixed rule for generating high entropy words. These are words that can be replaced with a large number of different words without changing the meaning of the sentence. Given any original passage of text, it's very unlikely for those words to all exactly follow the rule set by the generator, but a generated text will always have this rule followed, so they can be distinguished. Likewise, You can take any original passage and replace words in this fashion to increase the odds of it being detected as AI generated and the resulting text will still be original text.
Here's the thing though - the probabilities for word choice come from the data the model was trained on. While someone that uses a substantially different writing style / word choice than the LLM could easily be identified as being not from the LLM, someone with a similar writing style might be indistinguishable from the LLM.
Or, to oversimplify: given that Reddit was a large portion of the input data for ChatGPT, all you need to do is write like a Redditor to sound like ChatGPT.
I think you're trying to handwave at someone who knows more about the steganographic watermarking approach than you do.
AI content isn’t watermarked, or detection would be trivial. What he’s talking about is that certain words have a certain probability of appearing after certain other words in a certain context. While there is some randomness to the output, certain words or phrases are unlikely to appear because the data the model was based on didn’t use them.
All I’m saying is that the more a writer’s writing style and word choice are similar to the data set, the more likely their original content would be flagged as AI generated.
I wonder if it was too many false positives, like when some tool said the US constitution was written by AI. Which seems quite logical considering that LLMs imitate humans very closely and cannot by themselves prevent hallucinations which is the best predictor of whether it was written by a person in good faith or not.
The silver lining I suppose is they admit the AI has yet to become self aware so that's good.
That's not how it works lol, just like a math problem isn't self aware
An LLM can never be self aware as it has no thoughts. It's not even really AI, people just call it that because most people are unfamiliar with the concept of machine learning so AI is a simpler term to use with the public.