Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

misk@sopuli.xyz to Technology@lemmy.world – 1696 points –
Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT
tomshardware.com
451

If this is true, then we should prepare to be shout at by chatgpt why we didnt knew already that simple error.

ChatGPT now just says “read the docs!” To every question

Hey ChatGPT, how can I ...

"Locking as this is a duplicate of [unrelated question]"

Chatgpt is going to get trained on thinking those two questions are duplicates and end up giving bullshit outdated answers to every question.

And then links to a similar sounding but ultimately totally unrelated site.

Stack overflow was the pioneer of hallucinations.

Nah, it just marks your question as duplicate.

Already had that happen with perplexity, like, no mate, I’m asking you.

2 more...

You joke.

This would have been probably early last year? Had to look up how to do something in fortran (because fortran) and the answer was very much in the voice of that one dude on the Intel forums who has been answering every single question for decades(?) at this point. Which means it also refused to do anything with features newer than 1992 and was worthless.

Tried again while chatting with an old work buddy a few months back and it looks like they updated to acknowledging f99 and f03 exist. So assume that was all stack overflow.

3 more...

Take all you want, it will only take a few hallucinations before no one trusts LLMs to write code or give advice

[…]will only take a few hallucinations before no one trusts LLMs to write code or give advice

Because none of us have ever blindly pasted some code we got off google and crossed our fingers ;-)

It's way easier to figure that out than check ChatGPT hallucinations. There's usually someone saying why a response in SO is wrong, either in another response or a comment. You can filter most of the garbage right at that point, without having to put it in your codebase and discover that the hard way. You get none of that information with ChatGPT. The data spat out is not equivalent.

That's an important point, and and it ties into the way ChatGPT and other LLMs take advantage of a flaw in the human brain:

Because it impersonates a human, people are more inherently willing to trust it. To think it's "smart". It's dangerous how people who don't know any better (and many people that do know better) will defer to it, consciously or unconsciously, as an authority and never second guess it.

And the fact it's a one on one conversation, no comment sections, no one else looking at the responses to call them out as bullshit, the user just won't second guess it.

2 more...
2 more...

Split segment of data without pii to staging database, test pasted script, completely rewrite script over the next three hours.

When you paste that code you do it in your private IDE, in a dev environment and you test it thoroughly before handing it off to the next person to test before it goes to production.

Hitting up ChatPPT for the answer to a question that you then vomit out in a meeting as if it’s knowledge is totally different.

1 more...
3 more...

We should already be at that point. We have already seen LLMs' potential to inadvertently backdoor your code and to inadvertently help you violate copyright law (I guess we do need to wait to see what the courts rule, but I'll be rooting for the open-source authors).

If you use LLMs in your professional work, you're crazy. I would never be comfortably opening myself up to the legal and security liabilities of AI tools.

If you use LLMs in your professional work, you're crazy

Eh, we use copilot at work and it can be pretty helpful. You should always check and understand any code you commit to any project, so if you just blindly paste flawed code (like with stack overflow,) that's kind of on you for not understanding what you're doing.

The issue on the copyright front is the same kind of professional standards and professional ethics that should stop you from just outright copying open-source code into your application. It may be very small portions of code, and you may never get caught, but you simply don't do that. If you wouldn't steal a function from a copyleft open-source project, you wouldn't use that function when copilot suggests it. Idk if copilot has added license tracing yet (been a while since I used it), but absent that feature you are entirely blind to the extent which it's output is infringing on licenses. That's huge legal liability to your employer, and an ethical coinflip.


Regarding understanding of code, you're right. You have to own what you submit into the codebase.

The drawback/risks of using LLMs or copilot are more to do with the fact it generates the likely code, which means it's statistically biased to generate whatever common and unnoticeable bugged logic exists in the average github repo it trained on. It will at some point give you code you read and say "yep, looks right to me" and then actually has a subtle buffer overflow issue, or actually fails in an edge case, because in a way that is just unnoticeable enough.

And you can make the argument that it's your responsibility to find that (it is). But I've seen some examples thrown around on twitter of just slightly bugged loops; I've seen examples of it replicated known vulnerabilities; and we have that package name fiasco in the that first article above.

If I ask myself would I definitely have caught that? the answer is only a maybe. If it replicates a vulnerability that existed in open-source code for years before it was noticed, do you really trust yourself to identify that the moment copilot suggests it to you?

I guess it all depends on stakes too. If you're generating buggy JavaScript who cares.

I feel like it had to cause an actual disaster with assets getting destroyed to become part of common knowledge (like the challenger shuttle or something).

Yeah but if you're not feeding it protected code and just asking simple questions for libraries etc then it's good

Maybe for people who have no clue how to work with an LLM. They don't have to be perfect to still be incredibly valuable, I make use of them all the time and hallucinations aren't a problem if you use the right tools for the job in the right way.

The last time I saw someone talk about using the right LLM tool for the job, they were describing turning two minutes of writing a simple map/reduce into one minute of reading enough to confirm the generated one worked. I think I'll pass on that.

confirm the generated one worked. I think I’ll pass on tha

LLM wasn't the right tool for the job, so search engine companies made their search engines suck so bad that it was an acceptable replacement.

Honestly? I think search engines are actually the best use for LLMs. We just need them to be "explainable" and actually cite things.

Even going back to the AOL days, Ask Jeeves was awesome and a lot of us STILL write our google queries in question form when we aren't looking for a specific factoid. And LLMs are awesome for parsing those semi-rambling queries like "I am thinking of a book. It was maybe in the early 00s? It was about a former fighter pilot turned ship captain leading the first FTL expedition and he found aliens and it ended with him and humanity fighting off an alien invasion on Earth" and can build on queries to drill down until you have the answer (Evan Currie's Odyssey One, by the way).

Combine that with citations of what page(s) the information was pulled from and you have a PERFECT search engine.

That may be your perfect search engine, I jyst want proper boolean operators on a sesrch engine that doesn't think it knows what I want better than I do, and doesn't pack the results out with pages that don't match all the criteria just for the sake of it. The sort of thing you described would be anathema to me, as I suspect my preferred option may be to you.

4 more...
4 more...

Yeah, every time someone says how useful they find LLM for code I just assume they are doing the most basic shit (so far it’s been true).

7 more...
7 more...

The quality really doesn't matter.

If they manage to strip any concept of authenticity, ownership or obligation from the entirety of human output and stick it behind a paywall, that's pretty much the whole ball game.

If we decide later that this is actually a really bullshit deal -- that they get everything for free and then sell it back to us -- then they'll surely get some sort of grandfather clause because "Whoops, we already did it!"

People keep saying this but it’s just wrong.

Maybe I haven’t tried the language you have but it’s pretty damn good at code.

Granted, whatever it puts out needs to be tested and possibly edited but that’s the same thing we had to do with Stack Overflow answers.

I've tried a lot of scenarios and languages with various LLMs. The biggest takeaway I have is that AI can get you started on something or help you solve some issues. I've generally found that anything beyond a block or two of code becomes useless. The more it generates the more weirdness starts popping up, or it outright hallucinates.

For example, today I used an LLM to help me tighten up an incredibly verbose bit of code. Today was just not my day and I knew there was a cleaner way of doing it, but it just wasn't coming to me. A quick "make this cleaner: <code>" and I was back to the rest of the code.

This is what LLMs are currently good for. They are just another tool like tab completion or code linting</code>

1 more...

Have you tried recent models? They're not perfect no, but they can usually get you most of the way there if not all the way. If you know how to structure the problem and prompt, granted.

21 more...

See, this is why we can't have nice things. Money fucks it up, every time. Fuck money, it's a shitty backwards idea. We can do better than this.

Someone comes up with something good: look what I made, we can use this to better humanity!

Corporations: How can we make money off of this?

25 more...

So they pulled a "reddit"?

These companies don't realise their most engaged users generate a disproportionate amount of their content.

They will just go to their own spaces.

I think this a good thing in the long run, the internet will become decentralised again.

I don't know. It feels a bit like "When I quit my employer will realize how much they depended on me." The realization tends to be on the other side.

But while SO may keep functioning fine it would be great if this caused other places to spring up as well. Reddit and X/Twitter are still there but I'm glad we have the fediverse.

The company's get hit hard by unplanned vacancies. It won't take them down, but it can cost them buckets of money in either expenses, lost revenue or both. The thing is, the people that left will never know that, there coworkers will never see it, only people in finance and budget will know how to quantify the impact.

1 more...

Well, reddit is doing fine so far. Shareholders are happy

1 more...

And then Stack Overflow will go the same way Digg did.

god damn- I went over to Digg yesterday to see what its been like and I shit you not, it is links to reddit threads and instagram posts

Same as any other social media. Reddit has a lot of twitter, Tumblr and 4chan screenshots, TikTok videos, etc. Lemmy is not much different.

I hope it doesn't end up like it did on Reddit, where all those protests did not result in anything at all.

Lemmy's bigger than ever, and that's a direct consequence of reddit's enshittification, so there's that at least.

1 more...

Reddit/Stack/AI are the latest examples of an economic system where a few people monetize and get wealthy using the output of the very many.

Mmm this golden goose tastes delicious!

You're forgetting a silly and funny company whose name starts with "G"

First, they sent the missionaries. They built communities, facilities for the common good, and spoke of collaboration and mutual prosperity. They got so many of us to buy into their belief system as a result.

Then, they sent the conquistadors. They took what we had built under their guidance, and claimed we "weren't using it" and it was rightfully theirs to begin with.

Oh I didn't consider deleting my answers. Thanks for the good idea Barbra StackOverflow.

I'd be shocked if deleted comments weren't retained by them

I think the reason for those bans is that they don't want you rebelling and are showing that they don't need you personally, thus ban.

Of course it's all retained.

Isn't it amazing that places like this built on user support and contribution turn around and pull a “we don’t need you”?

They think they are too big to die by now. That userbase grows like crops, and isn't conscious of how it's being treated.

That's a bit like monopolists and Ponzi scheme owners think. It works sometimes.

They have been un-deleting after they ban.

Isn't that illegal in most countries?

In Europe GDPR gives you the right to have all your data deleted. All you do is send in a request and SO has to remove everything of yours, not just anonymize it. There are some exceptions for legal reasons, eg where financial transactions are involved, but comments should not be exempt.

It is still just a "trust us" deal. They say they have deleted it, and all you can do is trust them. They could possibly get into legal troubles if it was shown they were lying, but that could be easily avoided as well.

GDPR is ok, but much of it is based on good actors doing what they should.

2 more...
2 more...
3 more...
3 more...

Letting corporations "disrupt" forums was a mistake.

Maybe we should replace Stack Overflow with another site where experts can exchange information? We can call it "Experts Exchange".

codidact ... Stack overflow had a mass exodus of mods a 2-3 years ago and a some of them made codidact.

Any discussion on making it ActivityPub enabled?

I didn't see any, but would be curious if anyone else had.

6 more...

At the end of the day, this is just yet another example of how capitalism is an extractive system. Unprotected resources are used not for the benefit of all but to increase and entrench the imbalance of assets. This is why they are so keen on DRM and copyright and why they destroy the environment and social cohesion. The thing is, people want to help each other; not for profit but because we have a natural and healthy imperative to do the most good.

There is a difference between giving someone a present and then them giving it to another person, and giving someone a present and then them selling it. One is kind and helpful and the other is disgusting and produces inequality.

If you're gonna use something for free then make the product of it free too.

An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don't mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.

An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don’t mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.

This seems like a very fair and reasonable way to deal with the issue.

Agreed on that last part, making that the default would be a great solution. I could also use a signature in comments, like that guy who always puts the "Commercial AI thingy" but automatically.

5 more...

Begun, the AI wars have.

Faces on T-shirts, you must print print. Fake facts into old forum comments, you must edit. Poison the data well, you must.

I mean we aren't even fighting AI, we are still fighting greedy little turds

Problem is, it still results in turning the Internet to shit. We just do it manually to preempt the AI doing it.

Messages that people post on Stack Exchange sites are literally licensed CC-BY-SA, the whole point of which is to enable them to be shared and used by anyone for any purpose. One of the purposes of such a license is to make sure knowledge is preserved by allowing everyone to make and share copies.

That license would require chatgpt to provide attribution every time it used training data of anyone there and also would require every output using that training data to be placed under the same license. This would actually legally prevent anything chatgpt created even in part using this training data from being closed source. Assuming they obviously aren't planning on doing that this is massively shitting on the concept of licensing.

CC attribution doesn't require you to necessarily have the credits immediately with the content, but it would result in one of the world's longest web pages as it would need to have the name of the poster and a link to every single comment they used as training data, and stack overflow has roughly 60 million questions and answers combined.

2 more...

IF its outputs are considered derivative works.

Ethically and logically it seems like output based on training data is clearly derivative work. Legally I suspect AI will continue to be the new powerful tool that enables corporations to shit on and exploit the works of countless people.

1 more...
2 more...

Maybe but I don’t think that is well tested legally yet. For instance, I’ve learned things from there, but when I share some knowledge I don’t attribute it to all the underlying sources of my knowledge. If, on the other hand, I shared a quote or copypasta from there I’d be compelled to do so I suppose.

I’m just not sure how neural networks will be treated in this regard. I assume they’ll conveniently claim that they can’t tie answers directly to underpinning training data.

4 more...

Share Alike

I can't wait to download my own version of the latest gpt model

It does help to know what those funny letters mean. Now we wait for regulators to catch up..

/tangent

If anything, we're a very long way from anything close to intelligent, OpenAI (and subsequently MS, being publicly traded) sold investors on the pretense that LLMs are close to being "AGI" and now more and more data is necessary to achieving that.

If you know the internet, you know there's a lot of garbage. I for one can't wait for garbage-in garbage-out to start taking its toll.

Also I'm surprised how well open source models have shaped up, its certainly worth a look. I occasionally use a local model for "brainstorming" in the loosest terms, as I generally know what I'm expecting, but it's sometimes helpful to read tasks laid out. Also comfort in that nothing even need leave my network, and even in a pinch I got some answers when my network was offline.

It gives a little hope while corps get to blatantly violate copyright while having wielding it so heavily, that advancements have been so great in open source.

4 more...

The enshittification is very real and is spreading constantly. Companies will leech more from their employees and users until things start to break down. Acceleration is the only way.

Accelerationism is like being on a plane and wishing it crashes when one of the engine fails.

I mean, sure but in the context of individual websites I don't see it being a big deal. There will be replacements, and relatively quickly. Accelerationism applied to major societal structures is a terrible idea though.

That's a terrible analogy, implying the wish that everyone on the plane dies if one engine fails.

It's like an airline company has been complete shit for decades, wanting to see them fail fast so that a better airline company can take their place.

Except it's not like a plane because we can stop using specific websites whenever we like, and build our own websites to whittle away at their hegemony.

primary use for AI is self destructing your website.

Pleasing tech illiterate sharholders

Remember when adding the word blockchain to an Iced Tea company's name caused share prices to jump?

is this real? I can't tell anymore.

I googled it and I wish it wasn’t

a little-known micro-cap stock called Long Island Iced Tea Corp. (LTEA) said Thursday that it’s now “Long Blockchain Corp.,” and its stock leaped more than 200 percent at the open of trading. Shares closed up 183 percent.

🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️

This is like my friend who "invested" in Doggy (not Doge) coin "because it was going to explode and become highly valuable" even though it was only worth like .1% of what Doge was worth like two years back..... He's a teacher.

Or my other friend that invested thousands in Etherium like 2 years back, while knowing basically nothing about "The Etherium Network", or anything crypto related. He just knew that he could potentially make money off of it like he could with stocks. I asked him like a year later if he ever made anything off of it and he said "not really", and said he had reinvested the money into other things (I forget which, it wasn't crypto related) 🤣

1 more...

I despise this use of mod power in response to a protest. It's our content to be sabotaged if we want - if Stack Overlords disagree then to hell with them.

I'll add Stack Overflow to my personal ban list, just below Reddit.

Once submitted to stack overflow/Reddit/literally every platform, it's no longer your content. It sucks, but you've implicitly agreed to it when creating your account.

While true, it's stupid that things are that way. They shouldn't be able to hide behind the idea that "we're not responsible for what our users publish, we're more like a public forum" while also having total ownership over that content.

1 more...
1 more...

I fully understand why they are doing this, but we are just losing a mass of really useful knowledge. What a shame...

And while it hurts now, it's REALLY going to hurt when large swaths of useful answers that don't exist anywhere else are gone and there's nothing replacing them.

Noone writes hundreds of pages of documentation for their stuff anymore. Without the collected knowledge learned from experience there, what do we have?

Unless we have source code to read, very little.

I'm still feeling the pain of google search results sucking combined with most of the large coding forums being gone and reddit slowly going to garbage. Stack Overflow was the last bastion of collected knowledge of it's type... and it's not like it was 25 years ago where we still had phonebook-sized manuals for almost all major software because agile has killed the concept of exhaustive definitive documentation for a given version of something.

I used to sorta roll my eyes at people shouting about federating everything, but at this point I'm scared and agreeing with them.

Vandalism is always reverted on SO, even if done by the original author. No knowledge is lost. Suing OA for violating the CC-BY license might be possible, but I'd wager SO is not interested in suing them, and since they hold the rights, not much can be done by others.

Eventually, we will need a fediverse version of StackOverflow, Quora, etc.

Those would be harvested to train LLMs even without asking first. 😐

At this point I’m assuming most if not all of these content deals are essentially retroactive. They already scrapped the content and found it useful enough to try and secure future use, or at least exclude competitors.

They scraped the content, liked the results, and are only making these deals because it's cheaper than getting sued.

Can they really sue (with a chance of winning) if you scrape content that's submitted by users? That's insane.

Honestly? I'm down with that. And when the LLM's end up pricing themselves out of usefulness, we'll still have the fediverse version. Having free sites on the net with solid crowd-sourced information is never a bad thing even if other people pick up the data and use it.

It's when private sites like Duolingo and Reddit crowd source the information and then slowly crank down the free aspect that we have the problems.

The Ad sponsored web model is not viable forever.

The Ad sponsored web model is not viable forever.

a thousand times this

I’d rather the harvesting be open to all than only the company hosting it.

Assuming the federated version allowed contributor-chosen licenses (similar to GitHub), any harvesting in violation of the license would be subject to legal action.

Contrast that with Stack Exchange, where I assume the terms dictated by Stack Exchange deprive contributors of recourse.

SO already was. Not even harvested as much as handed to them. Periodic data dumps and a general forced commitment to open information were a big part of the reason they won out over other sites that used to compete with them. SO most likely wouldn't have existed if Experts Exchange didn't paywall their entire site.

As with everything else, AI companies believe their training data operates under fair use, so they will discard the CC-SA-4.0 license requirements regardless of whether this deal exists. (And if a court ever finds it's not fair use, they are so many layers of fucked that this situation won't even register.)

4 more...

Not fediverse, but open-source and community run: https://codidact.com

Oh this looks decent. British non-profit, I like it. Registering.

Smells too much like duo-lingo. Here, everyone jump in and answers all the questions. 5 years later, ohh look at this gold mine of community data we own....

This was actually the whole original point of Duolingo. The founder previously created Recaptcha to crowd source machine vision of scanned books.

His whole thing is crowd sourcing difficult tasks that machines struggle with by providing some sort of reason to do it (prevent spam at first and learn a language now)

From what I understand Duolingo just got too popular and the subscription service they offer made them enough money to be happy with.

1 more...
1 more...
1 more...

Everything you write on here is public. There's nothing stopping anyone from using that data for training

Yeah but didn't you see the sovereign citizens who think licenses are magic posting giant copyright notices after their posts? Lol

It's so childish, ai tools will help billions of the poorest people access life saving knowledge and services, help open source devs like myself create tools that free people from the clutches of capitalism, but they like living in a world of inequity because their generational wealth earned from centuries of exploitation of the impoverished allows them a better education, better healthcare, and better living standards than the billions of impoverished people on the planet so they'll fight to maintain their privilege even if they're fighting against their own life getting better too. The most pathetic thing is they pretend to be fighting a moral crusade, as if using the answers they freely posted and never expected anything in return for is a real injustice!

And yes I know people are going to pretend that they think tech bros won't allow poor people to use their tech and they base this on assuming how everything always works will suddenly just flip Into reverse at some point or something? Like how mobile phones are only for rich people and only rich people can sell via the internet and only rich people can start a YouTube channel...

We already have the SO data. We could populate such a tool with it and start from there.

9 more...

Data should be socialized and machine learning algorithms should be nationalized for public use.

Better yet, copyright should be abolished completely.

It should stay for creative works but that's it. It should protect people who actually write books, compose music, make art, and sing. It shouldn't be held by corporations forever by leeching off their workers.

Creative works of individuals specially... Corporations should explicitly be deemed not people and not possessing of the same rights as people and the fact that needs to be said just goes to show how far down the shit hole we've fallen

Corporations should be outlawed from owning houses and land as well. Maybe they can own the building, but they must be forced to rent the land from Us.

4 more...
4 more...

Why does OpenAI want 10 year old answers about using jQuery whenever anyone posts a JavaScript question, followed by aggressive policing of what is and isn't acceptable to re-ask as technology moves on?

3 more...

I'm going to run out of sites at this pace.

Right? It seems like the modern internet is made up of like 5 monolithic sites, and unlimited SEO spam.

I know that's not literally true, but it sure feels like it.

1 more...

Rather than delete, modify the question so its wrong. Then the ai will hallucinate.

I just expect to insult the user while not answering the question.

As a large language model, I expect you to use the search function. Asshole.

Have you tried to read the fucking manual you filthy lazy fuck? Marked as solved. Is there anything else i can do to help you? 😊

Reddit did almost the same and don't forget guys to delete your Reddit account

It won't matter, they would have all of your comments archived already. Even if you overwrite them AI will be scraping the copies they keep.

it creates a lot of poisoned data especially if you like edit half your posts with nonsense

That's trivial to filter if you just look at how much time has passed between posting and editing. Reddit comments are only very rarely updated after more than a day.

1 more...
1 more...
1 more...
3 more...

A malicious response by users would be to employ an LLM instructed to write plausibly sounding but very wrong answers to historical and current questions, then an army of users upvoting the known wrong answer while downvoting accurate ones. This would poison the data I would think.

All use of generative AI (e.g., ChatGPT1 and other LLMs) is banned when posting content on Stack Overflow. This includes "asking" the question to an AI generator then copy-pasting its output as well as using an AI generator to "reword" your answers.

Ironic, isn't it?

Interestingly I see nothing in that policy that would dis-allow machine generated downvotes on proper answers and machine generated upvotes on incorrect ones. So even if LLMs are banned from posting questions or comments, looks like Stackoverflow is perfectly fine with bots voting.

Sounds like it would require some significant resources to combat.

That said, that plan comes at a cost to presumably innocent users who will bark up the wrong trees.

Maybe we need a technical questions and answers siteon the fediverse!

Not gonna stop your knowledge being fed to an AI.

Is there an actual way to stop it? I don't think so. At least, moving to the fediverse would stop any particular corporation from having the monopoly of it, prevent reddit-like abuse of power, would give users more power, among a few other things.

22 more...

Nothing stopping them from scraping that too

22 more...

While I think the reaction of StackOverflow is not good, I don't understand the users either.

EDIT: seems like the language model won't be free, I understand then.

OpenAI is a terribly misleading name.

That is how it started. It was a non-profit with the goal to release all their patents and research for free.

That lasted for a few years, and then the people running it realized they could instead all become filthy rich and nobody could do anything about it. So they did that.

But don't worry, they are a capped for-profit now! They can only make 100 time the amount of money as they have investments. So they'll stop when they have reached ... checks notes.... Around $1.3 trillion.

Aren’t a lot of answers outdated on stackoverflow?

You are now banned from stackoverflow

1 more...

Half the time I look on stack overflow it feels like the answer is irrelevant by todays standards

That's what happens when new posts aren't allowed to exist if it asks a similar question to an old one.

This question is deleted for off topic

1 more...

I will answer some questions with my old account using gpt 4 to poison the data.

If you want to poison SO a little at the same time providing valid answers that help users, use outlook.com email domain for new accounts. It seems to not have anti throwaway countermeasures while being accepted by SO. And it seems fitting to bash the corporate with the corporate.

2 more...

lol wow this is going even more poorly than I thought it would, and I thought my kneejerk reaction to the initial announcement was quite pessimistic.

If we can't delete our questions and answers, can we poison the well by uploading masses of shitty questions and answers? If they like AI we could have it help us generate them.

Poison the well by using AI-generated comments and answers. There isn't currently a way to reliably determine if content is human or AI-generated, and training AI on AI is the equivalent of inbreeding.

The poison was there all along the way. The poison is us

Inserts spider man meme

1 more...

Angry users claim they are enabled to delete their own content from the site through the "right to forget," a common name for a legal right most effectively codified into law through the EU's General Data Protection Regulation (GDPR). Among other things, the act protects the ability of the consumer to delete their own data from a website, and to have data about them removed upon request. However, Stack Overflow's Terms of Service contains a clause carving out Stack Overflow's irrevocable ownership of all content subscribers provide to the site

It reality irritates me when ToS simply state they will do against the law.

It's not quite that simple, though. GDPR is only concerned with personally identifiable information. Answers and comments on SO rarely contain that kind of information as long as you delete the username on them, so it's not technically against GDPR if you keep the contents.

You could argue that people can be identified by their writing style. I have no idea how far you'd get with that though.

Frankly I don’t see any way whatsoever that this would fly, and that’s a good thing!

Imagine what it would mean for software-development if one angry dev could request the deletion of all their contributions at a moments notice by pointing to a right to be forgotten. Documentation is really not meaningfully different from that.

Instead of solely deleting content, what if authors had instead moved their content/answers to something self-owned? Can SO even claim ownership legally of the content on their site? Seems iffy in my own, ignorant take.

Everything you submit to StackOverflow is licensed under either MIT or CC depending on when you submitted it.

Regardless of the license (apart perhaps from public domain) it is legally still your copyright, since you produced the content. Pretty sure in EU they cannot prevent you from deleting your content.

But those two licenses give everyone an irrevocable right to do certain things with your content forever and displaying it on a website is one of those things (assuming they follow the other requirements of the license).

1 more...
4 more...

So does that mean anyone is allowed to use said content for whatever purposes they'd like? That'd include AI stuff too I think? Interesting twist there, hadn't thought about it like this yet. Essentially posters would be agreeing to share that data/info publically. No different than someone learning how to code from looking at examples made by their professors or someone else doing the teaching/talking I suppose. Hmm.

CC (not sure about MIT) virtually always requires attribution, but as GitHub Copilot showed right now open-"media" authors have basically no way of enforcing their rights.

2 more...
3 more...
9 more...

They can. It's in the TOS when you make your account. They own everything you post to the site.

5 more...
14 more...

For years, the site had a standing policy that prevented the use of generative AI in writing or rewording any questions or answers posted. Moderators were allowed and encouraged to use AI-detection software when reviewing posts. Beginning last week, however, the company began a rapid about-face in its public policy towards AI.

I listened to an episode of The Daily on AI, and the stuff they fed into to engines included the entire Internet. They literally ran out of things to feed it. That's why YouTube created their auto-generated subtitles - literally, so that they would have more material to feed into their LLMs. I fully expect reddit to be bought out/merged within the next six months or so. They are desperate for more material to feed the machine. Everything is going to end up going to an LLM somewhere.

I think auto generated subtitles were to fulfil a FCC requirement, some years ago, for content subtitling. It has however turned out super useful for LLM feeding.

2 more...

If i was stack overflow I would've transferred my backups to OpenAI weeks before the announcement for this very reason.

This is also assuming the LLMs weren't already fed with scraped SO data years ago.

It's a small act of rebellion but SO already has your data and they'll do whatever they want with it, including mine.

It’s true that it’s mostly a symbolic act, but the rebellion matters, especially from old accounts. It’s also a nice way to mark the time after which I never participated in SO again. After my ban expires, I’ll deface my questions again. And again. Until they permaban me.

There’s also the possibility of adding to the wonderful irony of making the AI more useful than the original by having content that’s no longer accessible through through the original. It doesn’t get more enshittified than that, even if Prashanth Chandrasekar is too out of touch to ever regret his decision.

OpenAI clearly already scraped the pre-LLM (aka actually useful) content from SO, this entire deal is happening after the fact to avoid litigation.

I think you're 100% correct in assuming they've already fed it data scraped from SO. I've previously gotten code samples from ChatGPT that was clearly from SO down to the comments in the code. Even reverse searched some of the code and found the question it was from.

Weren't all the answers already trained on ChatGPT last year?

They seem to only be watching the questions right now. You’re automatically prevented from deleting an accepted answer, but if you answered your own question (maybe because SO was useless for certain niche questions a decade ago so you kept digging and found your own solution), you can unaccept your answer first and then delete it.

I got a 30 day ban for “defacing” a few of my 10+ year old questions after moderators promptly reverted the edits. But they seem to have missed where I unaccepted and deleted my answers, even as they hang out in an undeletable state (showing up red for me and hidden for others).

And comments, which are a key part to properly understanding a lot of almost-correct answers, don’t seem to be afforded revision history or to have deletes noticed by moderators.

So it seems like you can still delete a bunch of your content, just not the questions. Do with that what you will.

Can we change our answers? Change your answers to garbage, don't delete them. Do it slowly.

If you have low karma, then edits are reviewed by multiple people before the edit is saved. That's primarily in place to prevent spam, who could otherwise post a valid question then edit it a few months later transforming the message into a link to some shitty website.

Even with high karma, that just means your edit is temporarily trusted. It's gets reviewed and will be reverted if it's a bad edit.

And any time an edit is reverted, that's a knock against your karma. There's a community enforced requirement for all edits to be a measurable improvement.

Even moderation decisions are reviewed by multiple people - so if someone rejects a post because it's spam, when they should have rejected it because it's off topic (or approved it) then that is also going to be caught and undone. And any harmful contribution (edit or moderation decision) will result in your action being undone and your karma going down. If your karma goes down too fast, your access to the site is revoked. If you do something really bad, then they'll ban your IP address.

Moderators can also lock a controversial post, so only people with high karma can touch it at all.

... keep in mind Stack Overflow doesn't just allow editing your own posts, you can edit any content on the website, similar to wikipedia.

It's honestly a good overall approach, but around when Jeff Attwood left in 2008 it started drifting off course towards the shit show that is stack overflow today.

It's a shame, only corporate are going to be benefiting from hard work & labour of so many talented people.

If the Stack Overflow site remains available then it still serves the same purpose it did before. I personally use ad blockers and don’t pay to use the site, which must not be cheap to operate. The bigger problem is if talented people refuse to share their expertise with people like me because they aren’t being compensated for their efforts.

In the article the dude was banned for 7 days for changing his answer.

So wait a few days, then do it slowly.

I'm almost sure the site has already been scrapped of current contest for the LLM.

Yup, but that's not the point IMO, it's to remove quality content from the site so visitors see how crappy it is and stop using it.

Great idea. Then I’ll turn to ChatGPT for higher quality answers.

I don't understand what anyone wins from this

Corporations are foundationally evil

And how do they not win more if we poison the entire Internet?

It's like being in a toxic relationship with kids involved

Set boundaries

Follow rules

Don't destroy the fucking fruit of your bodies just because you are angry at each other

Fuck those guys, like a lot, for taking your given data and selling

And fuck open ai for trying to make money from scientific discoveries meant for all of humanity

But what the fuck with ruining the entire Internet?

Who gets anything then?

If language models will ruin Internet why be afraid that normal human responses are available? Wut?

Maybe a better act of rebellion would be to scrape the data on stack, self host it, and move to an open source platform. Easy for me to say though, when I only ever coded Hello World

1 more...

Maybe we should start asking questions that iterate loops billions of times. Something semi-malicious that a person would recognize but an AI wouldn't.

Nah, the training data probably doesn't quite work that way. The AI would be very unlikely to test code, just regurgitate the most likely response based on it's training sets. Instead just filling posts with random bits and pieces of unrelated code and responses might be better.

The reddit Steve method again.

I mean, they could just do what reddit does and restore from backup automatically lol

This sort of thing is so self-sabotaging. The website already has your comment, and a license to use it. By deleting your stuff from the web you only ensure that the AI is definitely going to be the better resource to go to for answers.

I'm not sure about that... in Europe don't you have the right to insist that a website no longer use your content?

Not when you've agreed to a terms of service that hands over ownership of your content to Stack Overflow, leaving you merely licensed to use your own content.

1 more...
2 more...

Also backups and deleted flags. Whatever comment you submitted is likely backed up already and even if you click the delete button you're likely only just changing a flag.

2 more...

That's why I'm not going to bother contributing to future content.

4 more...

Were they trying to protect ChatGPT from all the bad and convoluted answers?

Frankly, the solution here isn’t vandalism, it’s setting up a competing side and copying the content over. The license of stackoverflow makes that explicitly legal. Anything else is just playing around and hoping that a company acts against its own interests, which has rarely ever worked before.

3 more...