Data contamination expert ๐Ÿ‘Œ

ElCanut@jlai.lu to Lemmy Shitpost@lemmy.world – 1344 points –
86

You are viewing a single comment

I used some tools to corrupt about 10 years of comments and posts of mine.

While that's the correct thing to do in my opinion, it would be a mistake to assume that Reddit didn't store your original comments.

By corrupting their dataset, you may actually be helping them recognize maliciously edited comments.

it would be a mistake to assume that Reddit didn't store your original comments.

They were fairly specific about not doing that (I'd imagine largely because of GDPR).

I deleted 10 years of "content" before I left and checked their policies. They apparently actually do properly delete from their servers.

I've got a bridge in the desert I'd like to sell you.

GDPR is no joke. Storing a handful of comments is not worth the penalty if they get caught.

Note that I speak from experience as part of a company that needs to comply with the regulations. We do it because the risk of violation is 10000000% not worth it no matter how annoying and arduous it is to comply.

But the GDPR only covers European users tho.

That's true but it's far easier to globally implement rather than trying to segment. Very difficult to accurately prove a user isn't EU resident across an entire userbase.

That's probably why they don't let you access Reddit with a VPN, so they can have some idea of location.

Yeah, I mean I knew that when I was doing it.

Sometimes all you can do is make a symbolic gesture that really does nothing, and even if it does nothing, you should still do it.

Probably leaving and supporting lemmy by paying for some developer fees (i'm on the patreon), posting and commenting, probably 100x more damaging to Reddit.

FWIW, I requested an old reddit accounts data the other day under CCPA and all the contamination was in there. My guess is their backend updates every so often. i guess i made a good call to edit my comments and leave them there to simmer before i deleted them along with the account. perhaps this is the way?

Mass edits made rapidly are obviously suspect, too... If the same user edits anything more than a dozen comments in, say a minute, you have to ask what's going on

Can't post a genius idea like this one without posting the links of the tools

Its not my idea, but I could probably dig up the tool I used. Dollars to donuts, it doesn't work any more.

This might have been the tool I used. I dont think so because I overwrote everything with one message, but google around you'll find similar.

https://github.com/adriantache/YARCO

If you overwrote with a single message, then your messages are back to what they were.

Not necessarily true. I overwrote several thousand comments with a different tool and used three different quotes on greed. I have periodically checked and about two dozen came back. I just manually changed them at that point.

This would be better if it fed the parent comment into ChatGPT prefixed with โ€œcreate a plausible but factually incorrect aggressive response to โ€

Feed the machine to the machine!

A tool like that would almost definitely require api access to function. If that was still possible, most of us wouldn't be here having this conversation.

A tool like that would almost definitely require api access to function. If that was still possible, most of us wouldnโ€™t be here having this conversation.

No it didn't use the API. You had to run it in browser and be logged in to reddit.

The tool I used had an extension for Firefox. You then used that Reddit extension so you could get more scrolling on your post history. Then you pressed a button and it would insert gibberish for all comments and posts. Then youโ€™d go next page and do it again.

I think Reddit caught on to this. I tried destroying my comment history (~7 years with 600k karma) with a few of the available tool on GitHub.

Found my account permabanned next time trying to login. People should attempt to eliminate/poison as much as possible, but Reddit has all the comments and modifications in a database somewhere to sell it all to whatever AI is the highest bidder.

They have to do something to make money after taking away awards. The advertising is absolute shit and not worth the $100 entry fee.

I edited mine via a tool to say fuck Reddit and Steve Huffman is a greedy pig boy.

What do you mean by corrupt?

I used a tool that edited my comments to replace it with gibberish. Supposedly Reddit still retains deleted comments but if you edit them, it only keeps the latest version. So by editing it you make the comments worthless.

I also edited my comments to be basically a Lemmy ad and completely deleted the posts except in a few communities where it could be helpful in the future.

What tool? I'd like to use it as well.

I used redact.dev

Thank you

Edit - This worked great thank you. Was able to scrub my Twitter as well.

Just redacted 5 years worth of comments with this. Now to let my account sit for a few months so their backups have only my latest masterpieces. Thanks!!!

I ran a script over all of my comments (through my browser) to edit them into something about how spez had back stabbed the community. I had tens? hundreds of thousands? of comments.

It took several hours to run, but I did a forward pass (newest to oldest) and a backwards pass (oldest to newest). It bugged out because it had to run so long but I think I got it all.

I'm not sure this will really do anything because you could pretty easily statistically isolate any one who did what I did, and roll their account history back to a prior state in the training data.

Regardless, it was the least I could do on the way out the door.

I simply got permabanned and my account disappeared.

It replaces them with gibberish. I did the same for my 12+ years worth.