How big would be the backlash and consequences if one of the instances for eg: .world or .ml turned out to be selling data ?

Ask Lemmy@lemmy.world – 40 points – 4 months ago

Is there any laws against it ? Will the admins walk scot free ? This question just popped into my head its not serious but do feel free to answer .

Why pay money for it if you can simply set up your own instance and get it all anyway?

Perhaps, you won't get IPs, e-mails, passwords, useragents, and other such data. I don't see why there would be any reason to transfer anything past account ID + data.

That stuff would all be considered PII so it would have major GDPR implications. I also don't know if any of that would even be valuable though, except to 3 letter agencies.

Does GDPR apply to non-corporate systems though? Lemmy isn't a corporation right, just a group of people creating code that can run and interface between other self hosted servers

Yes. To my understanding gdpr doesn't care who you are, if you have users and you track their data then you're covered under it.

Interesting, I feel that would be extremely hard to action on in a federated system.

Well, no. Because users choose what they share on the fediverse by writing it and posting it.
Servers processing IP, User Agents, Emails etc as part of security is not part of the agreement to share with the fediverse.

So, an instance that federates will be able to receive the publicly shared information for free (usernames, displaynames, profiles, posts & comments). They wont get any PII that a user does not explicitly share (by writing it in a comment).
But if an instance started selling the information of their own users, then that would be in violation of GDPR.

Yep, only the necessary data is federated. The other relevant data that's logged (which is much less than what other social media platforms collect to be fair) could potentially be abused

They couldn't sell the data - someone who wanted the data would just start their own benign looking federated instance and get the data for free

I think only the instance that the person reads from gets that person's click trail. The sending and receiving instances get the private messages between users A and B, but I don't know if other instances get those. I do think it's an anti-privacy design in Lemmy that the person's read actions are logged. I would change the architecture to avoid that among other things. Alternatively I think of running my own instance just to avoid leaking this info.

There are some other privacy concerns I have with lemmy design choices -- like, it's not going to be hard for a random user out there to get a given Lemmy user's IP address, which is kind of asking for trouble. Like, even aside from doxing potential, let's say that someone gets pissed in a discussion and decides to DDoS the other user's connection or something like that.

IRC had issues with that.

I get the impression that lemmy's designers wanted to build a meme propagation system rather than a discussion forum. Well they got what they wanted.

Ehhh. As much as I have annoyances with the devs on some issues, I think that it's more that it's just hard to design a distributed system like this without thinking of all the tradeoffs and security and privacy issues.

Like, there were some cross-site scripting issues in the past in lemmy. I didn't spend a lot of time looking into them, but there were some web dev types who were kinda scathing, said that this is something that an experienced Web dev should know about. But I don't think that the lemmy devs thought "oh, let's add cross-site scripting security holes". I think that it was probably just that they didn't have someone with a lot of Web security experience -- which is its own little unique field -- looking at what they were doing.

If you want to permit for inline images -- which may or may not be a good idea, agree that they aren't essential -- then there are going to be tradeoffs. If you have a user's home instance fetch and serve all the images, which is what they do with comment text, then that avoids exposing a user's IP on comment view to random other people...but then it also increases bandwidth costs to run a lemmy instance. Maybe by a lot. And if instances are mutating comments to redirect images to be versions that they host, then if you want to do pubkey/privkey signing of comments, which might be a good idea down the road, then you're gonna introduce more complexity, because that'd invalidate a comment's signature. Lemmy would have to do something like expose both the original comment and the mutated comment and let a client validate the signature. Maybe have a signature on images to ensure that another instance isn't just replacing the images with something else. But then that maybe breaks if a remote site generates an image dynamically and its content changes every time it's served. Lot of tradeoffs and unintended side effects. And it's a distributed system with different people who may or may not trust various other people to do various things and may not all agree on what acceptable risks are.

I use a VPN. Good luck whoever decides to DDoS the CIA, FBI and/or NSA.

FBI/NSA/CIA man: don't pay attention to me, as you can see all I do is torrent shit movies and watch porn.

Selling data?

Bro its all out there for free now.

Tangentially related, but that's the whole reason I'm on Lemmy. If my instance admins do anything I'm not on board with, I have the option to change instances with minimal disturbance in my overall Lemmy experience.

What would they sell? Upvotes? IP addresses?

The content of our body text posts and comments for LLM bullshit.

They don't even need to federate for that, just bots.

Yeah but the OP is about if the instance owners themselves were selling it. They already have access to everything without needing to scrape it with a bot.

Why would they pay for that rather than just setting up an instance and federating?

IP is probably my guess as everything else can be freely collected by anyone.

Well, I would close my account. Dunno how many other people would do the same. But I'm getting pretty tired of being considered just a target for ads and products, and given the ethos of the federated community, would be a huge violation of trust in my opinion over and above normal bcz they have all sworn that they don't and won't do that.

Already quit one social media site I was on for over a decade. Wouldn't be any worse than that.

Sorry to tell you, but all your Lemmy information is already in all kinds of data broker databases. Data on Lemmy isn't private, like, at all. It's the whole philosophy of the federated ecosystem.

Suppose for a moment that you're distributing free samples of whatever. Is there any law of some country forbidding me from gathering those samples and selling them? Probably not. But it's still a shitty and immoral move, and people would be [IMO rightfully] disgusted by my actions.

I think that it's the same deal here. We're basically granting our content for free, to our instances and everyone whose instances federate with. It would be really hard to sue the admins for selling it to a third party, but I don't expect their userbases or other instances to be exactly happy with it - they'd probably get defederated, and users would flock to other instances.

I don't think defederation would do anything, as it's a blacklist, not a whitelist.

Let's look at your proposed situation. Say I create Instance A. A year later it becomes clear that I'm providing data to someone else. Instances defederate with me.

Now I create Instance B and repeat.

I'm focusing mostly on active instances (with users and comms), as OP exemplified with .ml or .world. For those, redoing the process to harvest more data isn't simply a matter of "create instance B and repeat" - it takes ages to grow an instance to such a large size.

What you're talking about is a different albeit related situation: someone could create an instance right off the bat to harvest data, stay low-key, and once detected flush out and repeat with another instance. I think that this concern wasn't addressed yet simply because it isn't a big of an issue; but once it becomes an issue, perhaps the guarantee and endorsement systems could be reused to weed those out.

It's not like we'll not see it coming Once threads.net starts federating with the big instances, it's all over

The information that can be gathered and sold is pretty intense, even when your account is on another instance

All your posts and comments (this is public information, no cause for alarm)
Everything you upvote/downvote
Every time you load an image from another instance (link tracking, can probably be correlated with your IP address you signed into facebook with)
With enough posts in your feed tracking you, a general picture of your interests and time of day you interact with lemmy will form, which can be sold

Time-of-day tracking by third parties already happened on Reddit, had external sites dedicated to it. Not Reddit selling it, just that that information necessarily gets exposed and can be mined. So that's not really unique to the Threadiverse. Though being aware of it is probably a good idea.

What are they gonna do with that though? Not like they can show you ads. And again, they can do that without federating.

To my knowledge, the only universal law would be that people cannot lie about giving away personal info. If the site said upfront and outright that information was given away, I am almost positive they could get away with it, similar to how contracts normally work.

Are there any laws against it where? You need to specify a jurisdiction.

My main reaction would be that whoever is paying for that data is a fool. It's available for free.

Three-letter agencies are already ripping whole Lemmy. And I bet "AI" corps too, So they don't need to sell a fuck.

You are asking for a friend, right? ;)

Of course

Does that friend happen to be the admin of a large Lemmy instance?

Putting my two cents here, first, lemmy is not a unified thing, its a concept just like the internet, in that since you can't sue lemmy anymore than you can sue the internet. Theoretically, if you can, which entity will it be, the people who make the code, the admins of each individual instance.you would need to take each individual instance, each in a different court case, which can take months or years and shitton of money, since your data is shared between instance and since any individual can host an instance, you may be looking at a court case per person since each is a separate enity. There are also wanting list so you have to wait months or even an year between cases for different instances

And like realistically, at most one person will try to sue instance. What complicats this more is that we are talking about different countries with different laws and different relations. People talk about the GDPR like its the Gospel, when in reality its so much more complicated than that. For Europe, Big Tech is the pressing issue for now, so lemmy isn't gonna be making headlines anytime soon, and moreover this includes mastodon, kin etc, pretty much any federvise that can view lemmy instance, which the same applys to them.

Lastly, decentralization was never about privacy, at least in the beginning, infact its probably worst for it, since you have to trust hundreds if not more parties with your data. Ironically, decentralization was made specifically for those cases, lemmy spirit can live on without any specific instance, the software is open and anyone fork or use , so there is no head to cut.