You'll waste more time trying to figure out how to do this than it would take to move a monitor and keyboard to the server, do the install, and plug the monitor and keyboard back into your main computer. Once the server is up, you can administer it over the network via ssh.

I was wondering what that ominous music was when I woke up this morning

"The world seeing [their] work" is not equal to "Some random company selling access to their regurgitated content, used without permission after explicitly attempting to block it".

LLMs and image generators - that weren't trained on content that is wholly owned by the group creating the model - is theft.

Not saying LLMs and image generators are innately thievery. It's like the whole "illegal mp3" argument. mp3s are just files with compressed audio. If they contain copyrighted work, and obtained illegitimately, THEN their thievery. Same with content generators.

I'll grant that PHP is set up to allow some super shitty code, but on fairness to the language; WordPress is a dumpster fire (compounded by endless awful plugins). That's compounded by it's ubiquity, so it's a massive target.

I just set up mbin as a single-user instance, and other than a bug I found (that they fixed live with me, in chat, including PRs), it's been awesome.

I hope your instance continues to work well for you 👍

There is /kbin which seems down all the time and its fork MBin which seems to have a good community but is written in PHP which I try to avoid.

Can you expand on the reasoning for avoiding PHP? I get avoiding Java; JRE it's s disaster, and a resource hog.

In fairness, a lot of the more exceptional engineers I've worked with couldn't write their way out of a wet paper bag.

On top of that, even great technical writers are often bad at picking - or sticking with - an appropriate target audience.

My first distribution was Slackware 7.1 when I was in high school. It took a week to download the .iso on dialup, and I had to use a download manager (GetRight) so that I could resume the partial download any time the connection dropped (usually because someone had to use the phone).

I'm old o_o

I still vividly remember not being able to figure out how to install new packages, or knowing how to compile from source.

Never mind the hazards of producing it; It's fucking annoying to look at while the sun it out.

I live in Arizona, so double fuck me.

I was hired at a small company a number of years ago. Contract-to-hire. One of those "we want to see you prove yourself before we actually hire you" deals. My role was to take over all of technical operations (cloud architecture, sysadmin, desktop support, the whole deal), so that the CTO didn't have to do it all himself.

One time - about a week in - I spent the entire day playing with kinetic sand in the main lobby (which was in full view of every developer and the CTO). Mostly, I was building little bricks (something like 0.5x1x2cm), and stacking them in a 2 sided 90 degree wall.

When asked what I was doing by several people throughout the day, I said "I'm rebuilding your network". I'm certain I looked like a crazy person. Honestly, it's not a totally invalid assessment in general, even now.

What I was actually doing was planning out the subnets, ACLs, and general routing for a series of servers (web front-ends, api servers, DB servers, etc), and weighing the pros and cons of AWS LBs vs HAProxy for various applications.

Over the next few days, I built out the new network and started migrating legacy servers into it. I demo'd the process and accompanied documentation (which I mostly kept in case I had to build another network, or rebuild this one after some catastrophic total loss), and they seemed impressed.

My 3 month contract was converted to direct-hire within 3 weeks, after a number of other enhancements (like centralized ssh auth via OpenLDAP - rather than everyone sharing the same default user RSA key - and total systems monitoring via Nagios). Each one came with about a day's worth of playing with some fidget or fixing some non-technical thing (like hanging a bunch of framed items in the lobby, which they'd been meaning to do, but wasn't a high priority, especially for the technical staff).

They'd have had all the reason in the world to assume the new guy was full of shit and was about to wash out, but after that they assumed that when I looked like I was majorly slacking off (usually well away from my desk, tinkering with something mindless) that I was about to build some new thing into the network, or up-end a process, or some other crazy (but ultimately useful) thing.

They definitely didn't mind when I would pace and talk to myself like a nut-bar (which I did/do frequently).

This "fair use" argument is excellent if used specifically in the context of "education, not commercialization". Best one I've seen yet, actually.

The only problem is that isn't marketing itself as educational, or as a commentary on the work, or as parody. They tout themselves as a search engine. They also have paid "pro" and "enterprise" plans. Do you think they're specifically contextualizing their training data based on which user is asking the question? I absolutely do not.

I agree that their replies are a little... over the top. That's all kind of a distraction from the main topic though, isn't it? Do we really need to be rendering armchair diagnoses about someone we know very little about?

I mean, if I posted a legitimate concern - with evidence - and I was dog-piled with a bunch of responses that I was a nutter, I'd probably go on the defensive too. Some people don't know how to handle criticism or stressful interactions, it doesn't mean we should necessarily write them (or their verified concerns) off.

Why bother? Paid, non-transferable cloud backups, low-spec hardware that wears out in a few months, over-hyped/half-finished games (assuming they're ever released), back catalogs that aren't available if you don't subscribe or repurchase every generation... Just skip em.

If you want AAA games, there's plenty you can play mobile or on PC (or both), or if you specifically want indie, there's plenty of them too on , individual websites, and steam (among many others; GoG, HumbleBundle, etc). You frequently don't even need to pay for these games, since a lot of them are free or via user-decided donations (mostly re: indies).

Hardware that can run them range everywhere from GPD handhelds to Steam Deck to any number of either's competitors, and they also function as more than just game machines since they run either Linux or Windows.

Nintendo who?

"Your honor, we can use whatever data we want because model training is probably fair use, or whatever".

I don't know what's worse, the fact that you think creators don't have the right to dictate how their works are used, or that you apparently have no idea what fair use is.

This might help;

My wife and I went to see the eclipse (it as our honey moon, literally) a few months ago and I had an identical experience xD

"Holy shit, are these laser-beams of sun cutting across the back of my eyeballs all the time?"

Mind you, it's anything shiny, not just chrome, but why add to the problem?

I always use the browser versions (partly because I don't like installing things, and partly because I run Linux), so it pretty much always shows me away. And I don't care.

Eh. This is not a new argument, and not the first evidence of it. I don't think you're gonna be high on their list of retaliation targets, if you register at all (to say nothing of the low-to-middling reach of the fediverse in general).

Hell, just look at photographers/painters v. image generators, or the novel/article/technical authors v. ... practically all LLMs really, or any other of a dozen major stories about "AI" absorbing content and spitting out huge chunks of essentially unmodified code/writing/images.

Hello fellow mbin user! I just got my personal instance set up 👍

rsync can resume partial transfers, but you really should break that file up. Trying to do it in one go is crazy.

you got some criticism and now you’re saying everyone else is a bot or has an agenda

Please look up ad hominem, and stop doing it. Yes, their responses are a distraction from the topic at hand, but so were the random posts calling OP paranoid. I'd have been on the defensive too.

[Our company] publish[s] open source work ... anyone is free to use it for any purpose, AI training included

Great, I hope this makes the models better. But you made that decision. OP clearly didn't. In fact, they attempted to use several methods to explicitly block it, and the model trainers did it anyway.

I think that the anti-AI hysteria is stupid virtue signaling for luddites

Many loudly outspoken figures against the use of stolen data for the training of generative models work in the tech industry, myself included (I've been in the industry for over two decades). We're far from Luddites.

LLMs are here

I've heard this used as a justification for using them, and reasonable people can discuss the merits of the technology in various contexts. However, this is not a justification for defending the blatant theft of content to train the models.

whether or not they train on your random project isn’t going to affect them in any meaningful way

And yet, they did it while ignoring explicit instructions to the contrary.

there are more than enough fully open source works to train on

I agree, and model trainers should use that content, instead of whatever they happen to grab off every site they happen to scrape.

Better to have your work included so that the LLM can recommend it to people or answer questions about it

I agree if you give permission for model trainers to do so. That's not what happened here.

Agreed on all counts.

My reply initially had a "if you had a fleet of these things..." addendum, but OP's post read (to me) as though he was converting commodity hardware into a makeshift home server, so I removed it because it was almost certainly not relevant.

Oh I remember those disks :D I think I had to either pull them off the ISO, or download them separately so that I could boot the system to the point where A: the install could occur at all and B: it had enough drivers to use the CD-ROM drive XD

I'm not quite sure who's argument you're making here. It reads like you agree with OP and I (e.g. "LLMs shouldn't be using other people's content without permission", et al).

But you called OP paranoid... I assumed because you thought OP thought their content was being used without their permission. And it's extremely clear that this is what is happening...

What am I missing?

I already replied to the essence of this in my reply to your other post about how "illegal downloads aren't theft because its a copy", but I'll mention here that this is even more evidence that you aren't a creator, and I suggest that your opinions on this subject aren't relevant, and you should avoid subjecting other people to them.

"evidence suggests that you probably aren't a creator" "As a result, I suggests that your opinions aren't relevant"

Aside from the fact that these are not character attacks, I encourage you to refute my assumptions. Otherwise, my points will stand on their own.

The number of hours I put into figuring out what X was, the difference between XFree86 and X.ORG , fixing resolution and DPI issues, installing video card drivers (mostly nVidia)... I think all that tinkering prepared me for my career as a systems admin.

I think Slackware came with KDE, which is probably why I leaned toward it for so long. I've been using XFCE for many years, now.

The MPAA and music industry would beg to differ. As would the US courts, as well as any court in a country we share copyright agreements with.

Consider that if a movie uses a scene from another movie without permission, or a music producer uses a melody without permission, or either of them use too much of an existing song without permission, everyone sues everyone else, and they win.

Consider also that if a large corporation uses an individual's content without permission, we have documented cases of the individual suing, and winning (or settling).

Some other facts to consider;

  • An mp3 file is not inherently illegal. Nor is a torrent file/tracker/download.
  • If the mp3 file contains audio you don't own the rights to, it is illegal, same for the torrent you used to download/distribute it. In the eyes of the law, it's theft.
  • A trained LLM or image generation model is not inherently theft, if you only use open-source or licensed/owned content to train it
  • (at odds in our conversation) What of a model that eas trained with content the trainer didn't own?

In the mp3 example, its largely an individual stealing from a large company. On the Internet, this is frequently cheered as the user "sticking it to the man" (unless, of course, you're an indie creator who can't support yourself because everyone's downloading your content for free). Discussions regarding the morality of this have been had - and will be had - for a long time, but it's legality is a settled matter: It's not legal.

In the case of "AI" models, its large companies stealing from a huge number of individuals who have no support or established recourse.

You're suggesting that it's fine because, essentially, the creators haven't lost anything. This makes it extremely clear to me that you've never attempted to support yourself as a creator (and I suspect you haven't created anything of meaning in the public domain either).

I guess what it comes down to is this; If creators can be stolen from without consequence, what incentive does anyone have to create anything? Are you going to work your 40-60 hours a week, then come home and work another 20-40 hours to create something for no personal benefit other than the act of creation? Truely, some people will. Most wont.

Agreed on all points, except my personal interpretation of "fair use" specific to the case of generative models.

You call out "doesn't replace the original work". Is that not how you see an LLM Q/A bot replacing a user going to a git repo for established examples, or a website for an article (generating page views, subscriptions, ad revenue), or similar? Why would anyone go to the source materials if they're getting their answer from the bot?

This is practically the same as when Google started showing articles in AMP, and not bringing people to the original website, is it not?

The first sentence directly addresses your comment "it's not theft" with "the law says it is".

The rest of the post attempts to explain why it is so and some of the moral or ethical discussions surrounding some examples.

First, a chat bot is not an API. Second, they were talking about the the formatting and delivery method of the data, not the content.

Regarding the output of the model: Some repos are entirely READMEs by their nature. No code, just documentation and walkthroughs. Notwithstanding that; If I set a flag that's says "don't use my data" and they use it anyway, that's theft, even if it's only one file, even if the file is just a description of the code. That's my work, not yours. You don't get to use it however you want, unless I specifically note that it's public domain (or you use it and follow the license, like attributing me, or linking to the repo, etc).

As to the difference between a bot and a human (re: stack overflow)? The former is a representative of a company (automation or not, whether it's a bot or a page on their corporate site), the latter is a person relating experience and opinion. The legal difference is that one is using the data commercially, and the other is just a person in the world, answering another person's question for no reason other than a desire to be helpful (and if they're decent, attributing the source instead of claiming that they're generating wisdom on their own).

That last parenthetical used to be called plagiarism, by the way.

You left out that they refuse to let end users control updates on the system unless they resort to hacky bullshit (and even that doesn't work consistently). As far as I know (and have experienced on Windows Server) this extends to enterprise as well.

I'm currently avoiding silicon until more apps are compiled to work on them. My last bad experience with this was trying to run virtualbox on the host and ununtu as a guest, and it ran slow as crap because some part of virtualbox wasn't ready for silicon yet.

Disclaimer: I generally avoid Apple like the plague, my comment and experience are specific to a job that really wanted me to use a macbook in my role as a Linux systems admin. My specific complaint may well have been adressed literally years ago by now.

It's not paranoia if you have proof that they're stealing your content without permission or compensation.

You come off as an AI bro apologist. What they're doing isn't okay.

I give it a few days. It might have already happened, I haven't been checking the news today.

"Best practice" isn't a catch-all rebuttal. Best practices are contextual. I'm keen to see your justification for encryption beyond "all sites should encrypt everything always".

My assertion is that this isn't necessary in this case. Why do you think that it is necessary to encrypt open-source, freely available, non-controversial site content?

I might have missed it, but it doesn't look like their site accepts payment data, or has a login of any kind.

Why would the lack of SSL concern you?

There's no need to encrypt this data. Any entity that is watching you knows how to see the domains you visit, and everything on this site is on the main page, or a click away from it.

An SSL here is nothing more than security theater, or marketing.

