Open Source Initiative tries to define Open Source AI

ylai@lemmy.ml to Open Source@lemmy.ml – 143 points –
Open Source Initiative tries to define Open Source AI
theregister.com
30

You are viewing a single comment

So the cover art I made for a friend's album isn't open source, even though I released it as CC BY-SA... because you can't make it yourself?

I would consider the "source code" for artwork to be the project file, with all of the layers intact and whatnot. The Photoshop PSD, the GIMP XCF or the Krita KRA. The "compiled" version would be the exported PNG/JPG.

You can license a compiled binary under CC BY if you want. That would allow users to freely decompile/disassemble it or to bundle the binary for their purposes, but it's different from releasing source code. It's closed source, but under a free license.

It would depend on the format what is counted as source, and what isn't.

You can create a picture by hand, using no input data.

I challenge you to do the same for model weights. If you truly just sit down and type away numbers in a file, then yes, the model would have no further source. But that is not something that can be done in practice.

I challenge you to recreate the Mona Lisa.

My point is that these models are so complex that they're closer to art than anything reproduce

I don't see your point? What is the "source" for Mona Lisa I would use? For LLMs I could reproduce them given the original inputs.

Creating those inputs may be an art, but so could any piece of code. No one claims that code being elegant disqualifies it from being open source.

Are you sure that you can reproduce the model, given the same inputs? Reproducibility is a difficult property to achieve. I wouldn't think LLMs are reproduce.

In theory, if you have the inputs, you have reproducible outputs, modulo perhaps some small deviations due to non-deterministic parallelism. But if those effects are large enough to make your model perform differently you already have big issues, no different than if a piece of software performs differently each time it is compiled.

That's the theory for some paradigms that were specifically designed to have the property of determinism.

Most things in the world, even computers, are non-deterministic

Nondeterminism isn't necessarily a bad thing for systems like AI.

I think technically, the source should be the native format of whatever image manipulation program that you use. For vector graphics, there is svg format but the native editor is still preferable. Otherwise, whoever gets the end copy cannot easily modify or reproduce it, only copy it. But it of course depends on the definition of "easy" and a lot of other factors. Licensing is hard and it is because I am not a lawyer.