Open source pioneer Bruce Perens gets one thing right and most things wrong in a recent interview on the future of open source. He’s absolutely correct that “our [open source] licenses aren’t working anymore,” even if he’s wrong as to why. (He says “businesses have found all of the loopholes.”)
No, the problem is that open source has never been more important, yet less relevant to the biggest technology trends of our time: cloud computing and artificial intelligence. In 2024, we need open source to catch up with these technologies.
Clouds gathering over open source
It’s fashionable in some quarters to blame companies like MongoDB (disclosure: I work for MongoDB), Neo4j, Elastic, HashiCorp, etc., for allegedly polluting open source with licenses like the Business Source License, Commons Clause, and Server Side Public License (SSPL). But the problem isn’t so much these companies as the fact that they tried to distribute cloud services under open source licenses that simply don’t work for the cloud.
Don’t believe me? Ask Stefano Maffulli, executive director of the Open Source Initiative (OSI), which shepherds the Open Source Definition (OSD). In an interview, Maffulli told me, “Open source kind of missed the evolution of the way software is distributed and executed.” All open source licenses were conceived in a pre-cloud era and assume an outdated method for distributing software. With the Affero General Public License (AGPL), the OSI embraced a hack that wasn’t cloud native. As such, Maffulli continues, “We didn’t really pay attention to what was going on and that led to a lot of tension in the cloud business.”
Some of that tension played out while I was working at AWS. My current employer, MongoDB, tried to get the SSPL approved as an official open source license by the OSI. Eventually, the company withdrew from the process, which was unfortunate. If you like the GPL, you should like the SSPL, as it’s basically a cloudified GPL. Unlike the Business Source License and more recent licenses, the SSPL doesn’t discriminate against certain kinds of use of the software (i.e., there is no restriction on running the software in production for commercial or competitive purposes). It simply says that if you distribute the software as a service, you need to make available all other software used to run it, because what good is freedom to inspect, modify, and run software if the essential software infrastructure to power it is completely closed? (You can see the differences between the AGPL and SSPL clearly delineated here.)
In 2024, the OSI needs to get serious about updating its open source definition to be relevant for the cloud. It doesn’t need to be the SSPL, but it does need to reflect the fact that most software isn’t distributed in the same way the OSD’s “open source” contemplates. We’re still using horse-and-buggy definitions of open source to try to capture electric cars and rocket ships of our modern reality.
Making open source meaningless in the AI era
As much as cloud has outpaced open source, AI has rendered it utterly meaningless. I’ve discussed this at length (see here and here), but it comes down to a fundamental question: What is the “code” that open source would hope to preserve?
In a conversation with Aryn CEO Mehul Shah, we hashed through this problem of “code.” Quoting that article at length:
The first is to think of curated training data like the source code of software programs. If we start there, then training (gradient descent) is like compilation of source code, and the deep neural network architecture of transformer models or [large language models] is like the virtual hardware or physical hardware that the compiled program runs on. In this reading, the weights are the compiled program.
This seems reasonable but immediately raises key questions. First, that curated data is often owned by someone else. Second, although the licenses are on the weights today, this may not work well because those weights are just floating-point numbers. Is this any different from saying you’re licensing code, which is just a bunch of 1s and 0s? Should the license be on the architecture? Probably not, as the same architecture with different weights can give you a completely different AI. Should the license then be on the weights and architecture? Perhaps, but it’s possible to modify the behavior of the program without access to the source code through fine-tuning and instruction tuning. Then there’s the reality that developers often distribute deltas or differences from the original weights. Are the deltas subject to the same license as the original model? Can they have completely different licenses?
We can’t, in short, simply say a large language model is open source, because we can’t even yet decide what, exactly, should be open. This is similar to the problem the SSPL was trying to resolve, but it’s even more complicated. “There is no settled definition of what open source AI is,” argues Mike Linksvayer, head of developer policy at GitHub. We’re nowhere near resolving that quandary.
Fortunately, this time around, the OSI isn’t asleep at the OSD wheel and is actively working through what the OSD should be for AI. However, Maffulli stresses, “It’s an extremely complex scenario.” My New Year’s wish for our industry is that the OSI takes responsibility for upgrading the OSD for both cloud and AI. We’ve spent the last few years castigating companies for not abiding by open source principles that the OSI failed to make relevant for the biggest trends in software. This year, that needs to stop.
Copyright © 2024 IDG Communications, .