Hacker News Clone

Define policy forbidding use of AI code generators

by todsacerdoti on 6/25/2025, 11:26:55 PM with 383 comments

by maerF0x0 on 6/27/2025, 2:45:29 PM
I was literally musing on a parallel subject yesterday morning, how engineers kinda did it to themselves with Open Source, LLM Code generators would probably not be possible without the large corpus of totally available (readable) lines of mostly valid code hosted in places like github.com .
It strikes me that Open Source was inevitable as companies and engineers found it economical (a good choice) to amortize the cost of building something across many of them, without as much legality/negotiation/constraints as if they did so in a closed source way collectively. It was kinda a sort of good will community thing. For FOSS, in the case of companies by not competing on things like a javascript framework, and instead on the product features themselves. Or, in the case of engineers by allowing themselves access to that same code across many employers.
Now that gambit is approaching the point where most projects which could be assembled by entirely FOSS (lots of it) is becoming easier and easier to generate. It'd take a 20(?) person team of expensive nerds weeks/months to build a website in 1995, but now a normal individual can simply ask for a website to be made by an LLM.
Across my ~30 years in software (eep), it seems that the table stakes for the minimum viable software keeps just growing and growing-- It used to be a push to have an API, now its minimum viable, it used to be a push to get "live" notificatoins (via polling!) now its minimum viable to push via websockets. Etc etc for the surviving set of features.
by benlivengood on 6/26/2025, 12:21:02 AM
Open source and libre/free software are particularly vulnerable to a future where AI-generated code is ruled to be either infringing or public domain.
In the former case, disentangling AI-edits from human edits could tie a project up in legal proceedings for years and projects don't have any funding to fight a copyright suit. Specifically, code that is AI-generated and subsequently modified or incorporated in the rest of the code would raise the question of whether subsequent human edits were non-fair-use derivative works.
In the latter case the license restrictions no longer apply to portions of the codebase raising similar issues from derived code; a project that is only 98% OSS/FS licensed suddenly has much less leverage in takedowns to companies abusing the license terms; having to prove that infringers are definitely using the human-generated and licensed code.
Proprietary software is only mildly harmed in either case; it would require speculative copyright owners to disassemble their binaries and try to make the case that AI-generated code infringed without being able to see the codebase itself. And plenty of proprietary software has public domain code in it already.
by JonChesterfield on 6/26/2025, 12:02:22 AM
Interesting. Harder line than the LLVM one found at https://llvm.org/docs/DeveloperPolicy.html#ai-generated-cont...
I'm very old man shouting at clouds about this stuff. I don't want to review code the author doesn't understand and I don't want to merge code neither of us understand.
by acedTrex on 6/26/2025, 12:49:38 AM
Oh hey, the thing I predicted in my blog titled "yes i will judge you for using AI" happened lol
Basically I think open source has traditionally HEAVILY relied on hidden competency markers to judge the quality of incoming contributions. LLMs throw that entire concept on its head by presenting code that has competent markers but none of the backing experience. It is a very very jarring experience for experienced individuals.
I suspect that virtual or in person meetings and other forms of social proof independent of the actual PR will become far more crucial for making inroads in large projects in the future.
by ants_everywhere on 6/26/2025, 12:11:25 AM
This is signed off primarily by RedHat, and they tend to be pretty serious/corporate.
I suspect their concern is not so much whether users have own the copyright to AI output but rather the risk that AI will spit out code from its training set that belongs to another project.
Most hypervisors are closed source and some are developed by litigious companies.
by Havoc on 6/25/2025, 11:41:51 PM
I wonder whether the motivation is really legal? I get the sense that some projects are just sick of reviewing crap AI submissions
by hughw on 6/26/2025, 12:25:42 AM
I'd hope there could be some distinction between using LLM as a super autocomplete in your IDE, vs giving it high-level guidelines and making it generate substantive code. It's a gray area, sure, but if I made a contribution I'd want to be able to use the labor-saving feature of Copilot, say, without danger of it copying an algorithm from open source code. For example, today I generated a series of case statements and Copilot detected the pattern and saved me tons of typing.
by Aeolun on 6/26/2025, 1:28:17 AM
This seems absolutely impossible to enforce. All my editors give me AI assisted code hints. Zed, cursor, VS code. All of them now show me autocomplete that comes from an LLM. There's absolutely no distinction between that code, and code that I've typed out myself.
It's like complaining that I may have no legal right to submit my stick figure because I potentially copied it from the drawing of another stick figure.
I'm firmly convinced that these policies are only written to have plausible deniability when stuff with generated code gets inevitably submitted anyway. There's no way the people that write these things aren't aware they're completely unenforceable.
by bgwalter on 6/26/2025, 8:55:52 AM
It is interesting to read the pro-AI rant in the comments on the linked commit. The person who is threatening to use "AI" anyway has almost no contributions either in qemu or on GitHub in general.
This is the target group for code generators. All talk but no projects.
by daeken on 6/25/2025, 11:49:04 PM
I've been trying out Claude Code (the tool I've found most effective in terms of agentic code gen/manipulation) for an emulator project of mine for the last few days. Part of it is a compiler from an architecture definition to disassembler/interpreter/recompiler. I hit a fairly minor compiler bug and decided to ask Claude to debug and fix it. Some things I noted:
1. My C# code compiled just fine and ran even, but it was convinced that I was missing a closing brace on a lambda near where the exception was occurring. The diff was ... Putting the existing brace on a new line. Confidently stated that was the problem and declared it fixed.
2. It did figure out that an unexpected type was being seen, and implemented a pathway that allowed for it to get to the next error, but didn't look into why that type had gotten there; that was the actual bug, not the unhandled type. So it "fixed" it, but just kicked the can down the road.
3. When figuring out the issue, it just looked at the stack trace. That was it. It was running the compiler itself; it could've just embedded some debug code (like I did) and work out what the actual issue was, but it didn't even try. The exception was just a NotSupportedException with no extra details to work off of, so adding just a crumb of context would let you solve the issue.
Now, is this the simplest emulator you could throw AI at? No, not at all. But neither is qemu. I'm thoroughly unconvinced that current tools could provide real value on codebases like these. I'm bullish on them for the future, and I use GenAI constantly, but this ain't a viable use case today.
by wyldfire on 6/25/2025, 11:52:38 PM
I understand where this comes from but I think it's a mistake. I agree it would be nice if there were "well settled law" regarding AI and copyright, probably relatively few rulings and next to zero legislation on which to base their feelings.
In addition to a policy to reject contributions from AI, I think it may make sense to point out places where AI generated content can be used. For example - how much of QEMU project's (copious) CI setup is really stuff that is critical content to protect? What about ever-more interesting test cases or environments that could be enabled? Something like "contribute those things here instead, and make judicious use of AI there, with these kinds of guard rails..."
by saurik on 6/27/2025, 12:40:00 AM
As someone who once worked on a product that had to carefully walk the line of legality, I haven't found any mention in this discussion of what I imagine is a key problem for qemu, that doesn't face other projects: as an emulator, they are already under a lot of scrutiny for legality, and so they are going to need to be a lot more conservative than other random projects with respect to increasing their legal risk.
by flerchin on 6/26/2025, 1:42:25 PM
I suppose the practical effect will be that contributors who use AI will have to defend their code as if they did not. To me, this implies more ownership of the code and deep understanding of it. This exchange happens fairly often in PRs I'm involved with:
"Why did you do this insane thing?"
"IDK, claude suggested it and it works."
by zoobab on 6/26/2025, 10:06:52 AM
BigTech now control Qemu?
"Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>"
by tqwhite on 6/26/2025, 3:30:38 PM
I don't blame them for worrying about it. The policy should not be to forbid it but make sure you don't leave artifacts because I guarantee, people are going to use a bot to write their code. Hell, in six months, I doubt you will be able to get a code editor that doesn't use AI for code completion at least.
Also, AI coded programs will be copyrightable just like the old days. You think the big corps are going to both not use bot coding and give up ownership of their code? Fat chance.
Remember the Micky Mouse copyright extension? If the courts aren't sensible, we will have one of those the next day.
The old days ended very abruptly this time.
by abhisek on 6/26/2025, 3:21:20 AM
> It's best to start strict and safe, then relax.
Makes total sense.
I am just wondering how do we differentiate between AI generated code and human written code that is influenced or copied from some unknown source. The same licensing problem may happen with human code as well especially for OSS where anyone can contribute.
Given the current usage, I am not sure if AI generated code has an identity of its own. It’s really a tool in the hand of a human.
by randomNumber7 on 6/26/2025, 10:34:38 PM
I know a secret. You can read the code the AI generated for you and check if it is what you want to do. It is still faster than writing it yourself most of the time.
by caleblloyd on 6/26/2025, 5:58:22 AM
Signed by mostly people at RedHat, which is owned by IBM, which makes Watson, which beat humans in Jeopardy in 2011.
> These are early days of AI-assisted software development.
Are they? Or is this just IBM destroying another acquisition slowly.
Meanwhile the Dotnet Runtime is fully embracing AI. Which people on the outside may laugh at but you have extremely talented engineers like Stephen Toub and David Fowler advocating for it.
So enterprises: next time you have an IBM rep trying to sell you AI services, do yourself a favor and go to any other number of companies out there who are actually serious about helping you build for the future.
And since I am a North Carolina native, here’s to hoping IBM and RedHat get their stuff together.
by ludicrousdispla on 6/26/2025, 9:00:43 AM
>> The tools will mature, and we can expect some to become safely usable in free software projects.
It should be possible to build a useful AI code generator for a given programming language solely from the source code for the language itself. Doing so however would require some maturity.
by b0a04gl on 6/26/2025, 4:02:21 AM
there's no audit trail for how most code gets shaped anyway we're teammate's intuition from a past outage a one-liner from some old jira ticket even the shape of a func pulled from habit none of that is reviewable but still it gets trusted lol
ai moves faster than group consensus this ban won't slow down the tech it'll may make paradigms like qemu harder to enter harder to scale, harder to test thru properly
so if we maintain code like this we gotta know the trade we're making we're preserving trust but limiting throughput maybe fine idk but don't confuse it as future proofing
i kinda feel it does exposes trust in oss is social not epistemic. we accept complex things if we know who dropped it and we reject clean things if it smells synthetic
so the real qn isn't > did we use ai? it's > can we even maintain this in 6mo? and if the answer's yes doesn't really matter who produced the code fr
by UrineSqueegee on 6/26/2025, 2:20:34 PM
if AI using books to train isn't copyright infringement then the outputted code isn't copyrighted material either
by naveed125 on 6/26/2025, 2:13:36 AM
Coolest thing I've seen today.
by randomNumber7 on 6/26/2025, 10:26:15 PM
I mean for low level C code the current LLMs are not that helpful anyway.
On the other hand I am 100% sure that every company that doesn't use LLMs will be out of business in 10 years.
by BurningFrog on 6/26/2025, 12:51:28 AM
Would it make sense to include the complete prompt that generated the code with the code?
by incomingpain on 6/26/2025, 11:34:33 AM
Using AI code generators. I have been able to get the code base large enough that it was starting to make nonsense changes.
However, my overall experience I have been thinking about how this is going to be a massive boon to open source. So many patches, so many new tools will be created to streamline getting new packages into repos. Everything can be tested.
Open source is going to be epicly boosted now.
QEMU deciding to sit out from this acceleration is crazy to me, but probably what is going to give Xen/Docker/Podman the lead.
by N1H1L on 6/26/2025, 2:32:24 AM
I use LLMs for generating documentation- I write my code, and ask Claude to write my documentation
by mattl on 6/26/2025, 12:38:02 AM
I'm interested to see how this plays out. I'd like a similar policy for my projects, but also a similar policy/T&C that prohibits the crawling of the content too.
by curious_cat_163 on 6/26/2025, 12:01:32 AM
That’s very conservative.
by wlkr on 6/26/2025, 11:00:00 AM
qq
by jssjsnj on 6/26/2025, 2:38:59 AM
Oi
by jekwoooooe on 6/25/2025, 11:50:23 PM
When will people give up this archaic practice of sending patches over emails?
by Art9681 on 6/26/2025, 12:21:01 AM
This is a "BlockBuster laughs Netflix out of the room" moment. I am a huge fan of QEMU and used it throughout my career. The maintainers have every right to govern their project as they see fit. But this is a lot of mental gymnastics to justify clinging to punchcards in a world where we now have magnetic tape and keyboards to do things faster. This tech didn't spawn weeks ago. Every major project has had at least two years to prepare for this moment.
Pull your pants up.
by teruakohatu on 6/25/2025, 11:36:46 PM
So essentially it’s “let us cover ourselves by saying it’s not allowed” and in practice that means not allowing code that a human thinks is AI generated code.
Universities have this issue too, despite many offering students and staff Grammarly (Gen AI) while also trying to ban Gen AI.
by pretoriusdre on 6/26/2025, 2:15:52 AM
AI generated code is generally pretty good and incredibly fast.
Seeing this new phenomenon must be difficult for those people who have spent a long time perfecting their craft. Essentially, they might feel that their skillsets are being undermined. It would be especially hard for people who associate a lot of their self-identity with their job.
Being a purist is noble, but I think that this stance is foolish. Essentially, people who chose not to use AI code tools will be overtaken by the people who do. That's the unfortunate reality.
by sysmax on 6/26/2025, 1:58:24 AM
I wish people would make distinction regarding the size/scope of the AI-generated parts. Like with video copyright laws, where a 5-second clip from a copyrighted movie is usually considered fair use and not frowned upon.
Because for projects like QEMU, current AI models can actually do mind-boggling stuff. You can give it a PDF describing an instruction set, and it will generate you wrapper classes for emulating particular instructions. Then you can give it one class like this and a few paragraphs from the datasheet, and it will spit out unit tests checking that your class works as the CPU vendor describes.
Like, you can get from 0% to 100% test coverage several orders of magnitude faster than doing it by hand. Or refactoring, where you want to add support for a particular memory virtualization trick, and you need to update 100 instruction classes based on straight-forward, but not 100% formal rule. A human developer would be pulling their hairs out, while an LLM will do it faster than you can get a coffee.