by yorwba on 11/1/2023, 10:56:46 AM
If you just want to mess with naïve opportunistic scrapers Nightshade-style, you can
1. apply a substitution cipher to your text
2. apply the same substitution cipher to a font file
3. display your text with the modified font
This will annoy scrapers, blind people, people who disable web fonts, people who use automatic translation and copy-pasters, etc.
If you really only want to make AI training more difficult, then you're asking for the impossible and nightshade doesn't deliver on that promise either.
by selfhoster11 on 11/1/2023, 10:05:11 AM
Content restriction isn't, and should never be the business of web standards.
This proposal would join the likes of Encrypted Media Extensions (which was a mistake to adopt), and Web Integrity (which will be a mistake if adopted).
After reading about Nightshade project (this article has the right threads to follow if you want to dig at it or see briefly the results in testing: https://venturebeat.com/ai/meet-nightshade-the-new-tool-allowing-artists-to-poison-ai-models-with-corrupted-training-data/)
Why is HTML not being updated to better protect the IP of individuals - we need something more granular than "Don't scrape me" - it's fine for you to scrape my art content, even go ahead and download my entire soundcloud without paying a dime and go DJ with it - but don't train your models on my art. That's my edge.
As a day trader I understand this well. With a lot of grit, I'm in that fraction of human traders that can beat the market with passed prop challenges to back it up with real data over time. If AI trains on all of my concepts, there is no way I can thrive, and only a very few will become ultra valuable. That's not the world I want to live in.