Hi there, if you don’t want me to hit you, please carry this sign that says “please don’t hit me” with you always. Otherwise, I can’t possibly be held responsible if I hit you. Because it’s in my nature to hit you. I can’t live without hitting people. It’s just who I am and what I do. Thank you for your understanding in this delicate matter.
Sincerely,
OpenAI
@aral Arguably, opt-out search indexing helped the web become what it is today, but maybe it’s time to make these things opt-in, and granularly so. i.e., I opt-in to allowing indexing for the purpose of search but not for the purpose of LLMs. I’m not sure how that would work in practice. Technically speaking, it would be easy, but it all depends on everyone playing by the rules.
@ramsey And what a wonderful place the web is today.
@aral Here’s an idea I just had for a system around this. https://phpc.social/@ramsey/110939386127543841
@ramsey @aral there should be a web standard for content licensing. As in, a standardised way of signalling permissible freedoms around content.
What is interesting is how data and information that's unlicensed is considered ok to be used freely by companies like "Open"AI but stuff like art, software, etc. are copyrighted with all the rights reserved to their authors by default.
https://en.m.wikipedia.org/wiki/Berne_Convention
Why doesn't this apply to webpages?
@aral So essentially no one (especially not OpenAI) wants to write a responsible webcrawler or a webcrawler protocol more tractable than a robots file ? This is how we get global indexers which, if not enforced into the public domain, inevitably result in a stolen data market (the basis of the modern web).
@aral At least they're nicer than #advertisers: the latter don't allow opt-out at all, if they can get away with it.
@aral Did a traffic engineer write this?
@aral
This excerpt from James Bridle #jamesbridle in Ways of Being seems apposite here.
@aral the fuck is this?
@Ray_Of_Sunlight “Add us to the robots.txt on your site if you don’t want us to train our AI on it.”
@aral I'm genuinely curious what you perceive the solution to this is? They modelled it after an already in place model that has worked seemingly well. I'd postulate that if companies took web presence and other technical matters more seriously by hiring and paying people better, this wouldn't be a problem or even a conversation.
@unhook2048 (a) the web is basically synonymous with surveillance capitalism today so “model that has worked seemingly well” is rather subjective.
(b) The solution is glaringly obvious: make it opt in.
(Oh, will no one opt in if you make it opt in? Then maybe the thing you’re making shouldn’t exist in the first place.)
@aral well okay then. I mean you're not wrong regarding the opt-in, I suppose I just wish we lived in a world where the capitalistic approach of things could be ignored, it would allow us to act and move in a direction we agreed would ultimately be better for humanity, rather than relying and hoping on political leaders and c level execs to be ... well, not how they are now.
I also think the competative nature of the LLM models is a step in the right direction, Claude exists.
@aral Oh, you're saying I hit you even though you were carrying a sign that says "please don't hit me"?
Didn't you get the memo we sent out yesterday? Signs must now say "please don't strike me".
@aral
what do you think robots.txt is?
and copying isn't the moral equivalent of battery.
@bigMouthCommie Bit of a dick, are we?