Cool project: "Nepenthes" is a tarpit to catch (AI) web crawlers.
"It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse."
@tante I have mixed feelings.
Crawlers should respect robots.txt….
At the same time: there is clearly an emotionally based bias happening with LLM’s.
I feel weird about the idea of actively sabotaging. Considering it is only towards bad actors… and considering maybe robots.txt often are too restrictive in my opinion… the gray areas overlap a bit.
Why should we want to actively sabatoge AI dev? Wouldn’t that lead to possible catastrophic results? Who benefits from dumber ai?
The difference between a human reading a website and writing an article ‘inspired by’ what they’ve read And an LLM consuming and outputting content the same way is we recognize that an LLM is a tool and can do the same thing faster.
Reading is training. Reading isn’t copying. Output is the issue: not input. It’s worrisome to see so many not grasp this.
Looking/copying isnt stealing. It just isn’t. No one lost their website.
@altruios @dalias @woe2you @tante
By this logic, I should be allowed to scrape movies encoded in H.264 off of streaming websites and transcode them to H.265.
The transcoder is inspired by the original copy and creates something new and different (at the byte level). Since the output product is now something completely different (but still contains the same information, a watchable movie), by your logic this is now completely legal instead of obvious piracy.
Yet if I were to do this, I would be breaking the law. Why is it different when AI does it versus when ffmpeg does it? They're both just software reading and interpreting information and transforming it into something else.
@altruios @dalias @woe2you @tante
Maybe you have a point about AI training data being a collage of various works, and one could argue that transforming the art in a creative way is fair use, but I don't think any of the technologies we currently call AI are capable of being creative.
LLMs are glorified Markov chains. It's still just rote copying. If I were a student, and on one of my papers I made a "collage" of other people's writings on a topic and presented it as my own original thought (which is what AI does), that would be considered plagiarism.
Regardless of legality (a concept I don't ascribe much value to), AI *is* harming artists. The primary selling point of AI is to put actual human creatives out of work using their own creations, and sure it can't do that right now in any comparable way, but it might be able to soon. If anything, the only ethical decision is poisoning their models. This is the only way to protect our actual creatives from capitalism.
And from the other point of view, let's argue that the models *won't* get any better (maybe because we poisoned them). If the models don't get better, we are wasting vast amounts of electricity to generate nothing of any real value. This will further accelerate the destruction of our only habitat.
I can't think of any good reasons to allow this technology to continue existing. The benefits don't seem to outweigh the risks. The only real use they seem to have is replacing search engines, but that's only because the search engines have gone completely to shit. Why not just invest our resources into fixing the search engines instead?
@tyami @altruios @dalias @tante "Vast amounts of electricity" is hugely overstating it. According to this paper: https://www.cell.com/joule/fulltext/S2542-4351(23)00365-3 GPT-3 cost 1287 MWh to train. That's a little under 10.5 watt-hours per daily active user of ChatGPT, ie around 6% of the energy used by an Xbox Series X or 2% of a high end gaming PC. Let's ban Call of Duty, shall we?
@woe2you @altruios @dalias @tante
At a glance, this paper seems to suggest that searches using LLMs use tremendously more power than standard google searches, and that if every Google search was an LLM interaction, it would consume as much power as the entire country of Ireland.
Training is not the only thing that uses power. *Interacting* with a model seems to use tremendous amounts of power at scale, and this is supported by the paper you linked.
To add insult to injury, in my experience, the AI summaries at the top of google search results seem to basically match what I would've found on the first page, right down to the wording. This does not seem to be worth the 23-30x increase in power consumption versus a typical search.
@tyami @altruios @dalias @tante It's only "tremendous amounts" and "the entire country of Ireland" when you conveniently gloss over the fact that it's 8.5 billion searches per day performed by a hair shy of 5 billion people. Meanwhile the population of Ireland is around 5.3 million. Their energy usage is a drop in the bucket when you compare it to ANYTHING at global scale. Gaming. Banking. Christmas lights. Sports stadia. Street lighting.
@woe2you @altruios @dalias @tante
It is a tremendous amount of power relative to the amount of useful work done, which is near zero. I can just click on the first link on the page and get the same info without wasting 6-8Wh of power, and it would likely be more accurate than what the Markov chain generated.
You mentioned Call of Duty, but at least that serves a tangible purpose. It entertains people. It legitimately brings some people joy to play that game.
I don't think Google's AI does *anything* useful. If anything it just annoys me and provides useless, redundant, or wrong information. I'd imagine most people would agree with me that saving one click is not tangibly improving anyone's life in the same way a video game does.
@tyami @altruios @dalias @tante You're talking as though Google still aims to bring you useful results that you can click through to. It doesn't any more, its strategy is now to keep you on its own site as long as possible. Its incentives no longer align with yours as a user of search. The solution to that is simple: just stop using Google.
@woe2you @altruios @dalias @tante
I am aware that google is an ad company and not a search company, but that's irrelevant.
My whole point is that these AI models are being created to do useless things, and that is accelerating the destruction of our only habitat. It is an extreme waste of resources that should not be allowed to continue unimpeded.
@woe2you @altruios @dalias @tante
There is a lot we can do to prevent it. Voting with your wallet (or your clicks) has never been an effective one.
You are approaching this from a free market point of view, but the issue with this anarcho-capitalist view on the world is that these ideologies fall apart as soon as you realize monopolies exist.
There are no viable alternatives to Google at the moment, and there will not ever be one. The entire search space is a shit-show. Sure, small players pop up every once in a while, and they work well for some time, but they are all backed by VC money. Eventually the investors are going to want to turn a profit, and that's when it all goes to shit. Not to mention, if everyone attempted to create their own competing search engine, the ensuing crawler spam would destroy the internet.
The immediate solution is to poison their models and make AI non-viable for companies like Google. The long-term solution is to destroy capitalism so search engines are no longer incentivized to be shitty. Anything less is a band-aid that will be torn off again when your alternative search engine decides to join the VC hype-train and integrate AI slop.
@tyami @altruios @dalias @tante Oops. I was proceeding from the assumption that this discussion was remotely grounded in reality. I've seen Lycos rise and fall. I've seen Altavista rise and fall. I've seen Yahoo rise and fall. Google too will fall, if people like you don't continue to prop it up with one hand while wringing the other.
@woe2you @altruios @dalias @tante
The reality is obvious. It's not like the 90's where the internet was small and it was practical to build your own search engine.
Nowadays it is impossible to compete with Google. Even Microsoft can't do it without resorting to anti-competitive tactics, and even with the bullshit they pull they only have a 4% market share. Hand-waving about using alternatives doesn't solve any real problems. People have already been doing this for at least a decade. The problem is that no significant amount of people will use the others, because the others suck.
We all have to exist under capitalism, and people literally cannot afford to take a principled stance and use an alternative search engine. If someone used an alternative search engine at their job, and now has to spend extra time wading through garbage, they won't be able to keep up with everyone else. This principled decision could cost people their job.
Even if all of us techy-folks did use alternate search engines, it wouldn't make a difference. Average people search with Google. My parents won't understand why Google is bad, even if I tried to explain it to them. They will keep using it, because they always have. In their minds, Google = Search. This is obviously not ideal, but it's how the real world works.
Google is not going to stop what they're doing out of the goodness of their hearts. We first have to make it unprofitable (in a realistic way, not vote with your wallet bullshit), and then we have to disincentivize the underlying greed that causes them and other companies to behave like this (by destroying capitalism).
Anything else is a half-measure not based in the real-world. We don't need a thousand competing search engines, we need one good search engine without a profit-incentive so it works for us instead of against us.
Obviously a monoculture would be bad if we take this idea to it's extreme, just like any idea, but my point is, more competing search engines is not necessarily better, and using them doesn't solve problems in any tangible way. DuckDuckGo just added AI slop too, and they were supposed to be the good guys. The alternatives are no better.