Reddit sues to block Perplexity from scraping Google search results

noop500

Seniorius Lurkius
6
Subscriptor
Wtf, sounds like Google may have a case against them for unauthorized access, but how does Reddit have a case if the data is coming from google(and appears to be public?), a partner Reddit gives access to. Or maybe reddit would have a case against google for letting these bad actors get to the data but that seems like a stretch as well. Reddit just butthurt that someone found away around their licensing extortion plans? Last time I used reddit was june 12 2023. I lurk (no account) on lemmy (boost android app) and discuit these days.
 
Upvote
36 (53 / -17)

ashypans

Wise, Aged Ars Veteran
101
Subscriptor
I don't much care for the AI monetization of comments and use generated content on platforms like reddit or by ...ahem ...other associated Conde Nast properties.

but...
These companies are clearly full of it with excuses like what they are putting forward. Claiming it is public data is not going to get them anywhere, it clearly ain't. I don't expect anyone to be particularly sympathetic to Perplexities case that they have no part in this simply because they were not directly engaging in the scraping. And then to act surprised that reddit proceeded to litigation and didn't first approach any of them when they got caught, is more bs. If I caught someone red handed, walking past my no trespassing sign to snap a photo of my art for the 18th time so they could sell it on etsy or something, I certainly wouldn't be talking it out with them before calling the cops. Saying its okay because its publicly visible in my front yard aint gonna cut it either.
 
Upvote
54 (69 / -15)

ghostcarrot

Ars Scholae Palatinae
646
Perplexity’s spokesperson, Jesse Dwyer, told Ars the company chose to post its statement on Reddit “to illustrate a simple point.”

“It is a public Reddit link accessible to anyone, yet by the logic of Reddit’s lawsuit, if you mention it or cite it in any way (which is your job as a reporter), they might just sue you,” Dwyer said.

This guy thinks we're stupid.
 
Upvote
124 (135 / -11)
As much as I dislike AI scraping, it's kind of hard to be too sympathetic to reddit here. We're talking about public facing content that apparently doesn't even require you to visit Reddit to view,.. data that Reddit only cares about because they'd rather sell it off to companies instead. Data that isn't generated by Reddit as a company and is being sold off without any compensation given to the actual content creators. I don't see why I should be particularly enthused about Reddit's rent seeking here.

Though this article does a good job demonstrating how useless Perplexity's product seems to be. Reading the article I'm struggling to figure out what their service even is. Is it really just sending an API call to another LLM and a google search stapled together? I don't know why anyone would need a proprietary service for that.
 
Upvote
197 (203 / -6)
As much as I dislike AI scraping, it's kind of hard to be too sympathetic to reddit here. We're talking about public facing content that apparently doesn't even require you to visit Reddit to view,.. data that Reddit only cares about because they'd rather sell it off to companies instead. Data that isn't generated by Reddit as a company and is being sold off without any compensation given to the actual content creators. I don't see why I should be particularly enthused about Reddit's rent seeking here.

Though this article does a good job demonstrating how useless Perplexity's product seems to be. Reading the article I'm struggling to figure out what their service even is. Is it really just sending an API call to another LLM and a google search stapled together? I don't know why anyone would need a proprietary service for that.
It reminds me of the dot-com boom companies where their product was a "browser" that was just a skin wrapped around the IE web component.

The clever ones cashed out for millions before the bubble popped.
 
Upvote
121 (122 / -1)

NoSkill

Ars Praetorian
496
Subscriptor
If I understand this correctly, the business model of Perplexity is to provide search results by searching on Google and summarizing the results?
Let me explain. No, there is too much, let me sum up.
Redditors create content that Reddit sells to train Google's AI while Google is scraping Reddit to provide search results and further engage users with AI summaries. Perplexity scraped the results of the search results and summaries and used them to train its AI. Corporations win, creators lose.
 
Upvote
93 (96 / -3)

Boskone

Ars Legatus Legionis
13,075
Subscriptor
Wtf, sounds like Google may have a case against them for unauthorized access, but how does Reddit have a case if the data is coming from google(and appears to be public?), a partner Reddit gives access to.
They improperly used Google to acquire data from Reddit to which they held no license. Two different claims.
 
Upvote
34 (37 / -3)
Post content hidden for low score. Show…

AvianLyric

Smack-Fu Master, in training
80
Subscriptor++
“We won’t be extorted, and we won’t help Reddit extort Google, even if they’re our (huge) competitor,” Perplexity wrote.
That’s an amusing stance for Perplexity. Both acknowledging that their entire product is entirely dependent on a free service Google provide, and claiming that Google is a “huge” competitor.

If my product was only viable as long as my huge competitor continued to allow me to abuse their free products, I wouldn’t be shouting it from the rooftops. People might start wondering what on earth my product actually did.
 
Upvote
100 (100 / 0)
Serious question: Since when is Reddit a reliable source of information?
You are missing something. The AI slop companies (That includes OpenAI) prefer to sling all the shit they can find against the wall for training and then try and slap on guard rails after the fact in an attempt to stop their creation from being as racist, toxic and generally obnoxious as the data they trained it on.

Filtering what you feed your AI so that it is reliable, factual and sane is far far too hard for them you see ;)
 
Upvote
40 (41 / -1)
Meanwhile, if I ask my friend Steve to google something for me, that is perfectly fine to do and Steve probably won't get sued. He may use my data inappropriately though.

Weird how 'on a computer' is suddenly 'as a computer'. Anyway, those are funny thoughts that don't apply.

Along the same lines others have noted, I am in awe that the users that created the content have no part in this story except to be the entire thing of value that give all of the other services and interested parties any reason to exist at all. So fucking weird.

I don't mean to knock the trove of valuable content on reddit when I say that I generally don't see value in reddit search results for the things I'm looking for lately. Maybe 50/50 if I'm feeling generous. There is an argument that reddit has contributed to longer times spent searching for answers for my use case.

I have no experience with Perplexity. Does anyone here have experience with it as a daily search tool?
 
Upvote
24 (26 / -2)
Serious question: Since when is Reddit a reliable source of information?
Sadly, the AI companies don't really care about reliability, or they would scrape Wikipedia and Project Gutenberg and call it good. They're operating on the theory that the more human-produced content they have, the more plausibly human their models' output will be. If they get enough, their models will understand the world and be competent office workers.

Until recently, most Reddit posts were written by humans. Racists, flat-earthers, shit-posters, it didn't matter; it was all "human-generated content." As the rest of the Web was increasingly flooded by AI slop, sources of actual human writing became increasingly valuable.
 
Upvote
37 (39 / -2)
Serious question: Since when is Reddit a reliable source of information?
Since google got consistently worse over the course of several years.

It's wild to say this but if you search for something on reddit you're getting an actual person's opinion/experience. But it won't be an ad. Or ai. Or a paid advertisement masquerading as a review. It'll just be handful of people discussing something. And somehow this may be the best result you can get: just some guy's opinion.

I put in a garden drip irrigation system this summer. It sucks but reddit was by far the best resource for that.

I used to do youtube for a lot of instructional/informational stuff but that has gotten massively worse since shorts took off.

So perhaps the best way I can explain it is that reddit has stayed (mostly) the same while the rest of the internet has declined rapidly. They didn't become more trustworthy they just became the least-untrustworthy by default?
 
Upvote
76 (81 / -5)

chantries

Wise, Aged Ars Veteran
143
I have no experience with Perplexity. Does anyone here have experience with it as a daily search tool?
My ISP has given me free access to Perplexity Pro. I'm not a maven of search engines nor of composing prompts for LLMs in general. I've found Perplexity useful for some more complicated searches/summaries. For example:

"Provide a bibliography of open sources in English or French that explore the distinction that Foucault makes between subjectivity and individuality."

"Using https://plato.stanford.edu/entries/montaigne/ and other sources suggest an order for reading the essays in Book 3 of Montaigne's Essays in a thematic way. Answer in Markdown format."

For me, at least, Perplexity generated replies and actual sources that I would have had a lot of difficulty coming up with myself and saved me the time that a close reading of the lengthy entry in the SEP would have consumed.
 
Last edited:
Upvote
10 (14 / -4)

masteraleph

Ars Scholae Palatinae
718
Serious question: Since when is Reddit a reliable source of information?
1) Human generated content is valuable even if it's not provided by experts. "What office chair should I get?" "Which mixer will best handle 5 pounds of flour?" "Does what this handyman installed under my sink pass the sniff test?" "What hotel should I stay at in Boston?" "What's the best way to run a 57" Samsung monitor with an M4 Mac?" is all valuable information to find out when written by people. Some of those people will be wrong, but this is classic "wisdom of the crowds" stuff that hearkens back to earlier days of the internet when there were a lot more subject specific forums.

2) Building on 1, forums in many places have shut down over the last decade, and Reddit has become the home for a lot of information- or at least, a lot of publicly searchable information (since the alternative for specialized communities often rests in Discord or other services not publicly searchable).

3) There are some subreddits that are actually really good. See, for example, AskHistorians, AskElectricians, etc. Some of the information is dubious, but many of these places are full of "I live and breathe this stuff and I just want to comment on the stuff that I know" types of folks. Reddit has a poor reputation both because there are a certain number of folks who know little and say much posting and because of certainly poorly or maliciously moderated subreddits. But it can also be incredibly useful; I frequently use Google searches specifying reddit as the site simply because I know that I'm likely to run into human generated content that actually has answered questions other people had that I now have, or that's at least related to my question.
 
Upvote
62 (62 / 0)

evanTO

Ars Scholae Palatinae
1,107
[SerpApi’s spokesperson said] "As stated on our website, ‘The crawling and parsing of public data is protected by the First Amendment of the United States Constitution. We value freedom of speech tremendously.’”
Reddit isn't the government. So many American's don't understand their own constitution...
 
Upvote
50 (52 / -2)

buback

Ars Scholae Palatinae
771
I put in a garden drip irrigation system this summer. It sucks but reddit was by far the best resource for that.
It's an ecosystem: Google steers you to Reddit, you learn and post back to Reddit, Google uses your answer to steer others to Reddit.

Meanwhile all the old mom and pop websites like "Moist Joe's drip irrigation forum" wither on the vine. It's the Wal-Martization of the open internet.
 
Upvote
62 (62 / 0)

uopx

Smack-Fu Master, in training
74
Subscriptor
The Data Wars begun they have.

Interesting to note no one is concerned about the actual people generating the useful data; the John and Jane Doe who took the time to educate themselves, become subject matter experts, and interact with other people on Reddit. God forbid they will see a dime out of this bonanza.
 
Upvote
37 (37 / 0)
For me, at least, Perplexity generated replies and actual sources that I would have had a lot of difficulty coming up with myself and saved me the time that a close reading of the lengthy entry in the SEP would have consumed.
Examples like this make it difficult for me to say 'AI tools should go away'. I don't think the clever tricks to making the tools sound cool are actually artificial intelligence, but I do see value in the ability to parse copious amounts of data in a short amount of time, with some understanding of the context to my queries, and the logical next steps after the first query. I feel like we're finally getting good digital assistants, and all we had to do was burn down the planet.

Thanks for sharing the experience as well as the topic you were looking into.
 
Upvote
10 (11 / -1)
When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. For example, this license includes the right to use Your Content to train AI and machine learning models, as further described in our Public Content Policy. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.
-- https://redditinc.com/policies/user-agreement
Reddit don't own our content.

They have a perpetual license to republish it, sell it and use it however they want, but they don't own the content.

I don't see how they can sue someone else for stealing it. The people who own the content is us, we have provided them a non-exclusive irrevocable license to use it. But I don't see how they have standing to claim that someone else can steal it.

Since even by their terms, they don't own it. They simply have a license to it.
 
Upvote
12 (20 / -8)
Meanwhile, if I ask my friend Steve to google something for me, that is perfectly fine to do and Steve probably won't get sued. He may use my data inappropriately though.

Weird how 'on a computer' is suddenly 'as a computer'. Anyway, those are funny thoughts that don't apply.

Along the same lines others have noted, I am in awe that the users that created the content have no part in this story except to be the entire thing of value that give all of the other services and interested parties any reason to exist at all. So fucking weird.

I don't mean to knock the trove of valuable content on reddit when I say that I generally don't see value in reddit search results for the things I'm looking for lately. Maybe 50/50 if I'm feeling generous. There is an argument that reddit has contributed to longer times spent searching for answers for my use case.

I have no experience with Perplexity. Does anyone here have experience with it as a daily search tool?
You said it best.

I can only echo - it's so weird, all these firms, every single one, makes money by organizing content made by other people. And then trying to pretend like it's "their" content.
 
Upvote
23 (23 / 0)