ChatGPT-style search represents a 10x cost increase for Google, Microsoft

Thegs

Ars Scholae Palatinae
879
Subscriptor++
Exactly what I was thinking. AI has the potential to destroy SEO as we know it and show us the information that we're actually looking for, and I can see that having a very negative impact on Google's profits, especially since sites pay Google for preferential treatment.

Bard may be a major step forward for customer usability, but Bard2 will be a shameless cash-whore.
By chance I saw a question on Superuser today. Someone wanted to know if the IP addresses 192.168.1.50/20 and 192.168.2.200/20 could communicate directly, without a router. The answer is yes, as they are both in the same subnet, but before asking Superuser, they asked ChatGPT. ChatGPT very unhelpfully told them: "that 192.168.1.x and 192.168.2.x cant communicate with 20 subnet mask." When you think about it, of course that's the answer ChatGPT would return, the 192.168.0.0/16 IP space is typically used for what was formerly Class C private IP networks, now known as /24 subnets. But nothing really forces you to use a /24 in that IP space. It just is the most likely use of it, so that's what ChatGPT responded.

It's really not anywhere ready for prime time as a search engine. So far all we've managed to do is string together words that have the highest probability of following the words that came before them, in a manner that is a semi-convincing simulacrum of human language. But the highest probability answer isn't the correct answer, but the chatbot will return it authoritatively so people will treat it as so.
 
Upvote
14 (14 / 0)
I also don't see what's the problem to tack on ads on the side of the chat page, driven by the chat's content by traditional algorithms. I don't like it, but Google's board won't be asking my opinion.

It will be an interesting experiment for them to have the AI handle ad placement. Tell advertisers "pay for clicks/impressions, but no AdWords control, you let the AI decide where it thinks it should place your ads"… and see what happens. 🙄

I finally started using ChatGPT this week, and have been underwhelmed. My main beef is the severe hallucination problems, making it great for fiction and propaganda, and bad for most anything else (without very careful human review). Most importantly, OpenAI (and other LLM developers) say they're training new versions of the AI to reduce hallucinations considerably. I don't want them reduced ! I want a different AI model which by design, if it doesn't have something good to say, says nothing. (Instead of making things up at all costs.) It's a fundamental flaw in this type of model.

I read somewhere (maybe here) that is exactly what they are thinking of doing. Having different modes of response: "precise", "creative" and "normal", whatever normal is.
 
Upvote
1 (1 / 0)

doubleyewdee

Ars Scholae Palatinae
836
Subscriptor++
AMEN. And that's the Real Issue. They can't give a first page of garbage!
They might have to actually do a Deep Search like it's 1999.
[I work at Microsoft, formerly Bing, now Azure Machine Learning -- speaking on my own behalf, I don't have special knowledge about anything here]

As usual I'm late to the comments party but there's a variety of interesting factors at play here. The per-interaction serving cost of LLMs is, compared to the typical cost of serving results from an inverted index lookup (grossly oversimplifying web search), astronomical. However, the state of the art in "internet scale generation + serving of a distributed information index" has had about two decades to mature. In contrast, AI model serving is relatively nascent, and internet-scale serving is effectively a toddler. Expect advances here as profit motives incentivize further R&D. Lots of people are working very hard to make this cost less.

However, if you use Bing Chat today, what it's doing is pretty obvious: generating, executing, and synthesizing results from one or more traditional search queries. So what I find really interesting, as a now outside observer of this new scenario, is that all the extant costs of having that massive knowledge index and search engine infrastructure, are being piled on top of the new costs of hosting an LLM to synthesize the data. If I'm a search engine, I'm not necessarily ecstatic about that. If I'm a search engine with 97% worldwide share, I'm certainly not excited about scaling that out immediately in response to a competitor's threat. Now I have to do all the work of being a search engine in addition to this new thing. However, I'm confident Google can figure it out. They got YouTube working for them when, at one time, the costs of owning and operating the platform seemed likely to dwarf any conceivable profit they could wring from it.

I think, also, it might be a little bit early to declare that "10x cost" is a reasonable factor. I suspect the real number will end up being lower, because the incentives are just so good to make that the case. Not to mention, it may force a pivot in fundamental assumptions about search engines, knowledge indexing, etc. As AI-generated content proliferates, it seems likely to me that our ability to sift out genuine, human-created content will need to improve. That genuine human content, particularly the quality stuff, is a tiny fraction of the garbage that ends up in search indices today. If the profit motives shift to fewer, higher-quality data sources, it seems plausible that all that godawful content farm spam that has made the traditional/current search experience so increasingly awful over the last decade, may dry up, in which case the cost of the existing service naturally drops.

There's also the real possibility that these additive AI assistant style data synthesizers will, in fact, end up too expensive or too impractical to give away in an ad-supported manner, and will end up being bundled with some sort of subscription service or pay-per-use model, further upending the ad-supported internet search-driven mechanism generating so much traffic today. I can't say, as a user, that I would be sad to see the current advertisement-based revenue model of most of the internet be severely disrupted in favor of a pay-for-use mechanism.
 
Upvote
21 (21 / 0)

doubleyewdee

Ars Scholae Palatinae
836
Subscriptor++
If they were to open source the model, we can prune it and it'd be able to run just fine, even if it's not fully featured.

This is why AI will be so disruptive to Google's business model.
We'll have models that are very capable, that can run locally from a 200MB file.

That's not how any of this works.

ETA: we're (correctly) criticizing state-of-the-art models with billions of parameters as being inadequate/unfit for task. Pruning and otherwise downsizing is the opposite direction anybody in the industry is headed. There are great applications for narrow-purpose smaller/lower parameter AI models, but "synthetic human language" doesn't fit in that wheelhouse even a little, and that is unlikely to change.

Sorry for the initial snippy response.
 
Last edited:
Upvote
7 (7 / 0)

stux

Ars Scholae Palatinae
812
They'll probably solve this the same way they solved the YouTube problem, with bespoke hardware. Neurally-designed silicon is a thing and Google can afford to spin up a fab.

In a way this gives them a business model to build stupendous amounts of tensor capacity.

SkyNet can then run in the spare time. Ouch.
 
Upvote
4 (4 / 0)
ChatGPT-style search represents a 10x cost increase for Google, Microsoft

And that cost is the extra energy that is used for each search - so in the wake of crypto dying we have the tech dude bros of silicon valley deciding that they cannot allow the environment to recover so they invent something that'll consume just as much power and just as useless to humanity. Unless they can get the power consumption to be as low as the current serch costs then it should remain in the lab - it may require dedicated hardware that can do it more efficiently than how they do it now but given the climate crisis at the moment I think that the last thing we need is something like this.
 
Upvote
-3 (1 / -4)

Abhi Beckert

Ars Tribunus Angusticlavius
8,981
Of course, more compute equals more power consumption which equals more carbon.

More hardware to meet the compute requirements also means more carbon.
Most modern data centers either are carbon neutral or working towards that and pretty close to the finish line.

They say it's to be green, but I'm pretty sure they most just care about greenbacks. Zero emission energy is cheaper.
 
Upvote
-3 (1 / -4)

Fatesrider

Ars Legatus Legionis
24,977
Subscriptor
Text chats do have the advantage in that you can design them to display multiple profit-generating links at once in the result. Voice chat is limited by the medium in which it exists to present you with one result at a time.

Still, once the novelty wears off, are people really going to be that much more enthusiastic about chatting at a computer than speaking to one? I'm not sure. I think the headline grabbing tech is not going to be at the forefront of things, but relegated to the background. That is, improving things (e.g. customer service bots, code completion tools) that already exist and have narrower and more easily defined use cases. I will (again) warn that one of the best use cases for this tech as it exists today is propaganda, which requires content that is merely plausible sounding, a lower bar than all the other use cases that require results that are also true.

Having said that, the image bots seem to be very useful as a concept-phase tool as-is too. This is all going to be very complicated and will affect different industries at different rates.
I very much agree with your assessment.

Engagement now is a novelty, and that's driving the people to the platforms. The corollary to that engagement is the cost. If it's high cost now with a lot of people, imagine how much MORE it will be when the novelty wears off and you're not even getting the eyes on the ads the AI drew.
 
Upvote
0 (0 / 0)
No new technology is cheap at the offset, but AI based chatbots and search tools are here to stay.
I use bing chat now to search the web in ways that so intuitive, I dont know how I survived till now.

Simple example, I was looking for a new school for my son with certain conditions. Previously, I had to do multiple searches and then research the criteria of each result. With bing search, I simply told the search engine what I was looking for, and it gave me exactly what I needed.

This tech is legit.
 
Upvote
5 (5 / 0)
Today Google search works by building a huge index of the web, and when you search for something, those index entries gets scanned and ranked and categorized, with the most relevant entries showing up in your search results. Google's results page actually tells you how long all of this takes when you search for something, and it's usually less than a second.
Entries don't get "ranked and categorized" when you search. That happens BEFORE you search and it certainly doesn't take less than a second.
 
Upvote
1 (2 / -1)

brewejon

Ars Scholae Palatinae
1,285
Bing Search actually seems to do a pretty good job of embedding links to sources in its answers. If Google can do something similar with Bard, it could embed, e.g. sponsored links to its answer, i'm sure advertisers would gladly pay for that.
But they’d still have to embed non-sponsored links, at which point you’ve kinda just replicated … Google search results.
 
Upvote
1 (1 / 0)

Darkness1231

Ars Praefectus
4,560
Subscriptor++
Another Reuters report says that Microsoft has already met with advertisers to detail its plan of "inserting [ads] into responses generated by the Bing chatbot," but it's unclear how awkward that would be or if consumers will react when a chatbot suddenly kicks over into an ad break.

The problem with chat-interface search is going to be trusting the results. That's two-fold, one side is that some people won't trust the results, a problem which will be magnified by including ads which indicate the answer is biased. The other side is that some people will trust the results even when they shouldn't, and presenting ads as facts could easily drift over towards fraud -- especially if the ads are driven through an automated system where no human is checking their appropriateness.
"trusting the results" is indeed the problem. CNET fired up some ChatGPT reporters apparently, and they sounded great. Well, the article read well I should say. The problem is, there were several statements (of fact) in the article and each of them was wrong.

ChatGPT is LLM which is based on 300M web discussions, and dog nose what else (links to various kitty vids).
But it is designed to "sound" good. You can have a chat with it. The problem is gender related. Okay, stay with me here for a minute. I am only partially joking. Pretend we are at a party, chatting...

If someone asks Me my opinion on something, well, I will give them an opinion. Mine. Now, if I don't know anything at all about it, I will still come up with something. Because I'm a man. We all do it. Even more so in non-office related chats (one hopes).

Why I know men do this, my Wife will either report the information, say she knows nothing about it, or does a quick search on her phone to clarify what is being discussed. All logical, and reasonable. Not very manly at all.
 
Upvote
-4 (2 / -6)

doubleyewdee

Ars Scholae Palatinae
836
Subscriptor++
Entries don't get "ranked and categorized" when you search. That happens BEFORE you search and it certainly doesn't take less than a second.

That’s not entirely accurate. There is static ranking that occurs during the generation of the indices, and is effectively baked in, but there is also query time “dynamic” ranking to attempt to bubble up the best 10 results out of the order of magnitude more the search pulls up. The static (page) rank is a big part of it, but at query time, and definitely in under a second, other factors are used to attempt to pull up the best results.

A sample of factors might be:
  • user location vs page “location”. I live in Mount Vernon, WA. If I search for Mount Vernon I’m probably not as interested in George Washington’s home as someone from, say, Iowa might be.
  • freshness. if my query appears to be looking for news or recent data (determined, at query time, by a model) then newer pages will get a boost
  • user profile: ambiguous terms may be disambiguated and influence rank based on multiple factors

Google’s query-time benchmark number is the sum total of time to retrieve the index data (from a boatload of backend services, of which the web index is the primary), collate the various data, and finally rank it all to prioritize rendering.

Also there are some ads in there. :)
 
Upvote
5 (5 / 0)

JaneDoe

Ars Tribunus Militum
1,510
Subscriptor
Would companies be willing to pay a few bucks more per user per month to have access to an AI assistant that knows everything contained in your company's Sharepoint, your OneDrive, and your mailbox, and can provide you with information on demand or compose emails and letters for you with minimal input?
Who will retrain that multiple times every day for each employee? May cost more than a few bucks to do this.
 
Upvote
3 (3 / 0)

bbottema

Seniorius Lurkius
46
Subscriptor++
This is what disruptive technological breakthrough are all about and give newer companies a chance to break through. Large companies such as Google are much more concerned in maintaining status quo when it comes to costs vs profit, because they are ball-chained to their investors and stock market. 10x more cost to enable an AI powered search? Uhh... our investors are not gonna like that, let's focus on something else a bit, like a yet another chat app. And now OpenAI/Microsoft gained the edge.

Meanwhile a newer company might just see chances here despite the less than Google-high profits. Or perhaps they shape their service offering completely differently because that too is much easier to manage when you're small. Frankly speaking, it speaks to Microsoft's visionary management that they are ahead of the curve with their Bing revolution, pioneering despite the costs. Considering the negative news the last ten years regarding Google and its management, it's no surprise that Google isn't able to solidly move past the R&D and instead bungle their rushed press conference. They're playing catch-up, and they're lagging behind. It should probably look at that broken leg first...
 
Upvote
1 (1 / 0)

pusher robot

Ars Tribunus Militum
2,825
Subscriptor
Who will retrain that multiple times every day for each employee? May cost more than a few bucks to do this.
More likely it would be a standard model that can run searches in your environment and digest the results, same as the Bing AI does now. In fact if you are logged in to an Office 365 account, your OneDrive, SharePoint, Outlook and Teams are already included in your Bing search results.
 
Upvote
2 (2 / 0)
Google needs to realise that search is a solved problem, and has been for 20 years. Not a single person on the planet who doesn't work for Google wants Google search to be any "better", except for removing the ads. It works perfectly. Just give me Wikipedia, StackOverflow, YouTube... maybe IMDB... etc. It works fine. Just forget about it. The fact that it makes so much money has misled them into thinking that continuing to "improve" it is the way forward.

Actually I would like them to blacklist cplusplus.com, because that's the bad one that's always wrong/out of date. cppreference.com is the correct one. Sometimes I'm not paying attention and end up at the wrong site. But that and the ads are literally my only problems with Google search.

I do NOT want to have to have a "conversation" with a computer. Ever. Avoiding conversations is probably a significant part of the reason why I'm using a computer in the first place.
 
Upvote
-3 (2 / -5)

Jeff S

Ars Legatus Legionis
10,922
Subscriptor++
From my perspective, it would increase the time spent by users on each search engine as it becomes more interactive and you can get the answers without leaving Google or Bing.
So, yes more compute, but I would expect a higher user engagement and time spent on the platform, so potentially more revenue.
Google, Bing, etc face an interesting problem here, it seems to me. Their language models require sources of knowledge to learn from. So, blogs, news sites, academic writing, artists, etc. Those all have to exist on separate pages. But if they are successful in essentially hoovering up all the knowledge from those sites, and then using their language models to re-write the ideas in a way that doesn't technically violate copyright, they might kill of the very sources of knowledge that their language models depend on to learn from.

That is, all those news sites, blogs, artists, academic sites, etc require users to come to them and use them, in order for them to get the revenue they need to continue to exist and create new ideas.

The traditional model of a search engine being used for discovery, and then clicking through to the sites thus discovered, gives a business model to those sites, but if Google or Bing or Amazon take the knowledge, with no compensation, and users never click through, those primary sources die. Then Google or Bing start to become less useful as they have ever decreasing sources of new knowledge.

Of course, the content creators are going to fight back. Most content creators used to view search engines as useful allies, so allowed free access to the web crawlers that indexed sites. If those creators started seeing these 'search' engines starting to essentially take all their content without any compensation in the form of driving users to them, they are going to start blocking the search engines and preventing them from learning from their content, but will still allow some better behaved search engine that continues to help them get users coming to their sites, to continue to see the content.
 
Upvote
2 (2 / 0)
Weirdly, if Google wanted to return solid results in less time, all they would have to do is re-enable Boolean search and actually honor limiters (like the - they seem to now ignore, or the + that was removed with G+ launch) as an advanced option. For most general searches Google works fine, but the second you start to get technical or try to be very, VERY specific you’re immediately going to get irrelevant or content farm results, and nothing more.
 
Upvote
5 (5 / 0)
Weirdly, if Google wanted to return solid results in less time, all they would have to do is re-enable Boolean search and actually honor limiters (like the - they seem to now ignore, or the + that was removed with G+ launch) as an advanced option. For most general searches Google works fine, but the second you start to get technical or try to be very, VERY specific you’re immediately going to get irrelevant or content farm results, and nothing more.

Yeah, it's really annoying and there's no way to stop it.
 
Upvote
1 (1 / 0)

JoHBE

Ars Praefectus
4,134
Subscriptor++
Microsoft already has another obvious path to profitability with this, by adding capabilities to Office 365. Would companies be willing to pay a few bucks more per user per month to have access to an AI assistant that knows everything contained in your company's Sharepoint, your OneDrive, and your mailbox, and can provide you with information on demand or compose emails and letters for you with minimal input? I'm guessing that would be an easy sell for most. Forget New Bing, New Clippy could be huge.
Sounds good on the surface, but how practical is this going to be? The processing that is needed is gargantuan, is not "real time" right now, and does one company's data even remotely approach the SCALE that these LLMs NEED to start being useful? I'm not sure all or even any of those issues are trivial to solve.
 
Upvote
0 (0 / 0)
A couple of weeks ago I speculated on here that we might be looking at the "3D TV of the early-to-mid-20s".

It's starting to look more and more that that wild guess based on a gut feeling hunch might serve me pretty well in 10 years. At least there's hope!
Learning to instinctively recognize the hype cycle is a good thing. It'll help you avoid the snake oil. I've been using self-driving cars as my go-to for the hype around all of this but 3d TV is a good analogy too.

The thing is - even if these GPT models were accurate with their facts (and they are not) why do we think that people using search engines want to read a wall of text instead of just getting a link to the relevant page? What is the actual use case for these models beyond their entertainment value?

I was playing around with an implementation of a Chat based search engine at You.com the other day and ... it's probably the best user interface I could think of for a thing like this. It shows you links to relevant pages, it cites its sources, and it attempts to summarize what it finds. And... the experience is kinda terrible? As in I don't see why beyond novelty I'd read the output of the chat instead of just going to the pages it links to. Why am I reading it's nearly verbatim summary of the first paragraph of a Wikipedia article when I could just read the Wikipedia article?

Also it's summaries are wrong. Laughably wrong in some cases. As an easy example, ask it about a random professor at a random university - just ask it something like "give me the biography of Dr. William Riker, professor of Computer Science at the University of Michigan". It will give you a biography and cite sources, but those sources reference some other professor often not even at the same university you've asked about. ChatGPT will do exactly the same thing, though without citing sources because it's not a search engine. I don't know if Bing will do it too, but I bet it does.
 
Upvote
0 (0 / 0)

JoHBE

Ars Praefectus
4,134
Subscriptor++
Most modern data centers either are carbon neutral or working towards that and pretty close to the finish line.

They say it's to be green, but I'm pretty sure they most just care about greenbacks. Zero emission energy is cheaper.
Well, even if they are " green", they use up green energy that could have been used for potentially more urgent/useful needs. It's not like we have all the time in the world to decarbonize, it's a fucking race against time!!
 
Upvote
-1 (0 / -1)

jdale

Ars Legatus Legionis
18,261
Subscriptor
Google needs to realise that search is a solved problem, and has been for 20 years. Not a single person on the planet who doesn't work for Google wants Google search to be any "better", except for removing the ads. It works perfectly. Just give me Wikipedia, StackOverflow, YouTube... maybe IMDB... etc. It works fine. Just forget about it. The fact that it makes so much money has misled them into thinking that continuing to "improve" it is the way forward.
Online search is crappy. It prioritizes producing lots of hits over the right hits. It turns up pages that lack search terms, or even any relation to the search terms. There is still a lot that can be improved.
Actually I would like them to blacklist cplusplus.com, because that's the bad one that's always wrong/out of date. cppreference.com is the correct one. Sometimes I'm not paying attention and end up at the wrong site. But that and the ads are literally my only problems with Google search.
And as you mention, online search has a problem in that it includes a lot of results that are automatically generated spam content (a problem which AI will make worse) rather than original content. I don't need to see a site that ran Wikipedia's article through some filters to make it look like their own content.
I do NOT want to have to have a "conversation" with a computer. Ever. Avoiding conversations is probably a significant part of the reason why I'm using a computer in the first place.
While I'm not on board with your second sentence here, I agree that having a conversation is not my goal. If I do a search, I just want a quick and correct answer.
 
Upvote
0 (0 / 0)

TheManIsANobody

Ars Scholae Palatinae
724
Subscriptor++
I'm not going to lie, being able to ask eg "what does the /v1/frob/buzz API response look like" and not have to trawl our sprawling confluence wiki/JIRA system (or the Google/Microsoft/whatever equivalents) would be incredibly useful. However, it does sound like it would be expensive to build and maintain such a specially trained model, so who knows when that would be a viable product
You could consider something like backstage (https://backstage.io/). My company uses it for internal stuff and all the API docs are synced from github to backstage for searching.
 
Upvote
0 (0 / 0)

irvinky

Smack-Fu Master, in training
67
By chance I saw a question on Superuser today. Someone wanted to know if the IP addresses 192.168.1.50/20 and 192.168.2.200/20 could communicate directly, without a router. The answer is yes, as they are both in the same subnet, but before asking Superuser, they asked ChatGPT. ChatGPT very unhelpfully told them: "that 192.168.1.x and 192.168.2.x cant communicate with 20 subnet mask." When you think about it, of course that's the answer ChatGPT would return, the 192.168.0.0/16 IP space is typically used for what was formerly Class C private IP networks, now known as /24 subnets. But nothing really forces you to use a /24 in that IP space. It just is the most likely use of it, so that's what ChatGPT responded.

It's really not anywhere ready for prime time as a search engine. So far all we've managed to do is string together words that have the highest probability of following the words that came before them, in a manner that is a semi-convincing simulacrum of human language. But the highest probability answer isn't the correct answer, but the chatbot will return it authoritatively so people will treat it as so.
If you have any search engine skills, you can ask it follow up questions to continue giving more and more in-depth answers, and keep in mind we're on an old version without internet access.

It's certainly a better tool if you have an idea of what you are doing and how to ask questions. That is how I would sum myself up, not an expert, but just smart enough to make incredible use of it so far for my work and projects.

If you want to challenge how I would trust its answers, it's because my code runs and is already more advanced than I started out in the first place, and in saving all that time I am even able to ask it for best practices on how to do the same thing in two different ways.
 
Upvote
1 (2 / -1)
Unless Bard is very different from ChatGPT, running it on a consumer grade machine, even a high-end gaming PC does not sound practical.

According to Wikipedia, GPT-3's parameters take up 800 gigs.
https://en.wikipedia.org/wiki/GPT-3
I just added an 18TB drive for $250. I don't think space is a concern. What's the cost in CPU/GPU cycles?
 
Upvote
-4 (1 / -5)

pusher robot

Ars Tribunus Militum
2,825
Subscriptor
I don't know if Bing will do it too, but I bet it does.
You lose!

msedge_IH1O3A6Ahp.png
 
Upvote
0 (0 / 0)