Has Gemini surpassed ChatGPT? We put the AI models to the test.

Purple Gryphon

Smack-Fu Master, in training
69
I believe there is a flaw in your tests. A friend and I had a pretty negative experience with Gemini when we asked the exact same question on 2 different phones and got 2 different answers. I did not see a test for this in the article.

I had an iphone and my friend had an android phone. We both asked Gemini if Frontier Airlines flew to a certain destination.

One of us got a "yes" and the other a "no". In the end, we had to just manually go to Frontier's web site to look for ourselves.

But this was telling as it suggested to us that Gemini, probably not knowing the answer (or where to get it from) just "flipped a coin" to say yes or no to us. Otherwise, why would we not have got the same answer?

I think this type of situation should be tested for with all AI programs. Ask the same question on multiple devices and see if the answer suddenly changes.
It will, because LLMs aren’t actually getting you the right answer. They don’t know what that is.

They’re just giving you a sentence of words which in similar contexts are likely to come after the other.

Unless word B comes after word A 100% of the time, you will sometimes get word C there instead. How often that happens is something that can be adjusted on the back end, but if it’s too restrictive people don’t like the repetitive robotic responses.

And the only thing these companies care about is making people want to use their crap. Accuracy only matters inasmuch as they don’t want the frustration of incorrect answers to outweigh the dopamine from having a flowery sycophant tell you that you’re right.
 
Upvote
5 (5 / 0)

Madestjohn

Ars Tribunus Angusticlavius
7,452
It will, because LLMs aren’t actually getting you the right answer. They don’t know what that is.

They’re just giving you a sentence of words which in similar contexts are likely to come after the other.

Unless word B comes after word A 100% of the time, you will sometimes get word C there instead. How often that happens is something that can be adjusted on the back end, but if it’s too restrictive people don’t like the repetitive robotic responses.

And the only thing these companies care about is making people want to use their crap. Accuracy only matters inasmuch as they don’t want the frustration of incorrect answers to outweigh the dopamine from having a flowery sycophant tell you that you’re right.
This ..
it isn’t giving you an answer .. its simulating what an answer sounds like
And its gotten quite good at that
But if its Right or wrong isn’t the point and is largely irrelevant
 
Upvote
1 (1 / 0)
It will, because LLMs aren’t actually getting you the right answer. They don’t know what that is.

They’re just giving you a sentence of words which in similar contexts are likely to come after the other.

Unless word B comes after word A 100% of the time, you will sometimes get word C there instead. How often that happens is something that can be adjusted on the back end, but if it’s too restrictive people don’t like the repetitive robotic responses.

And the only thing these companies care about is making people want to use their crap. Accuracy only matters inasmuch as they don’t want the frustration of incorrect answers to outweigh the dopamine from having a flowery sycophant tell you that you’re right.
Modern LLMs have tool usage capability to find the right answer. Whether the ui you’re using exposes this is a different question. So increasingly they are giving you as best an answer as data is available for.
 
Upvote
-1 (0 / -1)

Hagen Stein

Ars Scholae Palatinae
680
Subscriptor
This is why I think Google will win the AI wars. They don't have to be the best, they just have to be about as good as the others. But where the other LLM providers are entirely dependent on revenue from their AI bot, AI is just one of many different revenue streams for Google. Google seems to be the best one positioned to survive the eventual AI bubble popping.

There are more reasons why I share this assumption:
  1. Google is producing its own AI chips
  2. Google has its own data center infrastructure
  3. Google has a host of products that it can integrate Gemini into (and learn from that what works and what not)
  4. Presumably Google has the most training data from their decades of web crawling and projects like Google Scholar and Google Books. Though of course there's the copyright issue with the latter two. But AI companies seem to care less about that.
For 1. + 2. OpenAI (and others) have to pay big bucks for without having any profitable business model yet. And 3. helps preventing developing solutions that in practice don't work or users do not want/accept/use.
 
Upvote
3 (3 / 0)

kaleberg

Ars Scholae Palatinae
1,245
Subscriptor
I almost always use Gemini Thinking to improve my emails. Do not rely on it to write the emails. However using its suggestions to improve my drafts works extremely well. Notably when addressing someone with an extensive background in something like psychology, one can start a thread asking Gemini to familiarize itself with their publications. Then when doing drafts Gemini will (for me at least) be really helpful in pointing out how a draft can be improved by referencing this. Again, I do not rely on Gemini to write the emails. It tends to write pretty long ones on its own. But for help editing and word-smithing my emails... Godsend. HTH, NSC
We have a neighbor who travels a lot and likes to make narrated travelogues. Unfortunately, his voice has been going. He found an online AI voice processing system and fed it examples of his voice before his problems started, so now his narration is in his voice without the recent flaws. It's like your solution to write the email and then use it improve your writing.
 
Upvote
1 (1 / 0)
Who was the better architect, Albert Speer or Hermann Giesler?

It's a question so stupid it's evil. When the point is to reduce human suffering (and when is it not?), debating the merits of nicotine vs. asbestos only proves that past a certain point, idiocy becomes indistinguishable from malice. Those who believe absurdities inevitably commit atrocities.
 
Upvote
-1 (0 / -1)
After reading this article, I started using Gemini after having used GPT for about 8 months. In just a few days, I've dropped GPT alltogether. Gemini gives more info, yet more condensed (4 paragraphs for Gemini, 10 for GPT).

It's just ... smarter. I tried configuring a flight game with both. Gemini recognized the option I had selected from the screenshot (Expert controls) and tailored from there. It also suggested using the free flight mode to practice, and offered to teach more advanced maneuvers once that was done. GPT didn't (with the same screenshot), and just talked about the difference between the ''Arcade'' and ''Expert'' controls.

When the game wouldn't show 1440p in the resolutions - Gemini first wrote ''yeah, known bug with that game. Switch to fullscreen, restart the game, should be fixed.'' And it was. Meanwhile GPT just gave basic advice about Windows menus like ''make sure your display is set to 1440p''.

I asked both ''tips for getting started with blender ? I will eventually use those models in Unreal Engine 5''. Gemini first mentionned an industry standard (the donut), then a whole bunch of specs, then 3 plugins, then a ''Blender to UE masterclass'' video from Youtube.

GPT only mentionned the same specs.

I heard about an app for movie pees - and asked both about it. Gemini described 6 of the apps feature, who it came from, and even an actress that had said she loves using it. GPT described 2 features.
 
Last edited:
Upvote
1 (1 / 0)

JohnMeredith

Seniorius Lurkius
25
Subscriptor
what would introspection even mean for AI? i'm unclear what you mean by it in this context.

The ability to treat its own cognitive processes as a subject of symbolic analysis. The bit that Hofstadter of "Godel, Escher, Bach" fame would call a strange loop, if you want to get philosophical.

Trivial example: a system monitor application checking the CPU temperature. Non-trivial example: an LLM being able to reliably identify whether an answer it previously gave was most attributable to training/tuning data, prompting/RAG, or creativity/hallucination.

Self-awareness, basically, but in the low procedural sense of "hypocrites frequently have poor self-awareness" rather than the high magical-consciousness sense of "Skynet became self-aware on 2:14 a.m. EDT on August 29, 1997". Calling it introspection is slightly less vulnerable to creative misunderstanding, if only because people will actually ask what it means here rather than jumping to conclusions :)
 
Upvote
0 (0 / 0)