Samsung allegedly boosting benchmark performance

Status
Not open for further replies.
The sad reality is that everyone does this. This is the primary reason it annoys me when people use SunSpider as a cross-platform benchmark: SunSpider these days is a test of how well your JS engine cheats at SunSpider, nothing more. nVidia started "optimizing" for 3DMark in their drivers almost a decade ago when their FX series struggled to compete legitimately, and today both companies do this sort of thing regularly and extensively - how do you think nVidia or AMD can release a driver that magically doubles performance in some new title? Heck, what do you think that 'free' marketing money that companies get for tacking a "Plays best on Intel" or "The Way It's Meant To Be Played" logo at the start of their game is really paying for?

This is, regrettably, the way of the industry now. And when journalists report results like SunSpider and GeekBench and GLBenchmark and make sweeping judgments of device performance based on them, they're only encouraging this sort of hackery.
 
Upvote
-14 (45 / -59)

dmsilev

Ars Tribunus Angusticlavius
7,267
Subscriptor
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005401#p25005401:qo43e0ci said:
ShlomoAbraham[/url]":qo43e0ci]Any theories as to why? If the chip is there, why not let other apps use it? Save battery life?

That'd be my guess. Set up a profile that runs the thing flat-out when it's in "benchmark the processor" mode and then throttles back for everything else to improve the battery life.
 
Upvote
35 (39 / -4)
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005401#p25005401:22sihxhm said:
ShlomoAbraham[/url]":22sihxhm]Any theories as to why? If the chip is there, why not let other apps use it? Save battery life?
I'd suspect they're ramping up the voltage as well - the chip might be able to handle short bursts of increased voltage and heat, but would have a shortened lifespan if used extensively.
 
Upvote
37 (40 / -3)
ATi and nVidia were both caught in the past doing this with video card benchmarks, too. I particularly remember one incident involving Quake 3 benchmark optimizations about twelve years ago where ATi was doing some specific optimizations that would only trigger when it detected the calling application was "quake3.exe"

Made it really easy to prove, too. Just rename the EXE and you'd see performance drop considerably.
 
Upvote
45 (47 / -2)
Post content hidden for low score. Show…

Adam Starkey

Ars Scholae Palatinae
1,039
Subscriptor
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005475#p25005475:3rqyvvhj said:
Firehawke[/url]":3rqyvvhj]ATi and nVidia were both caught in the past doing this with video card benchmarks, too. I particularly remember one incident involving Quake 3 benchmark optimizations about twelve years ago where ATi was doing some specific optimizations that would only trigger when it detected the calling application was "quake3.exe"

Made it really easy to prove, too. Just rename the EXE and you'd see performance drop considerably.

Yeah that was pretty douchey. but in a way not quite as obnoxious as this. At least in that case one could make the argument that a well written application *could* get that performance out of those video cards. With a combination of good drivers, APIs, and game engine code, those figures were potentially reachable. By contrast there's no way a Samsung device is going to run like the benchmark rigging suggests, as the devices would most likely run uncomfortably hot and suffer much higher rates of inside warranty failure.

All in all, asdf25's comment pretty much sums it up.
 
Upvote
39 (43 / -4)
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005521#p25005521:36d6t1a7 said:
Adam Starkey[/url]":36d6t1a7]
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005475#p25005475:36d6t1a7 said:
Firehawke[/url]":36d6t1a7]ATi and nVidia were both caught in the past doing this with video card benchmarks, too. I particularly remember one incident involving Quake 3 benchmark optimizations about twelve years ago where ATi was doing some specific optimizations that would only trigger when it detected the calling application was "quake3.exe"

Made it really easy to prove, too. Just rename the EXE and you'd see performance drop considerably.

Yeah that was pretty douchey. but in a way not quite as obnoxious as this. At least in that case one could make the argument that a well written application *could* get that performance out of those video cards. With a combination of good drivers, APIs, and game engine code, those figures were potentially reachable. By contrast there's no way a Samsung device is going to run like the benchmark rigging suggests, as the devices would most likely run uncomfortably hot and suffer much higher rates of inside warranty failure.

All in all, asdf25's comment pretty much sums it up.

I look at it this way: It definitely proves that benchmark cheating is widespread and that you should ALWAYS take this stuff with a grain-- perhaps even an entire shaker-- of salt. I do believe this is good reason to call Samsung on the carpet to explain themselves, though. There's no excuse for this kind of blatant bullshit.
 
Upvote
31 (33 / -2)
This in a way reinforces my gripe with the mobile market. Companies aren't as interested in making excellent consumer products as they are making sales.

I suppose it's a chicken and egg thing except making a good phone which will build good will is a lot slower than attacking the spec sheet and getting immediate sales.

I think samsung have demonstrated that they prefer the latter; win the contest on tech specs, make the sales and abandon the phone when the next one nears release.
 
Upvote
31 (32 / -1)

dnjake

Ars Tribunus Militum
2,519
This is a two edged sword. I don't know of anything that went this far. But, I do know from my days as a professional that it was common to put a large focus on how well a product did on measurable benchmarks that had little to do with how the product was used in the real world. Companies take that kind of approach because it serves their marketing. It serves their marketing because customers make buying decisions based on that kind of benchmark. The complaints about Samsung would be more impressive if users were reporting that their products did not work satisfactorily on the applications that they actually use them for. Presumably there are some games where this kind of spec could make at least a little difference. But, it is hard to imagine much else.
 
Upvote
-10 (10 / -20)

Adam Starkey

Ars Scholae Palatinae
1,039
Subscriptor
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005537#p25005537:k088ww83 said:
Firehawke[/url]":k088ww83]
I look at it this way: It definitely proves that benchmark cheating is widespread and that you should ALWAYS take this stuff with a grain-- perhaps even an entire shaker-- of salt. I do believe this is good reason to call Samsung on the carpet to explain themselves, though. There's no excuse for this kind of blatant bullshit.

I totally agree. The auto industry sending out ringers to magazines to test is pretty much the accepted norm. It's wrong, and if no-one calls them out on it, then there's no incentive for anyone to play fair.

Unfortunately, this'll rattle around the tech-sphere irritating a handful of nerds who already undertstand that those benchmarks are useful only as a cheap way of irritating other fanboys. The average Joe will know nothing about any of this, all he'll know is that the guy at Best Buy reliably informed him that the S4 smoked everything in its path when magazines benchmarked it, and Candy Crush will therefore look AWESOME!!! Job done, Samsung. :(
 
Upvote
21 (22 / -1)
The sad part is that the blame is going to land on the scapegoats from their R&D/engineering departments. The idiots from Upper Management who ordered this act of stupidity on the other hand are going to get away with it.

As I said before, Samsung is a company filled with some of the most amazing engineering and design teams the world has ever seen. Sadly, all that smart and creativity goes down the drain because of the idiot bosses.
 
Upvote
-8 (13 / -21)

aiken_d

Ars Tribunus Militum
2,038
The funny thing about this is that it probably didn't do them any good at all. iOS users prioritize UX over benchmarks. WP8 users work for Microsoft. So this is just competitive against other Android devices, for the narrow segment of people who look at benchmarks... Who were probably all going to buy Samsung anyway. Why risk the PR black eye?

Did it sell even one more phone? We've got a lot of highly technical Android users here... did any of you change a purchasing decision based on these benchmarks?
 
Upvote
11 (27 / -16)

Adam Starkey

Ars Scholae Palatinae
1,039
Subscriptor
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005587#p25005587:2i5v92qc said:
Sixclaws[/url]":2i5v92qc]The sad part is that the blame is going to land on the scapegoats from their R&D/engineering departments. The idiots from Upper Management who ordered this act of stupidity on the other hand are going to get away with it.

Samsung is not in the habit of eating crow. My guess is that no-one's going to be made a scape-goat for anything.

As I said before, Samsung is a company filled with some of the most amazing engineering and design teams the world has ever seen. Sadly, all that smart and creativity goes down the drain because of the idiot bosses.

Accusing Samsung's bosses of being idiots is seriously misguided.
 
Upvote
5 (12 / -7)

TheFLP

Ars Praetorian
427
Subscriptor++
Curious to know which two people downvoted every single suggestion that Samsung is underclocking for battery life considerations, and why they bothered to crawl out from under their rocks.

Valuing battery life over performance is a smart thing to do in a phone. On the other hand, manipulating the benchmarks is dishonest and therefore stupid for anyone one who values integrity. I guess that makes it a wash.
 
Upvote
-11 (24 / -35)

Adam Starkey

Ars Scholae Palatinae
1,039
Subscriptor
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005607#p25005607:2jzsagrr said:
aiken_d[/url]":2jzsagrr]The funny thing about this is that it probably didn't do them any good at all. iOS users prioritize UX over benchmarks. WP8 users work for Microsoft.

And Best Buy customers just want something to help them feel good about an arbitrary choice.

This seems to need repeating in almost every thread here, so I'll step up this time: you and I are reading Ars, we do not represent the broader market.
 
Upvote
19 (20 / -1)

F22Rapture

Wise, Aged Ars Veteran
193
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005475#p25005475:3m1o2p0w said:
Firehawke[/url]":3m1o2p0w]ATi and nVidia were both caught in the past doing this with video card benchmarks, too. I particularly remember one incident involving Quake 3 benchmark optimizations about twelve years ago where ATi was doing some specific optimizations that would only trigger when it detected the calling application was "quake3.exe"

Made it really easy to prove, too. Just rename the EXE and you'd see performance drop considerably.

Was that specific to the benchmark itself though? Making game-specific optimizations isn't really cheating. Cheating would be putting in extra effort to make the benchmark run better than the game does (optimizing a JS engine for Sunspider), or unlocking extra hardware functions not available to anything else (as Samsung seems to have done), or some other hack which misleads the benchmark (such as ATi reducing default image quality and accuracy to get a higher FPS).
 
Upvote
8 (11 / -3)
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005607#p25005607:1an206p9 said:
aiken_d[/url]":1an206p9]Did it sell even one more phone? We've got a lot of highly technical Android users here... did any of you change a purchasing decision based on these benchmarks?

I would say that it doesn't necessarily have to have influenced an end user. Like in the article ars states they use them, ars make recommendations, if the benchmark influences ars recommendations then the benchmark score will indirectly influence someone who is influenced by ars. Same applies to any tech publication.

Car analogy works here. Motoring journos tend to be quite a fickle bunch and quite often rev heads. If they are loaned a car which is unrepresentative of the fleet, ie special tune or selected to make sure there aren't any creaks or funny noises, then that review isn't exactly representative of the normal situation.

Michelin restaurant reviews have the right approach. Come in unannounced, pay cash and don't tell who they are. Once you put aside the reviewers inherent bias or slant (some people like things others don't), you can be sure that the review is a faithful experience.
 
Upvote
17 (17 / 0)

Tyler X. Durden

Ars Tribunus Angusticlavius
9,166
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005415#p25005415:1svkoork said:
dmsilev[/url]":1svkoork]
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005401#p25005401:1svkoork said:
ShlomoAbraham[/url]":1svkoork]Any theories as to why? If the chip is there, why not let other apps use it? Save battery life?

That'd be my guess. Set up a profile that runs the thing flat-out when it's in "benchmark the processor" mode and then throttles back for everything else to improve the battery life.
Exactly, which means they know damn well that when push comes to shove the speed vs battery trade-off they are using for everything else is what the vast majority of users will enjoy but that for a non-zero number of [potential] customers the e-peen of benchmarks moves product.

I'm not sure which is sadder of the two, the e-peen attitude or blatant deception used to play to it.
 
Upvote
6 (7 / -1)
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005441#p25005441:1a6n4nud said:
charleski[/url]":1a6n4nud]
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005401#p25005401:1a6n4nud said:
ShlomoAbraham[/url]":1a6n4nud]Any theories as to why? If the chip is there, why not let other apps use it? Save battery life?
I'd suspect they're ramping up the voltage as well - the chip might be able to handle short bursts of increased voltage and heat, but would have a shortened lifespan if used extensively.
If this is the case - and I'm guessing you're correct and it is - than I'm not sure what the problem is.

I'm the first to jump on the bash-Android bandwagon, and these days Samsung is pretty analogous to Android, but I don't see the problem here. It makes sense for manufacturers to design chips with theoretical peak performance, but limit the hardware from operating (at least regularly) at that peak. Why would a benchmark not test the theoretical limit of a processor?

Everyday usability is measured in other ways. Which is why I moved away from Android after owning two devices. They were the "most powerful" when I bought them, but the experience was garbage. Are there any objective benchmarks out there which measure UX?

EDIT: cleaned up words, added explanation/answer to my confusion:

From fuzzyfuzzyfungus (below):

This Samsung configuration, though, was slightly different: If (and only if) specific benchmark .apks were running, the CPU governor would lock the CPU into an otherwise unavailable maximum frequency mode. If anything other than those benchmarks, no matter how demanding it might be, was running, normal frequency behavior applied. That's essentially false advertising on their part.
 
Upvote
-13 (12 / -25)

Tyler X. Durden

Ars Tribunus Angusticlavius
9,166
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005645#p25005645:3dc2zwzb said:
F22Rapture[/url]":3dc2zwzb]
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005475#p25005475:3dc2zwzb said:
Firehawke[/url]":3dc2zwzb]ATi and nVidia were both caught in the past doing this with video card benchmarks, too. I particularly remember one incident involving Quake 3 benchmark optimizations about twelve years ago where ATi was doing some specific optimizations that would only trigger when it detected the calling application was "quake3.exe"

Made it really easy to prove, too. Just rename the EXE and you'd see performance drop considerably.

Was that specific to the benchmark itself though? Making game-specific optimizations isn't really cheating. Cheating would be putting in extra effort to make the benchmark run better than the game does (optimizing a JS engine for Sunspider), or unlocking extra hardware functions not available to anything else (as Samsung seems to have done), or some other hack which misleads the benchmark (such as ATi reducing default image quality and accuracy to get a higher FPS).
If it is the incident I'm think of it I'm pretty sure it involved pre-loading and maintaining in cache certain textures that showed up in a demo sequence in those games that was commonly used for benchmarking (because it was a consistent use and exercising of the game engine that could be duplicated over and over).

So the optimizations would really only work for that particular map and to a certain extent that path.
 
Upvote
12 (12 / 0)
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005641#p25005641:2td7zlq9 said:
Adam Starkey[/url]":2td7zlq9]This seems to need repeating in almost every thread here, so I'll step up this time: you and I are reading Ars, we do not represent the broader market.

However, how many people listen to your advice regarding technology? And how will this affect the perceptions of those who aren't particularly loyal to Samsung? Just recently we've heard about the Galaxy S4 Active's issues with water and today we hear about this. In the long run, a continuing pattern of this kind of behavior could hurt Samsung,
 
Upvote
6 (7 / -1)
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005705#p25005705:yyrvb1q4 said:
thebonafortuna[/url]":yyrvb1q4]
I'm the first to jump on the bash-Android bandwagon, and these days Samsung is pretty analogous to Android, but I don't see the problem here. It makes sense for manufacturers to design chips with theoretical peak performance, but limit the hardware from operating (at least regularly) at that peak. Why would a benchmark not test the theoretical limit of a processor?

The issue is that the 'theoretical limits' of the processor change specifically when the phone recognizes that certain benchmarks are running, in a way that is entirely unavailable at other times.

A CPU frequency governor, as standard on virtually everything for years now, is the 'honest' implementation of this behavior: if you do something that eats CPU time, CPU frequency increases. If usage drops, the CPU decreases frequency, and sometimes also cuts voltage or even puts part of itself to sleep. Nothing wrong with that, totally sensible to use headroom when you need it and go to sleep when not needed.

This Samsung configuration, though, was slightly different: If (and only if) specific benchmark .apks were running, the CPU governor would lock the CPU into an otherwise unavailable maximum frequency mode. If anything other than those benchmarks, no matter how demanding it might be, was running, normal frequency behavior applied. That's essentially false advertising on their part.
 
Upvote
33 (34 / -1)

Infrasound

Smack-Fu Master, in training
52
Happens with video games, the whole "Save battery" aside (which I get is a good thing) you have something which should reflect overall performance, i.e the benchmark is meant to give a reasonably honest and transparent reflection of the hardware by pushing it in all directions and into the corners.

Raw clockspeed isn't the be all and end all of performance it was even five years ago and if you don't understand why that's so then enjoy buying a new phone every 21.8 months when your provider will waive the rest of the contract because their generous.

Should they do this? I'm on the fence, MS did it with their own apps and got taken to the cleaners with it a few years back, Nvidia and ATI do it with benchmark software, then end result however is still always going to be you get what you pay for. The companies who make these devices have nailed down cost per unit and expected volumes, so as some has said by the time you get your handset its all marketing.
 
Upvote
0 (1 / -1)

truepusk

Ars Tribunus Militum
1,746
Subscriptor
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005461#p25005461:jjfo0sr0 said:
atomo[/url]":jjfo0sr0]
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005443#p25005443:jjfo0sr0 said:
MacsAre1[/url]":jjfo0sr0]Two words: Battery life.

One word: Marketing.

Obviously marketing....
I think he was responding to the silly questions/comments above (context).
 
Upvote
4 (4 / 0)

truepusk

Ars Tribunus Militum
1,746
Subscriptor
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005529#p25005529:34iucxus said:
pbrice68[/url]":34iucxus]Samsung has always been trash. This discovery is not at all surprising.

I remember before Apple switched over to intel for its mac processors, a dual G4 came out, probably around 2000/2002. Apple faked/gamed the hell out of the benchmarks to make it look like the system could compete and defeat windows/intel systems. That said, at that time no one bought into it, so it's easier to feel burnt here.

Also, as some knowledgeable posters have been pointing out, there are several recent examples of this from other companies. Kudos to Anand and the original trailblazers that raised the alarm bell. So many have been caught with their hand in the cookie jar over the years and recently it's hard to get too outraged.
 
Upvote
-12 (10 / -22)

truepusk

Ars Tribunus Militum
1,746
Subscriptor
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005641#p25005641:v9t7jooh said:
Adam Starkey[/url]":v9t7jooh]
This seems to need repeating in almost every thread here, so I'll step up this time: you and I are reading Ars, we do not represent the broader market.

Best Buy probably isn't the broader market anymore. I don't think many analysts think it will be long before they go the way of Circuit City and those who don't are probably betting on Best Buy to be able to make major changes to adapt.

Furthermore, regarding Ars and the broader market, I have a hair to split. Maybe Ars isn't the broader market, but more and more it seems like Ars, in it's staff, articles, readership, and comments, is representing the lowest common denominator, especially compared to a decade ago. I'm sure part of it is me becoming old, but from what used to be an ivy league techy staff to what they have now, post acquisition to a lot of insightful, open-minded commentators relative to what I see now... That aside, kudos to this article and many other bright spots that still keep me coming back.
 
Upvote
-1 (4 / -5)
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005779#p25005779:383feedy said:
fuzzyfuzzyfungus[/url]":383feedy]
[url=http://meincmagazine.com/civis/viewtopic.php?p=25005705#p25005705:383feedy said:
thebonafortuna[/url]":383feedy]
I'm the first to jump on the bash-Android bandwagon, and these days Samsung is pretty analogous to Android, but I don't see the problem here. It makes sense for manufacturers to design chips with theoretical peak performance, but limit the hardware from operating (at least regularly) at that peak. Why would a benchmark not test the theoretical limit of a processor?

The issue is that the 'theoretical limits' of the processor change specifically when the phone recognizes that certain benchmarks are running, in a way that is entirely unavailable at other times.

A CPU frequency governor, as standard on virtually everything for years now, is the 'honest' implementation of this behavior: if you do something that eats CPU time, CPU frequency increases. If usage drops, the CPU decreases frequency, and sometimes also cuts voltage or even puts part of itself to sleep. Nothing wrong with that, totally sensible to use headroom when you need it and go to sleep when not needed.

This Samsung configuration, though, was slightly different: If (and only if) specific benchmark .apks were running, the CPU governor would lock the CPU into an otherwise unavailable maximum frequency mode. If anything other than those benchmarks, no matter how demanding it might be, was running, normal frequency behavior applied. That's essentially false advertising on their part.
Ahh, that makes sense. Thank you for the detailed response. I learn something new every day on this site.
 
Upvote
4 (4 / 0)
Status
Not open for further replies.