.NET/SQL Optimization: SQL-side v code-side

svdsinner

Ars Legatus Legionis
15,142
Subscriptor
Question: You are comparing two ways to compute something expensive:
  1. Computing on the SQL server and getting the answer via SqlClient
  2. SQL data brought into memory via SqlClient and the compute done in .NET code.
Beyond execution time, how do you factor in that #2 will use more memory and causes more pressure on garbage collection? (The wonderful aspect of .NET that when GC kicks in, your app just hangs for a bit while it does GC and you can't control when that happens)

Example: if #1 finishes in 2 seconds and #2 finishes in .4 seconds but sends 1.6MB into GC each run, how do you decide whether that extra 1.6MB of GC pressure is worth it for the faster execution time?

Assume: this expensive calculating is happening 10-100 times/minute and the server has plenty of memory and cores to handle the load.
Assume: this expensive calculation is faster to compute with .NET than with TSQL. (Otherwise this is all moot)
 

rain shadow

Ars Tribunus Angusticlavius
6,360
Subscriptor++
1.6MB is not a ton of GC pressure, unless you're on an old/underpowered machine of some kind

precompute the result and store it on the server, or have the client start computing the value before the user actually needs it

Spawn a separate .NET program that only computes the value and nothing else and it will probably be done before needing a GC

Also, background and concurrent GC have been available for a few years now. if the client still freezes from time to time, maybe look into that.
 
Last edited:
if #1 finishes in 2 seconds and #2 finishes in .4 seconds but sends 1.6MB into GC each run
Is seriously question if they were doing the same work.

If you can query and fetch all the data in LESS time remotely than performing it on the server in a well structured sql query; what pathological issue are you dealing with.
 

fitten

Ars Legatus Legionis
54,717
Subscriptor++
If you can query and fetch all the data in LESS time remotely than performing it on the server in a well structured sql query; what pathological issue are you dealing with.
I was wondering the same thing. There are other things that I would consider, too. For example, we may need to include the data transfer time. The SQL Server should be able to do some things pretty fast and then send a small result set to the client. If every query sends all the data over to the client for calculations, all that's going to add up. 10-100 times per minute for 1.6MB of data isn't too bad for today's networks but the OP doesn't say how many clients are running or where they are. If you are doing that across 100 clients, you'll start getting to a point where someone might notice and probably not something you'd want to do to just any random client(s) in the world. And then we might want to consider what data we are sending to a client. Sending over aggregate data / summaries / etc. is one thing but I would want to consider things if I were sending over confidential / personal data.

IME, SQL Server is pretty good with computation (aggregates, etc.) and if the server is pretty good (which we would generally have pretty big onprem machines... many cores, gobs of memory... or host it in the cloud and then you can scale it as you need) then you don't have to worry about all the clients having to be high(er) end machines, either.
 

Apteris

Ars Tribunus Angusticlavius
9,395
Subscriptor
Example: if #1 finishes in 2 seconds and #2 finishes in .4 seconds but sends 1.6MB into GC each run, how do you decide whether that extra 1.6MB of GC pressure is worth it for the faster execution time?
By thinking about what is making your stakeholders the most unhappy at the moment, and improving on that particular pain point.

I read your question as "#1 is a bit slower, but #2 is a bit more annoying because of the GC, which should I choose?" We can't answer that, we don't have the context you have. Perhaps I'm misunderstanding your question; let me know.

You may find this relevant: Implementing SLOs -- chapter 2 of the Google SRE book.
 

Apteris

Ars Tribunus Angusticlavius
9,395
Subscriptor
If you can query and fetch all the data in LESS time remotely than performing it on the server in a well structured sql query; what pathological issue are you dealing with.
I've run into this with recursive queries. The data was stored in a relational database, but what it was in effect was a tree with thousands of nodes. The SQL query would construct the tree by following the foreign key relations, progressively, and once finished would ask questions of the final structure.

Downloading the data in one go and then using Java (in this case, the programming language doesn't matter that much here) code on the server to filter it and query it proved one order of magnitude faster (at least, this was some time ago), and easier to understand as well.