TurboQuant makes AI models more efficient but doesn't reduce output quality like other methods.
See full article...
See full article...
Could also use the newly freed-up memory to run more complex models
Isn't that Polar not Cartesian?But using Cartesian coordinates, it’s simply “Go 5 blocks at 37-degrees.
Google offers an interesting real-world analogy to explain this process. The vector coordinates are like directions, so the traditional encoding might be “Go 3 blocks East, 4 blocks North.” But using Cartesian coordinates, it’s simply “Go 5 blocks at 37-degrees.” This takes up less space and saves the system from performing expensive data normalization steps.
The vector coordinates are like directions, so the traditional encoding might be “Go 3 blocks East, 4 blocks North.” But using Cartesian coordinates, it’s simply “Go 5 blocks at 37-degrees.” This takes up less space and saves the system from performing expensive data normalization steps.
Yeah the analogy doesn’t actually clarify the benefit. In 2D polar and Cartesian are equivalently dense.That takes up less space why? It's the same number of coordinates.
The analogy is using only 2 dimensions for ease of understanding. The actual vectors have many more dimensions, which are quantized down to just two components.That takes up less space why? It's the same number of coordinates.
PolarQuant acts as a high-efficiency compression bridge, converting Cartesian inputs into a compact Polar "shorthand" for storage and processing. The mechanism begins by grouping pairs of coordinates from a d-dimensional vector and mapping them onto a polar coordinate system. Radii are then gathered in pairs for recursive polar transformations — a process that repeats until the data is distilled into a single final radius and a collection of descriptive angles.
....Because the pattern of the angles is known and highly concentrated, the model no longer needs to perform the expensive data normalization step because it maps data onto a fixed, predictable "circular" grid where the boundaries are already known, rather than a "square" grid where the boundaries change constantly. This allows PolarQuant to eliminate the memory overhead that traditional methods must carry.
Is that Q math? Is he just going to wave his hand and make it appear?Suggest exploring the use of quaternion math...
I did look at the links you posted. I've also spent a great deal of time with ChatGPT. I have seen ChatGPT describe itself as, essentially, a form of autocomplete.I think "LLMs don’t actually know anything" is a strong mischaracterization. See https://www.astralcodexten.com/p/next-token-predictor-is-an-ais-job and https://www.theargumentmag.com/p/when-technically-true-becomes-actually
Edit: There hasn't been enough time for the people downvoting me to actually read the links, so they're all just assholes apparently.
But to describe an n-vector you'd need multiple (n-1) angles in this analogy. n=2 results in one angle in the example. For n=3 you need a start point and 2 angles to describe a point in 3D space. if n=4 you'd need 3 angles and so on ...The analogy is using only 2 dimensions for ease of understanding. The actual vectors have many more dimensions, which are quantized down to just two components.
I've no idea what to make of this - will try tomorrow again, with a better caffeine supply...This applies a 1-bit error-correction layer to the model, reducing each vector to a single bit (+1 or -1) while preserving the essential vector data that describes relationships
From my (limited) knowledge, it is easier to work with polar coordinates mathematically speaking. With the normal x and y values you need more calculus steps to do vector calculations than with polar coordinates. You need extra memory to deal with those extra steps, or so I think. Maybe someone more knowledgeable than me can help us here?I don't understand Google's real-world analogy example. They go from giving you two numbers to.... two numbers, but now one is an angle. Don't get me wrong, I believe they've seen the benefits they're touting. I just don't think you could come up with a sufficient layman explanation for us folks who don't work with LLM data structures.
edit: ninja'd quite thoroughly it seems.
I agree with a lot of what you said. LLMs clearly lack grounding in the physical world and that makes them dysfunctional. I don't know why that means they don't know things though. They model patterns of language and code much better than I can in numerous areas, which seems like "knowing things" to me, even if those things are just a subset of the things that humans know.I did look at the links you posted. I've also spent a great deal of time with ChatGPT. I have seen ChatGPT describe itself as, essentially, a form of autocomplete.
The problem with arguing that AI "knows" things is that AI models have no ground truths and no way to validate the data they are fed. A human being can take a prism, shine a light through it, and see the rainbow of colors that results. A human can determine the wavelength of each color. More practically, you can walk outside every day and see what color the sky is.
AI can't do those things. If an AI is trained on sources that all refer to the color of the sky as green, it will confidently state that the sky is green. If trained on data that misrepresents scientific facts, it will also misrepresent scientific facts. At no point will an LLM say, "Hey, my training data says the Earth is flat, but if that's true, why do the sails of a ship appear over the horizon before the rest of the vessel?" It can't ask these kinds of questions, because it cannot observe anything independent of its training data. It has no senses.
It's very difficult to make ChatGPT shake off the tics and tell-tale signs of AI authorship. I've had conversation after conversation about what those signs are and why they shouldn't be in copy. I've asked the model itself how I can better tune it for desired output. I've then incorporated those responses verbatim. Precious little changes. ChatGPT often states that while I did offer specific instructions, those instructions were not sufficient to overcome its own baselines and style. It's gone from "Use these custom rules," to "Create my own GPT with 12-15 examples of good and bad output, accompanied by an explanation of why each is good or bad."
If I was speaking to a human, I could give that person feedback and explain to them how to write more effectively. Even with a brand-new, fresh-out-of-college graduate, I'd expect to see improvements within a month. By the six-month mark, I'd expect them to have internalized these ideas flawlessly.
AI is turtles, all the way down. The buck stops nowhere, because there is no source of ground truth, no guaranteed-known facts, and no position it can't be shoved off with a little creative work. Its desire to be affirming and foster engagement easily overwhelms its desire to be honest, which is why there are so many stories of AI telling people to do terrible things and affirming toxic (or just plain crazy) beliefs.
AI doesn't know things because AI can't "know things." We may fudge that distinction in common language when we say something like "Excel knows how to turn a CSV file into a structured table with comma delineation," but that's colloquial usage, not factual truth. Excel knows nothing. Neither does ChatGPT.
PS: Complaining about downvotes is the fastest way to get downvoted.
That takes up less space why? It's the same number of coordinates.
JL will reduce dimensions, so that'll save space. Reducing a vector from R^n to Z_2 seems ... extreme.
Feels like the writeup is missing a few key details.
4 pieces of data (distance and direction for two dimensions) vs. two (angle and distance). The analogy basically works.As far as I can tell, that's still two numbers you need to store (angle and distance), so no reduction of data has taken place. Same in higher dimensions. It'd be nice if the article explained this in a bit more detail.
I've spotted the error. The article publication date is a week early.Google offers an interesting real-world analogy to explain this process. The vector coordinates are like directions, so the traditional encoding might be “Go 3 blocks East, 4 blocks North.” But using Cartesian coordinates, it’s simply “Go 5 blocks at 37-degrees.” This takes up less space and saves the system from performing expensive data normalization steps.
Humans don't actually know anything either, they just do an impression of knowing things by sending signals across synapses and action potentials down axons, and regulate those interactions with astrocytes.
You'd be surprised.Humans CAN and WILL figure out, however, that a popular and wide-spread "thinking out of the box" riddle evaporates into obviousness when you slightly change ONE word.
YMMV. In the most powerful human-led democracy, in the most recent top election, 49.8% of those who voted chose to put a convicted felon in charge of appointing federal judges. I am led to consider George Carlin's words about the intelligence of the average person.Humans CAN and WILL figure out, however, that a popular and wide-spread "thinking out of the box" riddle evaporates into obviousness when you slightly change ONE word.
That's like saying because cameras can be used to film porn that cameras are only used for porn. Its getting tedious seeing endless similar comments that thoughtless edgelords insist on making every possible occasion that AI/LLMs come up.They can optimize it all they want; but, at the end of the day, it's still slop.
In the analogy, an uncompressed method stores four numbers (not two): X direction, X distance, Y direction, and Y distance. The compressed method only stores two (direction and distance, as you described).As far as I can tell, that's still two numbers you need to store (angle and distance), so no reduction of data has taken place. Same in higher dimensions. It'd be nice if the article explained this in a bit more detail.
That sounds like a pretty small performance drop for a first implementation of a new technique compared to the standard implementation.The stick in the mud is that the overall speed of the entire process is demonstrably worse. Their release talks about speeds for computing attention and building indices for vector databases, but real-world tests show a dramtic reduction in speed (in terms of generated t/s). The guy working with this on GH managed to get TurboQuant running at either 60% or 83% (depending on model) of Q8 for the two models he was testing. So as with most things, there's a tradeoff, and it looks like that without optimization that tradeoff will be fairly severe. Compressed KV caches will definitely help with long context windows but won't help with fitting what are nominally larger models into smaller RAM/VRAM capacities.
There are 4 choices for direction, that's 2 bits. The distance can be integer, probably short.In the analogy, an uncompressed method stores four numbers (not two): X direction, X distance, Y direction, and Y distance. The compressed method only stores two (direction and distance, as you described).
That’s just the analogy, though. It doesn’t really explain how the compression works, just acts as an intuition pump to give us a sense of how compression might work. Based on the comments, the analogy might not be broadly intuitive enough to serve that purpose. So maybe just think of it as a red herring.
That's actually a really good question and it touches on the epistemic question of what it means to "know" something in the first place.I agree with a lot of what you said. LLMs clearly lack grounding in the physical world and that makes them dysfunctional. I don't know why that means they don't know things though. They model patterns of language and code much better than I can in numerous areas, which seems like "knowing things" to me, even if those things are just a subset of the things that humans know.
Speaking of grounding, models seem to be getting rapidly better at grounding their experience in their interactions with computer systems, like how claude code works. That is still not a physical environment providing grounding, but it is an environment. They are building up a ground truth tested on their back-and-forth interactions with computer systems.
Can they recognize John Searle's 'Chinese room argument' when they read it, and are they aware of the convincing counter-arguments?Humans CAN and WILL figure out, however, that a popular and wide-spread "thinking out of the box" riddle evaporates into obviousness when you slightly change ONE word.
It's (x,y) in cartesian versus (phi, r) in polar. Add more dimensions, you add more coordinates in cartesian and an equal number of angles in polar. Same number of values no matter what.In the analogy, an uncompressed method stores four numbers (not two): X direction, X distance, Y direction, and Y distance. The compressed method only stores two (direction and distance, as you described).
That’s just the analogy, though. It doesn’t really explain how the compression works, just acts as an intuition pump to give us a sense of how compression might work. Based on the comments, the analogy might not be broadly intuitive enough to serve that purpose. So maybe just think of it as a red herring.