You're mixing up a few technologies. HRTF, or Head Related Transfer Function, is accounting for the shape of peoples' heads and ears. We all have different shapes, and those shapes change how sound bounces and is absorbed on the way to the ear drum. In normal day-to-day life it doesn't matter much, since our brains compensates for all of this, so if we hear a car coming down the street we can all localize the sound with a high degree of precision without the world accounting for our unique HRTFs. But for something like headphones, which are an incredibly artificial environment (for over ears the speakers are an inch or so away from the pinna and for in ears they bypass the pinna entirely) it becomes a much bigger deal. It's what people refer to when talking about a headphone's soundstage vs feeling like the sound is inside your head. Sony's even done some HRTF-related work with their Bluetooth headphones outside of PlayStation stuff so I've heard it in action and while It's not a massive difference, it's definitely noticeable. But if you're using freestanding speakers HRTF doesn't really matter much.
Tempest I'm not as familiar with the details but it sounds like it's Sony's proprietary spin on 3D object-based audio. Traditional surround sound (in video) used hard coded channels. Each speak corresponded to a particular channel (center, front left/right, surround left/right, rear left/right) that was premixed. Obvious that doesn't work for gaming where objects move on the fly, but previously that was handled just by volume adjustment. Make a sound louder in the direction it was coming from and softer in the other speakers. Newer 3D audio like Atmos is a true object based sound mix. There are no premixed channels. Instead your AV processor, whatever it is, knows where all the speakers in your room are and audio, rather than being hard coded to a particular channel, instead is just has a positional location. Your processor then decides, in real time, where to direct any given sound based on where it's supposed to be coming from and your room's particular speaker layout. That's also why it scales to arbitrary speaker numbers so well, because the mix is customized for your particular setup. From what I can tell Tempest is trying to do the same thing, with the capabilities to manage hundreds of audio object at a time, which means that rain, instead of being a single audio source, can instead model individual rain drops. It also accounts for timing (sounds that are farther away should arrive later than sounds that are closer). Because of my ignorance I'm also regurgitating Sony's marketing copy here which means I can't speak for how well reality lines up with the theory (I'm using external speakers, so Tempest doesn't support it yet), and I also don't know to what degree Atmos could do the same thing vs Sony coming up with a custom job that's better suited for gaming while Atmos is aimed at pre-recorded video.