I genuinely wonder what they can actually do with models that remain largely black boxes other than add more background instructions to the initializing prompt to try and establish guardrails.
We know that even having the models 'reveal' their reasoning steps is hardly bulletproof and one of the whole points (both good and bad) about these things is that they're non-deterministic. There's no simple feature flag to toggle, no code to comment out.
It's also wild to me that they're asking businesses and investors to sign onto their platforms when you can have hugely impactful behavioral shifts just kinda...happen. This isn't the first time (remember the laziness thing?) and it won't be the last time that LLM 'productivity' is adversely impacted for unclear reasons.
It's like betting everything on a specific horse and rider in a race, but the horse is known to have spurious, uh...outbursts. Sometimes on the track.