We had some suspicions something like this might be possible after exploring vector steering, where you could push a model by adding particular vectors at particular layers to, say, change the mood, or always bring up King George III, or whatever you may. I imagine that this method is somewhat similar, if rather more advanced.
However, this article is missing the most bemusing part of this project, where Anthropic taught an AI to conduct proper Maoist self-criticism.