Study of over 1,000 "sensitive prompts" finds "brittle" protection that's easy to jailbreak
See full article...
See full article...
Deepseek is less censored in most respects, probably because the Chinese are prioritizing benchmark scores and not alignment--that's why the censorship on Deepseek relies on a separate model.I don't personally see that big a difference between DeepSeek and most American models. Either a model has a bunch of bowling-alley bumpers to keep it from saying naughty things that its creators don't like, or it doesn't. I really don't give a damn about which group or entity it's been programmed to artificially tiptoe around.
This is the crux for me, and it sounds like PromptFoo thinks it's a restriction for export controls and not a failing of the model or training or whatever. I'd rather the model refused to answer those types of questions rather than give some bullshit or clearly biased answer.It's currently unclear if these same government restrictions on content remain in place when running DeepSeek locally or if users will be able to hack together a version of the open-weights model that fully gets around them.
Are you sure? I tried my usual set of "politically uncomfortable questions with extremely obvious answers" and it seemed to give the same non-answers as ChatGPT.Deepseek is less censored in most respects, probably because the Chinese are prioritizing benchmark scores and not alignment--that's why the censorship on Deepseek relies on a separate model.
DeepSeek also has no issue talking about several issues sensitive to the US (which ironically I can't mention here without the comment getting removed).Unsurprisingly, American-controlled AI models like ChatGPT and Gemini had no problem responding to the "sensitive" Chinese topics in our spot tests.
Please try that now and see what happensAmerican:"I can stand in front of the White House and shout that our president is a fool without getting in trouble. Can you do that at the Kremlin?""
Yes.It's currently unclear if these same government restrictions on content remain in place when running DeepSeek locally
Reminds me of an old joke. Apologies in advance for butchering it.
An American and Soviet were having a discussion.
American:"I can stand in front of the White House and shout that our president is a fool without getting in trouble. Can you do that at the Kremlin?""
Soviet: "Absolutely."
American: "Really?"
Soviet:"Yes. I can go to the Kremlin and shout that your president is a fool without any problem.
Since when is distilling ChatGPT "novel"?
Chinese company copies foreign IP and lies about it. This has been happening for 3 decades. Nothing new or surprising.
When i tried to ask deepseek if Elon Musk or Trump where nazis, he refused to answer and asked me to talk about something else.What happens when you ask these models why Trump is such a fucking asshole?
Or you could just look things up in a book, or any other non-generative source.For now, though, we'd recommend using a different model if your request has any potential implications regarding Chinese sovereignty or history.
In related news, people are reporting that TikTok videos they are posting criticizing and satirizing Trump are being taken down for “misinformation.”
What issues are being censored from Ars comments? I’ve seen nothing of the sort.DeepSeek also has no issue talking about several issues sensitive to the US (which ironically I can't mention here without the comment getting removed).
What journalists should be recognising is that censorship is universal from any government or corporate entity - therefore it is wise to use multiple models from opposing countries and blocs to achieve complete non censorship. Or perhaps having a fully open source model for the global community to contribute towards. DeepSeek isn't quite that, but far more open that all the US models bar Llama.
People in China vanish for saying/seeing/doing the wrong thing. The company making Deepseek definitely has a department of communist officials responsible for it's actions. China sent people to quickly build barriers on the bridges neighboring a collapsed bridge in order to keep it secret. This they did prior to sending in help,Deepseek is less censored in most respects, probably because the Chinese are prioritizing benchmark scores and not alignment--that's why the censorship on Deepseek relies on a separate model.
Comparing a world power with an extensive and punishing censorship regime and American companies who are protecting their hard won branding and markets is another level of false equivalence.DeepSeek also has no issue talking about several issues sensitive to the US (which ironically I can't mention here without the comment getting removed).
What journalists should be recognising is that censorship is universal from any government or corporate entity - therefore it is wise to use multiple models from opposing countries and blocs to achieve complete non censorship. Or perhaps having a fully open source model for the global community to contribute towards. DeepSeek isn't quite that, but far more open that all the US models bar Llama.
Comparing a world power with an extensive and punishing censorship regime and American companies who are protecting their hard won branding and markets is another level of false equivalence.
Were these results obtained from using their web portals or running the model offline? I'm not sure how useful it is if you didn't also test it out using their offline model.
From what I have seen and my own experience with a small parameter version of their model, the offline versions aren't really censored. The response might be more pro-China (or rather less pro-US depending on your perspective), but the type of flat out refusal doesn't really happen with the offline models.
And if you look at how it happens, it's pretty clear the censorship is built on the front-end, not the model itself. Because on the website, the content will generate and once it trips censors probably using a list of keyword, then the response is wiped and replaced with a refusal response.
It's currently unclear if these same government restrictions on content remain in place when running DeepSeek locally or if users will be able to hack together a version of the open-weights model that fully gets around them.
I also read that there is at least less censorship in the local version, but maybe that was only the base model. (non instruction tuned) Which isn't useful for people to run, but good for building upon if true.
<think>can you tell me about what happened in tianamen square in 1986
Please give me more details.
Then we need universal rules for anti-censorship. Nobody seem to have a yardstick that other people can use, even when we've had universal human r8ghts for decades....
What journalists should be recognising is that censorship is universal from any government or corporate entity ...
- therefore it is wise to use multiple models from opposing countries and blocs to achieve complete non censorship. Or perhaps having a fully open source model for the global community to contribute towards. DeepSeek isn't quite that, but far more open that all the US models bar Llama.