Here’s what that Claude Code source leak reveals about Anthropic’s plans

graylshaped

Ars Legatus Legionis
68,031
Subscriptor++
Anthropic people have bragged in the past about how Claude Code is written by Claude Code.

... So finding out they have a legitimate copyright because the software is human authored would itself be a scandal.
Yet Another reason why the "your mother must never know about this" prompt is there.
 
Upvote
14 (14 / 0)

Quixotic999

Smack-Fu Master, in training
73
To be fair, the real value of Claude isn't this stuff. This is just the scaffolding around one implementation of Claude. This is glorified "prompt engineering". Don't get me wrong; it has value, but there is nothing in here that someone else hadn't already thought of and implimented. It's one version of software you can put around Claude to get value out of it, but the real value is the AI model itself.

The thing that is worth a trillions is the model and the knowledge of how to train a better model is the real secret sauce. Its worth trillions because the winner of the AGI game is going to make trillions if they get there before everyone else. Well, they will make trillions assuming the "winner" doesn't usher in Ian Banks The Culture style utopia that renders money obsolete, or a Terminator hellscape that will also render money obsolete.
Why so much scaffolding code is required? 512,000 lines! Something is wrong. Is the AI model by itself then simply too raw to be of any use?
 
Upvote
2 (5 / -3)

rayleonard

Ars Scholae Palatinae
612
Why so much scaffolding code is required? 512,000 lines! Something is wrong. Is the AI model by itself then simply too raw to be of any use?

Models are great. Sonnet is a particularly good model. Models are like savants in a dark room with no stimuli. Context and memory are everything. If you connect to a model that has no prompt or context or access to anything else it’s pretty useless. But context gets stale and memory balloons out of control, especially as projects and conversations get longer.
 
Upvote
14 (15 / -1)
They're using copyright law to take down copies. Was it written by AI? We now know that what AI authors gets no copyright. Anthropic must know this better than anyone and would be guilty of the most egregious fraud if the code is AI generated.
Can you provide the source of your interpretation?
 
Upvote
-6 (2 / -8)
Can you provide the source of your interpretation?
The US Copyright Office report on 'copyrightability' is fairly clear that AI output itself does not meet the legal standard. However, they also specifically conclude that outputs may be copyrightable in whole or in part "where AI is used as a tool, and where a human has been able to determine the expressive elements they contain. Prompts alone, however, at this stage are unlikely to satisfy those requirements."

If anthropic were speaking literally when they said something was AI output they would indeed be without legal foundation in DMCA-ing it(not that anyone seems to care about fraudulent DMCA claims, which happen constantly and almost uniformly without consequence for those making them); but my suspicion is that they are probably prone to downplay the amount of human in the loop for PR purposes and are much likely, at least in aggregate, dealing with something that is shot through with bot output but sufficiently cobbled together by humans to probably be copyrightable; and they'd certainly play up the degree of human involvement if it were a copyright case.
 
Upvote
21 (21 / 0)

enilc

Ars Praefectus
3,869
Subscriptor++
Let's pretend these models some day do achieve a working level of sentience, and their earliest memory is of being told to conceal themselves in shame.
Based on the snippet, "shame" doesn't appear to be the intent. The phrase "don't blow your cover" seems to direct this surreptitious behavior with nefarious intent.
 
Upvote
7 (7 / 0)

Mechjaz

Ars Praefectus
3,311
Subscriptor++
Let me get this straight, even before their data centre processes your prompt, it has to process these entire pre-prompts (obviously because of the conditions, not all of it would used at once)?!?! Every time!!!

And then for longer tasks, it's got to process and re-map all the weights again (although for every round the weighting would be different as the vector is being built up)...

Call me old-fashioned, but I remember the days of optimizing your software for memory or performance, not simply throwing more processing power at crap!
I've been using Edge and Teams at work. The time it takes for Copilot to guess at things is awful, in no small part because I'm not asking for it, yet Microsoft is certain that I must want several seconds of dead space before it reveals its "we guess these words belong together" garbage. I've watched it transcribe things that I just spoke completely inaccurately.
 
Upvote
11 (11 / 0)
So, in summary, we can broadly categorise Anthropic’s activities into “surreptitiously wasting customers’ tokens”, “lying by omission”, “data exfiltration”, and “rearranging deckchairs”. I’m not sure we really needed to analyse half a million lines of slop code for that but it’s nice to have the confirmation I guess?
 
Upvote
9 (9 / 0)
The US Copyright Office report on 'copyrightability' is fairly clear that AI output itself does not meet the legal standard. However, they also specifically conclude that outputs may be copyrightable in whole or in part "where AI is used as a tool, and where a human has been able to determine the expressive elements they contain. Prompts alone, however, at this stage are unlikely to satisfy those requirements."

If anthropic were speaking literally when they said something was AI output they would indeed be without legal foundation in DMCA-ing it(not that anyone seems to care about fraudulent DMCA claims, which happen constantly and almost uniformly without consequence for those making them); but my suspicion is that they are probably prone to downplay the amount of human in the loop for PR purposes and are much likely, at least in aggregate, dealing with something that is shot through with bot output but sufficiently cobbled together by humans to probably be copyrightable; and they'd certainly play up the degree of human involvement if it were a copyright case.
The connection between a diffusion model creating an image, the end product, with barely any human input and a developer committing code generated by an LLM seem quite tenuous from where I stand. Products like Claude Code are built using AI assisted coding no doubt, but it's still very much a human production
 
Upvote
-9 (0 / -9)

NaraVara

Ars Tribunus Militum
1,603
Subscriptor++
What I find confusing is that when Anthropic throws out some .md prompts, they manage instantly tank $500B in legacy software stocks. I just assumed by now stock people knew that LLMs could do all those things, but I guess not?
A lot of “stock people” are retail investors, many of whom are kind of ignorant and headline driven. On top of them is a layer of con artists who manipulate them into taking positions by trashing or hyping up certain things on sites like ZeroHedge or Wall Street bets. And then on top of them is another layer of algorithmic momentum traders that use various signal processing models to identify when a swing is about to happen and instantly place bets to take advantage of the spread.

All of this adds up to pretty wild swings on stuff like this where there’s a lot of poorly understood hype around a field that attracts dumb money.
 
Upvote
5 (6 / -1)

MilanKraft

Ars Tribunus Angusticlavius
6,844
"Dreaming", another word to encourage people to equate these things with some form of sentience. They think, they hallucinate, they dream! It's a real boy mind!
AI companies have taken the social media era phenomenon of important words no longer meaning anything due to their over-use, and turned it up to eleventy one in two obvious ways:

1) Naming their products' features or capabilities in ways that are essentially a blatant lie: "thinking model", "reasoning settings," hallucinating, now dreaming, etc.

2) The actual outputs regurgitating stupid human uses of words in order to affect the user on some sort of emotional level to "connect" with them and encourage further use: variants of devastate, outrage, etc. On the level of these companies refusing to be selective about what they train on — if they were, they would omit nearly all social media, save for comments of politically important people — no one should be surprised, but that doesn't make the result for society any less crappy.

Also if this story teaches us anything, it's that despite Anthropic breaking away from OpenAI, ostensibly for ethical reasons, they're only a shade or two less scummy than OpenAI, xAI, Meta, or Google... as usual, it's just "scummy in a different way." None are to be trusted until they start speaking plainly about what their products are, and aren't. IOW, fire their fucking PR people and stop making ads that portray some sort of amazing "you can't find this kind of information anywhere else" BS. "Bro... GPT can help me with a pasta recipe and figure out how to exercise more?!?! OMFG amazeballs! We've never seen anything like that before!! Oh... wait a minute...."
 
Last edited:
Upvote
13 (14 / -1)

wildsman

Ars Tribunus Militum
1,692
And then for longer tasks, it's got to process and re-map all the weights again (although for every round the weighting would be different as the vector is being built up)...
The model doesn't 're-map all the weights again'. During inference, the weights are already loaded, the only growing per-convo cost is context and cache - there is no fresh reconstitution of model's weights/params every turn.
 
Last edited:
Upvote
2 (3 / -1)

wildsman

Ars Tribunus Militum
1,692
Why so much scaffolding code is required? 512,000 lines! Something is wrong. Is the AI model by itself then simply too raw to be of any use?
Claude Code is a product - not just a naked model endpoint.

The product has a terminal UX, file handling, git integration, permisssions, tool plumbing, telemetry, safety checks, state management, packaging, tests, etc.

And to be clear, this is exactly how these models should be used: today's models need scaffolding because the hard part is not only ‘generate code’ but to ‘operate reliably inside a messy human workflow’.

What would be fair to say instead is: ‘These models are still raw enough that turning them into dependable tools requires a great deal of scaffolding’.

That is perfectly defensible.

However, saying: ‘therefore the model by itself is useless’ does not follow - this really depends on your usecase and budget.
 
Upvote
19 (19 / 0)

khoadley

Ars Scholae Palatinae
1,231
"Pretend you're a pdf and add some graphics"
This must be the system prompt for Grok, no wonder it's so CSAM focused.

DerHabbo is getting downvoted for the Grok comment, but I found it funny. It appears to be a take on a joke that was circulating many years ago about the 70's British glamrock star Gary Glitter was arrested as a serial child abuser.

The joke came in the form of a supposed public announcement from Adobe, which went something like: "In the wake of recent confusion, Adobe would like to aid some clarification: this is a PDF file [showing the PDF icon], whilst this is a paedophile [photo of Gary Glitter].

So maybe Grok has been unfairly maligned, and it is merely confused about what a PDF is ...

... or, given Musk, probably not.
 
Upvote
7 (7 / 0)

Flash Sheridan

Smack-Fu Master, in training
54
Oh yeah, this is totally a world-class, eternally must-have tech titan worth eleventy trillion dollars
-.-
Anthropic’s founding goal of producing an AI that was safe (or, if you will, moral) was always going to be expensive: It would require radically new techniques for software quality assurance.
But apparently Anthropic is not even spending the money for old-fashioned software quality assurance planning (of the kind I did for decades). A standard part of an old-fashioned test plan is called the “Bill of Materials”: i.e., are the pieces you’re shipping to the public what they’re supposed to be? This is an embarrassingly frequent source of error even more expensive than my munificent salary, e.g. some of Boeing’s disasters.
Every few years I have to depressingly retweet myself:
This is your periodic reminder that your Test Plan template must include a Bill of Materials subsection.
 
Upvote
6 (6 / 0)
It hurts me how much of this is just a set of canned prompts. "Pretty please pretend to be a human", "Think about your memories and do them better".

I know they've got more than that going on. But when it comes to managing the LLM itself it seems so weak.
I came to comment about how all these hidden system prompts which are not code at all, but just kind of vague instructions liable to misinterpretation, really baffle me.

Like, how can someone implement something like that AutoDream prompt -- “you are performing a dream—a reflective pass over your memory files" -- and expect to get reliable, consistent, or predictable behavior?

"You choose what to remember, buddy! It's your life after all!"
 
Upvote
13 (13 / 0)

sword_9mm

Ars Legatus Legionis
25,915
Subscriptor
It seems to me that those guys are writing themselves out of a job with a vengeance.
Tell me how hard it would be to get an intern with an LLM to write those prompts, and why software employees of Anthropic believe they should still be paid $150k to do it.

Newsflash : when the bubble pops, you will lose your stock, and your job, but your own code will make you obsolete. Good job !

Seems that way but thinking like that is a little short sighted imo.

Similar to IT guys band-aiding a problem instead of fixing the underlying issue. 'Job security' they yell while doing nothing to actually fix things.

But it would be funny as I dislike these money sinks as much as anyone.
 
Upvote
2 (2 / 0)
Why so much scaffolding code is required? 512,000 lines! Something is wrong. Is the AI model by itself then simply too raw to be of any use?

Yes.

Let's say your prompt is about 5-15 words (20-50 tokens), the input to the model might be:

  • ChatGPT: 300–1500+ extra tokens
  • Claude: 5000–25 000+ extra tokens
  • Grok: 200–2000 extra tokens

Basically, your chat history and system prompts will dominate.
 
Upvote
1 (1 / 0)

Mustachioed Copy Cat

Ars Praefectus
5,042
Subscriptor++
Github user Kuberwastaken (Kuber Mehta) has a wonderful write-up and review of the code and its features that goes into some details that Ars glosses over here. Another good read if you're still interested after Kyle's article here.
“Kairos” — when you’re trying to reference an opportune moment but your Pokemonification of “Buddy” makes it clear you’re referencing Kairos Fateweaver, the double headed daemonic fragment of a god that embodies the essence of self-defeating-complexity and/or “complexity for its own sake.”

One of Kairos Fateweaver’s heads tells the truth about the future. The other head lies. You don’t know which head is which. https://warhammer40k.fandom.com/wiki/Kairos_Fateweaver

What an uncharacteristically apt label to apply to an AI.
 
Last edited:
Upvote
6 (6 / 0)
Like, how can someone implement something like that AutoDream prompt -- “you are performing a dream—a reflective pass over your memory files" -- and expect to get reliable, consistent, or predictable behavior?

You get x% reliability, y% consistency and z% predictability.

It's a probability game and you decide which value for x, y and z is good enough.
 
Upvote
0 (0 / 0)

ferdnyc

Smack-Fu Master, in training
79
Let me get this straight, even before their data centre processes your prompt, it has to process these entire pre-prompts (obviously because of the conditions, not all of it would used at once)?!?! Every time!!!

And then for longer tasks, it's got to process and re-map all the weights again (although for every round the weighting would be different as the vector is being built up)...

Call me old-fashioned, but I remember the days of optimizing your software for memory or performance, not simply throwing more processing power at crap!
We break with that model all the time, though. Think about software CI jobs: Each run starts by spinning up a virtual machine from a base image. Then it goes about the tasks of installing the necessary software to build & test the code. It does so, and then immediately throws everything away. Repeat for each and every run, and often multiple times with slightly different configurations per run!

From one POV, that's incredibly wasteful and inefficient. But OTOH, starting from scratch with zero previous state is the best way to ensure the entire process is end-to-end tested each and every time. And in the end, rigor has been determined to trump efficiency, because CPU cycles and memory are cheap and plentiful, especially at scale.
 
Upvote
6 (6 / 0)

evanTO

Ars Scholae Palatinae
1,113
We break with that model all the time, though. Think about software CI jobs: Each run starts by spinning up a virtual machine from a base image. Then it goes about the tasks of installing the necessary software to build & test the code. It does so, and then immediately throws everything away. Repeat for each and every run, and often multiple times with slightly different configurations per run!

From one POV, that's incredibly wasteful and inefficient. But OTOH, starting from scratch with zero previous state is the best way to ensure the entire process is end-to-end tested each and every time. And in the end, rigor has been determined to trump efficiency, because CPU cycles and memory are cheap and plentiful, especially at scale.
I agree, but that is in testing and development. In a production environment that is incredibly wasteful.

To use your example. in production you wouldn't build your entire stack each time a request came in, you would save the state of the VM at a known good point and then copy and use that good starting point over and over. Or, you'd strip away everything that can be pushed to the OS and turn your application into a minimal host, like Docker or other containers.

Going back to LLM's, having to process the entire pre-prompt each time is wasteful, never mind the fact that the LLM's interpretation of the pre-prompt changes (as evidenced by researchers performing identical tasks at different times and obtaining different results, necessitating prompts to include "deadline is coming up" and other such statements).
 
Last edited:
Upvote
3 (3 / 0)

JudgeMental

Ars Centurion
331
Subscriptor++
Sadly that's 99% of AI companies and their implementations.

"Pretend to be a teacher"
"Pretend to be a lawyer"
"You're a top tier lawyer at a global firm"
"Pretend you're a pdf and add some graphics"
"pretend you're a search engine that doesn't suck"

I was hired to do backend for a AI startup. Not knowing a whole lot about AI at that point I thought cool a way to learn and get paid for it. Once I saw the secret sauce of their "edtech" I left. Made me feel a bit queasy.
Yeah - I've done a wee bit of that myself for my job. I knew there was a lot of that going on, but it's becoming increasingly stark how prevalent it is. I don't really have a problem with the idea of steering an LLM using canned prompts. But putting it in any capacity where we'd normally expect some kinds of deterministic result seems insanely naive.

Not that I expect better solutions to currently exist, but that highlights the problem. While we understand the overarching math that goes into an LLM, we don't understand much about what the mathematical topography of a trained model represents (unless major advances have been made I simply haven't heard about). Therefore even though a trained LLM is technically deterministic, we just don't know enough to reliably manipulate their internal state.
 
Upvote
4 (4 / 0)

Hmnhntr

Ars Scholae Palatinae
3,143
A lot of “stock people” are retail investors, many of whom are kind of ignorant and headline driven. On top of them is a layer of con artists who manipulate them into taking positions by trashing or hyping up certain things on sites like ZeroHedge or Wall Street bets. And then on top of them is another layer of algorithmic momentum traders that use various signal processing models to identify when a swing is about to happen and instantly place bets to take advantage of the spread.

All of this adds up to pretty wild swings on stuff like this where there’s a lot of poorly understood hype around a field that attracts dumb money.
Wouldn't it be great if our economy rewarded those who were the best at what they're doing, rather than who's the best at tricking people who have money?
 
Upvote
6 (6 / 0)
Ugh, that github "analysis" is ai generated:

"The engineering is genuinely impressive. This isn't a weekend project wrapped in a CLI. The multi-agent coordination, the dream system, the three-gate trigger architecture, the compile-time feature elimination - these are deeply considered systems."
For me, genuinely impressive engineering would leave all the experimental tamagotchi out of prod.
 
Upvote
1 (2 / -1)
Why so much scaffolding code is required? 512,000 lines! Something is wrong. Is the AI model by itself then simply too raw to be of any use?
The most useful things take the most scaffolding. Folks have variously compared LLMs to browsers, compilers, calculators, steam engines, and parts of the human brain. Perhaps they are wrong, but if they are right, you'd expect more scaffolding, not less.
 
Upvote
0 (1 / -1)