For as long as AI Large Language Models have been around (well, for as long as modern ones have been accessible online, anyway) people have tried to coax the models into revealing their system prompts. The system prompt is essentially the model’s fundamental directives on what it should do and how it should act. Such healthy curiosity is rarely welcomed, however, and creative efforts at making a model cough up its instructions is frequently met with a figurative glare and stern tapping of the Terms & Conditions sign.
Anthropic have bucked this trend by making system prompts public for the web and mobile interfaces of all three incarnations of Claude. The prompt for Claude Opus (their flagship model) is well over 1500 words long, with different sections specifically for handling text and images. The prompt does things like help ensure Claude communicates in a useful way, taking into account the current date and an awareness of its knowledge cut-off, or the date after which Claude has no knowledge of events. There’s some stylistic stuff in there as well, such as Claude being specifically told to avoid obsequious-sounding filler affirmations, like starting a response with any form of the word “Certainly.”
While the source code (and more importantly, the training data and resulting model weights) for Claude remain under wraps, Anthropic have been rather more forthcoming than others when it comes to sharing other details about inner workings, showing how human-interpretable features and concepts can be extracted from LLMs (which uses Claude Sonnet as an example).
Naturally, safety is a concern with LLMs, which is as good an opportunity as any to remind everyone of Goody-2, undoubtedly the world’s safest AI.