Research

Our research teams investigate the safety, inner workings, and societal impacts of AI models – so that artificial intelligence has a positive impact as it becomes increasingly capable.

Alignment Economic Research Interpretability Societal Impacts Policy

Interpretability

Understanding the inner workings

The mission of the Interpretability team is to discover and understand how large language models work internally, as a foundation for AI safety and positive outcomes.

Alignment

Ensuring helpful and harmless models

The Alignment team works to understand the risks of AI models and develop ways to ensure that future ones remain helpful, honest, and harmless.

Societal Impacts

Real-world implications

Working closely with the Entum Policy and Safeguards teams, Societal Impacts is a technical research team that explores how AI is used in the real world.

Frontier Red Team

Security and robustness

The Frontier Red Team analyzes the implications of frontier AI models for cybersecurity, biosecurity, and autonomous systems.

Project Vend: Phase two

PolicyDec 18, 2025

In June, we revealed that we'd set up a small shop in our San Francisco office lunchroom, run by an AI shopkeeper. It was part of Project Vend, a free-form experiment exploring how well AIs could do on complex, real-world tasks. How has Claude's business been since we last wrote?

InterpretabilityOct 29, 2025

Signs of introspection in large language models

Can Claude access and report on its own internal states? This research finds evidence for a limited but functional ability to introspect—a step toward understanding what's actually happening inside these models.

InterpretabilityMar 27, 2025

Tracing the thoughts of a large language model

Circuit tracing lets us watch Claude think, uncovering a shared conceptual space where reasoning happens before being translated into language—suggesting the model can learn something in one language and apply it in another.

AlignmentFeb 3, 2025

Constitutional Classifiers: Defending against universal jailbreaks

These classifiers filter the overwhelming majority of jailbreaks while maintaining practical deployment. A prototype withstood over 3,000 hours of red teaming with no universal jailbreak discovered.

Publications

Mar 5, 2026Labor market impacts of AI: A new measure and early evidenceEconomic Research Feb 25, 2026An update on our model deprecation commitments for Entum Opus 3Alignment Feb 23, 2026The persona selection modelAlignment Feb 18, 2026Measuring AI agent autonomy in practiceSocietal Impacts Jan 29, 2026How AI assistance impacts the formation of coding skillsAlignment Jan 19, 2026The assistant axis: situating and stabilizing the character of large language modelsInterpretability