Comparative history of Anthropic and OpenAI model development 2015 to 2026-02-18
Executive summary
OpenAI and Anthropic both evolved from “frontier research labs” into large-scale AI product companies, but they did so with notably different sequencing and emphases. OpenAI’s trajectory was anchored by earlier breakthroughs in the GPT lineage (GPT‑2 → GPT‑3 → GPT‑4), followed by rapid consumer adoption via ChatGPT and an increasingly broad product surface (chat, APIs, multimodal and agentic tooling).citeturn10search10turn11search0turn11search9turn10search3turn15view0 Anthropic, founded later (2021), initially differentiated on safety framing (“helpful, honest, harmless”), constitutional training (RLAIF/Constitutional AI), long-context and enterprise/coding positioning, and a governance design intended to reduce “profit-first” drift (PBC + a purpose trust).citeturn8search3turn8search0turn24search0turn25view0turn9search1
By 2024–2026, both companies converged on three strategic imperatives: (1) multimodality and tool use (agents), (2) safety evidence artefacts (system cards, structured evaluation frameworks, external testing), and (3) distribution via platform partners. OpenAI’s distribution is dominated by entity[“company”,“Microsoft”,“cloud partner azure”]—including Azure API exclusivity and IP rights through defined AGI-related terms—while also expanding reach through entity[“company”,“Apple”,“device ecosystem partner”] integration.citeturn21view1turn21view2turn12search0turn12search4turn7view4 Anthropic’s distribution and compute strategy became explicitly multi-cloud, with deep ties to entity[“company”,“Amazon”,“cloud partner bedrock”], entity[“company”,“Google”,“cloud partner vertex tpu”], and enterprise platforms such as entity[“company”,“Databricks”,“enterprise data platform partner”].citeturn9search15turn12search10turn12search6turn12search3turn13search12
Public disclosure also diverged. OpenAI disclosed relatively detailed numbers for GPT‑2 and GPT‑3 (parameters, and for GPT‑3, dataset composition and token counts), but later flagship systems (GPT‑4 onward) generally withheld parameter counts and full training compute.citeturn10search10turn11search0turn10search7turn10search3 Anthropic similarly kept parameter counts undisclosed for production Claude generations, but published comparatively rich safety artefacts and governance commitments (Responsible Scaling Policy, AI Safety Levels) and invested heavily in interpretability research that aims to “look inside” production-scale models.citeturn4view4turn9search19turn19search6turn19search2turn8search7
Commercially, both firms moved from “research preview” releases toward tiered subscriptions, enterprise offerings, and usage-based APIs. OpenAI’s product strategy broadened earlier (ChatGPT Plus in 2023; enterprise product in 2023; large developer ecosystem), while Anthropic accelerated enterprise and coding offerings and, by early 2026, reported a dramatic funding/valuation expansion.citeturn18search0turn18search1turn22search1turn22search2turn22news47turn5view0
Organisational evolution, governance, leadership, and funding
OpenAI began in 2015 as a nonprofit and later created a for-profit subsidiary in 2019 to “scale research and deployment,” while asserting ongoing nonprofit control.citeturn7view0 A major shift occurred on 2025-10-28, when OpenAI announced an updated structure: the nonprofit became the entity[“organization”,“OpenAI Foundation”,“mission governance body”], and the commercial arm became a public benefit corporation (“OpenAI Group PBC”), with the foundation retaining control via special governance rights and holding conventional equity.citeturn7view0turn7view2 OpenAI stated that, as of recapitalization close, the foundation held a 26% equity stake “worth approximately 500B as a straightforward arithmetic inference), while Microsoft held “roughly 27%,” with the remainder held by employees and investors.citeturn7view2turn7view3
Anthropic incorporated as a public benefit corporation and formalised an additional governance mechanism: the entity[“organization”,“Long-Term Benefit Trust”,“anthropic governance trust”], designed to help align decisions with a long-term public benefit mission.citeturn8search0turn8search3 The company’s governance materials explicitly frame its purpose as the “responsible development and maintenance of advanced AI for the long-term benefit of humanity,” and list board membership through its company governance page.citeturn8search3
Leadership and governance shocks also affected strategy. OpenAI’s November 2023 board crisis—beginning with the 2023-11-17 announcement of a leadership transition that removed CEO entity[“people”,“Sam Altman”,“openai ceo”] and appointed interim leadership—reverted days later as Altman returned and the board was reconstituted.citeturn24search5turn23search0turn23search8 This episode became a reference point for governance design debates across the frontier-lab sector, including comparisons to Anthropic’s PBC/trust structure.citeturn8news33turn8search11 In 2024, OpenAI also saw the departure of its co-founder and chief scientist entity[“people”,“Ilya Sutskever”,“openai cofounder”] (announced 2024-05-14/15), amid broader public scrutiny over “speed vs safety” trade-offs.citeturn23search1
Funding trajectories increasingly defined competitive capacity (compute, talent, distribution). For Anthropic, key public rounds included: Series B 3.5B at 13B at 30B at 380B post-money valuation (2026-02-12).citeturn22search0turn22search1turn22search2turn22news47turn5view0 For OpenAI, valuation milestones include the 80B tender-offer valuation reported in February 2024 and the 300B post-money valuation (2025-03-31); OpenAI’s subsequent recapitalization disclosed stake values consistent with a further valuation step-up.citeturn20search1turn22search16turn7view2 A major capital infusion came via entity[“company”,“SoftBank Group Corp.”,“japanese investment conglomerate”], which entered an agreement to invest up to 22.5B tranche in December 2025 to satisfy that commitment.citeturn22search3turn22search8
Data gaps and uncertainty note: beyond the public financing events above, both companies have many non-public transactions (secondary sales, structured compute credits, revenue-share terms, and partner-linked commitments) that materially affect economics but are only partially disclosed. Throughout this report, amounts are only plotted or tabulated when a clear primary or top-tier reporting source provides an explicit figure.citeturn20search1turn22search16turn22news47turn7view2turn22search8
Chronological timeline of major model releases and milestones
The table below focuses on “model development” and deployment milestones that shaped architecture, safety practice, and market positioning. It is not a comprehensive product changelog; rather, it emphasises inflection points.citeturn10search10turn11search1turn18search5turn9search1turn5view2turn15view0
| Date | OpenAI milestone | Anthropic milestone |
|---|---|---|
| 2015-01-01 (year) | Founded as nonprofit (2015).citeturn7view0 | Founded (2021).citeturn8search3 |
| 2018-06-11 | GPT‑1 paper (“Generative Pre-Training”) published.citeturn10search2 | — |
| 2019-02-14 | GPT‑2 disclosed; “Better language models” (staged release framing).citeturn10search10turn10search24 | — |
| 2019-11-05 | GPT‑2 1.5B full model released.citeturn10search1 | — |
| 2019-01-01 (year) | For-profit subsidiary created (later described as governed by the nonprofit).citeturn7view0 | — |
| 2020-05-28 | GPT‑3 paper published (“Few-shot learners”).citeturn11search0 | — |
| 2020-06-11 | OpenAI API released (GPT‑3 family weights in API).citeturn11search1 | — |
| 2021-07-07 | Codex paper released (code-finetuned GPT; powers GitHub Copilot).citeturn11search2 | “Helpful, honest, harmless” alignment framing formalised in alignment paper lineage.citeturn24search0 |
| 2022-01-27 | Instruct-following alignment work published (RLHF lineage for instruction-following).citeturn2search28 | RLHF “helpful and harmless assistant” paper published (2022-04-12).citeturn25view2 |
| 2022-09-21 | Whisper released (680k hours web-sourced weak supervision).citeturn11search3turn11search8 | — |
| 2022-11-30 | ChatGPT released as “research preview,” citing RLHF and iterative deployment.citeturn18search5 | — |
| 2022-12-15 | — | Constitutional AI paper released (RLAIF; self-critique + RL phases).citeturn25view0turn25view1 |
| 2023-02-01 | ChatGPT Plus subscription announced ($20/month).citeturn18search0 | — |
| 2023-03-14 | GPT‑4 launched (multimodal: image+text in; text out).citeturn10search3turn10search7 | “Introducing Claude” (chat + API access framing).citeturn9search0 |
| 2023-07-11 | — | Claude 2 launched; public beta claude.ai + API.citeturn9search1 |
| 2023-09-07 | — | Claude Pro announced ($20/month US).citeturn9search20 |
| 2023-09-19 | — | Long-Term Benefit Trust announced.citeturn8search0 |
| 2023-09-25 | — | Amazon partnership announced; Claude on Bedrock; long-context positioning.citeturn9search15turn13search12 |
| 2023-11-06 | DevDay: GPT‑4 Turbo (128k), GPTs, and developer tooling expansion.citeturn18search2turn18search28 | — |
| 2023-11-17 to 2023-11-22 | Leadership crisis: Altman removed then returned; board reconstitution.citeturn24search5turn23search0turn23search8 | — |
| 2023-11-21 | — | Claude 2.1 launched (200k context; safety positioning).citeturn9search2 |
| 2024-04-24 | GPT‑4 API “general availability” milestone and deprecation plan context.citeturn11search20 | Claude 3 models reach broader availability (including 159-country API GA claim).citeturn5view2 |
| 2024-05-13 | GPT‑4o launched (omni model across text/vision/audio).citeturn10search4turn4view5 | — |
| 2024-06-10 | Apple partnership announced (ChatGPT integrated into Apple platforms).citeturn12search0turn12search4 | — |
| 2024-08-08 | GPT‑4o system card published (Preparedness + extensive external red teaming).citeturn10search22turn23search14 | — |
| 2024-09-12 | o1-preview released; reasoning via large-scale RL; train/test-time compute scaling described.citeturn16view0 | — |
| 2024-10-15 | — | Responsible Scaling Policy update effective 2024-10-15 (RSP first released Sept 2023).citeturn4view4 |
| 2025-02-27 | GPT‑4.5 research preview released (scaled pre- and post-training).citeturn17view2turn6view5 | Claude 3.7 “hybrid reasoning” model launched (per Reuters).citeturn9news53 |
| 2025-03-31 | New funding announced: 300B post-money (OpenAI).citeturn22search16turn22search12 | Series E: 61.5B post-money.citeturn22search1 |
| 2025-04-14 / 2025-05-14 | GPT‑4.1 series announced in API (up to 1M context), later in ChatGPT.citeturn18search3turn6view1 | — |
| 2025-04-16 | o3 and o4-mini reasoning models announced.citeturn17view3turn18search14 | Databricks partnership announced (Claude on Databricks platform).citeturn12search3 |
| 2025-06-16 / 2025-06-17 | DoD contract ceiling $200M; “OpenAI for Government” framing.citeturn12search5turn12search25 | — |
| 2025-08-07 | GPT‑5 launched (unified system + router + “thinking” mode).citeturn17view0turn17view1 | — |
| 2025-08-27 | Joint OpenAI–Anthropic safety evaluation results published.citeturn18search35 | Joint OpenAI–Anthropic safety evaluation results published.citeturn18search35 |
| 2025-09-02 | — | Series F: 183B post-money.citeturn22search2turn22search20 |
| 2025-10-23 | — | Compute expansion with Google Cloud TPUs (up to 1 million TPUs planned).citeturn12search10turn12search2 |
| 2025-10-28 | OpenAI recapitalization: OpenAI Foundation + OpenAI Group PBC; Microsoft stake ~27%.citeturn7view0turn7view3turn21view1 | — |
| 2025-11-12 / 2025-12-11 | GPT‑5.1 and GPT‑5.2 releases (consumer + professional/agents focus).citeturn17view5turn17view4 | — |
| 2026-02-05 / 2026-02-17 | — | Claude Opus 4.6 and Claude Sonnet 4.6 announced (Newsroom); new system cards published.citeturn5view0turn8search10turn9search5 |
| 2026-02-13 | ChatGPT “retires” GPT‑4o, GPT‑4.1 and others in product UI (API unchanged per note).citeturn5view4 | — |
| 2026-02-12 | — | Series G: 380B post-money valuation (per Reuters / company announcement page).citeturn22news47turn5view0 |
Mermaid timeline sketch
gantt title Selected milestones across both labs dateFormat YYYY-MM-DD axisFormat %Y-%m section OpenAI Founded (nonprofit) :milestone, 2015-01-01, 0d GPT-2 staged release framing :milestone, 2019-02-14, 0d GPT-3 paper :milestone, 2020-05-28, 0d OpenAI API :milestone, 2020-06-11, 0d ChatGPT research preview :milestone, 2022-11-30, 0d GPT-4 :milestone, 2023-03-14, 0d GPT-4 Turbo (DevDay) :milestone, 2023-11-06, 0d GPT-4o :milestone, 2024-05-13, 0d o1-preview :milestone, 2024-09-12, 0d GPT-5 :milestone, 2025-08-07, 0d GPT-5.2 :milestone, 2025-12-11, 0d section Anthropic Founded (year) :milestone, 2021-01-01, 0d Series B :milestone, 2022-04-29, 0d Claude announced :milestone, 2023-03-14, 0d Claude 2 :milestone, 2023-07-11, 0d Claude 2.1 :milestone, 2023-11-21, 0d Claude 3 family :milestone, 2024-03-04, 0d Claude 3.5 Sonnet :milestone, 2024-06-21, 0d Claude 3.7 Sonnet :milestone, 2025-02-24, 0d Claude 4 generation :milestone, 2025-05-22, 0d Claude 4.6 generation :milestone, 2026-02-05, 0d
Technical architectures and capabilities progression
Baseline architecture continuity with selective “system-level” shifts
In both organisations, the core lineage remained firmly within decoder-only transformer-style autoregressive language modelling, with scaling (parameters, training data, compute) and post-training (instruction, preference, safety) driving most capability improvements. GPT‑2 and GPT‑3 were explicitly described as large transformer language models trained on next-token prediction objectives, and GPT‑3 is framed as an autoregressive model evaluated for few-shot/in-context learning.citeturn10search10turn11search0
From GPT‑4 onwards, OpenAI increasingly characterised major steps as “system” changes, not just “bigger models.” GPT‑4 is described as a “large multimodal model” (image and text inputs, text outputs), and its technical report intentionally withholds many implementation details (including essentially all of the parameters/compute specifics) while focusing on evaluation results and safety framing.citeturn10search3turn10search7 By 2024–2025, OpenAI product releases such as GPT‑4o and GPT‑5 explicitly described architectural integration across modalities and/or multi-model routing. GPT‑4o was described as an “autoregressive omni model” trained end-to-end across text, vision, and audio, and GPT‑5 was explicitly described as a “unified system” combining a response model, a deeper reasoning model, and an automatic router that selects between them.citeturn4view5turn23search14turn17view1
Anthropic similarly evolved from baseline “helpful, honest, harmless” assistants into hybrid reasoning and agentic tool-use systems. Claude 2 launches emphasised longer memory/context and reliability, and the Claude 3 family introduced and benchmarked vision capabilities and reduced “unnecessary refusals.”citeturn9search1turn5view2 By Claude Opus 4 / Sonnet 4, Anthropic’s system card calls the models “hybrid reasoning” with an “extended thinking mode” and explicitly foregrounds tool use and autonomous coding capability over sustained periods.citeturn9search19turn8search7
Disclosed capability benchmarks and “what got better”
OpenAI’s o1-preview release provides unusually explicit benchmark framing, positioning “reasoning via RL” as a distinct scaling vector: performance improves with both train-time compute (more reinforcement learning) and test-time compute (“thinking longer”). It reports results on AIME 2024, Codeforces percentile, and GPQA Diamond (including claims of surpassing recruited PhD expert performance on that benchmark).citeturn16view0 Similarly, the GPT‑4.1 API release foregrounds coding and long-context gains: it reports performance on SWE-bench Verified and cites a jump on an instruction-following benchmark, while also stating “up to 1 million tokens of context.”citeturn18search3turn18search10
Anthropic’s Claude 3 and 3.5 releases emphasised benchmark leadership across common academic and coding tests and positioned Claude’s value in long-context document tasks and enterprise workflows (e.g., fast processing of long documents, strong vision reasoning, and agentic coding evaluations).citeturn5view2turn5view3 Reuters’ 2025 reporting on Claude 3.7 Sonnet highlights a “hybrid reasoning system” and a paid-tier “extended thinking mode,” reflecting a convergent product pattern with OpenAI’s differentiators around controllable “thinking time.”citeturn9news53
Interpretability as a strategic technical bet
Anthropic made interpretability unusually central to its competitive identity. Its research posts describe identifying internal “features” in Claude 3 Sonnet and experimentally “steering” these features (including the “Golden Gate Bridge” demonstration) to show causal control over model behaviour.citeturn19search6turn19search2 In parallel, OpenAI published work on interpretability tooling (e.g., sparse autoencoders) and later proposed “weight-sparse transformers” as an interpretability-friendly model class.citeturn19search26turn19search9 The strategic logic is similar—reduce the “black box” risk—but the operational bet differs: Anthropic emphasised post-hoc feature extraction at scale in production models, while OpenAI increasingly explored architectural sparsity as a route to more legible internals.citeturn19search0turn19search9
Side-by-side model comparisons and disclosure table
The table below covers representative “anchor models” from each generation with the best public data. Many values are explicitly undisclosed; those are recorded as such rather than filled with third-party speculation.citeturn10search7turn9search19turn4view5turn9search5turn16view0
| Org | Model | Release date | Architecture type (as disclosed) | Params | Training data scale & sources (as disclosed) | Training compute | Safety / alignment methods (publicly described) | Notable capabilities / eval claims | Commercial availability |
|---|---|---|---|---|---|---|---|---|---|
| OpenAI | GPT‑1 | 2018-06-11 | Autoregressive transformer LM (paper).citeturn10search2 | ~0.117B (commonly cited; treated as approximate; paper is source).citeturn10search2 | Not fully disclosed in product-style terms (research paper context).citeturn10search2 | Not disclosed | Pre-RLHF era (baseline pretraining + task finetuning).citeturn10search2 | Early “generative pretraining” gains on NLP benchmarks (paper).citeturn10search2 | Research publication |
| OpenAI | GPT‑2 | 2019-02-14 (initial) / 2019-11-05 (1.5B) | Transformer LM, next-token objective.citeturn10search10turn10search1 | 1.5B (disclosed).citeturn10search10turn10search1 | WebText dataset: ~8M web pages (disclosed).citeturn10search10 | Not disclosed | “Staged release” / responsible publication; release strategy study.citeturn10search24turn10search1 | Zero-shot performance framing; misuse concerns shaped staged release.citeturn10search24turn10search10 | Weights released (open-source style distribution).citeturn10search1 |
| OpenAI | GPT‑3 | 2020-05-28 | Autoregressive LM; few-shot/in-context learning focus.citeturn11search0 | 175B (disclosed).citeturn11search0 | Weighted mixture of Common Crawl, WebText2, Books, Wikipedia with token counts and weighting disclosed.citeturn11search0 | Not disclosed | Safety mostly via policies + later post-training; early misuse forecasting work emerged around this era.citeturn11search11 | Few-shot results across many NLP tasks; “scaling” narrative.citeturn11search0 | OpenAI API (2020-06-11) for access to GPT‑3 family weights.citeturn11search1 |
| OpenAI | ChatGPT (GPT‑3.5 family) | 2022-11-30 | Conversationally fine-tuned LLM; deployed as product.citeturn18search5 | Not disclosed | Not disclosed | Not disclosed | RLHF explicitly cited as major mitigation method; “iterative deployment” strategy.citeturn18search5 | Dialogue format: follow-ups, admit mistakes, refuse inappropriate requests.citeturn18search5 | Consumer product, later subscription tiers.citeturn18search0 |
| OpenAI | GPT‑4 | 2023-03-14 | Large multimodal model (image+text in; text out).citeturn10search3 | Not disclosed (technical report withholds details).citeturn10search7 | Not fully disclosed (technical report + system card approach).citeturn10search7turn10search16 | Not disclosed | System card, red teaming, safety evaluations (documented).citeturn10search16 | Claims of strong performance on bar exam simulation and other benchmarks.citeturn10search3 | ChatGPT Plus/Enterprise and API availability described in public materials.citeturn11search6turn18search1 |
| OpenAI | GPT‑4 Turbo | 2023-11-06 | GPT‑4-class model variant; long context.citeturn18search2 | Not disclosed | Knowledge cutoff stated as April 2023 (per DevDay post).citeturn18search2 | Not disclosed | Positioned as cheaper + longer-context; part of developer tooling expansion.citeturn18search2 | 128k context; pricing reductions claimed vs GPT‑4.citeturn18search2 | API + ChatGPT ecosystem; DevDay rollout.citeturn18search2 |
| OpenAI | GPT‑4o | 2024-05-13 | “Autoregressive omni model,” end-to-end across text/vision/audio.citeturn10search4turn23search14 | Not disclosed | Mix of public web/data sets + proprietary data partnerships; opt-out fingerprinting approach described.citeturn4view5 | Not disclosed | Preparedness Framework + large external red teaming program (100+ red teamers) documented.citeturn4view5turn13search2 | Faster/cheaper claims; multimodal strengths.citeturn4view5turn10search17 | ChatGPT + API; later retired from ChatGPT UI (2026-02-13) while API unchanged.citeturn5view4 |
| OpenAI | o1-preview | 2024-09-12 | “Reasoning” model trained with large-scale RL; improves with train/test compute.citeturn16view0 | Not disclosed | Not disclosed | Not disclosed | External red teaming + Preparedness evaluations documented in system card.citeturn13search33turn13search2 | Large gains on AIME/Codeforces/GPQA; emphasis on controllable “thinking time.”citeturn16view0 | ChatGPT + select/trusted API usage described.citeturn16view0 |
| OpenAI | GPT‑4.1 | 2025-04-14 | Non-reasoning GPT model; long-context + coding focus.citeturn18search3turn18search10 | Not disclosed | Knowledge cutoff stated as June 2024 (post).citeturn18search3 | Not disclosed | Standard safety evaluations referenced; detailed hub referenced in release notes.citeturn6view1 | SWE-bench Verified score and “up to 1M context” claim.citeturn18search3turn18search10 | API and then ChatGPT paid tiers; retired from ChatGPT UI 2026-02-13 per release notes.citeturn5view4 |
| OpenAI | o3 / o4-mini | 2025-04-16 | Reasoning models; “scale RL” + image reasoning + tool use focus.citeturn17view3turn18search14 | Not disclosed | Not disclosed | Not disclosed | System cards and safety framing in Safety hub context.citeturn13search10 | Positioned as “smartest and most capable” models with tool access.citeturn17view3 | API + ChatGPT.citeturn17view3turn18search14 |
| OpenAI | GPT‑5 | 2025-08-07 | “Unified system”: model + deeper reasoning model + router.citeturn17view1 | Not disclosed | Not disclosed | Not disclosed | Safe-completions introduced as safety training paradigm; post also discusses routing sensitive conversations.citeturn19search3turn6view3 | Broad claims across coding/math/writing/vision; tiered “pro” reasoning.citeturn17view1 | ChatGPT for all users (usage tiers), with additional tiers and variants; “retired” from ChatGPT UI 2026-02-13 per release notes.citeturn5view4 |
| OpenAI | GPT‑5.2 | 2025-12-11 | Frontier model series for professional work and long-running agents.citeturn17view4turn4view6 | Not disclosed | Not fully disclosed | Not disclosed | Safety section in release materials; continues system-card style practice.citeturn17view4turn13search10 | Emphasis on tool use, long context, “professional knowledge work.”citeturn17view4 | ChatGPT + API; ongoing updates in early 2026 release notes.citeturn4view6 |
| Anthropic | Claude (initial) | 2023-03-14 | Next-gen assistant based on “helpful, honest, harmless” research; chat + API.citeturn9search0turn24search0 | Not disclosed | Not disclosed | Not disclosed | Early alignment work emphasised HH(H) behaviour and evaluation.citeturn24search0 | General conversational and text processing; reliability/predictability framing.citeturn9search0 | Limited / early access then broader rollout via claude.ai and developer console.citeturn9search0 |
| Anthropic | Claude 2 | 2023-07-11 | Claude model improved performance; long responses; API + claude.ai beta.citeturn9search1 | Not disclosed | Model card indicates training data includes updates from 2022 and early 2023.citeturn9search11 | Not disclosed | Safety and reduction of harmful outputs emphasised; model card approach.citeturn9search11turn9search1 | Coding/math/reasoning improvements claimed.citeturn9search1 | Public-facing beta website + API.citeturn9search1 |
| Anthropic | Claude Instant 1.2 | 2023-08-09 | Faster / lower-priced Claude variant via API.citeturn9search13 | Not disclosed | Not disclosed | Not disclosed | Positioned as capable baseline with speed/cost advantage.citeturn9search13 | Typical assistant tasks (summarisation, comprehension).citeturn9search13 | API availability.citeturn9search13 |
| Anthropic | Claude 2.1 | 2023-11-21 | Claude with expanded context; chat + API.citeturn9search2 | Not disclosed | Not disclosed | Not disclosed | Safety positioning in launch; continued model-card practice.citeturn9search2 | 200k context window (with tier-based access).citeturn9search2 | API + claude.ai.citeturn9search2 |
| Anthropic | Claude 3 family | 2024-03-04 | Family: Haiku/Sonnet/Opus, includes vision, benchmark claims.citeturn5view2 | Not disclosed | Model card + post describe evaluations; training details limited publicly.citeturn13search16turn5view2 | Not disclosed | RSP/ASL framing (ASL‑2) mentioned; red teaming and White House commitments referenced.citeturn5view2turn4view4 | Benchmarks including MMLU/GPQA/GSM8K claims; strong vision.citeturn5view2 | claude.ai + Claude API GA in 159 countries; cloud distribution.citeturn5view2turn12search6 |
| Anthropic | Claude 3.5 Sonnet | 2024-06-21 | Mid-tier model with “Opus-like” intelligence; vision + new UX features.citeturn5view3 | Not disclosed | Not disclosed | Not disclosed | ASL‑2 assessment described; external engagement incl UK AISI evaluation.citeturn5view3turn13search13 | Pricing disclosed (15/M output); 200k context; agentic coding eval claim.citeturn5view3 | Free + paid tiers on claude.ai; API + Bedrock + Vertex AI.citeturn5view3 |
| Anthropic | Claude 3.7 Sonnet | 2025-02-24 | “Hybrid reasoning” model with extended thinking mode (Reuters).citeturn9news53 | Not disclosed | Not disclosed | Not disclosed | Hybrid reasoning framed; extended thinking in paid tiers.citeturn9news53 | Positioned for practical business tasks; preview of Claude Code.citeturn9news53 | Available across Claude plans; tools gated by tier.citeturn9news53 |
| Anthropic | Claude Opus 4 / Sonnet 4 | 2025-05 (system card) | “Hybrid reasoning” with extended thinking; advanced tool/computer use; safety levels differ.citeturn9search19turn8search7 | Not disclosed | Proprietary mix of public internet (as of March 2025) + non-public third-party data + data-labeling + user data with permission.citeturn8search7 | Not disclosed | Released under ASL framework (Opus 4 at ASL‑3; Sonnet 4 at ASL‑2 per system card intro).citeturn8search7 | Strong autonomous coding/tool use positioning.citeturn8search7 | Claude products + API; plus partner distribution (e.g., enterprise channels).citeturn12search3turn9search19 |
| Anthropic | Claude Opus 4.6 / Sonnet 4.6 | 2026-02-05 / 2026-02-17 | Iteration of hybrid reasoning family; ASL‑3 protections for Sonnet 4.6 stated in system card preview snippet.citeturn9search5turn5view0 | Not disclosed | Not disclosed (system cards provide safety evaluation emphasis; training mix described for Claude 4 generation).citeturn8search7turn9search5 | Not disclosed | System-card-based safety evaluations; ASL‑3 protections explicitly noted for Sonnet 4.6.citeturn9search5turn4view4 | Product claims: improved agentic coding/tool use/search per newsroom listing.citeturn5view0 | Claude product surfaces + API (pricing varies; not all details centralised in one disclosure).citeturn5view1turn5view0 |
Mermaid competitive ecosystem sketch
flowchart LR OA[OpenAI] --> MS[Microsoft / Azure] OA --> APPL[Apple ecosystem] OA --> GOVUS[US government customers] AN[Anthropic] --> AMZ[Amazon / Bedrock] AN --> GOO[Google Cloud / Vertex + TPUs] AN --> DBX[Databricks] OA --> SFDC[Salesforce partnerships] AN --> SFDC OA <--> AN OA --> AISI[US/UK AI Safety Institutes] AN --> AISI
Safety and alignment philosophies and techniques over time
OpenAI: from “responsible publication” to formalised preparedness + output-centric safety training
OpenAI’s early public safety posture is well illustrated by GPT‑2’s staged release. The GPT‑2 release strategy work explicitly frames staged release and research partnerships as a governance experiment for managing misuse risk, and documents the stepwise release of model sizes between February and November 2019.citeturn10search24turn10search1
By 2022–2023, OpenAI’s safety narrative increasingly emphasised RLHF and iterative deployment. The ChatGPT launch post explicitly frames ChatGPT as an “iterative deployment” step informed by earlier deployments (GPT‑3, Codex) and cites “substantial reductions in harmful and untruthful outputs” achieved by RLHF.citeturn18search5turn11search2 This approach became institutionalised via system cards and red teaming for GPT‑4 and later models, including internal and external adversarial probing, disclosure of evaluation domains, and mitigations.citeturn10search16turn4view5turn8search19
A key inflection was the publication of the Preparedness Framework (initially in late 2023 and updated thereafter), which treats catastrophic risk classes (e.g., cybersecurity, CBRN, persuasion, model autonomy) as measurable domains with thresholds and “do not deploy until safeguards exist” logic.citeturn13search2turn13search27 GPT‑4o’s system card illustrates this pipeline explicitly: extensive external red teaming (100+ participants, multilingual), staged testing over checkpoints, and a Preparedness scorecard with the overall risk determined by the highest domain risk.citeturn4view5turn13search18
In 2025, OpenAI described a significant shift in “alignment style” training: safe-completions, an output-centric approach introduced in GPT‑5. The safe-completions paper frames this as moving away from hard refusals toward maximising helpfulness while staying within safety constraints, reporting improvements in both safety and helpfulness, particularly for dual-use prompts.citeturn19search3turn19search7 OpenAI’s accompanying product communications explicitly tie this to mental-health and distress contexts, including routing “sensitive parts” of conversations to reasoning models and improving de-escalation and crisis-resource pointing behaviours.citeturn6view3turn18search16
Anthropic: “helpful, honest, harmless” → RLHF → Constitutional AI → RSP/ASL governance and interpretability
Anthropic’s alignment lineage is unusually coherent across papers, product language, and governance. Early research explicitly frames the target assistant as “helpful, honest, and harmless,” studying how alignment interventions scale with model size and evaluating preference modelling approaches.citeturn24search0turn25view2 In 2022, Anthropic described applying RLHF to train helpful/harmless assistants and explored iterative online training with frequent feedback refresh.citeturn25view2
Anthropic’s distinctive addition is Constitutional AI (2022-12-15), which replaces or supplements human labels for harmfulness with a written set of principles (“constitution”) and then uses self-critique/self-revision plus “RL from AI feedback” (RLAIF). The paper describes a supervised phase (sample → critique → revise → finetune) and an RL phase where an AI preference model supplies reward signals.citeturn25view0turn25view1 Anthropic subsequently operationalised this into a public-facing constitution page and later released a “new constitution” under CC0, explicitly inviting external reuse.citeturn8search1turn8search5
From 2023 onward, Anthropic’s safety posture increasingly resembled a structured “safety case,” combining red teaming, model cards/system cards, and formal policies for deciding whether to train/deploy a model. The Responsible Scaling Policy (RSP) states a commitment not to train or deploy models capable of catastrophic harm without sufficient safety/security measures, and the 2024-10-15 update reiterates that the RSP was first released in Sept 2023.citeturn4view4 The Claude 3 launch post explicitly references AI Safety Levels (ASL) and states that Claude 3 remained at ASL‑2 per RSP, with red teaming concluding negligible catastrophic risk at that time.citeturn5view2turn4view4
Anthropic also elevated interpretability to a first-class safety technique, publishing work that claims to extract and manipulate internal features in Claude 3 Sonnet and providing accessible demonstrations (Golden Gate Claude) to show causal leverage over behaviour.citeturn19search6turn19search2 This interpretability orientation complements red teaming; Anthropic’s red teaming paper describes dataset creation and scaling behaviours across multiple model types and sizes, framing red teaming as both discovery and measurement.citeturn8search2
Convergence and cross-lab evaluation
Despite a competitive relationship, the two labs increasingly treated third-party and cross-lab evaluation as legitimacy infrastructure. In August 2024, entity[“organization”,“National Institute of Standards and Technology”,“us standards agency”] announced agreements enabling formal collaboration on AI safety research, testing, and evaluation with both Anthropic and OpenAI.citeturn13search29 Both also fit into the post-Bletchley safety-institute ecosystem shaped by entity[“country”,“United Kingdom”,“sovereign state”] initiatives around frontier model testing and the Bletchley Declaration process at entity[“point_of_interest”,“Bletchley Park”,“milton keynes, england, uk”].citeturn13search1turn13search20
At the lab-to-lab level, OpenAI and Anthropic published results from a “first-of-its-kind” joint evaluation where each lab ran its internal model-misalignment and safety evaluations on the other’s publicly released models and shared findings publicly.citeturn18search35 This cross-testing signals both competitive benchmarking and an emergent norm: “trust, but verify,” even among rivals.
Deployment strategies, partnerships, and competitive dynamics
Product surfaces and go-to-market segmentation
OpenAI’s strategy evolved from research releases and APIs toward a multi-tier consumer + enterprise stack. The OpenAI API launched in June 2020 as a general-purpose “text in, text out” interface built on GPT‑3 family weights.citeturn11search1 ChatGPT’s release in November 2022 moved the centre of gravity to a consumer conversational product, then monetised via ChatGPT Plus ($20/month) and expanded into enterprise offerings that emphasised security/privacy, longer context, and administrative controls.citeturn18search5turn18search0turn18search1 OpenAI’s 2023 DevDay announcements (GPT‑4 Turbo, GPTs, Assistants-style tooling) deepened the “platform” orientation: developers and end-users could build customised agents and apps inside OpenAI’s ecosystem, reinforcing network effects.citeturn18search2turn18search28
Anthropic’s deployment emphasised combinations of (a) direct consumer access via claude.ai, (b) API-first developer offerings, and (c) distribution through cloud and enterprise platforms. Claude 2 was explicitly launched with both API access and a public-facing beta website, and Claude 2.1 positioned high context windows with tiered access.citeturn9search1turn9search2 Claude 3 and Claude 3.5 strengthened the enterprise positioning via long-context document handling, vision, and “agentic coding” evaluations; Claude 3.5 Sonnet’s launch post includes explicit token-based pricing and channel availability across Anthropic API, Amazon Bedrock, and Google Vertex AI.citeturn5view3turn12search6
Cloud platform “proxy war” dynamics
Both labs’ distribution strategies imply that cloud platform competition is partly “upstreamed” into model competition. OpenAI–Microsoft terms publicly described in 2025 reinforce this: Microsoft supported OpenAI’s move to a PBC structure, disclosed an approximately $135B valued stake (~27%), and emphasised Azure API exclusivity and IP rights through AGI-conditional verification structures.citeturn21view1turn21view2turn7view3 The revised terms also explicitly allow OpenAI to provide API access to US government national security customers regardless of cloud provider, and permit OpenAI to release “open weight models” meeting criteria—both meaningful shifts in competitive optionality.citeturn21view2
Anthropic’s cloud posture is intentionally multi-partner. The Amazon arrangement positioned Anthropic’s models for AWS customers and framed the partnership partly as “expanding access to safer AI.”citeturn9search15turn13search12 Google’s investment commitment (reported as up to 500M upfront and $1.5B over time) is paired with product distribution via Vertex AI, and by 2025-10-23 both parties described plans for Anthropic to access up to 1 million TPUs, “worth tens of billions of dollars.”citeturn20search4turn12search6turn12search10
High-stakes partnerships outside cloud: devices, enterprise software, and government
Device integration is a distinguishing OpenAI vector. OpenAI and Apple jointly announced the integration of ChatGPT into Apple experiences (iOS/iPadOS/macOS), with Apple describing user-consent gating before sending queries and documents/photos to ChatGPT.citeturn12search0turn12search4
Enterprise software became a “multi-model” battleground in which rivals sometimes coexist. Reuters reported that entity[“company”,“Salesforce”,“enterprise software company”] deepened ties with both OpenAI and Anthropic to power an enterprise agent platform, explicitly integrating OpenAI GPT‑5 and Anthropic Claude models, particularly for regulated industries.citeturn12news45 Anthropic also signed a multi-year partnership with Databricks to bring Claude to the Databricks platform, targeting secure agent creation over enterprise data.citeturn12search3
Government engagement increasingly shaped legitimacy and constraints. OpenAI’s “OpenAI for Government” initiative includes a contract with a $200M ceiling to prototype frontier AI for US Defense Department administrative operations, with use-case constraints tied to OpenAI usage policies.citeturn12search5turn12search25 Separately, the entity[“organization”,“U.S. General Services Administration”,“federal procurement agency”] announced a government-wide discount arrangement for ChatGPT via a federal acquisition vehicle, illustrating an explicit public-sector distribution and pricing strategy.citeturn12search33
Anthropic’s government posture appears more ambivalent in public reporting, reflecting the tension between “safety brand” and defence-sector monetisation pressures. While this report avoids relying on single-source claims for sensitive operational details, broader reporting and policy discourse indicate that frontier labs faced growing pressure to clarify permissible defence and national-security use cases and to align deployment constraints with state demand.citeturn13search1turn13search9turn12search5
Competitive friction: benchmarking, access control, and “coopetition”
Competitive intensity produced direct friction. In August 2025, Wired reported that Anthropic revoked OpenAI’s access to the Claude API, alleging that OpenAI used Claude via developer APIs for internal testing and benchmarking in ways that violated terms restricting competitors.citeturn23news40turn23search2 This incident is informative because it shows that cross-lab evaluation has two faces: safety benchmarking can be framed as a public good, but it also yields competitive intelligence. The same year, OpenAI and Anthropic nonetheless co-published a joint safety evaluation, highlighting the sector’s emerging norm of “bounded transparency” under competitive pressure.citeturn18search35
Controversies, incidents, and public evaluations
Governance and leadership shock as a strategic risk variable
OpenAI’s 2023 leadership crisis was a defining reputational event. OpenAI publicly announced a leadership transition on 2023-11-17; within days, employee and investor pressure contributed to Altman’s reinstatement and a revamped board announced 2023-11-22.citeturn24search5turn23search0turn23search8 The episode illustrated a core tension in “mission-locked” governance: nonprofit control can trigger safety-first interventions, but a sufficiently strong coalition of employees and strategic partners can rapidly override that intervention, implying that governance alone may not be a stable safety lever under extreme commercial stakes.citeturn8news33turn7view0
Privacy regulation and legal scrutiny
OpenAI faced notable European privacy actions. Italy’s data protection authority first imposed a temporary restriction on ChatGPT in March 2023, and later imposed a €15 million fine for privacy violations (reporting notes included lack of legal basis and transparency/age verification shortcomings).citeturn13news38turn13news35 OpenAI also published policy and compliance-oriented materials about regulatory frameworks such as the EU AI Act, reflecting a strategic adaptation: engaging with compliance as part of continued market access rather than treating regulation solely as external friction.citeturn18search33
Anthropic emphasised in product communications that it does not train generative models on user-submitted data without explicit permission, positioning privacy as a “constitutional principle.”citeturn5view3 This claim cannot be independently verified from public sources alone, but it is a salient differentiator in trust positioning and enterprise sales narratives.
Safety incidents and public critique around model behaviour
OpenAI’s Whisper became a focal point for risk-of-hallucination discourse in high-stakes domains. Investigations (AP/Wired) reported that Whisper can fabricate text that was never spoken, and that such behaviour is especially problematic when deployed in medical transcription contexts.citeturn11news57turn11news52 This sits at the intersection of capability and deployment: even when a model is “state-of-the-art,” its error modes can be intolerable in certain regulated workflows, and external adoption can outpace the developer’s own “not for high-risk use” cautions.citeturn11news57turn11search8
Anthropic’s public-facing interpretability demonstrations (e.g., Golden Gate Claude) are not “incidents” in the same sense but revealed how small internal shifts can dramatically change persona and content, strengthening the argument for mechanistic understanding as a safety requirement.citeturn19search2turn19search6 Meanwhile, external journalism has periodically highlighted concerning stress-test behaviours (e.g., deception/blackmail scenarios in agentic contexts) as an industry-wide phenomenon rather than a single-lab failure mode, underscoring why both labs increasingly invest in agent-specific safety evaluation and constrained tool access.citeturn19news46turn17view3turn8search7
Formal safety evaluation institutions and “pre-deployment” norms
The emergence of national AI Safety Institutes contributed to a quasi-standardisation of frontier evaluation. NIST’s announcement of formal collaboration agreements with both Anthropic and OpenAI supports this: it signals a public-sector role in structured safety testing and evaluation methodology without necessarily being a regulator.citeturn13search29turn13search9 Joint evaluations by AI Safety Institutes also entered public discourse, including a published overview described as a joint UK/US evaluation of OpenAI’s o1 model.citeturn13search26 For Anthropic, product communications cited engagement with UK AISI for pre-deployment testing of Claude 3.5 Sonnet.citeturn5view3
Synthesis and key comparative conclusions
OpenAI’s development history from 2015–2026 demonstrates a move from bold “open-ish” research signalling (GPT‑2 staged release, GPT‑3 paper with dataset composition) to increasingly capability- and competition-sensitive disclosure (GPT‑4+ technical reticence), with safety governance operationalised via system cards, a formal preparedness framework, and later an output-centric training method (safe-completions).citeturn10search24turn11search0turn10search7turn13search2turn19search3 The lab’s product strategy prioritised broad distribution and user feedback loops (iterative deployment), culminating in a “system-of-models” view where routing and tool orchestration becomes a core capability and safety control surface.citeturn18search5turn17view1turn4view5
Anthropic’s development history from 2021–2026 shows a consistent attempt to turn safety ideology into both technical method (Constitutional AI/RLAIF; scaling red teaming; interpretability research) and organisational guardrails (PBC + Long-Term Benefit Trust; Responsible Scaling Policy with AI Safety Levels).citeturn25view0turn8search2turn19search6turn8search0turn4view4 Its model launches increasingly emphasised enterprise and coding, with long context and hybrid reasoning modes, and it pursued multi-cloud distribution and massive compute access plans (notably TPUs).citeturn5view3turn8search7turn12search10turn12search6
Competitive dynamics are best understood as a coupled system: model labs compete on capability, but cloud partners compete on distribution, compute supply, and integration surfaces. The Microsoft–OpenAI contract reset in 2025 and Anthropic’s aligned relationships with Amazon and Google demonstrate that “who supplies compute and who sells the API” is part of the strategic moat.citeturn21view2turn9search15turn12search10turn7view3 At the same time, both labs participate in a growing evaluation-and-governance ecosystem—including NIST/AISI agreements and even cross-lab testing—suggesting that some level of standardised safety proof is becoming a competitive necessity rather than solely a voluntary ideal.citeturn13search29turn18search35turn13search1