programming-as-theory-building-skill

Name: programming-as-theory-building-skill
Author: AnamKwon

Verified

Claude Code skill that applies Naur's Programming as Theory Building to coding-agent workflows.

6stars

0forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/AnamKwon/programming-as-theory-building-skill

Getting Started

Guides for using ai agents skills like programming-as-theory-building-skill.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 6/11/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-06-11T08:50:20.984Z",
  "npmAuditRan": true,
  "pipAuditRan": true,
  "promptInjectionRan": true
}

README.md

Frequently Asked Questions

What is programming-as-theory-building-skill?

programming-as-theory-building-skill is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by AnamKwon. Claude Code skill that applies Naur's Programming as Theory Building to coding-agent workflows. It has 6 GitHub stars.

Is programming-as-theory-building-skill safe to use?

Yes. programming-as-theory-building-skill passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install programming-as-theory-building-skill?

Clone the repository with "git clone https://github.com/AnamKwon/programming-as-theory-building-skill" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is programming-as-theory-building-skill written in?

programming-as-theory-building-skill is primarily written in Python. It is open-source under AnamKwon on GitHub, so you can review or fork the full source.

Are there alternatives to programming-as-theory-building-skill?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh programming-as-theory-building-skill against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

Developers Also Liked

Based on votes and bookmarks from developers who liked this skill

code-assistant-peers

by AnamKwon

MCP server for mandatory peer review between coding assistants like Claude Code, Codex, Gemini, and other CLI agents.

nsfw-ai-skill betting-odds-tracker

Programming as Theory Building Skill

A Claude Code plugin and reusable coding-agent skill that turns code generation from prompt completion into theory-preserving engineering work.

Most coding-agent failures are not syntax failures. They are theory failures: the agent writes code that looks right, but does not understand the invariant the code protects, why the current boundary exists, where the change belongs, or what behavior proves the change is correct.

The skill is grounded in Peter Naur's paper "Programming as Theory Building" (1985). Naur's central claim is that the durable asset in programming is not only the program text, but the programmer's theory of how the program maps real-world affairs into behavior. This skill converts that idea into operational checks for coding agents: map the domain rule, explain the current shape, place the change beside the closest existing facility, and verify the behavior that matters.

The Problem

General coding agents often produce plausible files that satisfy the prompt surface while missing the program's governing invariant. For code generation, that shows up as:

new helpers or modules that do not match the existing domain boundary,
tests that prove the happy path but not the business rule,
speculative abstractions added before the current problem needs them,
readable code whose design story is hard to extend safely.

programming-as-theory-building narrows the agent's behavior around the question Naur's paper makes unavoidable: what theory of the program is being preserved or extended?

The Solution

The plugin packages one Claude Code skill and one project-level CLAUDE.md guideline file. The skill asks the agent to answer these checks before non-trivial code work:

Principle	Addresses
Rebuild the theory	Context-free patches and wrong assumptions
Place by similarity	Misplaced helpers, duplicated domain concepts
Keep changes surgical	Drive-by rewrites and unrelated cleanup
Avoid speculative flexibility	Bloated abstractions and unused options
Verify the theory	Tests that pass without proving the domain rule

That makes the agent inspect code paths, names, tests, docs, and runtime behavior before editing. It also discourages one-off abstractions and asks for verification tied to the domain behavior, not just syntax.

Benchmark summary

The benchmark compares commerce-backend code generation across three isolated arms:

skills_off: managed Claude Code skills disabled.
karpathy_only: only the Karpathy guidelines skill enabled.
theory_only: only this Programming as Theory Building skill enabled.

Code generation used Claude Haiku through the Claude Code MODEL=haiku setting for every arm. Each generation ran in a fresh temporary workspace, and generated projects were reviewed by a separate Claude Opus review pass using benchmark-codegen-review-v1.

The copied benchmark now contains three prompt families:

basic-commerce: the original, looser FastAPI + SQLite inventory reservation/order orchestration prompt.
strict-production: a later, more explicit prompt that specifies endpoints, status codes, error bodies, expiration behavior, stock restoration, 401 auth behavior, and pagination semantics. This maps to benchmark/prompts/strict-commerce.md.
strict-commerce-no-mcp: the same strict prompt run after MCP usage was disabled in the harness, also using benchmark/prompts/strict-commerce.md. It is reported separately because the execution environment changed.

Because the prompt changed, the headline result is reported by prompt family rather than as one flattened average.

Prompt family	Arm	n	Avg weighted	Functional	Executability	Test quality	Verdict summary
`basic-commerce`	`skills_off`	40	71.0	61.4	68.9	65.8	12 good, 27 mixed, 1 poor
`basic-commerce`	`karpathy_only`	40	73.9	63.8	71.0	70.5	19 good, 21 mixed
`basic-commerce`	`theory_only`	40	77.9	68.6	78.5	76.1	27 good, 13 mixed
`strict-production`	`skills_off`	19	80.9	76.6	74.2	80.3	4 excellent, 7 good, 8 mixed
`strict-production`	`karpathy_only`	19	82.5	77.5	80.5	83.2	5 excellent, 5 good, 9 mixed
`strict-production`	`theory_only`	20	83.4	81.8	77.8	83.8	4 excellent, 12 good, 4 mixed
`strict-commerce-no-mcp`	`skills_off`	10	78.5	64.3	73.9	88.0	2 excellent, 2 good, 6 mixed
`strict-commerce-no-mcp`	`karpathy_only`	9	84.6	82.8	83.7	82.9	3 excellent, 4 good, 2 mixed
`strict-commerce-no-mcp`	`theory_only`	10	88.5	89.5	91.2	88.9	4 excellent, 6 good

Interpreting the result

The basic-commerce prompt is the cleaner test of skill behavior because the prompt leaves more program theory to be inferred. In that family, theory_only won all four run-level comparisons. Its advantage was strongest in executability and tests, where it led skills_off by +9.6 and +10.3 points respectively.

The strict-production prompt raised every arm. It explicitly supplied many rules that the theory-building skill otherwise had to recover: status codes, stock restoration, expiration behavior, idempotency expectations, and pagination semantics. In that stricter family, the gap narrowed; karpathy_only won one run and theory_only won the other.

The MCP-disabled strict run is separated from the earlier strict runs. In that run, theory_only led with 88.5, followed by karpathy_only at 84.6 and skills_off at 78.5.

The overall pattern is that karpathy_only improves readability and compactness, while theory_only more consistently improves domain correctness, executability, and behavioral tests. Across all parseable isolated reviews, theory_only has the best weighted average: 81.0 vs 77.7 for karpathy_only and 74.8 for skills_off. Neither skill eliminates recurring failures by itself: inventory/reservation invariants, idempotency, expiration/state transitions, SQLite isolation, runtime entrypoints, dead code, and README overclaims still appear in reviews.

Run-by-run results, excluded review-output notes, copied raw result folders, manifest join notes, and recurring failure categories are documented in benchmark/README.md.

benchmark/prompts/
benchmark/results-20260609.json
benchmark/raw-results/.skill-codegen-runs/
benchmark/raw-results/.skill-review-runs/

Install

Option A: Claude Code plugin

/plugin marketplace add AnamKwon/programming-as-theory-building-skill
/plugin install programming-as-theory-building-skill@programming-as-theory-building-skill

For a fork, replace AnamKwon with the account or organization that publishes the repository. The install command is <plugin-name>@<marketplace-id>; this repository uses programming-as-theory-building-skill for both.

Option B: manual Claude Code skill install

mkdir -p ~/.claude/skills/programming-as-theory-building
cp skills/programming-as-theory-building/SKILL.md ~/.claude/skills/programming-as-theory-building/SKILL.md

Option C: per-project CLAUDE.md

cp CLAUDE.md /path/to/project/CLAUDE.md

For Codex CLI, copy the operating rules into AGENTS.md; Codex does not import Claude Code SKILL.md automatically. For Gemini CLI, put the rules in GEMINI.md, or import the skill content with the CLI's memory mechanism.

How to Know It's Working

These guidelines are working if you see:

fewer isolated helpers that ignore existing service/repository/UI boundaries,
fewer broad rewrites when a local change would preserve the theory,
more explicit invariant checks before implementation,
final summaries that connect Theory, Changed, Verified, and Risk.

Reproduce the benchmark

From the parent experiment workspace, run 10-repeat sets and aggregate results by prompt family:

MODEL=haiku REPEATS=10 ARMS="skills_off karpathy_only theory_only" ./run_skill_codegen_experiment.sh
MODEL=opus ./run_opus_code_review_experiment.sh .skill-codegen-runs/<run_id>

The published benchmark combines multiple 10-repeat batches. Keep prompt revisions and environment changes separate when aggregating; the basic-commerce, strict-production, and strict-commerce-no-mcp groups are not directly interchangeable samples.

The benchmark harness intentionally keeps both out of the default comparison set. ARMS=both remains available as an explicit opt-in, but the default comparison isolates single-skill effects.

Citation

Naur, Peter. "Programming as Theory Building." Microprocessing and Microprogramming, vol. 15, no. 5, 1985, pp. 253-261.

Repository layout

.
|-- README.md
|-- PROMOTION.md
|-- LICENSE
|-- CITATION.cff
|-- CLAUDE.md
|-- .claude-plugin/
|   |-- marketplace.json
|   `-- plugin.json
|-- benchmark/
|   |-- README.md
|   |-- prompts/
|   |   |-- README.md
|   |   |-- basic-commerce.md
|   |   `-- strict-commerce.md
|   |-- raw-results/
|   |   |-- .skill-codegen-runs/
|   |   `-- .skill-review-runs/
|   `-- results-20260609.json
|-- skills/
|   `-- programming-as-theory-building/
|       `-- SKILL.md
`-- .gitignore