by AtomFlow-AI
Molecode presents molecules as code and enables LLMs to operate and reason on chemistry directly.
# Add to your Claude Code skills
git clone https://github.com/AtomFlow-AI/MoleCodeLast scanned: 6/6/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-06-06T06:52:45.118Z",
"npmAuditRan": true,
"pipAuditRan": false
}No comments yet. Be the first to share your thoughts!
30 days in the Featured rail
Official repository for MoleCode unlocks structural intelligence in large language models.
Molecode presents molecules as code and enables LLMs to operate and reason on chemistry directly. Instead of making language models reconstruct molecular structure from cryptic strings, MoleCode lets them read, write, and edit directly on the structures.
English | 中文
Please visit the AtomFlow website.
A molecule is a graph: atoms are nodes, bonds are edges, and chemistry emerges from the topology. Yet large language models are almost always fed molecules as linear strings like SMILES, where the graph is implicit — connectivity is positional, branches are syntactic, and rings hide inside index digits. Before an LLM can do any chemistry, it must first reconstruct the graph from the syntax, spending reasoning budget on structural bookkeeping.
MoleCode makes the structure the language. Every atom and bond is written as a typed declaration with a persistent identifier, serialized as a Mermaid graph. Topology becomes directly readable, editable, and auditable inside the context window — and the format is deterministically and losslessly inter-convertible with SMILES / MOL via RDKit (no learned model, no information loss).
graph TB
subgraph chlorophenol["para-chlorophenol"]
chlorophenol_C_1[C]
chlorophenol_O_1[OH]
chlorophenol_C_2[CH]
chlorophenol_C_3[CH]
chlorophenol_C_4[C]
chlorophenol_Cl_1[Cl]
chlorophenol_C_5[CH]
chlorophenol_C_6[CH]
chlorophenol_C_1 === chlorophenol_C_2
chlorophenol_C_2 --- chlorophenol_C_3
chlorophenol_C_3 === chlorophenol_C_4
chlorophenol_C_4 --- chlorophenol_C_5
chlorophenol_C_5 === chlorophenol_C_6
chlorophenol_C_6 --- chlorophenol_C_1
chlorophenol_C_1 --- chlorophenol_O_1
chlorophenol_C_4 --- chlorophenol_Cl_1
end
The same
Subgraph → Node → Edgegrammar covers small molecules, polymers, and Markush structures — and extends to reaction mechanisms and multimodal document parsing.
| | SMILES | MoleCode | | --- | --- | --- | | Topology | implicit, positional | explicit, named nodes & edges | | Atom identity | none | persistent IDs (stable across prompt → reasoning → output) | | Editing | whole-string rewrite | local graph op (add a methyl = 1 node + 1 edge) | | Validation | fragile string parsing | deterministic RDKit round-trip | | Reasoning behavior | memorizes syntax | generalizes over structure |
Empirically (see the MoleCode paper and docs/06-why-it-works.md):
pip install molecode # from PyPI — pulls in rdkit + networkx
Or from source (for the examples, the Agent Skill, and development):
git clone https://github.com/AtomFlow-AI/MoleCode.git
cd MoleCode
pip install -e .
pip install molecodegives you the library (molecode.molecule,molecode.polymer,molecode.markush,molecode.prompts,molecode.llm). The runnableexamples/and the Agent Skill live in the repository. Full API reference → docs/api.md.
from rdkit import Chem
from molecode import mol_to_mermaid, mermaid_to_mol, mol_to_smiles
# SMILES -> MoleCode graph
graph = mol_to_mermaid(Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O"), name="Aspirin")
print(graph)
# MoleCode graph -> SMILES (lossless round-trip)
assert mol_to_smiles(mermaid_to_mol(graph)) == Chem.CanonSmiles("CC(=O)Oc1ccccc1C(=O)O")
MoleCode ships as a ready-to-use Agent Skill, so coding agents can clone this repo and immediately reason over and edit molecules at the explicit-graph level — no extra setup, no MCP server required.
| Agent | How it picks MoleCode up |
| --- | --- |
| Claude Code | Auto-discovers the skill at .claude/skills/molecode/. Just ask it to understand or edit a molecule. |
| Codex (and other agents) | Reads AGENTS.md at the repo root and uses the bundled CLI; interface metadata in agents/openai.yaml. |
Instead of asking the model to hand-write SMILES — error-prone for anything non-trivial — the skill has it convert → inspect the named atoms/bonds → edit the graph → validate, all through one stable CLI:
python .claude/skills/molecode/scripts/molecode_convert.py doctor
python .claude/skills/molecode/scripts/molecode_convert.py smiles-to-molecode "CCO" --name Ethanol
python .claude/skills/molecode/scripts/molecode_convert.py validate --input edited.mmd # formula, counts, round-trip
python .claude/skills/molecode/scripts/molecode_convert.py molecode-to-smiles --input edited.mmd
The skill bundles the six conversion forms (SMILES / PSMILES / Markush ↔
MoleCode) plus validate, compare (Markush-aware isomorphism) and doctor, a
syntax reference for hand-editing graphs, and a file-based edit workflow built
for large molecules. See .claude/skills/molecode/SKILL.md.
molecode.moleculeAtoms are prefix_Element_Number[Label] nodes; bonds are --- (single), === (double), -.- (triple), with ===|E|/===|Z| and _R/_S for stereochemistry. → syntax reference
molecode.polymerThe repeat unit stays explicit as a subgraph carrying a symbolic ×n count, with TL/TR terminus markers — so the graph does not blow up with chain length. → polymer docs
from molecode.polymer import polymer_to_mermaid, mermaid_to_psmiles
graph = polymer_to_mermaid("*NCCCCCC(=O)*", n=8, name="Nylon-6") # PSMILES -> graph
mermaid_to_psmiles(graph) # -> '*NCCCCCC(=O)*'
molecode.markushVariable R-groups and named substituents become abbreviation nodes in curly braces — {R1}, {Boc}, {Ar} — something plain SMILES cannot express. A built-in graph-isomorphism comparator scores predictions up to abbreviation expansion. → Markush docs
graph TB
subgraph Mol["molecule name"]
Mol_C_1[C]
Mol_O_1[OH]
Mol_X_1{Boc}
Mol_X_2{R1}
Mol_C_1 --- Mol_O_1
Mol_C_1 --- Mol_X_1
end
MoleCode is a drop-in representation for any LLM — feed the grammar as a system prompt, hand the model a graph, and validate its output deterministically. The examples/ folder has runnable scripts for all four task families (they run offline by default, printing the exact prompt; set MOLECODE_API_KEY to call a model):
python examples/01_molecule_roundtrip.py # SMILES <-> graph (lossless)
python examples/02_polymer_roundtrip.py # polymers with ×n
python examples/03_markush_roundtrip.py # abbreviation nodes & isomorphism
python examples/04_understanding.py # count atoms / formula / rings ...
python examples/05_generation.py # de novo design under constraints
python examples/06_editing.py # local graph edits (add/del/substitute)
python examples/07_reasoning.py # reaction-product prediction
python examples/08_image_to_molecode.py # OCSR: molecule image -> MoleCode (vision model)
The reusable ingredients: