by avidevelops
# Add to your Claude Code skills
git clone https://github.com/avidevelops/claude-architect-exam-prepThis guide is compiled from a detailed Q&A thread analyzing exam-style questions for the Claude Certified Architect – Foundations certification. It breaks down multi-agent architectures, context management, tool design, and batch processing principles, focusing heavily on architectural tradeoffs and best practices over raw prompt engineering.
The Claude Architect Foundations exam tests your ability to design resilient, production-ready AI systems. It consists of scenario-based multiple-choice questions focusing on Domain 1: Agentic Architecture, Domain 4: API & Orchestration, and Domain 5: Context Management. The exam prioritizes structural, deterministic solutions (like schema design and tool boundaries) over probabilistic approaches (like prompt instructions).
This guide covers core architectural scenarios you are likely to encounter, including:
Scenario: Schema Design & Self-Correction
Question:
An automated invoice extraction pipeline occasionally outputs structured JSON where the extracted line items do not add up to the total amount extracted from the invoice. What is the best architectural approach to handle this semantic error?
Options:
calculated_total field alongside the stated_total field, compare them, and flag mismatches for human review.✅ Correct Answer: A — Extract calculated_total and flag discrepancies
Why this is correct:
JSON schemas prevent syntax errors but not semantic errors (like bad math). The most robust self-correction pattern is extracting both what the document explicitly states (stated_total) and what the model calculates from the line items (calculated_total). If they don't match, the record is flagged for human review. This improves reliability without fabricating data.
Why the others are weaker:
💡 Exam Takeaway:
Design self-correction flows by extracting both a stated value and a calculated value, routing mismatches to human review rather than silently "fixing" source document errors.
Scenario: Batch Processing & Cost Optimization
Question:
You are preparing to process 50,000 legacy documents using the Batch API. An initial test on 500 documents reveals that 18% of them require 2-3 prompt refinements to extract data correctly. What is the most cost-efficient strategy for scaling this workload?
Options:
✅ Correct Answer: A — Refine interactively first, batch process later
Why this is correct:
The guide explicitly recommends refining prompts on a representative sample before batch-processing large volumes. Iterative resubmissions at scale destroy the cost savings of the Batch API. By fixing the failure modes on a localized sample, you ensure the prompt is robust, maximizing first-pass success when you finally trigger the 50,000-document Batch run.
Why the others are weaker:
💡 Exam Takeaway:
Refine prompts on a representative sample first to maximize first-pass success, then run the full volume through the Batch API to minimize expensive resubmissions.
Scenario: SLA Management & Async APIs
Question:
Your system processes asynchronous user requests with a strict Service Level Agreement (SLA) requiring results within 30 hours of submission. You plan to use the Message Batches API, which can take up to 24 hours to complete. Which batch submission schedule best meets the SLA while maximizing cost efficiency?
Options:
✅ Correct Answer: D — Submit batches every 4 hours
Why this is correct:
Worst-case total turnaround time is the time a document waits for the next batch window plus the maximum batch processing time (24 hours). If you submit every 4 hours, a document arriving right after a cutoff waits 4 hours + 24 hours processing = 28 hours. This leaves a 2-hour safety buffer under the 30-hour SLA.
Why the others are weaker:
💡 Exam Takeaway:
Calculate worst-case SLA by adding batch frequency interval to the 24-hour max processing time; always leave an operational buffer.
Scenario: Schema Engineering & Provenance
Question:
An extraction pipeline processes technical manuals. A specific manual lists two conflicting battery capacities: one in the text and a different one in a detailed specs table. Historical data shows the specs table is correct 90% of the time. How should the extraction schema handle this?
Options:
✅ Correct Answer: D — Capture all values with source locations
Why this is correct:
Forcing premature collapse into a single value is an anti-pattern. If the specs table is only right 90% of the time, hard-coding a preference guarantees a 10% error rate. Modifying the schema to accept an array of values with explicit source locations preserves full provenance, allowing downstream business logic or human reviewers to make an informed reconciliation.
Why the others are weaker:
💡 Exam Takeaway:
Preserve conflicting source data in the structured output instead of forcing premature collapse into a single value.
Scenario: Tool Interfaces & Identifiers
Question:
An agent uses a search_documents tool to find files, and subsequently uses share_document(document_id, email) and move_document(document_id, folder) to act on them. How should the search_documents tool format its output to ensure reliable chaining?
Options:
document_id and metadata for each result.✅ Correct Answer: B — Structured data with document IDs
Why this is correct:
Multi-step workflows require clear input/output contracts. Because the downstream tools (share, move) require a specific machine-usable identifier (document_id), the upstream search tool must return exactly that ID in a structured format alongside the human-readable metadata.
Why the others are weaker:
💡 Exam Takeaway:
Multi-step tool workflows require machine-usable identifiers; tools meant for chaining should always return structured data containing explicit IDs.
Scenario: Agentic Tool Integration
**Question
No comments yet. Be the first to share your thoughts!