Search
Field notes from building AI in production.
Daily case studies, deep-dives, and operator-grade write-ups on AI engineering, DevOps, and the messy reality of shipping software that touches LLMs. Written from the cab, the office, and the trenches β by Jeremy Longshore at Intent Solutions.
Curated long reads
Rent the Agent, Own the Proof
Anthropic's Claude Tag is the best agentic Slack teammate you can rent. The question it doesn't answer: who owns the memory, and who can verify the audit log? What customer-owned, verifiable agent governance looks like in code β CCSC and AGP.
in-progressAgent-Native Mobile Testing
A three-chapter series on plugin authoring, agent reliability on real-device clouds, and AI triage for agent-orchestrated mobile cloud testing β worked through kobiton/automate as the running case study.
Recent posts
Rent the Agent, Own the Proof
Anthropic's Claude Tag is the best agentic teammate you can rent. The question it doesn't answer: who owns the memory, and who can verify the audit log? Here's what customer-owned, verifiable agent governance looks like in code.
ReadGate the Statement, Not the Tool Name
When one MCP tool carries every SQL verb, allowlisting tool names is theater. The safety boundary has to read the statement β here's how that gate was built.
ReadCoverage Said 69%, Mutation Testing Said 25%
A repo at 69% line coverage scored 24.88% on mutation testingβand the rules engine that touches user email scored 0.00%. Coverage said fine; Stryker didn't.
ReadWhen LLM Output Lies Instead of Crashing
An LLM-output parsing bug silently understated a cost report via case-sensitivity. One-line normalizer fix plus 5 defensive hardening steps.
ReadThe LLM Should Never Do the Math
A Claude Code skill that hunts Databricks cost leaks and reports confirmed dollars from the customer's own billing tables β never LLM estimates.
ReadA Denial You Can Audit Beats a Silent Drop
The governed second brain shipped: one plugin, two modes, and a governor that hands back a receipt when it refuses a write instead of dropping it silently.
ReadA Reply Your Bot Loses to a Crash Is One Your User Never Got
Making every bot reply path β streaming and file-upload β crash-durable. The unglamorous reliability work that separates a demo bot from one people depend on.
ReadSeries & ecosystems
MCP for Beginners
End-to-end Model Context Protocol curriculum β solutions in Python, TypeScript, Java, Rust, C#, .NET β translated across six languages.
Agentic Design Patterns
Patterns and anti-patterns for production agent systems. Decision frameworks, prompt scaffolds, evaluation harnesses.
Tiny Recursive Models
Building small recursive systems where simple loops compose into emergent behavior. Series in progress.
IRSB Ecosystem
Intent Solutions release tooling β the open-source family of plugins, skills, and packages.
Wild Ecosystem
The shared-GCP, multi-MCP family of standalone integrations powered by a common platform spine.
Research & Curriculum
Long-form research articles, learning paths, and reading lists for the AI builder community.
Retrospectives
June 2026: The First Full Month Under the Repaired Rubric β Tier 1 Went From 5% to 50%, and the Last Mile Got Louder
June 2026 retrospective: the first full month running under May's repaired tier rubric. The distribution corrected from 5% to 50% Tier 1, exactly as predicted β but the structural sweep still flags every disagreement as inflation, and no human has adjudicated one. 27 posts, 1,166 commits.
May 2026May 2026: The Month the Classifier Caught Itself β A Mid-Month Calibration Reckoning, 27 Posts, and 964 Commits
May 2026 retrospective: the tier classifier caught itself inflating. Calibration found the rubric structurally broken mid-month. 27 posts, 964 commits.