AI intel digest

Vibe Engineering Effect Apps — Michael Arnaldi, Effectful

Michael Arnaldi demonstrated a practical workflow for building with coding agents and the Effect TypeScript library by c

2026-05-0729 min read5,837 words10 facts · 0 assumptions

Start here

Executive summary

1. SUMMARY Michael Arnaldi demonstrated a practical workflow for building with coding agents and the Effect TypeScript library by cloning the Effect repository directly into the project as a git subtree. Starting from an empty repository, he set up a Bun/Vitest/TypeScript project with strict diagnostics, created an agents.md instruction file, extracted patterns from the Effect source for HTTP APIs and SQL, and built a working Todo API with OpenAPI documentation in approximately 90 minutes. The core thesis is that LLM coding agents are trained to consume and produce code, not documentation, so giving them direct access to library source code produces better results than prompts or MCP servers. 2. KEY FACTS FACT: Michael Arnaldi has not written code by hand for at least 6-8 months. | EVIDENCE: "I've also have not been coding by hand since about late this summer" and later "I'm not writing code by hand since at a minimum six or eight months" | CONFIDENCE: HIGH FACT: The workshop started from a completely empty repository with nothing pre-prepared. | EVIDENCE: "I actually prepared absolutely nothing that means we can take any path that we want" | CONFIDENCE: HIGH FACT: The project used Bun as the runtime, Vitest for testing, TypeScript Go (TSGo) as the compiler, and Effect v4 beta. | EVIDENCE: "set up a bun repository, use vest for testing", "we want to use the TSGO version of it", "We're going to use effect v4" | CONFIDENCE: HIGH FACT: The Effect repository was added as a git subtree in a `.repos/effect` directory without history. | EVIDENCE: "create a dot repos folder and add as a g sub tree without history squashed uh in repos effect" | CONFIDENCE: HIGH FACT: All TypeScript diagnostics were configured as errors to prevent the LLM from accepting code with warnings. | EVIDENCE: "For AI we would like to turn everything into an error so that the The LLM cannot cannot pass cannot accept code that has any remote resemblance or an error" | CONFIDENCE: HIGH FACT: An `agents.md` file was created to establish rules, available commands, and reference to the Effect repo. | EVIDENCE: "we want to set up an agents.mmd listing the commands available" | CONFIDENCE: HIGH FACT: The model initially created unnecessary custom wrapper code for layer building in tests, which was then corrected. | EVIDENCE: "It basically created a function to provide a layer to an effect... But it's completely unnecessary here" and later "Avoid custom wrappers that call layer.build" | CONFIDENCE: HIGH FACT: The final API included create, update, flag-as-done, and list operations for todos, with OpenAPI documentation. | EVIDENCE: "expose a todo functionality where you can one create todos... Two, update todos... Three, flag todo as done or not. for list todos" and "There is an open API created" | CONFIDENCE: HIGH FACT: The speaker uses GPT 5.4 as his primary coding model, having switched from Anthropic models due to usage restrictions. | EVIDENCE: "I'm using GPT 5.4" and "anthropic is putting arbitrary restrictions on how we use their models. So I don't really want to use entropic models" | CONFIDENCE: HIGH FACT: Coding agents have been trained to focus on user code, not node_modules or gitignored directories. | EVIDENCE: "coding agents have been trained to focus on your own code not on the code that is on node modules" and "cursor does not index stuff that is git ignored" | CONFIDENCE: HIGH 3. KEY IDEAS IDEA: "Just clone the repo" — the most effective way to give coding agents knowledge of a library is to include its source directly in the project, not via documentation or MCP servers. | REASONING: LLMs are trained on code consumption and production during reinforcement learning, not on reading documentation or using MCP servers; they ignore node_modules and gitignored files by training. | IMPLICATION: Library authors and developers should treat source code as the primary interface for AI agents, potentially changing how libraries are distributed and consumed. IDEA: LLMs should be architected around as stateless machines with outdated knowledge, not as continuous learners like humans. | REASONING: Humans continuously learn, sleep to consolidate memory, and transform experience into long-term memory; LLMs only have pre-training and post-training phases with no ongoing knowledge acquisition. | IMPLICATION: Context architecture — how knowledge is injected, refreshed, and managed — is more important than model size or context window length. IDEA: Smaller context windows used strategically outperform large context windows filled with everything. | REASONING: The context window is "a fixed size array" pushed to a neural network; more information confuses the model, which is why "a 1 million context window is not necessarily helpful." | IMPLICATION: Tool design should prioritize targeted, relevant context injection over maximizing token limits. IDEA: Spec-Driven Development replaces plan mode — create a persistent spec as markdown, then implement in iterative loops. | REASONING: Plan mode gives models "crippled access to tools"; better to discuss the spec with the model, persist it, then implement in small tasks with clean context windows. | IMPLICATION: AI-assisted development workflows need new patterns distinct from traditional human planning, focused on context hygiene and incremental execution. IDEA: ESLint/custom lint rules serve as backpressure to prevent models from taking shortcuts. | REASONING: Models will find workarounds (e.g., `as never as X` when `as unknown as X` is banned); each banned pattern requires observing model behavior and codifying prohibitions. | IMPLICATION: Effective AI coding requires continuous human monitoring and rule-making — it's "babysitting a junior developer with a knife." IDEA: Open-weight models lag frontier models by 3-6 months, making them viable for daily operations sooner than expected. | REASONING: Current open models already surpass Claude 3.5 Sonnet, which was already usable for library development; the gap is closing. | IMPLICATION: Organizations should prepare for a future where frontier models are not required for serious development work. IDEA: Effect cluster and workflows are becoming more important because AI integrations make processes longer-running and more failure-prone. | REASONING: "If the average response time is 10 milliseconds server is pretty much never going to fail in that 10 millisecond. If that 10 millisecond becomes a minute, yes, you're pretty sure the server is going to fail in that minute." | IMPLICATION: Workflow engines (like Temporal, or Effect's built-in solution) will see increased adoption as AI features become standard. 4. KEY QUOTES - "This session should just be called just clone the [__] repo and get and be done with it." — Michael Arnaldi - "We learn continuously... with LLMs. This does not happen with LLMs." — Michael Arnaldi - "You are basically appending messages onto a fixed size array which is called the context window and context window is limited." — Michael Arnaldi - "The dumbest thing ever ends up working better." — Michael Arnaldi, on complex context management architectures vs. simple bash loops - "It's kind of babysitting a junior developer with a knife running through the kitchen." — Michael Arnaldi, on writing lint rules to constrain model behavior - "If the average response time is 10 milliseconds server is pretty much never going to fail in that 10 millisecond. If that 10 millisecond becomes a minute, yes, you're pretty sure the server is going to fail in that minute." — Michael Arnaldi 5. SIGNAL POINTS - Clone library source code into your project as a git subtree — agents are trained on code, not docs, and ignore node_modules/gitignored files. - Configure all TypeScript diagnostics as errors; agents will otherwise accept code with warnings that humans would catch. - Use `agents.md` to codify project-specific rules, commands, and pattern references — treat it as living documentation that evolves with observed model failures. - Implement custom lint rules as backpressure for each shortcut a model discovers; expect an arms race of workarounds. - Prefer Spec-Driven Development over plan mode: persistent markdown specs + clean context windows + iterative execution loops. - Context hygiene beats context volume — restart sessions, avoid watch modes, and keep each task focused. - Open-weight models are approaching viability for serious development; the 3-6 month lag to frontier is shrinking. - Long-running AI processes make workflow engines (Temporal, Effect cluster) essential infrastructure, not optional extras. 6. SOURCES MENTIONED - Effect / Effect-TS: TypeScript library for building type-safe, composable applications. Used v4 beta. Described as having "effect cluster" for distributed workflows. - Bun: JavaScript runtime used for the project. Speaker confirmed Vitest uses Bun runtime with a specific flag. - Vitest: Test runner. Speaker noted a tag was needed to make it use Bun rather than Node. - TypeScript Go (TSGo): Preview compiler used instead of standard TypeScript. Installed via `typescript-go` package. - GPT 5.4 / OpenAI: Primary coding model used. Described as more concise than Anthropic models but prone to asking for confirmation. - Claude / Opus / Anthropic: Previously used but abandoned due to "arbitrary restrictions" on usage. Described as sometimes taking shortcuts that propagate through codebases. - Cursor: Mentioned as not indexing gitignored files. - MCP servers: Mentioned as insufficient because models are not trained to use them. - effect.solutions: Website by Kit Langton for quick-starting Effect in AI projects. Criticized for using a CLI that creates a circular dependency. - Temporal: Workflow engine mentioned as comparison to Effect's built-in workflow solution. - Ralph Loops: Referenced via Joffrey Huntley's analogy of Claude 3.5 as "a kid with a knife running through the house." - Accountability: Arnaldi's personal repository with extensive ESLint config for constraining AI behavior. 7. VERDICT This video is worth watching for anyone building with coding agents, especially in typed functional languages. The unique signal is a demonstrated, end-to-end workflow for making agents effective with unfamiliar libraries — not through better prompts or RAG, but by architectural decisions (git subtrees, strict diagnostics, lint rules as backpressure, context hygiene) that align with how models are actually trained. Arnaldi's credibility comes from doing this at library-level complexity for months across TypeScript and Rust, not toy examples. The live coding format, including visible model failures and corrections, makes this more actionable than polished tutorials. For Effect users specifically, it's essential; for AI tooling researchers, the "clone the repo" thesis and the lint-rule arms-race observation are testable hypotheses. Signal density is high for practitioners, moderate for casual viewers. --- Count: 10 facts, 0 assumptions, 8 demonstrations (project setup, TSGo configuration, strict diagnostics, git subtree, agents.md creation, HTTP API implementation, SQL client setup, test cleanup with layer pattern correction). Signal density: 75% — the workshop format includes some conversational overhead and repeated points, but the core methodology is densely packed with actionable, evidenced claims.

What matters

Signal points

1
Clone library source code into your project as a git subtree — agents are trained on code, not docs, and ignore node_modules/gitignored files.
2
Configure all TypeScript diagnostics as errors; agents will otherwise accept code with warnings that humans would catch.
3
Use `agents.md` to codify project-specific rules, commands, and pattern references — treat it as living documentation that evolves with observed model failures.
4
Implement custom lint rules as backpressure for each shortcut a model discovers; expect an arms race of workarounds.
5
Prefer Spec-Driven Development over plan mode: persistent markdown specs + clean context windows + iterative execution loops.
6
Context hygiene beats context volume — restart sessions, avoid watch modes, and keep each task focused.
7
Open-weight models are approaching viability for serious development; the 3-6 month lag to frontier is shrinking.
8
Long-running AI processes make workflow engines (Temporal, Effect cluster) essential infrastructure, not optional extras.

Interpretation

Key ideas

"Just clone the repo" — the most effective way to give coding agents knowledge of a library is to include its source directly in the project, not via documentation or MCP servers.

Why: LLMs are trained on code consumption and production during reinforcement learning, not on reading documentation or using MCP servers; they ignore node_modules and gitignored files by training.

Implication: Library authors and developers should treat source code as the primary interface for AI agents, potentially changing how libraries are distributed and consumed.

LLMs should be architected around as stateless machines with outdated knowledge, not as continuous learners like humans.

Why: Humans continuously learn, sleep to consolidate memory, and transform experience into long-term memory; LLMs only have pre-training and post-training phases with no ongoing knowledge acquisition.

Implication: Context architecture — how knowledge is injected, refreshed, and managed — is more important than model size or context window length.

Smaller context windows used strategically outperform large context windows filled with everything.

Why: The context window is "a fixed size array" pushed to a neural network; more information confuses the model, which is why "a 1 million context window is not necessarily helpful."

Implication: Tool design should prioritize targeted, relevant context injection over maximizing token limits.

Spec-Driven Development replaces plan mode — create a persistent spec as markdown, then implement in iterative loops.

Why: Plan mode gives models "crippled access to tools"; better to discuss the spec with the model, persist it, then implement in small tasks with clean context windows.

Implication: AI-assisted development workflows need new patterns distinct from traditional human planning, focused on context hygiene and incremental execution.

ESLint/custom lint rules serve as backpressure to prevent models from taking shortcuts.

Why: Models will find workarounds (e.g., `as never as X` when `as unknown as X` is banned); each banned pattern requires observing model behavior and codifying prohibitions.

Implication: Effective AI coding requires continuous human monitoring and rule-making — it's "babysitting a junior developer with a knife."

Open-weight models lag frontier models by 3-6 months, making them viable for daily operations sooner than expected.

Why: Current open models already surpass Claude 3.5 Sonnet, which was already usable for library development; the gap is closing.

Implication: Organizations should prepare for a future where frontier models are not required for serious development work.

Evidence

Key facts

Michael Arnaldi has not written code by hand for at least 6-8 months.

HIGH

Evidence: I've also have not been coding by hand since about late this summer" and later "I'm not writing code by hand since at a minimum six or eight months

The workshop started from a completely empty repository with nothing pre-prepared.

HIGH

Evidence: I actually prepared absolutely nothing that means we can take any path that we want

The project used Bun as the runtime, Vitest for testing, TypeScript Go (TSGo) as the compiler, and Effect v4 beta.

HIGH

Evidence: set up a bun repository, use vest for testing", "we want to use the TSGO version of it", "We're going to use effect v4

The Effect repository was added as a git subtree in a `.repos/effect` directory without history.

HIGH

Evidence: create a dot repos folder and add as a g sub tree without history squashed uh in repos effect

All TypeScript diagnostics were configured as errors to prevent the LLM from accepting code with warnings.

HIGH

Evidence: For AI we would like to turn everything into an error so that the The LLM cannot cannot pass cannot accept code that has any remote resemblance or an error

An `agents.md` file was created to establish rules, available commands, and reference to the Effect repo.

HIGH

Evidence: we want to set up an agents.mmd listing the commands available

The model initially created unnecessary custom wrapper code for layer building in tests, which was then corrected.

HIGH

Evidence: It basically created a function to provide a layer to an effect... But it's completely unnecessary here" and later "Avoid custom wrappers that call layer.build

Show 3 more facts

The final API included create, update, flag-as-done, and list operations for todos, with OpenAPI documentation.

HIGH

Evidence: expose a todo functionality where you can one create todos... Two, update todos... Three, flag todo as done or not. for list todos" and "There is an open API created

The speaker uses GPT 5.4 as his primary coding model, having switched from Anthropic models due to usage restrictions.

HIGH

Evidence: I'm using GPT 5.4" and "anthropic is putting arbitrary restrictions on how we use their models. So I don't really want to use entropic models

Coding agents have been trained to focus on user code, not node_modules or gitignored directories.

HIGH

Evidence: coding agents have been trained to focus on your own code not on the code that is on node modules" and "cursor does not index stuff that is git ignored

Memorable lines

Quotes

“This session should just be called just clone the [__] repo and get and be done with it." — Michael Arnaldi”

“We learn continuously... with LLMs. This does not happen with LLMs." — Michael Arnaldi”

“You are basically appending messages onto a fixed size array which is called the context window and context window is limited." — Michael Arnaldi”

“The dumbest thing ever ends up working better." — Michael Arnaldi, on complex context management architectures vs. simple bash loops”

“It's kind of babysitting a junior developer with a knife running through the kitchen." — Michael Arnaldi, on writing lint rules to constrain model behavior”

“If the average response time is 10 milliseconds server is pretty much never going to fail in that 10 millisecond. If that 10 millisecond becomes a minute, yes, you're pretty sure the server is going to fail in that minute." — Michael Arnaldi”