← All articles
ai-agentsSignal 78/100

AI intel digest

Skills at Scale — Nick Nisi and Zack Proser, WorkOS

Nick Nisi and Zach Proser from WorkOS's Applied AI team led a workshop on "skills at scale" — portable markdown-based in

2026-05-0624 min read4,896 words10 facts · 0 assumptions
Start here

Executive summary

1. SUMMARY (3-5 sentences) Nick Nisi and Zach Proser from WorkOS's Applied AI team led a workshop on "skills at scale" — portable markdown-based instruction units that teach AI tools (Claude, Codex, Cursor) how to perform specific tasks. They demonstrated that a single skill.md file with YAML frontmatter can route automatically to the right task, enforce constraints, and even execute deterministic scripts via interpolation (e.g., `!`git log``). The workshop included live audience Q&A on governance challenges (versioning, shared vs. forked skills, model drift) and hands-on building of a "repo roast" skill. They also showcased how WorkOS uses skills to power 15 framework integrations in a CLI that installs and configures auth automatically. The core thesis: skills compress institutional knowledge into minimal, composable, cross-tool units that survive model updates. 2. KEY FACTS (5-10 bullets) FACT: Skills are supported by Claude, Codex, Cursor, and Claude Desktop | EVIDENCE: "codec supports them, cloud supports them, cursor supports them, uh the uh desktop apps like like cloud desktop supports it" | CONFIDENCE: HIGH FACT: A skill is typically a folder containing skill.md with YAML frontmatter (name, description) plus optional scripts, references, and images | EVIDENCE: "they're more like a folder with a skill.mmd file in them... they can have references... scripts... images" | CONFIDENCE: HIGH FACT: The description field in frontmatter is used by the LLM at runtime for routing, not for human readability | EVIDENCE: "this description is is incredibly powerful and loaded. This is what the LLM is going to use at runtime to essentially do routing" | CONFIDENCE: HIGH FACT: Script interpolation using `!`command`` syntax allows deterministic data injection into the LLM context | EVIDENCE: "if you use the bang and then back tick... Claude will do like an interpolation of that... it'll just replace this with a list of the stale to-dos" | CONFIDENCE: HIGH FACT: WorkOS maintains a public GitHub repo (workos/skills) and an internal skills marketplace | EVIDENCE: "GitHub work OS skills... we also have uh some like internal skills... there's an O specialist, there's a DX specialist" | CONFIDENCE: HIGH FACT: WorkOS has a formal evaluation framework for public skills that benchmarks performance with/without the skill | EVIDENCE: "we do ship uh... a whole eval framework... doing several runs where it will load claude without the skill and ask it to do a task and then load it with a skill" | CONFIDENCE: HIGH FACT: Both speakers report not writing code manually for 6-8+ months, working entirely through agents | EVIDENCE: "when is the last time you wrote a line of code by yourself?... probably six or eight months now" / "Same" | CONFIDENCE: HIGH FACT: Vercel's `npx skills` tool symlinks skills into multiple directories for cross-tool loading | EVIDENCE: "if you've ever used like the MPX skills, uh, tool from Verscell, that is just kind of sim linking them all into all of these different directories" | CONFIDENCE: HIGH FACT: The "repo roast" workshop skill uses progressive disclosure to load reference files only when needed | EVIDENCE: "if you're doing like a scoring... load the scoring rubric... If we're not doing scoring, you don't have to load that" | CONFIDENCE: HIGH FACT: WorkOS's CLI uses the Claude Agent SDK and composes multiple skills to power 15 framework integrations | EVIDENCE: "one CLI uses this same pattern to power 15 framework integrations — each one a skill composed with others, wired into an agent that installs and configures auth" | CONFIDENCE: HIGH 3. KEY IDEAS (5-10 bullets) IDEA: Constraints outperform prescriptions in skill design | REASONING: Speakers observed that overly verbose step-by-step instructions bloat context and get ignored; short constraint lists ("never be vague", "always cite line + commit") produce better adherence | IMPLICATION: The optimal skill is a minimal contract of boundaries, not a script — this scales better across model versions IDEA: Progressive disclosure solves the context window bloat problem | REASONING: Instead of loading all repo knowledge upfront (like claude.md), skills reference external files that load only when the LLM decides they're relevant | IMPLICATION: Teams can encode vast institutional knowledge without hitting token limits or degrading performance IDEA: Skills are the "DRY pattern for the agentic era" | REASONING: Developers repeat the same prompts across conversations, projects, and tools; skills codify once and run anywhere | IMPLICATION: This creates a new layer of abstraction — portable behavior units that decouple institutional knowledge from any single model or IDE IDEA: Governance of shared skills mirrors code dependency management | REASONING: Audience questions surfaced conflicts between centralized skill repos (review overhead, model drift) and individual forks; speakers responded with versioning, marketplace segregation (public/internal/personal), and plugin-style interfaces | IMPLICATION: Organizations need package-manager-like semantics for skills before team-wide adoption scales IDEA: Deterministic script injection bridges the gap between LLM non-determinism and reliable workflows | REASONING: Without scripts, the LLM speculates on what "latest commits" means; with `!`git log``, the exact data is injected verbatim | IMPLICATION: This pattern enables audit trails, reproducible reports, and compliance-sensitive agent workflows IDEA: Skills should be evaluated against baseline performance, not just "does it work" | REASONING: WorkOS runs A/B tests (with/without skill) and grades on a rubric; skills must improve performance measurably and maintain 80-90% reliability | IMPLICATION: Skills require CI/CD-like rigor; they are production artifacts, not ad-hoc prompts 4. KEY QUOTES (3-7 bullets) "It's almost like carrying, if you will, the DRY pattern into the agentic era in a way." — Nick Nisi "Descriptions are routing rules, right? They're they're less for us and they're more for the AI to determine when to use it." — Zach Proser "Without scripts, the AI is just speculating on what you mean when you say go get the latest commits." — Zach Proser "It can still get it... it can still decide not to follow things that you have. I've definitely had cases where I'm like, 'Do this, this, and then this,' and it skip the step in the middle, and I say, why'd you skip that? And it's like, 'Oh, yeah, you told me to do it. I I didn't feel like it.'" — Nick Nisi "Wait a week while working on it and then go back and ask Claude analyze my week's worth of work and then what are the skills I should split out of that based on this." — Nick Nisi "I just say hey now it's on v3 go update it... one user prompt of child running through field, nothing exists, and then 30 seconds later, you have a video of it running through." — Zach Proser on Nano Banana skill pipeline 5. SIGNAL POINTS (5-8 bullets) Skills are already a cross-tool standard: Claude, Codex, Cursor, and Claude Desktop all load the same skill.md format — no rewrite needed The `!`script`` interpolation pattern is the most underutilized power move: it turns speculative LLM behavior into deterministic, auditable workflows WorkOS's public skills repo and eval framework exist today; this is not theoretical — they ship production skills with regression tests The governance problem is real and unsolved: teams are already blocking shared skills due to fear of merge conflicts, model drift, and skill proliferation Progressive disclosure (lazy-loading reference files) is the architectural answer to "how do I give my agent 10,000 lines of context without breaking it" The speakers' personal workflows (6-8 months without manual coding) demonstrate that agent-first development is already viable for DX engineers Skills calling skills + sub-agents with isolated context = emergent composability patterns that resemble microservices architecture for agent behavior 6. SOURCES MENTIONED WorkOS — speakers' employer; hiring; offers auth/MCP security products Claude (Anthropic) — primary tool used by speakers; supports skills, skill builder/creator skill, plugin marketplace Codex (OpenAI) — supports skills; Codex skills marketplace mentioned as good for code review Cursor — supports skills Vercel — `npx skills` tool for cross-directory skill symlinking Claude Agent SDK — powers WorkOS CLI with 15 framework integrations Nano Banana (Google) — image generation model; Zach built a skill wrapper for v3 Remotion — video generation skill in "superpowers" library; generates videos from prompts Obsidian — note-taking app; Nick's ideation skill outputs to Obsidian vault Superpowers — skills library mentioned by audience and Zach; contains non-coding skills like Remotion video generation Slidev — presentation framework used for workshop slides; has a skill in the repo 7. VERDICT This video is worth watching for anyone building agent infrastructure or scaling AI adoption across engineering teams. The unique signal is not the existence of skills (known to Claude/Cursor power users) but the operational depth: WorkOS's eval framework, the `!`script`` interpolation pattern, progressive disclosure architecture, and live governance discussions with a 60-engineer team. You will not find this level of production-hardened skill strategy in documentation or blog posts. The Q&A section alone — covering versioning, shared vs. forked skills, model drift, and skill proliferation — is ahead of where most organizations are today. Signal density is high for practitioners, moderate for casual users. COUNT: 10 facts, 0 assumptions, 10 demonstrations (script interpolation, eval framework, progressive disclosure, skill loading, repo roast live demo, Nano Banana pipeline, Remotion video, Codex review skill, Claude skill builder, sub-agent context isolation) SIGNAL DENSITY: 78

What matters

Signal points

  1. 1

    Skills are already a cross-tool standard: Claude, Codex, Cursor, and Claude Desktop all load the same skill.md format — no rewrite needed

  2. 2

    The `!`script`` interpolation pattern is the most underutilized power move: it turns speculative LLM behavior into deterministic, auditable workflows

  3. 3

    WorkOS's public skills repo and eval framework exist today; this is not theoretical — they ship production skills with regression tests

  4. 4

    The governance problem is real and unsolved: teams are already blocking shared skills due to fear of merge conflicts, model drift, and skill proliferation

  5. 5

    Progressive disclosure (lazy-loading reference files) is the architectural answer to "how do I give my agent 10,000 lines of context without breaking it"

  6. 6

    The speakers' personal workflows (6-8 months without manual coding) demonstrate that agent-first development is already viable for DX engineers

  7. 7

    Skills calling skills + sub-agents with isolated context = emergent composability patterns that resemble microservices architecture for agent behavior

  8. 8

    6. SOURCES MENTIONED

Interpretation

Key ideas

1

Constraints outperform prescriptions in skill design

Why: Speakers observed that overly verbose step-by-step instructions bloat context and get ignored; short constraint lists ("never be vague", "always cite line + commit") produce better adherence

Implication: The optimal skill is a minimal contract of boundaries, not a script — this scales better across model versions

2

Progressive disclosure solves the context window bloat problem

Why: Instead of loading all repo knowledge upfront (like claude.md), skills reference external files that load only when the LLM decides they're relevant

Implication: Teams can encode vast institutional knowledge without hitting token limits or degrading performance

3

Skills are the "DRY pattern for the agentic era"

Why: Developers repeat the same prompts across conversations, projects, and tools; skills codify once and run anywhere

Implication: This creates a new layer of abstraction — portable behavior units that decouple institutional knowledge from any single model or IDE

4

Governance of shared skills mirrors code dependency management

Why: Audience questions surfaced conflicts between centralized skill repos (review overhead, model drift) and individual forks; speakers responded with versioning, marketplace segregation (public/internal/personal), and plugin-style interfaces

Implication: Organizations need package-manager-like semantics for skills before team-wide adoption scales

5

Deterministic script injection bridges the gap between LLM non-determinism and reliable workflows

Why: Without scripts, the LLM speculates on what "latest commits" means; with `!`git log``, the exact data is injected verbatim

Implication: This pattern enables audit trails, reproducible reports, and compliance-sensitive agent workflows

6

Skills should be evaluated against baseline performance, not just "does it work"

Why: WorkOS runs A/B tests (with/without skill) and grades on a rubric; skills must improve performance measurably and maintain 80-90% reliability

Implication: Skills require CI/CD-like rigor; they are production artifacts, not ad-hoc prompts

Evidence

Key facts

Skills are supported by Claude, Codex, Cursor, and Claude Desktop

HIGH

Evidence: codec supports them, cloud supports them, cursor supports them, uh the uh desktop apps like like cloud desktop supports it

A skill is typically a folder containing skill.md with YAML frontmatter (name, description) plus optional scripts, references, and images

HIGH

Evidence: they're more like a folder with a skill.mmd file in them... they can have references... scripts... images

The description field in frontmatter is used by the LLM at runtime for routing, not for human readability

HIGH

Evidence: this description is is incredibly powerful and loaded. This is what the LLM is going to use at runtime to essentially do routing

Script interpolation using `!`command`` syntax allows deterministic data injection into the LLM context

HIGH

Evidence: if you use the bang and then back tick... Claude will do like an interpolation of that... it'll just replace this with a list of the stale to-dos

WorkOS maintains a public GitHub repo (workos/skills) and an internal skills marketplace

HIGH

Evidence: GitHub work OS skills... we also have uh some like internal skills... there's an O specialist, there's a DX specialist

WorkOS has a formal evaluation framework for public skills that benchmarks performance with/without the skill

HIGH

Evidence: we do ship uh... a whole eval framework... doing several runs where it will load claude without the skill and ask it to do a task and then load it with a skill

Both speakers report not writing code manually for 6-8+ months, working entirely through agents

HIGH

Evidence: when is the last time you wrote a line of code by yourself?... probably six or eight months now" / "Same

Show 3 more facts

Vercel's `npx skills` tool symlinks skills into multiple directories for cross-tool loading

HIGH

Evidence: if you've ever used like the MPX skills, uh, tool from Verscell, that is just kind of sim linking them all into all of these different directories

The "repo roast" workshop skill uses progressive disclosure to load reference files only when needed

HIGH

Evidence: if you're doing like a scoring... load the scoring rubric... If we're not doing scoring, you don't have to load that

WorkOS's CLI uses the Claude Agent SDK and composes multiple skills to power 15 framework integrations

HIGH

Evidence: one CLI uses this same pattern to power 15 framework integrations — each one a skill composed with others, wired into an agent that installs and configures auth

Memorable lines

Quotes

It's almost like carrying, if you will, the DRY pattern into the agentic era in a way." — Nick Nisi
Descriptions are routing rules, right? They're they're less for us and they're more for the AI to determine when to use it." — Zach Proser
Without scripts, the AI is just speculating on what you mean when you say go get the latest commits." — Zach Proser
It can still get it... it can still decide not to follow things that you have. I've definitely had cases where I'm like, 'Do this, this, and then this,' and it skip the step in the middle, and I say, why'd you skip that? And it's like, 'Oh, yeah, you told me to do it. I I didn't feel like it.'" — Nick Nisi
Wait a week while working on it and then go back and ask Claude analyze my week's worth of work and then what are the skills I should split out of that based on this." — Nick Nisi
I just say hey now it's on v3 go update it... one user prompt of child running through field, nothing exists, and then 30 seconds later, you have a video of it running through." — Zach Proser on Nano Banana skill pipeline