AI intel digest
Skills at Scale — Nick Nisi and Zack Proser, WorkOS
Nick Nisi and Zach Proser from WorkOS's Applied AI team led a workshop on "skills at scale" — portable markdown-based in
Executive summary
1. SUMMARY (3-5 sentences) Nick Nisi and Zach Proser from WorkOS's Applied AI team led a workshop on "skills at scale" — portable markdown-based instruction units that teach AI tools (Claude, Codex, Cursor) how to perform specific tasks. They demonstrated that a single skill.md file with YAML frontmatter can route automatically to the right task, enforce constraints, and even execute deterministic scripts via interpolation (e.g., `!`git log``). The workshop included live audience Q&A on governance challenges (versioning, shared vs. forked skills, model drift) and hands-on building of a "repo roast" skill. They also showcased how WorkOS uses skills to power 15 framework integrations in a CLI that installs and configures auth automatically. The core thesis: skills compress institutional knowledge into minimal, composable, cross-tool units that survive model updates. 2. KEY FACTS (5-10 bullets) FACT: Skills are supported by Claude, Codex, Cursor, and Claude Desktop | EVIDENCE: "codec supports them, cloud supports them, cursor supports them, uh the uh desktop apps like like cloud desktop supports it" | CONFIDENCE: HIGH FACT: A skill is typically a folder containing skill.md with YAML frontmatter (name, description) plus optional scripts, references, and images | EVIDENCE: "they're more like a folder with a skill.mmd file in them... they can have references... scripts... images" | CONFIDENCE: HIGH FACT: The description field in frontmatter is used by the LLM at runtime for routing, not for human readability | EVIDENCE: "this description is is incredibly powerful and loaded. This is what the LLM is going to use at runtime to essentially do routing" | CONFIDENCE: HIGH FACT: Script interpolation using `!`command`` syntax allows deterministic data injection into the LLM context | EVIDENCE: "if you use the bang and then back tick... Claude will do like an interpolation of that... it'll just replace this with a list of the stale to-dos" | CONFIDENCE: HIGH FACT: WorkOS maintains a public GitHub repo (workos/skills) and an internal skills marketplace | EVIDENCE: "GitHub work OS skills... we also have uh some like internal skills... there's an O specialist, there's a DX specialist" | CONFIDENCE: HIGH FACT: WorkOS has a formal evaluation framework for public skills that benchmarks performance with/without the skill | EVIDENCE: "we do ship uh... a whole eval framework... doing several runs where it will load claude without the skill and ask it to do a task and then load it with a skill" | CONFIDENCE: HIGH FACT: Both speakers report not writing code manually for 6-8+ months, working entirely through agents | EVIDENCE: "when is the last time you wrote a line of code by yourself?... probably six or eight months now" / "Same" | CONFIDENCE: HIGH FACT: Vercel's `npx skills` tool symlinks skills into multiple directories for cross-tool loading | EVIDENCE: "if you've ever used like the MPX skills, uh, tool from Verscell, that is just kind of sim linking them all into all of these different directories" | CONFIDENCE: HIGH FACT: The "repo roast" workshop skill uses progressive disclosure to load reference files only when needed | EVIDENCE: "if you're doing like a scoring... load the scoring rubric... If we're not doing scoring, you don't have to load that" | CONFIDENCE: HIGH FACT: WorkOS's CLI uses the Claude Agent SDK and composes multiple skills to power 15 framework integrations | EVIDENCE: "one CLI uses this same pattern to power 15 framework integrations — each one a skill composed with others, wired into an agent that installs and configures auth" | CONFIDENCE: HIGH 3. KEY IDEAS (5-10 bullets) IDEA: Constraints outperform prescriptions in skill design | REASONING: Speakers observed that overly verbose step-by-step instructions bloat context and get ignored; short constraint lists ("never be vague", "always cite line + commit") produce better adherence | IMPLICATION: The optimal skill is a minimal contract of boundaries, not a script — this scales better across model versions IDEA: Progressive disclosure solves the context window bloat problem | REASONING: Instead of loading all repo knowledge upfront (like claude.md), skills reference external files that load only when the LLM decides they're relevant | IMPLICATION: Teams can encode vast institutional knowledge without hitting token limits or degrading performance IDEA: Skills are the "DRY pattern for the agentic era" | REASONING: Developers repeat the same prompts across conversations, projects, and tools; skills codify once and run anywhere | IMPLICATION: This creates a new layer of abstraction — portable behavior units that decouple institutional knowledge from any single model or IDE IDEA: Governance of shared skills mirrors code dependency management | REASONING: Audience questions surfaced conflicts between centralized skill repos (review overhead, model drift) and individual forks; speakers responded with versioning, marketplace segregation (public/internal/personal), and plugin-style interfaces | IMPLICATION: Organizations need package-manager-like semantics for skills before team-wide adoption scales IDEA: Deterministic script injection bridges the gap between LLM non-determinism and reliable workflows | REASONING: Without scripts, the LLM speculates on what "latest commits" means; with `!`git log``, the exact data is injected verbatim | IMPLICATION: This pattern enables audit trails, reproducible reports, and compliance-sensitive agent workflows IDEA: Skills should be evaluated against baseline performance, not just "does it work" | REASONING: WorkOS runs A/B tests (with/without skill) and grades on a rubric; skills must improve performance measurably and maintain 80-90% reliability | IMPLICATION: Skills require CI/CD-like rigor; they are production artifacts, not ad-hoc prompts 4. KEY QUOTES (3-7 bullets) "It's almost like carrying, if you will, the DRY pattern into the agentic era in a way." — Nick Nisi "Descriptions are routing rules, right? They're they're less for us and they're more for the AI to determine when to use it." — Zach Proser "Without scripts, the AI is just speculating on what you mean when you say go get the latest commits." — Zach Proser "It can still get it... it can still decide not to follow things that you have. I've definitely had cases where I'm like, 'Do this, this, and then this,' and it skip the step in the middle, and I say, why'd you skip that? And it's like, 'Oh, yeah, you told me to do it. I I didn't feel like it.'" — Nick Nisi "Wait a week while working on it and then go back and ask Claude analyze my week's worth of work and then what are the skills I should split out of that based on this." — Nick Nisi "I just say hey now it's on v3 go update it... one user prompt of child running through field, nothing exists, and then 30 seconds later, you have a video of it running through." — Zach Proser on Nano Banana skill pipeline 5. SIGNAL POINTS (5-8 bullets) Skills are already a cross-tool standard: Claude, Codex, Cursor, and Claude Desktop all load the same skill.md format — no rewrite needed The `!`script`` interpolation pattern is the most underutilized power move: it turns speculative LLM behavior into deterministic, auditable workflows WorkOS's public skills repo and eval framework exist today; this is not theoretical — they ship production skills with regression tests The governance problem is real and unsolved: teams are already blocking shared skills due to fear of merge conflicts, model drift, and skill proliferation Progressive disclosure (lazy-loading reference files) is the architectural answer to "how do I give my agent 10,000 lines of context without breaking it" The speakers' personal workflows (6-8 months without manual coding) demonstrate that agent-first development is already viable for DX engineers Skills calling skills + sub-agents with isolated context = emergent composability patterns that resemble microservices architecture for agent behavior 6. SOURCES MENTIONED WorkOS — speakers' employer; hiring; offers auth/MCP security products Claude (Anthropic) — primary tool used by speakers; supports skills, skill builder/creator skill, plugin marketplace Codex (OpenAI) — supports skills; Codex skills marketplace mentioned as good for code review Cursor — supports skills Vercel — `npx skills` tool for cross-directory skill symlinking Claude Agent SDK — powers WorkOS CLI with 15 framework integrations Nano Banana (Google) — image generation model; Zach built a skill wrapper for v3 Remotion — video generation skill in "superpowers" library; generates videos from prompts Obsidian — note-taking app; Nick's ideation skill outputs to Obsidian vault Superpowers — skills library mentioned by audience and Zach; contains non-coding skills like Remotion video generation Slidev — presentation framework used for workshop slides; has a skill in the repo 7. VERDICT This video is worth watching for anyone building agent infrastructure or scaling AI adoption across engineering teams. The unique signal is not the existence of skills (known to Claude/Cursor power users) but the operational depth: WorkOS's eval framework, the `!`script`` interpolation pattern, progressive disclosure architecture, and live governance discussions with a 60-engineer team. You will not find this level of production-hardened skill strategy in documentation or blog posts. The Q&A section alone — covering versioning, shared vs. forked skills, model drift, and skill proliferation — is ahead of where most organizations are today. Signal density is high for practitioners, moderate for casual users. COUNT: 10 facts, 0 assumptions, 10 demonstrations (script interpolation, eval framework, progressive disclosure, skill loading, repo roast live demo, Nano Banana pipeline, Remotion video, Codex review skill, Claude skill builder, sub-agent context isolation) SIGNAL DENSITY: 78
Signal points
- 1
Skills are already a cross-tool standard: Claude, Codex, Cursor, and Claude Desktop all load the same skill.md format — no rewrite needed
- 2
The `!`script`` interpolation pattern is the most underutilized power move: it turns speculative LLM behavior into deterministic, auditable workflows
- 3
WorkOS's public skills repo and eval framework exist today; this is not theoretical — they ship production skills with regression tests
- 4
The governance problem is real and unsolved: teams are already blocking shared skills due to fear of merge conflicts, model drift, and skill proliferation
- 5
Progressive disclosure (lazy-loading reference files) is the architectural answer to "how do I give my agent 10,000 lines of context without breaking it"
- 6
The speakers' personal workflows (6-8 months without manual coding) demonstrate that agent-first development is already viable for DX engineers
- 7
Skills calling skills + sub-agents with isolated context = emergent composability patterns that resemble microservices architecture for agent behavior
- 8
6. SOURCES MENTIONED
Key ideas
Constraints outperform prescriptions in skill design
Why: Speakers observed that overly verbose step-by-step instructions bloat context and get ignored; short constraint lists ("never be vague", "always cite line + commit") produce better adherence
Implication: The optimal skill is a minimal contract of boundaries, not a script — this scales better across model versions
Progressive disclosure solves the context window bloat problem
Why: Instead of loading all repo knowledge upfront (like claude.md), skills reference external files that load only when the LLM decides they're relevant
Implication: Teams can encode vast institutional knowledge without hitting token limits or degrading performance
Skills are the "DRY pattern for the agentic era"
Why: Developers repeat the same prompts across conversations, projects, and tools; skills codify once and run anywhere
Implication: This creates a new layer of abstraction — portable behavior units that decouple institutional knowledge from any single model or IDE
Governance of shared skills mirrors code dependency management
Why: Audience questions surfaced conflicts between centralized skill repos (review overhead, model drift) and individual forks; speakers responded with versioning, marketplace segregation (public/internal/personal), and plugin-style interfaces
Implication: Organizations need package-manager-like semantics for skills before team-wide adoption scales
Deterministic script injection bridges the gap between LLM non-determinism and reliable workflows
Why: Without scripts, the LLM speculates on what "latest commits" means; with `!`git log``, the exact data is injected verbatim
Implication: This pattern enables audit trails, reproducible reports, and compliance-sensitive agent workflows
Skills should be evaluated against baseline performance, not just "does it work"
Why: WorkOS runs A/B tests (with/without skill) and grades on a rubric; skills must improve performance measurably and maintain 80-90% reliability
Implication: Skills require CI/CD-like rigor; they are production artifacts, not ad-hoc prompts
Key facts
Skills are supported by Claude, Codex, Cursor, and Claude Desktop
HIGHEvidence: codec supports them, cloud supports them, cursor supports them, uh the uh desktop apps like like cloud desktop supports it
A skill is typically a folder containing skill.md with YAML frontmatter (name, description) plus optional scripts, references, and images
HIGHEvidence: they're more like a folder with a skill.mmd file in them... they can have references... scripts... images
The description field in frontmatter is used by the LLM at runtime for routing, not for human readability
HIGHEvidence: this description is is incredibly powerful and loaded. This is what the LLM is going to use at runtime to essentially do routing
Script interpolation using `!`command`` syntax allows deterministic data injection into the LLM context
HIGHEvidence: if you use the bang and then back tick... Claude will do like an interpolation of that... it'll just replace this with a list of the stale to-dos
WorkOS maintains a public GitHub repo (workos/skills) and an internal skills marketplace
HIGHEvidence: GitHub work OS skills... we also have uh some like internal skills... there's an O specialist, there's a DX specialist
WorkOS has a formal evaluation framework for public skills that benchmarks performance with/without the skill
HIGHEvidence: we do ship uh... a whole eval framework... doing several runs where it will load claude without the skill and ask it to do a task and then load it with a skill
Both speakers report not writing code manually for 6-8+ months, working entirely through agents
HIGHEvidence: when is the last time you wrote a line of code by yourself?... probably six or eight months now" / "Same
Show 3 more facts
Vercel's `npx skills` tool symlinks skills into multiple directories for cross-tool loading
HIGHEvidence: if you've ever used like the MPX skills, uh, tool from Verscell, that is just kind of sim linking them all into all of these different directories
The "repo roast" workshop skill uses progressive disclosure to load reference files only when needed
HIGHEvidence: if you're doing like a scoring... load the scoring rubric... If we're not doing scoring, you don't have to load that
WorkOS's CLI uses the Claude Agent SDK and composes multiple skills to power 15 framework integrations
HIGHEvidence: one CLI uses this same pattern to power 15 framework integrations — each one a skill composed with others, wired into an agent that installs and configures auth
Quotes
“It's almost like carrying, if you will, the DRY pattern into the agentic era in a way." — Nick Nisi”
“Descriptions are routing rules, right? They're they're less for us and they're more for the AI to determine when to use it." — Zach Proser”
“Without scripts, the AI is just speculating on what you mean when you say go get the latest commits." — Zach Proser”
“It can still get it... it can still decide not to follow things that you have. I've definitely had cases where I'm like, 'Do this, this, and then this,' and it skip the step in the middle, and I say, why'd you skip that? And it's like, 'Oh, yeah, you told me to do it. I I didn't feel like it.'" — Nick Nisi”
“Wait a week while working on it and then go back and ask Claude analyze my week's worth of work and then what are the skills I should split out of that based on this." — Nick Nisi”
“I just say hey now it's on v3 go update it... one user prompt of child running through field, nothing exists, and then 30 seconds later, you have a video of it running through." — Zach Proser on Nano Banana skill pipeline”