🧱 argil.io
Building Your AI Engineering Team

Building Your AI Engineering Team

25 min read
Last updated March 25, 2026

Ask questions about this article

Since this article is quite long and comprehensive, I've generated an LLM-optimized version to paste into your AI chat of choice.

Get LLM Version

This is a hands-on guide to building a full-stack AI engineering team for software development. Not a team of human engineers augmented by AI, but a team where AI agents are the primary builders, orchestrated and reviewed by you.

This playbook is for developers and technical founders who want to scale their output dramatically by running multiple AI coding agents in parallel. It's designed for those who are already building production applications and want to level up their workflow from "one person with AI" to "one person orchestrating an AI team."

This guide was written based on:

  • Real production experience across multiple applications
  • Hundreds of hours working with Claude Code and Conductor
  • Battle-tested workflows for Linear, GitHub, and AI orchestration

You don't need to implement every recommendation here from day one. Start simple, and refer back to this guide as your needs grow.

Use these links below to easily jump back to each section along your journey:

🧠 Part 1: The Mental Model

πŸ”§ Part 2: The Tool Stack

βš™οΈ Part 3: Claude Code Setup

🎭 Part 4: Conductor Orchestration

πŸ“‹ Part 5: Linear Workflow

πŸ“ Part 6: Specification with SpecWright

πŸ‘οΈ Part 7: Feedback with Pointa

πŸ”„ Part 8: The Five Workflows

πŸ› Part 9: Debugging

πŸ‘€ Part 10: Code Reviews

πŸ“¦ Part 11: Version Control & Git

πŸ—οΈ Part 12: CI/CD Pipeline

πŸ” Part 13: Audits

πŸ€– Part 14: Automation & Agents

🧠 Part 1: The Mental Model

When you work on a codebase, there are five main workflows you go through:

The Five Workflows

  1. Build New Features - Something complex that you have to define properly. A new capability, a new user flow, a new integration.

  2. Improvements - Little things here and there, usually in batches. Could be UI tweaks, UX polish, small logic fixes, or detail work. Not bugsβ€”just making things better.

  3. Bug Reports - Something is broken. It's not working as expected. You need to capture it, understand it, and fix it.

  4. Review AI Work - The AI completed a feature, improvement, or bug fix. Now you need to verify it works correctly and matches your intent.

  5. Audits - General review of your codebase across four dimensions:

    • Performance - How fast is the app?
    • Security - Is it safe from attacks?
    • Reliability - Is it tested and stable?
    • Cost - What are the server and API costs, especially for AI products?

What Makes a Great Workflow?

For any of these workflows, you want the best output possible, at the lowest price, as fast as possible. But how do we know a workflow is working well? Three dimensions:

  1. Human Efficiency - The workflow is effective and efficient for the human. It doesn't require doing things over and over. It's quick and intuitive.

  2. AI Context Quality - The workflow provides the AI with the right amount of context to do its job well. Not too little (guessing), not too much (context rot).

  3. History Generation - As work happens (whether human or AI), the workflow generates a record of what was done. This history is critical for going back to, for auditing, for the AI to reference later.

These three components define whether a workflow is actually working.

πŸ”§ Part 2: The Tool Stack

To build an AI engineering team, you need tools for different purposes. Here's the stack I use:

PurposeToolWhy
Issue TrackingLinearTrack work, manage state, assign to humans or AI
OrchestrationConductorRun multiple AI agents in parallel, manage git branches
Model ProviderClaude Code, Codex CLI, Gemini CLIAI coding agents β€” each model brings different strengths
Code RepositoryGitHubVersion control, PRs, collaboration
SpecificationSpecWrightDefine features before building them
Feedback & BugsPointaAnnotate UI, report bugs with full context

These are my choices. You can use whatever alternatives you preferβ€”Jira instead of Linear, Cursor instead of Claude Code, GitLab instead of GitHub, and so on.

But here's the critical point: you need a tool for each of these six purposes. Each one is solving a specific problem in the AI engineering workflow:

  1. Issue Tracking - Without it, you can't assign work to AI agents or track what state things are in. You'll lose track of what's done, what's in progress, and what needs review.

  2. Orchestration - Without it, you can only run one AI agent at a time. Orchestration lets you parallelize work across multiple agents, each on their own branch, without stepping on each other.

  3. Model Provider - This is your AI coding agent. The one that actually writes code. You don't have to pick just one. Claude Code, Codex CLI, and Gemini CLI each bring different strengths to the table. Claude is excellent at following complex instructions and maintaining architectural coherence across large codebases. Codex brings OpenAI's reasoning to the mix and can surprise you on certain types of problems. Gemini has a massive context window that makes it strong for tasks requiring broad codebase awareness. In practice, you'll develop a feel for which model handles which type of task best. The important thing is that your orchestration layer (Conductor) is model-agnostic, so you can route different work to different agents.

  4. Code Repository - Version control is non-negotiable. Every change needs to be tracked, reviewable, and recoverable.

  5. Specification - Before the AI builds, you need to define what to build. Vague prompts produce vague results. A specification tool forces clarity before code.

  6. Feedback & Bugs - You need a way to give the AI feedback on what it built. Visual annotations, bug reports with full context, improvement suggestions. This closes the loop.

Skip any of these six, and you'll hit a wall. The workflow breaks down. The specific tools matter less than having all six covered.

βš™οΈ Part 3: Claude Code Setup

Claude Code is your primary AI coding agent. Getting the setup right is critical for productivity.

CLAUDE.md: The Project Memory

The CLAUDE.md file is the most important file in your project when working with Claude Code. It's the persistent memory, the project bible that the agent reads at the start of every conversation. Think of it as the onboarding document you'd give a new engineer on their first day.

A great CLAUDE.md should contain:

1. Project overview - What the project is, what tech stack it uses, how it's structured. One paragraph, not a novel.

2. File structure - A brief map of where things live. The agent needs to know that components are in components/, data files in lib/, pages in app/. Don't list every file, just the patterns.

3. Naming conventions - kebab-case for files, camelCase for variables, PascalCase for components. Whatever your project uses, spell it out. LLMs default to whatever was most common in their training data, which may not match your project.

4. Styling conventions - If you use Tailwind, show common class patterns. If you have a design system, reference it. Include the small things: "use rounded-sm not rounded", "cards use bg-card border border-border". These details prevent a thousand tiny inconsistencies.

5. Architecture decisions - Document the "why" behind non-obvious choices. "We use Server Components by default and only add 'use client' when interactivity is needed." "We use Supabase for auth because X." Without this, the LLM will make different architectural decisions every session.

6. Common patterns with examples - Show a code snippet of how you create a new page, a new component, a new API route. The LLM will mimic the pattern. If you show it clean code, it writes clean code. If your examples are messy, everything it writes will be messy.

7. Things NOT to do - This is just as important as what to do. "Don't add comments unless the logic is genuinely non-obvious." "Don't create new utility files for one-time operations." "Don't add error handling for impossible scenarios." These guardrails prevent the LLM from over-engineering.

Converting Cursor Rules to CLAUDE.md

If you're coming from Cursor, you probably have a .cursorrules file. The good news: most of it translates directly. The main difference is that CLAUDE.md is markdown (more readable, more structured) while .cursorrules tends to be a flat list of instructions.

Take your existing rules, organize them under clear headings, and add the project-specific context that Cursor rules typically lack (file structure, architecture decisions, code examples). The result is a richer, more useful document.

Auto-Updating CLAUDE.md

One question that comes up a lot: should the AI update its own CLAUDE.md? Yes, but with guardrails.

Claude Code has a built-in memory system that can save learnings across conversations. But the CLAUDE.md itself should be treated as a curated document, not an append-only log. Periodically review it. Remove outdated information. Keep it focused.

A good practice: after a significant feature or refactor, ask Claude Code to propose updates to CLAUDE.md based on what changed. Review the proposal, keep what's useful, discard the rest. This way the document evolves with the project without becoming bloated.

Claude Code Settings That Matter

A few configuration choices that make a real difference:

  • Model selection - Use Opus for complex architectural work, Sonnet for straightforward implementations and quick iterations. The cost difference is significant, so match the model to the task complexity.
  • Permissions - Set up auto-allow for read operations and common tools. Every permission prompt you have to approve manually is a context switch that breaks your flow.
  • Custom slash commands - Create /commands for repetitive tasks. A /commit command that follows your commit message format. A /pr command that creates PRs with your template. These save seconds each time, which adds up to hours over weeks.
  • MCP servers - Connect the tools that give Claude Code superpowers. Linear for reading issues, Pointa for reading bug reports, browser automation for visual verification. The more context the agent can access without you copying and pasting, the better the output.

🎭 Part 4: Conductor Orchestration

Conductor is where you go from "one AI assistant" to "an AI engineering team." It lets you run multiple Claude Code agents in parallel, each on their own git branch, each working on a different task.

Why Orchestration Matters

Without orchestration, you're bottlenecked. You give the AI a task, wait for it to finish, review it, then give it the next task. That's sequential. It works, but it doesn't scale.

With Conductor, you can have five agents working simultaneously: one building a new feature, another fixing a bug, a third doing UI improvements, a fourth writing tests, and a fifth running an audit. Each on its own branch, each unaware of the others. You review work as it comes in, not as a single queue.

This is the difference between having one junior developer and having a team. The parallel throughput is what makes the "AI engineering team" mental model work in practice.

Key Concepts

Workspaces - Each workspace is an isolated environment. It gets its own branch, its own Claude Code instance, its own conversation history. When you open a workspace, you're opening a fresh desk for a specific task. Give it a clear objective and let it work.

Branch management - Conductor handles git branching automatically. Each workspace creates a branch, commits work to it, and when you're done, you create a PR. No branch naming debates, no merge conflicts between agents because they're working on separate branches.

Scripts - Conductor supports scripts that run before or after agent tasks. The most useful ones:

  • Build validation - Run npm run build after the agent finishes to catch compilation errors before you even look at the code.
  • Lint checks - Run your linter automatically so the agent's code matches your style.
  • Test execution - Run your test suite to catch regressions immediately.

For a Next.js + Vercel setup, a validation script that runs next build is almost mandatory. It catches TypeScript errors, import issues, and SSR problems that the agent might not notice from its terminal alone.

The Conductor Workflow

Here's how a typical Conductor session looks:

  1. Open Linear - Look at your backlog. Pick 3-5 tasks that can be done in parallel.
  2. Create workspaces - One workspace per task. Give each a clear objective based on the Linear issue.
  3. Let them run - The agents work simultaneously. You can monitor progress, but don't micromanage.
  4. Review as they finish - When an agent completes, review the diff, test the changes, merge or send back for revisions.
  5. Iterate - Some tasks need a follow-up. Reopen the workspace, give feedback, let it iterate.

The key insight: your job shifts from "writing code" to "defining work and reviewing results." You become the engineering manager, not the engineer.

Making Sure Work Gets Committed

A common concern: what if the agent does work but doesn't commit? Conductor handles this by design. Each workspace operates on its own branch, and the conversation with the agent is persistent. But to be safe, you can:

  • Include "commit your work when you're done" in your initial prompt
  • Use Conductor's scripts to auto-commit on completion
  • Check the workspace diff before closing to make sure nothing is lost

Conversation History

Conductor preserves the full conversation history within each workspace. If you close a workspace and reopen it later, the context is there. This is important for iterative tasks where you need multiple rounds of feedback.

Claude Code also has its own conversation history per project. Between the two, you have a complete audit trail of every decision, every prompt, and every change made by every agent.

πŸ“‹ Part 5: Linear Workflow

Linear is your command center. Every piece of work, whether done by you or by AI, should be tracked as a Linear issue. This isn't bureaucracy, it's how you maintain sanity when running multiple agents in parallel.

Issue States for AI Work

The default Linear states work well, but here's how I think about them in an AI engineering context:

StateWhat it means
BacklogDefined but not prioritized. Could be done someday.
TodoPrioritized and ready to be picked up. Has enough context for an agent to start.
In ProgressAn agent (or you) is actively working on it. A Conductor workspace is open.
In ReviewWork is done, PR is open, waiting for your review.
DoneMerged and deployed.
CancelledWon't be done. Documented why.

The critical transition is Todo β†’ In Progress. Before an issue can move to In Progress, it needs to have enough context for the AI to work autonomously. That means:

  • A clear description of what needs to be done
  • Links to relevant files or existing code
  • Any design references or screenshots
  • Explicit constraints (what NOT to do)
  • Acceptance criteria for non-obvious edge cases

If an issue doesn't have this, it's not ready for an agent. It stays in Backlog until you flesh it out.

Labels for AI Work

Labels help you filter and batch work. Here are the ones I use:

  • Feature - New functionality
  • Bug - Something broken
  • Improvement - Polish, UX tweaks, small enhancements
  • Audit - Performance, security, reliability, or cost review
  • Infra - CI/CD, deployment, developer tooling
  • Docs - Documentation updates

The Jarvis Pattern

Here's an automation goal worth working toward: a dedicated AI user in Linear (I call mine "Jarvis") that can be assigned issues. The workflow:

  1. You create an issue in Linear with full context
  2. You assign it to Jarvis
  3. An automation picks up the assignment, opens a Conductor workspace, and starts the agent
  4. The agent works on the issue, creates a PR, and moves the issue to "In Review"
  5. You review the PR, merge or send back

This isn't fully automated yet for most teams, but the pieces are there. Linear has an API, Conductor has an API, and the glue between them is a simple script. The point is: design your workflow as if this automation exists, even if you're doing the steps manually today. When the automation comes, your process won't need to change.

Batching Work

Not every issue needs its own Conductor workspace. Improvements especially should be batched. If you have ten small UI tweaks, create one workspace and give the agent the full batch. It's more efficient than creating ten separate workspaces for ten two-minute changes.

The rule of thumb: if a task takes less than 15 minutes of agent time, batch it with similar tasks. If it takes more, give it its own workspace.

πŸ“ Part 6: Specification with SpecWright

If you've read Four Rules of Vibe Coding, you know Rule 1: Define Before You Build. SpecWright is the tool that makes this practical at scale.

Why Specifications Matter for AI Teams

When you have one AI agent, you can get away with being vague. You're right there, watching it work, course-correcting in real time. When you have five agents running in parallel, each on a different task, vague instructions multiply into chaos. Agent 1 assumes one architecture, Agent 2 assumes another, and now you have a merge conflict that's really an architecture conflict.

Specifications solve this by forcing clarity before code. Each agent gets a clear, unambiguous definition of what to build, how it should behave, and what constraints to respect.

The SpecWright Workflow

SpecWright guides you through building a specification step by step:

  1. Define the feature - What is it? What problem does it solve? Keep it to one or two sentences.
  2. Questioning process - SpecWright prompts you with questions about edge cases, user scenarios, and constraints. You answer them, and it builds the spec from your answers. This is the same questioning process from Rule 1, but structured and guided.
  3. Job stories - Define the behaviors. "When X, I want Y, so I can Z." No personas, no ceremony. Just what the feature does.
  4. No-gos and rabbit holes - Explicitly state what's out of scope and what pitfalls to avoid. This is where you prevent over-engineering.
  5. Design breadboard - Map out the screens, interactions, and navigation flows using the breadboarding technique.
  6. Technical choices - Document the key technical decisions: which libraries, which patterns, which trade-offs.
  7. Preview and export - Review the full specification, then export it in a format the AI can consume.

Linking SpecWright with Linear

The ideal flow is:

  1. Create a feature in SpecWright β†’ generates a complete specification
  2. Create a Linear issue for the feature β†’ paste the spec link or embed the spec content
  3. Open a Conductor workspace β†’ the agent reads the Linear issue, which contains the spec
  4. The agent builds against the spec, not against a vague prompt

This chain (SpecWright β†’ Linear β†’ Conductor β†’ Claude Code) is the backbone of the Build workflow. Each tool adds a layer of context and clarity. By the time Claude Code starts writing code, it has everything it needs to work autonomously.

For smaller improvements that don't need a full spec, skip SpecWright and write the context directly in the Linear issue. The spec workflow is for features, not for "change the button color."

πŸ‘οΈ Part 7: Feedback with Pointa

Pointa closes the feedback loop. After the AI builds something, you need a way to tell it what's right, what's wrong, and what needs to change. Pointa does this visually.

How Pointa Works

Pointa is a Chrome extension that lets you annotate your running application directly in the browser. Click on any element, draw a box around a problem area, and add a note. Pointa captures:

  • The annotation - Your visual markup and comment
  • Element context - The DOM structure, CSS styles, and component hierarchy of what you clicked on
  • Console logs - Any errors or warnings in the console at the time of annotation
  • Network requests - Recent API calls and their responses
  • Screenshot - A visual snapshot of the current state

All of this context is packaged into something the AI can understand. No more "the button looks weird" with zero context. Now it's "the button looks weird" plus the exact element, its styles, its parent components, and any errors happening in the background.

Pointa β†’ Linear β†’ Conductor

Here's the feedback loop in practice:

For bugs:

  1. You're testing a feature in the browser
  2. Something is broken. You annotate it with Pointa
  3. Pointa creates a Linear issue automatically with all the context (screenshot, element, logs, network requests)
  4. You open a Conductor workspace for the bug fix
  5. The agent reads the Linear issue, sees the Pointa context, and knows exactly what to fix

For improvements:

  1. You're reviewing AI work in the browser
  2. Something works but could be better. You annotate it with Pointa
  3. You batch multiple annotations into a single Linear issue (or create one per annotation for bigger changes)
  4. The agent picks up the improvements with full visual context

For feature feedback: If you're reviewing a feature that an agent just built and it needs changes, you can annotate directly and push the feedback back to the same Linear issue. The agent sees both the original spec and your visual feedback.

MCP Integration

Pointa has an MCP (Model Context Protocol) server that connects directly to Claude Code. This means the agent can read Pointa annotations without you copy-pasting anything. When you reference a Pointa annotation in your prompt, the agent fetches the full context automatically: the screenshot, the element details, the console logs, everything.

This is the kind of integration that saves minutes per interaction. Over hundreds of interactions, that's hours of your life back.

πŸ”„ Part 8: The Five Workflows

Now let's put it all together. Here's how each of the five workflows works end-to-end with your AI engineering team.

Workflow 1: Building New Features

This is the most involved workflow because features require the most context and carry the most risk.

Step 1: Specify - Use SpecWright to define the feature. Go through the questioning process, write job stories, set no-gos, breadboard the design, and lock down technical choices. This typically takes 20-40 minutes of focused thinking.

Step 2: Create the issue - Create a Linear issue with the spec attached. Add labels, set priority, and write a brief summary at the top. The summary is for you. The spec is for the agent.

Step 3: Break it down - If the feature is large, break it into sub-issues in Linear. Each sub-issue should be independently implementable: data model first, then service layer, then backend, then frontend. This is Rule 2 (Divide and Conquer) applied at the project management level.

Step 4: Open workspaces - Create Conductor workspaces for each sub-issue. If the sub-issues are truly independent (e.g., backend doesn't need frontend to exist yet), run them in parallel. If they're sequential, run them one at a time.

Step 5: Review and iterate - As each workspace completes, review the diff, test the changes locally, and either approve or send back with feedback. Use Pointa to annotate visual issues.

Step 6: Merge and test - Merge PRs in dependency order. Run the full test suite after each merge. Test the complete feature end-to-end once everything is merged.

Workflow 2: Improvements

Improvements are the bread and butter of daily work. Small tweaks, UI polish, copy changes, UX refinements.

Step 1: Batch - Collect improvements as you notice them. Use Pointa to annotate visual issues on the fly. Don't create a Conductor workspace for each one.

Step 2: Group and create - Once you have a batch (5-15 items), group them logically. All UI tweaks together, all copy changes together, all logic fixes together. Create one Linear issue per group.

Step 3: Single workspace - Open one Conductor workspace per group. Give the agent the full list. It's more efficient to handle "fix these 8 things" in one session than to context-switch across 8 workspaces.

Step 4: Quick review - Improvements are low-risk. Scan the diff, do a quick visual check, merge. Don't over-review polish work.

Workflow 3: Bug Reports

Bugs need fast turnaround and precise context.

Step 1: Capture - When you find a bug, use Pointa to annotate it. Include:

  • What you expected to happen
  • What actually happened
  • Steps to reproduce
  • The Pointa annotation with console logs and network requests

Step 2: Create issue - Pointa can create the Linear issue directly. If not, create it manually with the Pointa context attached.

Step 3: Fix - Open a Conductor workspace. Give the agent the Linear issue. The Pointa context usually gives the agent enough information to find and fix the bug without extensive exploration.

Step 4: Verify - Test the fix locally. If the bug was in a critical path (payments, auth, data integrity), write a regression test before merging.

Workflow 4: Reviewing AI Work

This is the workflow you'll use most often. Every time an agent completes work, you review it.

Step 1: Read the diff - Start with the code. What files changed? What was added, modified, deleted? Does the structure make sense?

Step 2: Check for red flags - Look for common AI mistakes:

  • Files that shouldn't have been modified
  • New dependencies that weren't discussed
  • Overly complex solutions for simple problems
  • Missing error handling in critical paths
  • Hardcoded values that should be configurable

Step 3: Test locally - Pull the branch, run the app, test the feature. Click through the happy path. Try edge cases. Break things intentionally.

Step 4: Visual check - For UI changes, compare against the spec or design. Use Pointa to annotate anything that doesn't look right.

Step 5: Approve or iterate - If it's good, merge. If it needs changes, annotate with Pointa, update the Linear issue, and send the agent back to work. Most features take 2-3 iterations to get right. That's normal.

Workflow 5: Audits

Audits are proactive. You're not reacting to a bug or building a feature. You're systematically reviewing your codebase.

Step 1: Choose a dimension - Pick one: performance, security, reliability, or cost. Don't try to audit everything at once.

Step 2: Create the audit issue - In Linear, create an issue with the audit scope. Be specific: "Audit all API routes for proper authentication checks" or "Audit frontend bundle size and identify optimization opportunities."

Step 3: Run the audit - Open a Conductor workspace. Give the agent the audit prompt. Let it scan the codebase and report findings.

Step 4: Triage findings - The agent will likely find more issues than you can fix immediately. Triage them. Critical security issues get fixed now. Performance optimizations go to the backlog. Nice-to-haves get labeled and tracked.

Step 5: Fix and verify - For each fix, create a separate Linear issue and Conductor workspace. Audit fixes should be small, targeted, and independently verifiable.

Audit cadence: Run audits regularly. Security audits monthly at minimum. Performance audits before and after major features. Cost audits whenever you're scaling up or your bill surprises you.

πŸ› Part 9: Debugging

Debugging is where most time gets wasted. The pattern is always the same: something doesn't work, you tell the AI "fix it," the AI guesses, the fix doesn't work, you go back and forth five times, and 30 minutes later you're still stuck.

The fix for bad debugging is better context. Always.

The Debugging Checklist

Before you ask the AI to fix anything, gather these:

  1. What you expected - Be specific. "The button should open a modal" not "it doesn't work."
  2. What actually happened - Exact behavior. "Clicking the button does nothing. No error in console."
  3. Console logs - Copy the relevant errors and warnings. If there are none, that's information too.
  4. Network requests - If the bug involves API calls, check the Network tab. Did the request fire? What was the response?
  5. Steps to reproduce - Exact sequence of actions from a clean state.

Pointa + Linear for Faster Debugging

This is where Pointa shines. Instead of manually gathering all that context:

  1. Reproduce the bug in the browser
  2. Click the Pointa extension
  3. Annotate the problem area
  4. Pointa captures everything: screenshot, element context, console logs, network requests
  5. Send it to Linear as a bug report
  6. Give it to a Conductor agent

The agent now has more context than you could reasonably type out in a prompt. It knows the exact element, the styles applied, the component hierarchy, the errors in the console, and the failing network requests. That's usually enough to go straight to the root cause.

When Debugging Gets Stuck

If the AI can't find the bug after 2-3 attempts, don't keep going in circles. Instead:

  1. Add more logging - Ask the agent to add strategic console.log statements around the suspected area. Run the app, reproduce the bug, and feed the logs back.
  2. Narrow the scope - Instead of "fix this bug," try "what are all the possible reasons this component wouldn't render?" Let the AI think before it acts.
  3. Check recent changes - Use git log and git diff to see what changed recently. Bugs are usually caused by the most recent changes.
  4. Reproduce in isolation - Can you reproduce the bug in a simpler context? Strip away complexity until you find the minimal reproduction case.

The goal is to give the AI enough signal to find the needle in the haystack. More context, fewer guesses.

πŸ‘€ Part 10: Code Reviews

Reviewing AI-generated code is different from reviewing human code. Humans have consistent patterns, style preferences, and reasoning you can follow. AI output varies, sometimes dramatically, between sessions.

What to Look For

Architecture alignment - Does the code follow the patterns established in your codebase? Check that new files are in the right directories, imports use the right aliases, and the overall structure matches your conventions. This is where a good CLAUDE.md pays dividends.

Scope creep - Did the AI do more than asked? New dependencies you didn't request? Extra files "for future use"? Features that weren't in the spec? LLMs love to add things. Your job is to trim.

Error handling - AI tends to go to extremes: either no error handling at all, or try-catch blocks around everything. Look for the middle ground: error handling where it matters (external API calls, user input, critical business logic) and trust in internal code where errors would indicate a real bug.

Security basics - SQL injection, XSS, exposed secrets, missing auth checks. AI doesn't intentionally write insecure code, but it also doesn't think about security the way an experienced human does. Scan for the obvious stuff.

Performance - N+1 queries, unnecessary re-renders, loading entire datasets when you only need a subset. AI optimizes for "does it work" not "does it perform well."

Readability - Could a human (or future AI) understand this code without the original prompt? If the logic requires explanation, it probably needs refactoring.

The Review Process

  1. Skim the diff - Get the big picture. What changed, what's new, what's deleted.
  2. Check file structure - Are new files in the right place? Were existing files modified unnecessarily?
  3. Read the logic - Walk through the main code paths. Does the flow make sense?
  4. Test it - Pull the branch, run the app, test the feature. Automated tests are necessary but not sufficient.
  5. Annotate issues - Use GitHub PR comments for code-level feedback. Use Pointa for visual issues. Be specific about what to change.

Don't Nitpick

This is important. AI-generated code won't match your personal style perfectly. That's okay. If it works, is readable, follows the project conventions, and passes tests, that's good enough. Don't spend 20 minutes reformatting code that's already functional. Save your review energy for things that matter: correctness, security, architecture.

πŸ“¦ Part 11: Version Control & Git

Git hygiene matters more with AI agents than with human developers. Humans understand context implicitly. AI agents work in isolation, and their code needs to be mergeable without surprises.

Branch Strategy

Keep it simple:

  • main - Production branch. Always deployable. Protected.
  • Feature branches - One per Conductor workspace. Named automatically by Conductor.
  • No develop branch. No staging branch (unless your deployment requires it). Fewer branches, fewer problems.

Merge Conflicts

Merge conflicts are the biggest practical headache with parallel AI agents. Two agents can modify the same file without knowing about each other. Here's how to minimize and handle them:

Prevention:

  • Assign tasks that touch different areas of the codebase to different agents
  • If two tasks might touch the same file, run them sequentially instead of in parallel
  • Keep shared files (like global styles or utility functions) out of feature scope when possible

When conflicts happen:

  1. Don't panic. Most merge conflicts are trivial (import order, adjacent lines).
  2. Pull the target branch into the feature branch: git merge main
  3. If the conflict is in generated files (like package-lock.json), delete the file and run npm install to regenerate it. This is almost always the right move for lock files.
  4. For code conflicts, review both changes and decide which to keep. If both are needed, merge manually.
  5. Never resolve conflicts blindly. Understand what both sides were trying to do.

The package-lock.json Problem

This deserves its own section because it comes up constantly. When two agents add different dependencies, package-lock.json conflicts are guaranteed. The fix:

  1. Accept the incoming changes to package.json (merge both dependency additions)
  2. Delete package-lock.json
  3. Run npm install to regenerate a clean lock file
  4. Commit the result

Add this to your CLAUDE.md so agents know not to include lock file changes in their diffs, or at minimum know how to handle the conflict.

Branch Cleanup

After a PR is merged, delete the branch. No exceptions. Stale branches accumulate fast when you're running multiple agents daily. GitHub can auto-delete branches on merge, turn that setting on.

Periodically run git fetch --prune to clean up remote tracking branches locally. Keep your git tree clean. A cluttered branch list is a cluttered mind.

Commit Messages

Set a commit message convention in your CLAUDE.md and enforce it. I use a simple format:

type: brief description

- detail 1
- detail 2

Where type is one of: feat, fix, improve, refactor, test, docs, infra.

The agent will follow whatever convention you set, but you have to set it. Without guidance, you'll get inconsistent messages that make git log useless.

πŸ—οΈ Part 12: CI/CD Pipeline

CI/CD is your safety net. When AI agents are writing code and creating PRs, you need automated checks that catch problems before they reach production.

Essential CI Checks

At minimum, every PR should run:

  1. TypeScript compilation - tsc --noEmit catches type errors that the agent might not see.
  2. Linting - ESLint catches style violations and common mistakes.
  3. Build - npm run build (or next build for Next.js) catches SSR errors, missing imports, and configuration issues.
  4. Tests - Run your test suite. If tests fail, the PR doesn't merge.

These four checks catch the vast majority of issues. Add more as needed, but don't start with a 20-minute CI pipeline that blocks every PR. Fast feedback is more important than comprehensive checks.

Deployment Pipeline

For a Vercel + Next.js setup:

  • Preview deployments - Vercel creates a preview URL for every PR automatically. Use this to visually test changes without pulling the branch locally. This is one of the best features of Vercel for AI-assisted development.
  • Production deployment - Merging to main triggers a production deploy. This should be automatic and fast.
  • Rollback - Know how to roll back a deploy. With Vercel, you can redeploy any previous deployment in seconds. Test this once so you know the drill before you need it under pressure.

What NOT to Automate (Yet)

Don't over-invest in CI early on. The following are nice to have but not essential when you're starting:

  • Visual regression testing (e.g., Chromatic) - useful at scale, overkill for small teams
  • Performance budgets - good idea eventually, but don't block PRs on it from day one
  • Dependency scanning - important for production apps, but can wait until you have a stable dependency set

Start with the four essential checks. Add more only when you've been burned by something they would have caught.

πŸ” Part 13: Audits

Audits are the proactive counterpart to bug fixes. Instead of waiting for something to break, you systematically review your codebase for weaknesses.

Performance Audit

Prompt the agent to review:

  • Frontend - Bundle size, unnecessary re-renders, image optimization, lazy loading, client vs. server components
  • Backend - Query efficiency (N+1 queries), caching opportunities, API response times, database indexing
  • Infrastructure - CDN configuration, edge functions, serverless cold starts

Ask for a ranked list of findings by impact. Fix the top 3-5 issues. Don't try to optimize everything.

Security Audit

Focus on the OWASP Top 10:

  • Injection - SQL injection, command injection, XSS
  • Authentication - Session management, password handling, token expiration
  • Authorization - Access control checks on every route, proper role verification
  • Data exposure - Sensitive data in responses, API keys in client code, overly permissive CORS
  • Configuration - Security headers, HTTPS enforcement, error message verbosity

For production apps handling user data or payments, consider a professional security audit in addition to AI-assisted reviews. AI catches the obvious stuff. Humans catch the creative attacks.

Reliability Audit

Check:

  • Error handling - Are errors caught and handled gracefully? Do users see helpful error messages?
  • Test coverage - Are critical paths tested? What happens if an external API goes down?
  • Logging - Can you diagnose issues from logs alone? Is there enough logging (but not too much)?
  • Monitoring - Are you alerted when things break? Or do you find out from users?

Cost Audit

Especially relevant for AI products:

  • Token usage - Are you using the right model for each task? Are prompts optimized?
  • Database - Are you on the right plan? Are there unused indexes or tables?
  • Infrastructure - Are you paying for resources you're not using?
  • Third-party APIs - Are there cheaper alternatives? Can you reduce API calls through caching?

Running Audits with Conductor

The process is the same for all audit types:

  1. Create a Linear issue with the audit scope
  2. Open a Conductor workspace
  3. Give the agent a focused audit prompt: "Audit all API routes in app/api/ for proper authentication and authorization. Report findings ranked by severity."
  4. Review the findings
  5. Create follow-up Linear issues for each actionable finding
  6. Fix in priority order

Don't try to audit everything at once. One dimension at a time, one area of the codebase at a time. Depth beats breadth.

πŸ€– Part 14: Automation & Agents

The final piece is automation. Once your workflows are solid and repeatable, you can start automating the repetitive parts.

Custom Agents in Claude Code

Claude Code supports custom slash commands that act as specialized agents. Create commands for tasks you do repeatedly:

  • /audit-security - Runs a security audit on the current codebase using a predefined checklist
  • /update-docs - Reviews recent changes and proposes updates to CLAUDE.md and README
  • /write-tests - Analyzes a file or feature and generates integration tests for critical paths
  • /review-pr - Reads the current branch diff and provides a structured code review

These aren't complex automations. They're just pre-written prompts that save you from typing the same thing over and over.

Triggered Automations

Some actions should happen automatically without you thinking about it:

On every commit:

  • Lint check
  • Type check
  • Build validation

On every PR:

  • Full test suite
  • Preview deployment
  • Automated code review (via CodeRabbit or Conductor's built-in review)

On merge to main:

  • Production deployment
  • Documentation check (did the CLAUDE.md or README need updating?)

Periodically:

  • Weekly security audit
  • Monthly cost review
  • Quarterly performance audit

The Jarvis Vision

The end goal is a system where Linear is the brain, Conductor is the muscle, and you're the decision-maker. The flow:

  1. You create a well-specified issue in Linear
  2. You assign it to Jarvis (your AI user)
  3. The system picks up the assignment, opens a Conductor workspace, feeds the issue context to Claude Code
  4. The agent works autonomously: reads the spec, writes code, runs tests, creates a PR
  5. The PR triggers automated checks and code review
  6. You get a notification: "PR ready for review"
  7. You review, approve, and merge

Today, steps 2-4 require manual intervention (you open the workspace and paste the context). But the architecture is designed so that when the automation layer is ready, the workflow doesn't change. You've already been working this way manually.

Skills vs. Slash Commands

A common question: should you use Claude Code Skills (available through the Claude app and API) or custom slash commands?

Slash commands are simpler and more practical for daily coding work. They're project-specific, version-controlled (they live in your repo), and execute within the current Claude Code session. Use them for task-level automation: running audits, generating tests, formatting commits.

Skills are more powerful but heavier. They're better suited for complex, multi-step workflows that span multiple tools or require specialized domain knowledge. Think of skills as the "senior engineer" and slash commands as "keyboard shortcuts."

Start with slash commands. Move to skills only when you need the additional capability.

πŸ“‹ Your AI Engineering Team Checklist

Here's a practical checklist to track your progress. You don't need everything on day one. Start with the basics and build up.

Foundation (Start Here)

  • Set up a comprehensive CLAUDE.md for your project
  • Install Conductor and create your first workspace
  • Connect Linear to your project workflow
  • Set up GitHub with branch protection on main

Tool Integration

  • Install Pointa and test the annotation β†’ Linear flow
  • Create your first specification in SpecWright
  • Connect MCP servers to Claude Code (Linear, Pointa, browser)
  • Set up Conductor validation scripts (build + lint)

Workflow Mastery

  • Complete one full Build workflow (spec β†’ implement β†’ review β†’ merge)
  • Run your first batch improvement session (5+ items in one workspace)
  • File and fix your first bug using the Pointa β†’ Linear β†’ Conductor flow
  • Complete your first codebase audit (pick any dimension)

CI/CD

  • Set up TypeScript compilation check on PRs
  • Set up lint check on PRs
  • Set up build check on PRs
  • Set up test execution on PRs
  • Enable Vercel preview deployments

Automation

  • Create 3+ custom slash commands for repetitive tasks
  • Set up automated code review on PRs
  • Design the Jarvis workflow (even if manual for now)
  • Set up commit message conventions and enforce via CLAUDE.md

Git Hygiene

  • Enable auto-delete branches on merge
  • Add package-lock.json conflict resolution to CLAUDE.md
  • Establish branch naming conventions
  • Set up commit message format in CLAUDE.md

Ongoing

  • Monthly security audit
  • Quarterly performance audit
  • Regular CLAUDE.md review and updates
  • Review and refine workflows based on what's working

Written with ❀️ by a human (still)