When to license a managed multi‑agent platform — and when to roll your own stack.
What Counts as “Build” and “Buy”?
Path | Typical Stack | How You Pay |
Build / Self‑host | Open‑source frameworks (LangGraph OSS, CrewAI Core), your own vector DB, Kubernetes or ECS, direct LLM APIs | Cloud compute (e.g., EC2 g5.xlarge ≈ $1.01 hr) + model tokens (GPT‑4.1 input $2/1M, output $8/1M) + salaries |
Buy / SaaS Agent‑Ops | LangGraph Platform, CrewAI Cloud, Agent Engine (Vertex AI), Agents for Amazon Bedrock Flows | Usage metering ($0.001 node on LangGraph Plus) , per‑execution tiers (CrewAI from $99 mo, 1 000 exec) , node‑transition fees ($0.035 / 1 000 on Bedrock Flows) |
“Build” means you operate the graph, storage and guard‑rails yourself; “Buy” means the vendor hosts the control‑plane and often the data‑plane too.
Speed‑to‑Value vs Depth‑of‑Control
Decision Factor | Build Wins When… | Buy Wins When… | Notes |
Regulated data | You must keep all PII in‑house or within a specific VPC. | The vendor offers on‑prem/hybrid (e.g., LangGraph Enterprise) at parity cost. | EU‑AI‑Act fines up to €35 M / 7 % of revenue for prohibited practices. |
Time‑to‑launch | You already run Kubernetes, have DevOps & LLM skills. | You need a working MVP this quarter. LangGraph Cloud deploys in minutes. | |
Unit economics | Node volume is high (> 20 M/mo) so SaaS metering outgrows fixed GPU cluster. | Workloads are bursty (< 5 M nodes/mo) so pay‑as‑you‑go is cheaper than always‑on GPUs. | |
Talent availability | You can fund 2‑3 Agent Orchestrators ( ≈ $190‑250 k each). | You lack scarce agent‑ops talent; the platform supplies observability and fallback prompts. | |
Vendor lock‑in risk | Strategic importance of agent IP is high. | Vendor uses open file formats (Agent2Agent protocol, MCP). |
Quick TCO Calculator (12‑Month Horizon)
Cost Component | Self‑Host Example | SaaS Example |
Compute | 3× g5.xlarge nodes × 730 h ≈ $22 k/yr | Node fees: 5 M nodes × $0.001 = $5 k |
Stand‑by / Control Plane | Kubernetes + monitoring ≈ $3 k (EKS, Prometheus) | Stand‑by minutes: 60 k min × $0.0036 = $2.6 k |
LLM API | 120 M input + 30 M output tokens GPT‑4.1 ≈ $1 k | Same |
Storage / Vector DB | Pinecone serverless 50 GB ≈ $1.2 k | Often bundled |
Talent (2 FTE) | $420 k fully‑loaded | 0.5 FTE FinOps ($55 k) |
Compliance & Audit | Internal ISO + EU‑AI‑Act QMS ≈ €52 k ≈ $56 k | Vendor attestation usually included |
Total Year‑1 | ≈ $503 k | ≈ $64 k |
Change any line item and the equation flips. Use the blank template below to plug in your own numbers.
Blank template variables
AGENT_COUNT =
NODE_VOLUME_MONTH =
GPU_NODES =
TOKEN_IN =
TOKEN_OUT =
FTE_ORCH =
FTE_FINOPS =
AUDIT_COST =
Multiply by published rates to model your scenario.
Hidden Costs Often Missed
- Observability: Without tools like AgentOps you will build tracing dashboards from scratch.
- Guard‑rails & policy: SaaS bundles OpenAI/Bedrock guard‑rails; self‑host means building regex + moderation endpoints.
- Upgrades: Vendor pushes new features (memory retention, budget sentinels) automatically; DIY requires sprint time.
- Opportunity cost: Business Insider notes firms now spin up internal tools overnight with AI coding aids, tilting the build‑vs‑buy equation.
Decision Framework
Rule of Thumb:
• Buy if speed‑to‑market or compliance assurance dominates.
• Build if agent workloads are mission‑critical, high‑volume, and you can amortize talent.
Five key questions for the steering committee
1. Will data residency or model‑training restrictions block SaaS?
2. Do we expect > 300 % growth in node volume next year?
3. Can we hire/retain at least one senior Agent Orchestrator?
4. Is the agent graph competitive IP we must own?
5. What is our risk tolerance for EU‑AI‑Act penalties?
A “yes” on #1 or #4 leans Build; a “no” on #2 – #3 leans Buy.
Hybrid Compromise: “Bring‑Your‑Own‑Data Plane”
Many platforms now offer hybrid mode: SaaS control plane plus self‑hosted execution in your VPC (LangGraph Enterprise, Bedrock PrivateLink). You get vendor upgrades and logging without letting raw PII leave your cloud boundary. Costs sit between the extremes but often satisfy security teams.
Bottom Line
Prompt engineering became a commodity; agent orchestration is the new leverage point. Whether you licence LangGraph/CrewAI Cloud or self‑host an open stack comes down to volumes, velocity, and governance. Run the TCO numbers, weigh compliance exposure, and decide where you want to spend scarce agent‑ops talent: writing business logic, or patching Kubernetes YAML.