T he phrase agentic AI covers too much ground. It is a research agenda, a product category, a budget line, and (most of the time in vendor decks) last year's chatbot with a new deck cover. Gartner put numbers on the noise in June 2025: roughly 130 of the several thousand vendors claiming agentic capability ship anything that qualifies, and over 40% of agentic AI projects are forecast to be cancelled by the end of 2027 [1]. That is the backdrop for everything below.
This pillar walks the category from the definition down to the runtimes. The useful distinction is not agent versus not-agent. It is between systems that can rewrite their own plan mid-execution and systems that cannot. That line predicts which workloads get automated, which earn their seven-figure platform bills, and which keep a human in the loop for reasons no amount of tool-use fine-tuning will fix [2].
What qualifies as an agent
Strip the marketing. An agent is a loop: a system that observes state, decides the next action with some freedom of choice, and can revise that decision when new information arrives. The last clause is where most agentic products fail the definition. They replan only when a human clicks a button, which makes them workflow engines with a nicer frontend.
Russell and Norvig formalised the taxonomy a quarter-century ago: simple reflex, model-based, goal-based, utility-based, learning. The terms survive because they still map onto shipping code. Most enterprise deployments in April 2026 are goal-based agents with a utility function bolted on: a planning model that scores candidate actions, a tool-use harness that can retry or substitute a tool, and a memory store that lets the agent carry context across turns without asking the user to paste it back.
The tell that matters in a procurement meeting is simpler than the taxonomy suggests. Ask the vendor what happens when step three of a five-step plan fails. If the answer is the user gets an error, it is not an agent. If the answer is the planner re-scores, picks an alternative tool, and continues, it is. Write that into the evaluation rubric before the demo starts.
"The question is not whether it is an agent. The question is whether it can change its mind without asking permission."
The runtime field consolidated faster than expected
As of April 2026, five runtimes cover most serious enterprise procurement. LangGraph is the LangChain team's stateful graph runtime; its 1.0 release landed in October 2025 and the LangGraph Platform is generally available for long-running deployments, with nearly 400 companies running agents on it through beta [3]. AWS Strands Agents shipped 1.0 on July 15, 2025 with four orchestration patterns (Swarms, Graphs, Agents-as-Tools, Handoffs) and model-agnostic support across Bedrock, Anthropic, Ollama, Meta, and LiteLLM providers [4].
Microsoft Agent Framework entered public preview on October 1, 2025 and shipped 1.0 GA on April 3, 2026, explicitly the convergence of AutoGen and Semantic Kernel into one SDK, with A2A and MCP interop baked in [5]. OpenAI's Agents SDK launched in March 2025 alongside the Responses API, which the company now recommends over Chat Completions for new work; the older Assistants API is deprecated with a sunset date of August 26, 2026 [8]. CrewAI is the independent holdout; its AMP suite and Flows architecture power a claimed 1.4 billion agentic automations at customers including PwC, IBM, Capgemini, and NVIDIA [9].
Worth noticing: the hyperscalers converged on open-source SDKs with commercial runtimes, not closed platforms. That is a different industry structure than we had in early 2025, and it changes the exit-cost calculation on every procurement.
| Runtime | Backing org | Milestone | Strength |
|---|---|---|---|
| LangGraph | LangChain | 1.0 Oct 2025; Platform GA | Stateful graphs, durable execution, strong OSS community |
| AWS Strands Agents | AWS Open Source | 1.0 GA Jul 15, 2025 | Model-agnostic, native Bedrock AgentCore integration |
| Microsoft Agent Framework | Microsoft | Preview Oct 1, 2025; 1.0 GA Apr 3, 2026 | AutoGen + Semantic Kernel, .NET + Python, MCP/A2A |
| OpenAI Agents SDK | OpenAI | Mar 2025; Responses API default | Built-in web_search, file_search, computer_use, MCP |
| CrewAI + AMP | CrewAI Inc. | Enterprise-GA; on-prem + cloud | Role-based multi-agent, explicit enterprise tenancy |
Where agents are earning their keep
The honest list of production-grade agentic workloads is shorter than the conference circuit suggests, and long enough to justify a category. What they share: narrow schemas, structured audit trails, and an outcome metric that closes the loop without a human reviewer in the inner path.
Salesforce's own year-one Agentforce deployment is the clearest public scorecard. The service agent handled more than 1.5 million support requests (the majority resolved without humans) while the SDR agent worked over 43,000 leads and generated $1.7 million in new sales pipeline [6]. Across the broader Agentforce base (18,000+ customers in 124 countries), Salesforce reports over $100 million in annualised cost savings and a 34% productivity lift. Those are vendor figures, but the per-workload specificity of the breakdown is the part to read.
The counter-example every 2026 steering committee cites is Klarna. The company claimed its AI assistant had done the work of 700 customer-service agents in 2024; by mid-2025 it began rehiring humans after edge cases, emotional interactions, and multi-step resolutions dragged satisfaction scores down [10]. The lesson is not that agents fail. It is that agents fail on exactly the workloads where complexity is uncorrelated with volume. Pick the wrong workload and scale becomes the enemy.
| Workload | Why it works | Guardrail in place |
|---|---|---|
| IT ticket triage and routing | Narrow schema, strong priors | Human approval on tier 2+ escalation |
| Invoice reconciliation | Structured inputs, complete audit trail | Threshold-gated autonomous close |
| Compliance document review | Repetitive, low stakes per item | Spot-check sampling at 7-10% |
| Sales-lead enrichment & routing | Tolerant of imperfect decisions | Outcome metric closes the loop |
| L1 internal HR/IT support | Bounded intents, logged ground truth | Escalation on confidence drop |
Five tells of agent washing
Gartner's Anushree Verma named the phenomenon in June 2025. Agent washing is workflow automation, RPA, or chat UX rebranded as agentic capability, and it is the reason Gartner forecasts 40%+ of agentic AI projects to be cancelled by the end of 2027 [1]. The good news: the tells are observable in a forty-minute demo if you know what to look for.
We screen every vendor against the five below. Four or more and the product is a rules engine with a language model bolted on. A review team should agree on scoring before the demo; drift on the replanning criterion alone can turn a 2/5 into a 4/5 depending on who was watching.
- 01
The demo uses the same three happy-path queries every run.
Ask for the failure-mode log. Shipped products keep one; prototypes do not.
- 02
No eval suite is publicly documented.
Not a blog post. A numbered suite, versioned, with a changelog and a nightly run posted to a channel.
- 03
Replanning requires a human click.
Watch the state machine. If every branch needs user intent to advance, it is a workflow with conversation on top.
- 04
Tool failures surface as user-facing errors.
An agent that cannot retry, substitute, or escalate a failing tool is not reasoning about tool use in any meaningful sense.
- 05
Pricing is per seat, not per successful outcome.
Outcome-priced agents exist; Agentforce is explicit about it. Seat-priced ones are usually chatbots with a project manager.
The economics, written by finance not marketing
Serious agentic deployments are not cheap. Industry commentary through 2025 put enterprise first-year program cost in the seven-to-eight-figure range, once you add platform licence, integration partner, and the redirected staff time buyers routinely forget. The programs that survive into year two share one trait. They measure savings against one specific instrumented workflow and report the number to the CFO, on time, without adjustment. The ones that do not, do not renew.
Productivity-minute arithmetic (30 minutes saved per employee per week) is how the first wave embarrassed itself. The March 10, 2025 ServiceNow acquisition of Moveworks at $2.85 billion is a useful data point on what the market pays for a working agentic tier at the employee layer; the deal closed on December 15, 2025 [11]. Futurum's 1H 2026 enterprise-AI reports and CXToday's Agentforce coverage both show the vocabulary shift: direct financial impact is displacing productivity gains in analyst write-ups, which mirrors what buyers now demand in procurement.
Our editorial position, for the record: measure one workflow, instrument the before and after, and refuse to scale the program until you can show a defensible dollar number on the first. Teams that skip the measurement step do not fail at procurement. They fail at renewal.
Observability is the part that keeps them running
A production agent without observability is a prototype that pages on-call. The category matured fast in 2025. LangSmith added end-to-end native OpenTelemetry support to its SDK, letting teams pipe agent traces into Datadog, Grafana, Jaeger, or any OTel-compliant backend [12]. AgentCore on AWS and Foundry observability on Azure ship with the same OTel spec so traces cross cloud boundaries without translation.
The specific signals that matter are unglamorous. Per-tool latency and per-tool error rate tell you whether the tool harness is the bottleneck. Per-step outcome flags tell you whether the planner is choosing good actions. Replan counts per conversation tell you whether the agent is thrashing. If a vendor cannot show you all three on a default dashboard, the observability story is a slide deck.
Frequently asked
-
What is agentic AI, in one paragraph?
An agent is a loop. Observe, reason, act, revise the plan mid-run when new information shows up. That last clause is where most products fail the definition. The procurement test is shorter than any taxonomy: can it replan without a human click? Gartner ran the numbers in June 2025 and they were brutal. Roughly 130 vendors meet the bar out of the thousands who claim it [1]. Over 40% of agentic projects will be cancelled by end of 2027, usually on cost, unclear value, or risk controls nobody thought about until the security review (the part buyers forget). Everything else is a workflow with a language model glued on. -
Which agent runtimes should an enterprise team evaluate in 2026?
Five names cover most procurement. LangGraph. AWS Strands Agents. Microsoft Agent Framework. OpenAI Agents SDK. CrewAI. LangGraph 1.0 shipped in October 2025 with the Platform GA behind it [3]. Strands hit 1.0 on July 15, 2025 with four orchestration patterns (Swarms, Graphs, Agents-as-Tools, Handoffs) [4]. Microsoft Agent Framework went to public preview October 1, 2025 and reached 1.0 GA on April 3, 2026; the convergence of AutoGen and Semantic Kernel into one SDK [5]. OpenAI's Agents SDK launched March 2025 alongside the Responses API, now the default for new work [8]. CrewAI is the independent holdout, still the procurement winner where role-based multi-agent on-prem is the requirement [9]. The pattern we see most: one framework per cloud, federate the rest via MCP. -
What is agent washing and how do I spot it?
Agent washing is RPA with a chat window. Or workflow automation with LLM garnish. Gartner's Anushree Verma named it in mid-2025 and the term stuck [1]. Five tells, all watchable inside a forty-minute demo. The demo reuses three happy-path queries every run. No public eval suite exists (a blog post does not count). Replanning needs a human click. Tool failures come back as user-facing errors. Pricing is per seat, not per outcome. Four or more and the product is a rules engine with a language model bolted on the front. Score the rubric before the demo starts; drift on the replanning criterion is the part that flips a 2/5 into a 4/5 depending on who was watching. -
Where do agents earn their keep?
Bounded workflows. Narrow schemas. Logged ground truth. IT ticket triage. Invoice reconciliation. Compliance document review. Sales-lead enrichment. L1 internal HR/IT support. Those are the ones paying back. Salesforce's year-one Agentforce run is the clearest public scorecard we have: 1.5M+ support cases handled (most without humans), 43,000+ leads worked, $1.7M of new sales pipeline [6]. Klarna is the counter-example every 2026 steering committee cites. Volume without complexity is automatable. Complexity without volume is not. Klarna mis-read the line and started rehiring humans by mid-2025 [10], a caution the buyers who picked the wrong workload now rehearse on every slide. -
What are guardian agents, and are they real?
A guardian watches another agent and can veto it. That is the one-line version. Gartner projects the category will hold 10 to 15% of the agentic market by 2030 [7]. Products shipping today are narrow, which is fine. Prompt-injection guardrails. Output-schema validators. Policy-check agents that refuse disallowed tool combinations. Evaluate one by asking what happens when the guardian gets it wrong. A guardian with only false positives is an annoyance. A guardian with only false negatives is a liability. The procurement question buyers forget to ask, every time. -
How does agentic AI relate to MCP and RAG?
Three layers of the same stack, not competitors. MCP is the wire protocol an agent uses to call tools and fetch context: the plumbing. See our MCP pillar. RAG is a retrieval pattern for grounding a model in a document corpus. Agentic RAG places that retrieval inside a planning loop so the agent decides what to fetch next — see the agentic RAG glossary entry. Most production systems we review run all three. An agent on top. MCP underneath. RAG as one of several tools the agent can reach for. The part that surprises first-time buyers: the architecture converges within a year of serious deployment. Implementation trade-offs live in our MCP vs RAG comparison.