Most MCP tutorials stop at “the agent can call tools now.”
That is usually enough for read-only tools, search tools, or simple command wrappers. It is not enough once your MCP server needs to do one of two more interesting things:
- call back into a model during a tool run
- pause and ask the user for structured input before continuing
In MCP terms, those are sampling and elicitation.
They are easy to mix up, mostly because they both make the client-server loop feel more interactive. Under the hood, though, they solve different problems. Sampling lets the server ask the client to make a model request. Elicitation lets the server ask the client to collect structured input from the user.
That distinction matters in Pydantic AI because, as of March 23, 2026, the official docs say FastMCPToolset still does not support integration elicitation or sampling. If you need either feature, the path you want is the standard MCPServer client family.
This guide shows the practical setup.
If you want the broader decision guide first, read our comparison of MCPServer, FastMCPToolset, and MCPServerTool. If you are still wiring ordinary MCP tools into an agent, the companion article on connecting Pydantic AI to MCP servers with FastMCPToolset is the better starting point.
What you’ll learn:
- what MCP sampling actually does in a Pydantic AI client
- how
agent.set_mcp_sampling_model()changes server behavior - how to attach an
elicitation_callbackand returnaccept,decline, orcancel - where MCP’s schema limits and security rules show up in real code
- why this is one of the clearest cases for choosing
MCPServeroverFastMCPToolset
Time required: 25-35 minutes
Difficulty level: Intermediate
Step 1: Keep the Mental Model Straight
I think the cleanest way to remember these features is to ask one question:
Who needs something extra in the middle of the workflow?
If the server needs another model call, that is sampling.
If the server needs another piece of user input, that is elicitation.
Here is the short version:
| Feature | What triggers it? | What comes back? | Best use case |
|---|---|---|---|
| Sampling | The MCP server asks the client to make a model request | A model response | Server-side generation, classification, transformation, summarization |
| Elicitation | The MCP server asks the client to gather structured user input | A structured user response or refusal | Booking flows, approvals, missing parameters, human confirmation |
Both features make MCP workflows feel less brittle. Instead of forcing every input up front, the server can ask for what it actually needs at the moment it needs it.
That is the upside. The tradeoff is that your client integration now matters a lot more.
Step 2: Use MCPServer, Not FastMCPToolset
This is the first decision to make, and it is not a subtle one.
The official Pydantic AI FastMCP client docs explicitly say FastMCPToolset does not yet support integration elicitation or sampling. So even if you have been using FastMCP everywhere else, this is the point where you switch to the standard MCP client classes:
MCPServerStdioMCPServerStreamableHTTPMCPServerSSE
For most local examples, MCPServerStdio is the easiest place to start. It keeps the whole flow visible, and it is the transport used in the Pydantic AI documentation examples for both sampling and elicitation.
Step 3: Install the MCP Client Support
You need the mcp extra rather than the fastmcp extra used in the other integration path:
uv init pydantic-ai-mcp-workflows
cd pydantic-ai-mcp-workflows
uv add "pydantic-ai-slim[mcp]"
Or with pip:
python -m venv .venv
source .venv/bin/activate
pip install "pydantic-ai-slim[mcp]"
You will still need a model provider configured for your agent, for example:
export OPENAI_API_KEY="your_api_key_here"
If your MCP server itself needs credentials, pass those to the server process separately. Do not assume the subprocess magically inherits everything you use in your shell.
Step 4: Turn On Sampling the Right Way
Sampling is the part people usually miss on the first try.
Attaching an MCPServerStdio server to an agent is not enough by itself. If the server wants to call back into a model through MCP sampling, the client has to expose a sampling model first. In Pydantic AI, that is what agent.set_mcp_sampling_model() does.
Here is a minimal client setup:
import asyncio
from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerStdio
svg_server = MCPServerStdio(
"python",
args=["generate_svg.py"],
)
agent = Agent(
"openai:gpt-5.2",
toolsets=[svg_server],
)
async def main() -> None:
agent.set_mcp_sampling_model()
async with agent:
result = await agent.run(
"Create an SVG hero graphic of a maintenance robot with a red warning beacon."
)
print(result.output)
asyncio.run(main())
What that method does is simple but important: it sets a sampling model on every registered MCPServer toolset. If you do not pass a model explicitly, the agent’s own model is used.
That means these two setups are both valid:
agent.set_mcp_sampling_model()
agent.set_mcp_sampling_model("openai:gpt-5.2-mini")
The second pattern is often the more practical one. Your main agent might need a stronger model, while server-initiated sampling calls are cheaper classification or formatting work.
When to disable it
You can also shut sampling off at the server reference:
from pydantic_ai.mcp import MCPServerStdio
server = MCPServerStdio(
"python",
args=["generate_svg.py"],
allow_sampling=False,
)
That is useful when you want MCP tools but do not want the server making model requests through the client. It is also a nice defensive default in environments where every model call must be deliberate and observable.
Step 5: Understand What the Server Is Doing During Sampling
Sampling can feel mysterious until you look at it from the server’s side.
The server is already inside a tool call. Then it realizes it needs model help to finish the job. Instead of talking to OpenAI or Anthropic directly, it asks the connected MCP client to create a message on its behalf.
That gives you a few benefits:
- the client keeps control of model access
- the server can stay model-agnostic
- the whole workflow can still be mediated by the app that owns the session
In practice, this is useful for servers that generate code, write SVG or SQL, classify inputs, or turn rough user requests into a more constrained output format.
It is not a replacement for the main agent. Think of it more like a server-local “I need one more inference to finish this tool call” escape hatch.
Step 6: Add Elicitation When the Server Needs Missing Human Input
Elicitation solves a different problem.
Sometimes the user asks for something underspecified. A server can technically guess, but it should not. Booking a release window, choosing a target environment, or approving a risky action are all good examples. That is where elicitation fits.
At a high level, the flow looks like this:
- The user asks the agent to do something
- The agent calls an MCP tool
- The MCP server realizes a required value is missing
- The server sends an elicitation request to the client
- The client collects structured input and returns
accept,decline, orcancel - The server continues or exits cleanly
Here is a server example using FastMCP on the server side:
from mcp.server.fastmcp import Context, FastMCP
from pydantic import BaseModel, Field
app = FastMCP("release_ops")
class ReleaseRequest(BaseModel):
service: str = Field(description="Service to deploy")
environment: str = Field(description="Target environment")
window_start: str = Field(description="Deployment window start in ISO 8601")
rollback_ready: bool = Field(description="Rollback steps already prepared")
@app.tool()
async def schedule_release(ctx: Context) -> str:
result = await ctx.elicit(
message="I need deployment details before I can schedule this release.",
schema=ReleaseRequest,
)
if result.action == "accept" and result.data:
release = result.data
return (
f"Release scheduled for {release.service} in {release.environment} "
f"at {release.window_start}. Rollback ready: {release.rollback_ready}."
)
if result.action == "decline":
return "Release scheduling skipped because the request was declined."
return "Release scheduling cancelled."
if __name__ == "__main__":
app.run(transport="stdio")
That tool does not guess. It pauses, asks for exactly what it needs, and then resumes with structured data.
Step 7: Handle Elicitation on the Pydantic AI Client
On the client side, you attach an elicitation_callback when creating the MCP server instance. The callback receives request metadata and returns an ElicitResult.
Here is a terminal-based client example:
import asyncio
from typing import Any
from mcp.client.session import ClientSession
from mcp.shared.context import RequestContext
from mcp.types import ElicitRequestParams, ElicitResult
from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerStdio
async def handle_elicitation(
context: RequestContext[ClientSession, Any, Any],
params: ElicitRequestParams,
) -> ElicitResult:
print(f"\nServer request: {params.message}\n")
schema = params.requestedSchema or {}
properties = schema.get("properties", {})
content: dict[str, Any] = {}
for field_name, field_info in properties.items():
prompt = field_info.get("description") or field_name.replace("_", " ")
field_type = field_info.get("type")
raw = input(f"{prompt}: ").strip()
if field_type == "integer":
content[field_name] = int(raw)
elif field_type == "number":
content[field_name] = float(raw)
elif field_type == "boolean":
content[field_name] = raw.lower() in {"true", "1", "yes", "y"}
else:
content[field_name] = raw
choice = input("\nAccept, decline, or cancel? [a/d/c]: ").strip().lower()
if choice == "a":
return ElicitResult(action="accept", content=content)
if choice == "d":
return ElicitResult(action="decline")
return ElicitResult(action="cancel")
release_server = MCPServerStdio(
"python",
args=["release_server.py"],
elicitation_callback=handle_elicitation,
)
agent = Agent(
"openai:gpt-5.2",
toolsets=[release_server],
)
async def main() -> None:
async with agent:
result = await agent.run("Schedule the next production release for the checkout service.")
print(result.output)
asyncio.run(main())
There are two details here that are easy to underestimate:
- the client owns the user interaction surface
- the server does not have to guess whether silence means “no” or “not yet”
That makes elicitation much nicer than cramming every possible parameter into the first user prompt.
Step 8: Respect the Schema Limits
The MCP elicitation spec is intentionally narrow here.
For form-mode elicitation, requestedSchema is limited to flat objects with primitive properties only. The supported building blocks are:
stringnumberintegerbooleanenum
For strings, the spec also allows formats such as:
emailuridatedate-time
This is one of those constraints that feels annoying until you try to build a portable client UI. Flat, primitive schemas are much easier to render consistently across terminals, desktop apps, and web clients.
So if your first instinct is to send a deeply nested schema with arrays of sub-objects, stop there. Simplify the interaction. Ask in stages if you need to.
Step 9: Treat Security Rules as Product Rules, Not Footnotes
The MCP spec is very direct about this.
Form mode is for structured input that the client may see. It is not for sensitive secrets. Credentials, payment details, or anything that should bypass the MCP client belong in URL mode. The spec also requires the client to make the requesting server clear, allow decline and cancel paths, and show the destination domain before navigation in URL mode.
That has a practical implication for Pydantic AI integrations:
- use elicitation for approvals, missing parameters, dates, counts, labels, environment names, and other ordinary workflow fields
- do not use form elicitation for passwords, API keys, OAuth credentials, or payment details
If a workflow needs secure sign-in, design that as a dedicated auth flow rather than a convenient prompt.
Step 10: Common Mistakes to Avoid
These are the ones I keep seeing:
Reaching for FastMCPToolset out of habit
It is a great default for ordinary MCP wiring. It is the wrong tool for this job.
Forgetting agent.set_mcp_sampling_model()
If the server expects sampling and the client never exposes a sampling model, the workflow falls apart fast.
Returning only “success” paths
Elicitation is not just about acceptance. Your callback should handle accept, decline, and cancel as first-class outcomes.
Using schemas that are too rich
Keep form-mode input flat. If the flow needs more nuance, split it into multiple elicitation steps.
Asking for sensitive information in form mode
That is not just awkward. It runs against the spec’s trust and safety rules.
Step 11: When This Pattern Is Worth It
Not every MCP tool needs this extra machinery.
But if you are building:
- code-generation servers that need one internal model pass
- workflow tools that pause for human confirmation
- enterprise tools that gather a few missing fields mid-run
- multi-step assistants that should ask instead of guessing
then sampling and elicitation are exactly the features that make MCP feel less like remote function calling and more like a real interaction layer.
That is also why this is such a strong MCPServer use case. You are not just exposing tools. You are exposing a conversation boundary with rules.
Discussion
Leave a comment
No comments yet
Be the first to start the conversation.