How to Use MCP Sampling and Elicitation in Pydantic AI

Most MCP tutorials stop at “the agent can call tools now.”

That is usually enough for read-only tools, search tools, or simple command wrappers. It is not enough once your MCP server needs to do one of two more interesting things:

call back into a model during a tool run
pause and ask the user for structured input before continuing

In MCP terms, those are sampling and elicitation.

They are easy to mix up, mostly because they both make the client-server loop feel more interactive. Under the hood, though, they solve different problems. Sampling lets the server ask the client to make a model request. Elicitation lets the server ask the client to collect structured input from the user.

That distinction matters in Pydantic AI because, as of March 23, 2026, the official docs say FastMCPToolset still does not support integration elicitation or sampling. If you need either feature, the path you want is the standard MCPServer client family.

This guide shows the practical setup.

If you want the broader decision guide first, read our comparison of MCPServer, FastMCPToolset, and MCPServerTool. If you are still wiring ordinary MCP tools into an agent, the companion article on connecting Pydantic AI to MCP servers with FastMCPToolset is the better starting point.

What you’ll learn:

what MCP sampling actually does in a Pydantic AI client
how agent.set_mcp_sampling_model() changes server behavior
how to attach an elicitation_callback and return accept, decline, or cancel
where MCP’s schema limits and security rules show up in real code
why this is one of the clearest cases for choosing MCPServer over FastMCPToolset

Time required: 25-35 minutes
Difficulty level: Intermediate

Step 1: Keep the Mental Model Straight

I think the cleanest way to remember these features is to ask one question:

Who needs something extra in the middle of the workflow?

If the server needs another model call, that is sampling.

If the server needs another piece of user input, that is elicitation.

Here is the short version:

Feature	What triggers it?	What comes back?	Best use case
Sampling	The MCP server asks the client to make a model request	A model response	Server-side generation, classification, transformation, summarization
Elicitation	The MCP server asks the client to gather structured user input	A structured user response or refusal	Booking flows, approvals, missing parameters, human confirmation

Both features make MCP workflows feel less brittle. Instead of forcing every input up front, the server can ask for what it actually needs at the moment it needs it.

That is the upside. The tradeoff is that your client integration now matters a lot more.

Step 2: Use `MCPServer`, Not `FastMCPToolset`

This is the first decision to make, and it is not a subtle one.

The official Pydantic AI FastMCP client docs explicitly say FastMCPToolset does not yet support integration elicitation or sampling. So even if you have been using FastMCP everywhere else, this is the point where you switch to the standard MCP client classes:

MCPServerStdio
MCPServerStreamableHTTP
MCPServerSSE

For most local examples, MCPServerStdio is the easiest place to start. It keeps the whole flow visible, and it is the transport used in the Pydantic AI documentation examples for both sampling and elicitation.

Step 3: Install the MCP Client Support

You need the mcp extra rather than the fastmcp extra used in the other integration path:

uv init pydantic-ai-mcp-workflows
cd pydantic-ai-mcp-workflows

uv add "pydantic-ai-slim[mcp]"

Or with pip:

python -m venv .venv
source .venv/bin/activate
pip install "pydantic-ai-slim[mcp]"

You will still need a model provider configured for your agent, for example:

export OPENAI_API_KEY="your_api_key_here"

If your MCP server itself needs credentials, pass those to the server process separately. Do not assume the subprocess magically inherits everything you use in your shell.

Step 4: Turn On Sampling the Right Way

Sampling is the part people usually miss on the first try.

Attaching an MCPServerStdio server to an agent is not enough by itself. If the server wants to call back into a model through MCP sampling, the client has to expose a sampling model first. In Pydantic AI, that is what agent.set_mcp_sampling_model() does.

Here is a minimal client setup:

import asyncio

from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerStdio

svg_server = MCPServerStdio(
    "python",
    args=["generate_svg.py"],
)

agent = Agent(
    "openai:gpt-5.2",
    toolsets=[svg_server],
)


async def main() -> None:
    agent.set_mcp_sampling_model()

    async with agent:
        result = await agent.run(
            "Create an SVG hero graphic of a maintenance robot with a red warning beacon."
        )
        print(result.output)


asyncio.run(main())

What that method does is simple but important: it sets a sampling model on every registered MCPServer toolset. If you do not pass a model explicitly, the agent’s own model is used.

That means these two setups are both valid:

agent.set_mcp_sampling_model()

agent.set_mcp_sampling_model("openai:gpt-5.2-mini")

The second pattern is often the more practical one. Your main agent might need a stronger model, while server-initiated sampling calls are cheaper classification or formatting work.

When to disable it

You can also shut sampling off at the server reference:

from pydantic_ai.mcp import MCPServerStdio

server = MCPServerStdio(
    "python",
    args=["generate_svg.py"],
    allow_sampling=False,
)

That is useful when you want MCP tools but do not want the server making model requests through the client. It is also a nice defensive default in environments where every model call must be deliberate and observable.

Step 5: Understand What the Server Is Doing During Sampling

Sampling can feel mysterious until you look at it from the server’s side.

The server is already inside a tool call. Then it realizes it needs model help to finish the job. Instead of talking to OpenAI or Anthropic directly, it asks the connected MCP client to create a message on its behalf.

That gives you a few benefits:

the client keeps control of model access
the server can stay model-agnostic
the whole workflow can still be mediated by the app that owns the session

In practice, this is useful for servers that generate code, write SVG or SQL, classify inputs, or turn rough user requests into a more constrained output format.

It is not a replacement for the main agent. Think of it more like a server-local “I need one more inference to finish this tool call” escape hatch.

Step 6: Add Elicitation When the Server Needs Missing Human Input

Elicitation solves a different problem.

Sometimes the user asks for something underspecified. A server can technically guess, but it should not. Booking a release window, choosing a target environment, or approving a risky action are all good examples. That is where elicitation fits.

At a high level, the flow looks like this:

The user asks the agent to do something
The agent calls an MCP tool
The MCP server realizes a required value is missing
The server sends an elicitation request to the client
The client collects structured input and returns accept, decline, or cancel
The server continues or exits cleanly

Here is a server example using FastMCP on the server side:

from mcp.server.fastmcp import Context, FastMCP
from pydantic import BaseModel, Field

app = FastMCP("release_ops")


class ReleaseRequest(BaseModel):
    service: str = Field(description="Service to deploy")
    environment: str = Field(description="Target environment")
    window_start: str = Field(description="Deployment window start in ISO 8601")
    rollback_ready: bool = Field(description="Rollback steps already prepared")


@app.tool()
async def schedule_release(ctx: Context) -> str:
    result = await ctx.elicit(
        message="I need deployment details before I can schedule this release.",
        schema=ReleaseRequest,
    )

    if result.action == "accept" and result.data:
        release = result.data
        return (
            f"Release scheduled for {release.service} in {release.environment} "
            f"at {release.window_start}. Rollback ready: {release.rollback_ready}."
        )

    if result.action == "decline":
        return "Release scheduling skipped because the request was declined."

    return "Release scheduling cancelled."


if __name__ == "__main__":
    app.run(transport="stdio")

That tool does not guess. It pauses, asks for exactly what it needs, and then resumes with structured data.

Step 7: Handle Elicitation on the Pydantic AI Client

On the client side, you attach an elicitation_callback when creating the MCP server instance. The callback receives request metadata and returns an ElicitResult.

Here is a terminal-based client example:

import asyncio
from typing import Any

from mcp.client.session import ClientSession
from mcp.shared.context import RequestContext
from mcp.types import ElicitRequestParams, ElicitResult

from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerStdio


async def handle_elicitation(
    context: RequestContext[ClientSession, Any, Any],
    params: ElicitRequestParams,
) -> ElicitResult:
    print(f"\nServer request: {params.message}\n")

    schema = params.requestedSchema or {}
    properties = schema.get("properties", {})
    content: dict[str, Any] = {}

    for field_name, field_info in properties.items():
        prompt = field_info.get("description") or field_name.replace("_", " ")
        field_type = field_info.get("type")
        raw = input(f"{prompt}: ").strip()

        if field_type == "integer":
            content[field_name] = int(raw)
        elif field_type == "number":
            content[field_name] = float(raw)
        elif field_type == "boolean":
            content[field_name] = raw.lower() in {"true", "1", "yes", "y"}
        else:
            content[field_name] = raw

    choice = input("\nAccept, decline, or cancel? [a/d/c]: ").strip().lower()

    if choice == "a":
        return ElicitResult(action="accept", content=content)
    if choice == "d":
        return ElicitResult(action="decline")
    return ElicitResult(action="cancel")


release_server = MCPServerStdio(
    "python",
    args=["release_server.py"],
    elicitation_callback=handle_elicitation,
)

agent = Agent(
    "openai:gpt-5.2",
    toolsets=[release_server],
)


async def main() -> None:
    async with agent:
        result = await agent.run("Schedule the next production release for the checkout service.")
        print(result.output)


asyncio.run(main())

There are two details here that are easy to underestimate:

the client owns the user interaction surface
the server does not have to guess whether silence means “no” or “not yet”

That makes elicitation much nicer than cramming every possible parameter into the first user prompt.

Step 8: Respect the Schema Limits

The MCP elicitation spec is intentionally narrow here.

For form-mode elicitation, requestedSchema is limited to flat objects with primitive properties only. The supported building blocks are:

string
number
integer
boolean
enum

For strings, the spec also allows formats such as:

email
uri
date
date-time

This is one of those constraints that feels annoying until you try to build a portable client UI. Flat, primitive schemas are much easier to render consistently across terminals, desktop apps, and web clients.

So if your first instinct is to send a deeply nested schema with arrays of sub-objects, stop there. Simplify the interaction. Ask in stages if you need to.

Step 9: Treat Security Rules as Product Rules, Not Footnotes

The MCP spec is very direct about this.

Form mode is for structured input that the client may see. It is not for sensitive secrets. Credentials, payment details, or anything that should bypass the MCP client belong in URL mode. The spec also requires the client to make the requesting server clear, allow decline and cancel paths, and show the destination domain before navigation in URL mode.

That has a practical implication for Pydantic AI integrations:

use elicitation for approvals, missing parameters, dates, counts, labels, environment names, and other ordinary workflow fields
do not use form elicitation for passwords, API keys, OAuth credentials, or payment details

If a workflow needs secure sign-in, design that as a dedicated auth flow rather than a convenient prompt.

Step 10: Common Mistakes to Avoid

These are the ones I keep seeing:

Reaching for `FastMCPToolset` out of habit

It is a great default for ordinary MCP wiring. It is the wrong tool for this job.

Forgetting `agent.set_mcp_sampling_model()`

If the server expects sampling and the client never exposes a sampling model, the workflow falls apart fast.

Returning only “success” paths

Elicitation is not just about acceptance. Your callback should handle accept, decline, and cancel as first-class outcomes.

Using schemas that are too rich

Keep form-mode input flat. If the flow needs more nuance, split it into multiple elicitation steps.

Asking for sensitive information in form mode

That is not just awkward. It runs against the spec’s trust and safety rules.

Step 11: When This Pattern Is Worth It

Not every MCP tool needs this extra machinery.

But if you are building:

code-generation servers that need one internal model pass
workflow tools that pause for human confirmation
enterprise tools that gather a few missing fields mid-run
multi-step assistants that should ask instead of guessing

then sampling and elicitation are exactly the features that make MCP feel less like remote function calling and more like a real interaction layer.

That is also why this is such a strong MCPServer use case. You are not just exposing tools. You are exposing a conversation boundary with rules.

How to Use MCP Sampling and Elicitation in Pydantic AI

Step 1: Keep the Mental Model Straight

Step 2: Use `MCPServer`, Not `FastMCPToolset`

Step 3: Install the MCP Client Support

Step 4: Turn On Sampling the Right Way

When to disable it

Step 5: Understand What the Server Is Doing During Sampling

Step 6: Add Elicitation When the Server Needs Missing Human Input

Step 7: Handle Elicitation on the Pydantic AI Client

Step 8: Respect the Schema Limits

Step 9: Treat Security Rules as Product Rules, Not Footnotes

Step 10: Common Mistakes to Avoid

Reaching for `FastMCPToolset` out of habit

Forgetting `agent.set_mcp_sampling_model()`

Returning only “success” paths

Using schemas that are too rich

Asking for sensitive information in form mode

Step 11: When This Pattern Is Worth It

References

Leave a comment

No comments yet

Step 1: Keep the Mental Model Straight

Step 2: Use MCPServer, Not FastMCPToolset

Step 3: Install the MCP Client Support

Step 4: Turn On Sampling the Right Way

When to disable it

Step 5: Understand What the Server Is Doing During Sampling

Step 6: Add Elicitation When the Server Needs Missing Human Input

Step 7: Handle Elicitation on the Pydantic AI Client

Step 8: Respect the Schema Limits

Step 9: Treat Security Rules as Product Rules, Not Footnotes

Step 10: Common Mistakes to Avoid

Reaching for FastMCPToolset out of habit

Forgetting agent.set_mcp_sampling_model()

Returning only “success” paths

Using schemas that are too rich

Asking for sensitive information in form mode

Step 11: When This Pattern Is Worth It

References

Share this guide

Leave a comment

No comments yet

Related Articles

How to Connect a Pydantic AI Agent to MCP Servers with FastMCPToolset

Pydantic AI MCP Guide: MCPServer vs FastMCPToolset vs MCPServerTool

How to Connect a Pydantic AI Agent to MCP Servers with MCPServer

Step 2: Use `MCPServer`, Not `FastMCPToolset`

Reaching for `FastMCPToolset` out of habit

Forgetting `agent.set_mcp_sampling_model()`