Batch test your agent

Modified on: Thu, 14 May, 2026 at 1:57 PM

TABLE OF CONTENTS

Key concepts to know before you start
Where to find batch testing
Add test queries
Add queries manually
Generate sample queries with Freddy
Run the batch test
Review test results
Interpret what you see
- Answered queries
- Unanswered queries
Best practices

Run multiple test queries at once to evaluate your AI agent's responses, identify gaps, and confirm it is ready before going live. Batch testing is the fastest way to get a broad view of how well your agent handles the questions employees actually ask.

Key concepts to know before you start

The following terms come up throughout this article.

Term	What it means
Batch testing	Running a set of queries against the agent all at once, so you can evaluate many responses in a single pass rather than one conversation at a time.
Query	A test question you submit to the agent. Queries can be written manually, generated by Freddy, or both.
Generate sample queries	A Freddy-powered feature that automatically creates up to 50 test questions drawn from your agent's knowledge sources.
Answered	A status label that appears when the agent was able to produce a response to a query.
Unanswered	A status label that appears when the agent could not find a relevant answer in its knowledge sources.
Answer source	The knowledge file or article the agent used to construct its response.
Rate response	Thumbs-up or thumbs-down feedback you can leave on each result to flag quality issues.
Manage query	An option on the results page that takes you back to the query list so you can add, edit, or remove queries.

Where to find batch testing

All batch testing controls live under the Test section of AI Agent Studio.

Open AI Agent Studio and select the agent you want to test.
In the left navigation, click Test.

The Test page opens. If no queries have been run yet, it shows a Begin your first evaluation prompt with an Add queries button in the center of the screen.

Add test queries

Before running a test, you need a set of queries. You can write them yourself, let Freddy generate them, or do both.

Add queries manually

Use this approach when you want to test specific scenarios — questions you know employees ask, edge cases, or phrasing variations.

On the Test page, click Add queries.
The Add queries screen opens. Click inside the text area and type your questions, one per line.
Click Run queries to execute, or continue to the next section to generate additional queries first.

Write questions the way employees actually phrase them, not the way the knowledge article is titled. For example, "Can I get a loaner laptop?" tests differently than "Loaner laptop policy."

Generate sample queries with Freddy

If you do not have a query list ready, click Generate sample queries in the top-right corner of the Add queries screen. Freddy reads your agent's connected knowledge sources and automatically produces up to 50 sample questions.

On the Add queries screen, click Generate sample queries.
The Generate sample queries panel slides in from the right. It shows a numbered list of sample queries drawn from your knowledge base, a query count badge (for example, 50), and a language selector (defaults to EN).
Review the list. If the questions do not look right for your knowledge base, click Regenerate to produce a fresh set.
When the list looks good, click Add queries. The generated questions are added to your query list. Click Cancel to close the panel without adding anything.

Generated queries reflect the content of your connected knowledge sources. If a topic is missing from the generated list, it may indicate a gap in your knowledge base — not just in the test set.

Run the batch test

Once your queries are in the list, click Run queries at the bottom of the Add queries screen, or Run query from the top-right corner of the Test results page. The agent processes every query in the list and returns results.

You can navigate away from the Test section while the test runs. The results will be waiting when you return.

Review test results

After the test completes, the Test page shows a results table with one row per query. The table has five columns.

Term	What it means
Queries	The question text, numbered in the order they were run.
Status	Answered (green) or Unanswered (red). Answered means the agent produced a response; Unanswered means it could not find relevant content.
AI agent responses	A preview of the agent's reply. Click View more to expand the full response.
Answer source	The knowledge file or article the agent referenced. A "+2 more" indicator appears when multiple sources were used.
Rate response	Thumbs-up and thumbs-down icons you can click to rate the quality of the response. Use the three-dot menu for additional options.

Filter results

Use the Filters control at the top of the results table to narrow what you see. For example, click 2 Answered to show only the queries the agent successfully responded to, or filter by Unanswered to focus on gaps.

Export results

Click Export in the top-right area of the results page to download the test results. Use this to share findings with your team or keep a record before making knowledge base changes.

Manage queries

Click Manage query to return to the query list. From there you can add new queries, remove existing ones, or generate additional sample queries before running another test.

Interpret what you see

Use the status and response quality to decide what to do next.

Answered queries

An Answered status means the agent found relevant content and formed a response. Review the response text and answer source to confirm:

The response is accurate and matches the knowledge source.
The source cited is the correct one for that topic.
The phrasing is clear and appropriate for your employees.

If the response looks good, click the thumbs-up icon. If it is inaccurate or off-topic despite being answered, click thumbs-down and consider updating the underlying knowledge source.

Unanswered queries

An Unanswered status means the agent could not find relevant content. This is the most actionable signal batch testing gives you. Common causes:

The topic is not covered in any connected knowledge source.
The knowledge exists but uses different terminology from the query.
The knowledge source is not yet indexed or is excluded from this agent.

For each Unanswered query, decide whether to add a new knowledge article, add a Q&A pair for that specific question, or update an existing article to include the missing topic.

Best practices

Start with Generate sample queries. If you are new to batch testing, let Freddy create the first set. It gives you broad coverage immediately and surfaces topics you may not have thought to test.
Mix generated and manual queries. Use Freddy for volume and coverage. Add manual queries for edge cases, workflow triggers, and known pain points employees raise with the support team.
Test before every significant change. Run a batch test after adding new knowledge sources, changing workflows, or updating agent instructions. Treat it as a quick regression check, not just a pre-launch step.
Rate every result. Thumbs-up and thumbs-down ratings build a record of what the agent does well and where it struggles. Review these ratings over time to track improvement.
Re-run after fixing gaps. After updating a knowledge source or adding a Q&A pair, run the affected queries again to confirm the agent now answers them correctly before deploying changes.