TABLE OF CONTENTS
- Key concepts to know before you start
- Where to find batch testing
- Add test queries
- Add queries manually
- Generate sample queries with Freddy
- Run the batch test
- Review test results
- Interpret what you see
- Best practices
Run multiple test queries at once to evaluate your AI agent's responses, identify gaps, and confirm it is ready before going live. Batch testing is the fastest way to get a broad view of how well your agent handles the questions employees actually ask.
Key concepts to know before you start
The following terms come up throughout this article.
Where to find batch testing
All batch testing controls live under the Test section of AI Agent Studio.
Open AI Agent Studio and select the agent you want to test.
In the left navigation, click Test.
The Test page opens. If no queries have been run yet, it shows a Begin your first evaluation prompt with an Add queries button in the center of the screen.
Add test queries
Before running a test, you need a set of queries. You can write them yourself, let Freddy generate them, or do both.
Add queries manually
Use this approach when you want to test specific scenarios — questions you know employees ask, edge cases, or phrasing variations.
On the Test page, click Add queries.
The Add queries screen opens. Click inside the text area and type your questions, one per line.
Click Run queries to execute, or continue to the next section to generate additional queries first.
Write questions the way employees actually phrase them, not the way the knowledge article is titled. For example, "Can I get a loaner laptop?" tests differently than "Loaner laptop policy."Generate sample queries with Freddy
If you do not have a query list ready, click Generate sample queries in the top-right corner of the Add queries screen. Freddy reads your agent's connected knowledge sources and automatically produces up to 50 sample questions.
On the Add queries screen, click Generate sample queries.
The Generate sample queries panel slides in from the right. It shows a numbered list of sample queries drawn from your knowledge base, a query count badge (for example, 50), and a language selector (defaults to EN).
Review the list. If the questions do not look right for your knowledge base, click Regenerate to produce a fresh set.
When the list looks good, click Add queries. The generated questions are added to your query list. Click Cancel to close the panel without adding anything.
Generated queries reflect the content of your connected knowledge sources. If a topic is missing from the generated list, it may indicate a gap in your knowledge base — not just in the test set.Run the batch test
Once your queries are in the list, click Run queries at the bottom of the Add queries screen, or Run query from the top-right corner of the Test results page. The agent processes every query in the list and returns results.
You can navigate away from the Test section while the test runs. The results will be waiting when you return.Review test results
After the test completes, the Test page shows a results table with one row per query. The table has five columns.
Filter results
Use the Filters control at the top of the results table to narrow what you see. For example, click 2 Answered to show only the queries the agent successfully responded to, or filter by Unanswered to focus on gaps.
Export results
Click Export in the top-right area of the results page to download the test results. Use this to share findings with your team or keep a record before making knowledge base changes.
Manage queries
Click Manage query to return to the query list. From there you can add new queries, remove existing ones, or generate additional sample queries before running another test.
Interpret what you see
Use the status and response quality to decide what to do next.
Answered queries
An Answered status means the agent found relevant content and formed a response. Review the response text and answer source to confirm:
The response is accurate and matches the knowledge source.
The source cited is the correct one for that topic.
The phrasing is clear and appropriate for your employees.
If the response looks good, click the thumbs-up icon. If it is inaccurate or off-topic despite being answered, click thumbs-down and consider updating the underlying knowledge source.
Unanswered queries
An Unanswered status means the agent could not find relevant content. This is the most actionable signal batch testing gives you. Common causes:
The topic is not covered in any connected knowledge source.
The knowledge exists but uses different terminology from the query.
The knowledge source is not yet indexed or is excluded from this agent.
For each Unanswered query, decide whether to add a new knowledge article, add a Q&A pair for that specific question, or update an existing article to include the missing topic.
Best practices
Start with Generate sample queries. If you are new to batch testing, let Freddy create the first set. It gives you broad coverage immediately and surfaces topics you may not have thought to test.
Mix generated and manual queries. Use Freddy for volume and coverage. Add manual queries for edge cases, workflow triggers, and known pain points employees raise with the support team.
Test before every significant change. Run a batch test after adding new knowledge sources, changing workflows, or updating agent instructions. Treat it as a quick regression check, not just a pre-launch step.
Rate every result. Thumbs-up and thumbs-down ratings build a record of what the agent does well and where it struggles. Review these ratings over time to track improvement.
Re-run after fixing gaps. After updating a knowledge source or adding a Q&A pair, run the affected queries again to confirm the agent now answers them correctly before deploying changes.