TABLE OF CONTENTS
- Key concepts to know before you start
- What each approach does
- How they compare
- When to use each approach
- How they work together
- Best practices
AI Agent Studio gives you two ways to evaluate your agent before it reaches employees: the Preview panel for interactive, conversation-by-conversation testing, and the Test section for running many queries at once. Each serves a different purpose. This article explains what each approach does, where it fits in the build cycle, and how to use them together.
Key concepts to know before you start
The following terms come up throughout this article.
What each approach does
Preview: test one conversation at a time
The Preview panel opens as a slide-in panel on the right side of AI Agent Studio. You type a message, the agent responds, and you can keep the conversation going across multiple turns — just as an employee would. Because the panel stays open while you work, you can make a change in Build and immediately send a follow-up message to see whether it took effect.
Use Preview when you want to:
Test how the agent handles a multi-turn conversation, where earlier messages affect later ones.
Check that a specific workflow triggers correctly when an employee asks in a natural way.
Verify that a knowledge article produces a clear, accurate answer before you publish it.
Confirm that the human handover path works — for example, that the agent escalates when it should.
Explore an edge case or unusual phrasing that you want to investigate interactively.
Each new conversation in the Preview panel starts with a clean context. The agent does not carry information from a previous conversation into a new one. Use the New conversation option to reset context when you want to start a fresh test.Test: evaluate many queries at once
The Test section lets you build a list of queries and run them all against the agent in a single pass. Results appear in a table showing whether each query was Answered or Unanswered, a preview of the agent's response, the knowledge source it referenced, and controls to rate each response.
Use the Test section when you want to:
Check broad coverage — for example, whether the agent can handle all the common IT or HR questions in your knowledge base.
Identify which topics the agent cannot answer before going live.
Compare results before and after adding a knowledge source or changing agent instructions.
Generate a set of test queries using Freddy when you do not have an existing list.
Export results to share with your team or keep as a deployment record.
Batch tests are single-turn. Each query is treated independently — the Test section does not simulate multi-turn conversations or carry context between queries.How they compare
When to use each approach
While building the agent
Use the Preview panel as your primary testing tool during the build phase. It gives you immediate feedback as you add knowledge, adjust instructions, and configure workflows. After each meaningful change, open a conversation in Preview and ask the kinds of questions your employees are likely to ask.
Run a batch test in the Test section when you want to check the overall state of the agent after a significant change — for example, after adding a large knowledge source or enabling a new workflow. This tells you whether the change improved coverage, introduced gaps, or had no effect on unrelated areas.
Before going live
Before deploying the agent, run a comprehensive batch test to confirm that the agent can handle the full range of expected questions. Review Unanswered results and address any gaps in your knowledge base. Then use Preview to walk through the most important end-to-end scenarios — particularly any multi-step workflows or handover paths — to confirm the experience is right.
After making changes
- Small changes — updating a single knowledge article or adjusting a fallback message: use Preview to verify the specific area affected.
- Larger changes — adding a new knowledge source, modifying a workflow, or changing agent instructions: run a batch test to confirm the change works and has not broken anything else. Follow up with Preview for any failed or borderline results.
During ongoing quality checks
After the agent is live, use the Test section periodically to run the same set of queries and compare results over time. If you notice a drop in the Answered rate, switch to Preview to investigate the affected topics interactively and trace the cause.
How they work together
Preview and Test are designed to complement each other. A typical workflow looks like this:
Build and refine with Preview. Use the Preview panel while you add knowledge, configure workflows, and adjust instructions. Test specific scenarios interactively as you go.
Run a batch test to check coverage. Once the agent feels solid, use the Test section to run a broader set of queries. Review which topics come back as Unanswered.
Investigate gaps with Preview. For any Unanswered results or unexpected responses in the batch test, switch to Preview and explore those queries conversationally. This often reveals whether the gap is a missing knowledge article, a phrasing mismatch, or something in the agent's instructions.
Fix, then retest. Make the necessary changes in Build, then run the affected queries again in the Test section to confirm the improvement.
Validate end-to-end with Preview. Before deploying, walk through the most critical user journeys in Preview to confirm the full experience — not just individual answers — is ready.
Best practices
Do not skip batch testing before deployment. Preview is fast and flexible, but it only tests what you think to test. A batch test surfaces gaps you might not think to look for.
Do not skip Preview for multi-step scenarios. Batch tests are single-turn. Anything that requires a back-and-forth — a workflow that collects information across several messages, or a conversation that leads to handover — can only be validated in Preview.
Use Generate sample queries to build your first test list. If you are new to the Test section, let Freddy create an initial set drawn from your knowledge base. Then add manual queries for edge cases and workflows.
Rate batch test results. Use the thumbs-up and thumbs-down controls on each result. Over time, ratings help you spot patterns in where the agent performs well and where it needs improvement.
Keep a consistent query set. Running the same queries across multiple test runs lets you measure whether changes improve or degrade the agent's performance over time.