AI Agents, also known as Agentic AI, seem to be the next major development in generative AI (Gen AI). Many of the players in unified communications, contact center and, more broadly, the foundation model providers themselves (e.g., OpenAI, Google Gemini, Anthropic Claude, etc.) have all introduced – or likely will introduce – AI Agent capabilities. Many of the major enterprise platform providers have surged forward with AI Agents – Salesforce Agentforce is one of several examples.
Today, Salesforce introduced the Agentforce Testing Center so customers can test and monitor AI Agents. The Testing Center is built on the Salesforce Platform and is integrated with the Salesforce Data Cloud. The specific capabilities include:
- AI-generated tests for Agentforce: Users can test the different ways a customer (internal or external) may pose a question or interact with an agent. AI Agent builders (those humans creating the AI Agents) can use Agent Builder’s Plan Tracer feature to investigate the AI Agent’s reasoning process. Agentforce Testing Center can auto-generate synthetic interactions and test them in parallel to see how frequently they result in the right outcome. The test results can then be used to refine the AI Agents so they deliver a higher level of accuracy.
- Sandboxes for Agentforce and Data Cloud: This capability will replicate an organization’s data and metadata into a sandbox so that the AI Agents can be prototyped and tested without disrupting the business or exposing the data.
- Monitoring and observability for Agentforce: A two-pronged feature that enables AI Agents to be securely tested before released into production and, once they are live, monitor the AI Agents for adoption, accuracy, etc.
- Usage monitoring in Digital Wallet: Provides visibility into usage and consumption of credits.
Agentforce Testing Center is in a closed pilot today, and will be generally available for use in sandboxes in early December. Sandbox support for Agentforce and Data Cloud is generally available today.
No Jitter spoke with Alice Steinglass, EVP and GM of Salesforce Platform for some more insight into how some aspects of how the Testing Center works.
Responses have edited for clarity.
No Jitter (NJ): Could you provide a little more detail on how the monitoring and observability of the AI Agents works?
Alice Steinglass (Steinglass): One of the challenges of rolling out an AI Agent is that it has a nondeterministic aspect. It's different from sort of a classic automation where you just test it once, for example, and it works. [With AI Agents] you really need to see what's happening when you roll it out at your company – is it working the way you want it to and is it being used by the departments or [external customers] so that you can look at the return on investment? [Maybe you’re] trying to improve your sales calls. You need some observability and metrics built into the piece that you're trying to look at.
We’ve made the Einstein Trust Layer show up as part of the audit trail so you can dig in and see what is actually going into the Agent, what's being masked by Salesforce and what's happening inside the Trust Layer. That ability to audit is really important for companies who are looking at rolling these [Agents] out.
All of this is built on top of Data Cloud, which allows us to tie it into other metrics, to build dashboards and reports and connect it other things happening across your CRM instance. So in addition to the out-of-box reports we give our customers, you can customize [Testing Center] to build the reports that matter for your business and are connected to your ROI.
NJ: You mentioned that these AI Agents are nondeterministic. Can you provide more detail on how Plan Tracer works? It seems to provide some visibility into the ‘black box’ that is Gen AI.
Steinglass: When I ask the Agent a question, it will break that question down. First, it will try to figure out what it is I was trying to ask – the topic for this question? It will then do some reasoning around that, using the instructions [it is given] for the topic. Say you wanted to check the status of your order. To do that, [the AI Agent] needs to get your name, so it looks you up to get that information, then it needs to call an API to get the order status, then it needs to return [the status] to you. That’s just a basic example, but just there are three different actions the Agent takes to make that happen.
What we've done is make that process visible to you inside the [Agent Builder] so that you can construct those topics and actions, and then test to see what's going on inside the planner as you’re testing it in real time. Maybe I want to adjust what topics should be assigned to which categories, or you [determine that you] need to hook up another Agent action. So, in short, as I’m building out the Agent, I can see how it's actually working, how it's reasoning, how it's thinking, and what it's actually doing. But there's another level to it.
Say that Service Agent did exactly what I wanted, but what I really want to do is roll this out for 1,000 users – or a million users. They won’t all ask for their order in the same way, right? One of them might say, ‘Where are the jeans I got?’ while another says ‘Hey, my jeans haven’t shown up yet’ and someone else will say, ‘Where’s my order?’. There will be many ways people ask the same question. So, we’ve also enabled you to conduct, in parallel, LLM-generated test [phrases] that allow you to see if those 1,000 possible utterances will be categorized into the correct topics. Are they going to instigate the right actions 95% of the time or 90% of the time? And, when they don't, where does it go wrong?
That's the Testing Center feature that we're announcing as part of this release. It is the ability to take the Plan Tracer that we had before and then put it in that high volume of testing that allows you to roll out an Agent with more confidence across your organization.
NJ: That sounds like the intent detection that’s used in most IVR/IVA and voice/chat bots.
Steinglass: It’s similar, but there are two differences. One is that it’s designed from the ground up for these generative AI scenarios. It's designed for the Agents. We do need that same intent detection to be able to look at the topics, but within those topics we need to look at how they map to the actions that need to be taken and the reasoning that the [AI Agent] will do around those actions. That is a little more complicated than the sort of decision trees of IVRs in the past.
The other is that these [AI Agents] are not just answering call center calls. We're using Agents across the enterprise for a whole series of different problems and so there’s different things they can do. Right now, for example, Salesforce is shipping both a Service Agent, and we have multiple different Sales Agents. We also have an open ecosystem, and our partners are also shipping different capabilities, everything from quality management to creating documents or helping you with other processes. All those different pieces are part of Agentforce, so we need testing tools that scale beyond what IVR testing tools in the past would do.
NJ: Who creates the AI Agents?
Steinglass: There are three different groups of people creating the Agents. First, Salesforce is creating a number out-of-the-box Agents. Those are a great way to get started. But almost everybody will want to customize it, so most admins will take the out-of-the-box agent and then customize it for their organization.
The admin will make sure it's referencing the right knowledge articles or looking at the right parts of the website, making sure the right governance and guardrails are in place, etc. When you start adding those customizations, your admin would likely work with the business partners to identify the right scenarios and what the [organization] wants the agents to be able to do. We’ve built [Agentforce] as no code, low code to help enable that. If you have a professional development team who's building a bunch of capabilities, it's all open and accessible, so you can connect it into pro code development, into other systems and connect it into APIs.
Lastly, our partners are the third group who can build and ship the Agents.
Want to know more?
Check out these other articles about Salesforce and AI Agents: