This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.
AI-Powered Chatbots Are Answering Citizens’ Questions – What Are the Risks?
Short codes have become a common way of communicating, either by necessity or by marketing. Think, for example, of the well-publicized 9-1-1 and 9-8-8 emergency numbers that save critical time in urgent situations. Other 3-digit short codes (most notably 2-1-1 and 3-1-1) are used by callers to reach targeted recipients who can handle mostly non-life-threatening emergency routine calls (according to one study, up to 80% for non-emergency calls). While a sewer line backup may, in fact, be an emergency, especially to the people who are dealing with the backup right then -- in most cases, a sewer line backup does not rise to the emergency level that requires prompt dispatch of fire, police or ambulance services.
With this in mind, the San Francisco Chronicle recently ran an article reporting that a few smaller CA cities were opting to use AI-powered chatbots to field routine questions that 3-1-1 callers were asking. On many levels, a move of this nature makes sense, freeing up valuable and costly time of contact center humans to handle other tasks. But the risks of this technology misfiring, in the form of inaccurate responses are significant, and the downsides are worth identifying and analyzing before any entity makes this leap.
According to the Chronicle article, the majority of inbound requests received (between 60 and 70%) are routine questions, not service requests. This is in sharp contrast to the statistics the City of San Francisco compiled regarding their 311 calls. The city does not fully use chatbots (yet) to answer calls made to 3-1-1, although it does use some AI-based systems to assist in handling uploaded photos and providing potentially useful direction in identifying the correct city agency to assist. As a point of reference, the appropriate agency within the city has said that 80% of the calls received in the first eight months of this year were “informational requests,” not service requests, and thus did not require personal human interaction.
This metric is the baseline. But there are many complex challenges that would likely bring such a system to its knees in a municipality with a larger population than t the California municipalities that have adopted AI platforms more fully. Based on the statistics from San Francisco, approximately one in five of the calls received by the information line were service requests. In any case, the quality of the responses generated by any AI system are only as good as the underlying data that is accessed to create what the AI system has been trained to provide as a contextually-appropriate response to what the service providers have loaded as the answer to a routine questions.
There are some seriously important questions that must be posed, considered, and answered, in the name of public safety. What constitutes “routine?” How can the system respond when the query is not routine? Who decides what is routine and what isn’t? How often is the definition of routine revisited depending on the real-world call load? What happens when the caller, for any number of reasons, cannot understand the prompts? What happens when the system can’t understand the caller as a result of language barriers, including accents, or speech impediments, or when the system simply can’t understand the question posed or the problem to be addressed? Most importantly, however, is identifying where and how the “off-ramp” is designed so that people in critical situations can get the help that they need as quickly as possible.
I am currently reading Meredith Broussard’s new book More than a Glitch, which posits that bias of all kinds is implicit in almost everything that relies on large mathematical models to tackle problems. Broussard identifies three seperate sources of bias, ranging from the most commonly mentioned, race, to others including gender and ability. But most importantly, she identifies the issue that she calls “technochauvinism.” Here’s how she defines the term: “A kind of bias that considers computational solutions to be superior to all other solutions.” According to Broussard, “Technochauvinist optimism [has] led to companies spending millions of dollars on technology and platforms that marketers promised would ‘revolutionize’ and digitize everything from rug-buying to interstellar travel. Rarely have those promises come to pass, and often the digital reality is not much better than the original. In many cases, it is worse. Behind technochauvinism are very human factors like self-delusion, racism, bias, privilege, and greed.”
The point here is that until we come to recognize the technochauvenistic bias that’s built into these systems, and then the subsequent biases in the training data -- despite the best of intentions, outcomes generated by AI-enabled tools are, by definition, suspect. They may remain useful for 80% of the queries, but when you’re in the 20% and your call can’t be addressed by humans in an appropriately timely manner, the system—any system—has failed.
Everyone engaged in the process, from the “idea” people to those charged with testing and deployment of these systems (and the word “testing” is of particular importance) must become a stakeholder to ensure that such innovations truly function as intended. AI is here, and it will become more prevalent for the handling of routine tasks. But by itself, its limitations, including no common sense, common experience, or conscience, must force (if necessary) those who seek to deploy its power to recognize and design for the technology’s limitations so that those whose queries fall outside of the routine can get the help they need with minimal aggravation.