(Author’s note: this article in its entirety was written without the help of generative AI (Gen AI) in any way, nor was AI used to generate any graphics, either.)
Leveraging the large language models (LLMs) provided by your unified as a communications service (UCaaS) provider will be necessary for maximizing the value and productivity you can get from the UCaaS solution your organization happens to use. This article explores the use of LLMs within several UCaaS ecosystems including Zoom, Cisco Webex, Google Meet, and Microsoft Teams.
Our intent in preparing this overview is not to compare these solutions; rather, we hope to highlight some of the interesting features they provide and to discuss areas where each excels and impresses us.
The content in this article is based on a presentation the authors gave at Enterprise Connect 2024 and is separated into three articles:
- Part 1: How generative AI works and how LLMs are trained.
- Part 2: An overview of the features in the UCaaS platforms and a deeper dive into two key use case: meeting summaries and text refining.
- Part 3: Discusses the risks associated with using Gen AI in these UCaaS platforms.
AI Is Everywhere
Everyone is talking about artificial intelligence (AI), and all vendors are adding some sort of AI capability to their solutions. Given the hype, we would not be surprised if some enterprising vendors added AI to our “Smartie-O’s” or to our toilet paper.
Levity aside, of the 173 sessions presented at Enterprise Connect 2024, 71 mentioned AI at least once in the session description. It is a timely topic and time for some straight talk on what seems to matter most, at least to us.
Artificial Intelligence and Large Language Models
Before discussing the specific LLMs and how they are used in your UCaaS solution, a short discussion of how generative AI (Gen AI) works and how LLMs are trained might be useful so that you can understand their power as well as their limitations. For simplicity, we use Gen AI and LLM interchangeably in what follows.
First, Gen AI, and the LLMs behind it, rely heavily on the concept of machine learning (ML). Machine learning can include a number of sub disciplines, such as classification, predictive analytics, supervised and unsupervised learning, and deep learning. Generative AI takes advantage of these capabilities and may also incorporate other AI technologies including natural language processing, vision, speech to text, text to speech, and so forth.
The concept of neural networks is very important in training an LLM and using it for Gen AI. Think of a neural network as a computational model that tries to mimic the brain. The brain is a highly interconnected network of neurons. When presented with a stimulus or input, these neurons fire and stimulate other neurons until they elicit some kind of response.
Computational neural networks mimic the brain in that they accept inputs which then stimulate other layers of computational neurons via mathematical functions until an output is computed.
Generative AI models are trained to recognize patterns in data and then use those patterns to generate new, but similar data patterns. For example, if a generative AI model is trained on English words and sentences, it learns the statistical likelihood of one word following another and then uses those probabilities to generate a coherent sequence of words which we call sentences.
Generative AI is used in a number of different disciplines including genetics, developing new pharmaceuticals, language translation, and image generation. For our purposes in this article, we will discuss Gen AI in terms of language – which includes creating sentences, revising text, summarizing text, creating text-based responses from inputs, etc.
Computers Do Not Understand Words
Although we have been discussing words and language, computers understand numbers, not words. When training a generative AI model, words are encoded as numbers. There are three key aspects of words and word phrases which are used to encode words into numbers:
- The semantics of a particular word – what the word means within the context of a sentence or phrase where the word is found.
- The position of the word in the sentence or phrase. This includes both the relative word position in relation to other words and the absolute position of where the word appears in a sentence.
- The attention a word may have in a particular sentence or phrase. Attention essentially “scores” words based on how relevant they are to others in a particular sentence or phrase.
Tokenization
When encoding words into numbers, LLMs use a concept called tokenization to both create a vocabulary and to reduce the number of individual words used when training the neural networks underlying the LLM. In the tokenization process, frequently used words should not be split into smaller sub words; however, rare words should be decomposed into meaningful sub words. In addition, prefixes, suffixes, and commonly used portions of words may be tokenized because this will reduce the number of forms for a particular word.
Tokenization effectively reduces the total number of words required to train a model, and it makes training the model computationally more efficient.
Training an LLM
The tokens that make up the words, phrases, sentences, and paragraphs are encoded based on semantics, position, and attention into a complex set of numerical matrices that are used as inputs to a neural network. When training the network, parameters between input nodes, the hidden layers within the neural network, and the output notes are then computed. While training the model, this computation from inputs to outputs is called forward propagation.
Once the output is computed, it is compared with a desired output or target output. The “error” between the computed output and the desired output is computed, and the parameters in the neural network are adjusted using a process called back propagation. Once the neural network parameters have been adjusted, the input is again pushed through the neural network, the error measured, and the parameters are again adjusted.
The process of forward and backward propagation continues until the error between the computed output and the desired output is “small,” meaning that the neural network generates reasonably well what it was trained to produce.
Neural networks can contain a varying number of parameters. By way of illustration, the number of adjustable parameters in ChatGPT 3.5 is 1.75 billion. ChatGPT 4 has 1.7 trillion parameters, a tenfold increase in the number of parameters used or computed in the ChatGPT 4 model.
Once this error is small enough and the model performs reasonably well, these computed parameters can be used to do interesting things like summarizing a meeting, determining what action items were in a meeting, generating text based on an outline, and so forth. Right now, text-based input is generally used with Gen AI LLMs; consequently, the accuracy of that text, whether it comes from speech-to-text, typing, or some other input mechanism is very important.
Using the Trained Model
It is important to point out that the LLMs in Microsoft Copilot, Zoom AI Companion, Cisco AI Assistant for Webex, and Google Gemini for Workspace are not trained using your company’s data. When used within these UCaaS platforms, these Gen AI solutions all use pretrained models that were trained using data available to the companies who built the LLMs; only the parameters from the trained LLMs are used within these respective UCaaS solutions.
Even though an LLM is trained, it may not be trained to your specific domain or vocabulary. There are processes that can be used to generate more relevant responses, but which do not modify the trained parameters. For example, Microsoft Copilot uses a process called “grounding” which examines your content (files, emails, chat messages, contacts, calendars, meetings, etc.). Copilot modifies the prompts that you might give to the LLM using this information, which is found in the Microsoft Graph, to provide better input phrasing to and better output information from the LLM.
Why Can LLMs Give Different Answers to the Same Prompt
Many people wonder why an LLM might give a different response when given the same input prompt. LLMs have an adjustable parameter called “temperature” which is used to determine how much randomness can exist in the output. Just like people may not give the exact same response when asked the exact same question, LLMs are programmed to have some randomness or variability in the responses they give, all of which are based upon probabilities between words in the sentences and phrases they have been trained on.
By adjusting the “temperature” parameter in the LLM, responses can be made more strict and structured or more random and variable. Users cannot adjust the temperature parameter within the interfaces to the LLMs in the aforementioned UCaaS solutions.
LLM Hallucination
Because LLMs are simply probability models represented by numbers, LLMs and Gen AI do not understand words and phrases like humans do. Consequently, when certain words and phrases appear together, or in a regular recurring sequence, for which the model has been trained, LLMs may exhibit something called hallucination. Hallucination is a phenomenon where the large language model generates a response that is factually incorrect or unrelated to a user’s prompt.
LLMs can and do hallucinate, and they may make mistakes. Therefore, it is very important to review the output of an LLM because you and your organization are ultimately responsible for the content.
Want to know more?
Part Two of this article can be found here; Part Three can be found here.