If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart.
-- Nelson Mandela
I would venture to guess that most people had their first encounter with natural language processing (NLP) when Apple added Siri to the iPhone. Starting with the iPhone 4S, you could ask "her" simple questions such as "Who was the 12th president of the United States?" (Zachary Taylor) and "Will you marry me?" (We hardly know one another). Personally, I use Siri on a near daily basis for getting me to where I need to go and finding the best Indian, Thai, or Mediterranean restaurant once I arrive there.
Of course, NLP isn't limited to iPhones, today. You can now talk to your Android devices, and contact centers are increasingly adding automated "Tell me what you are calling about" functionality. It's not out of the realm to envision a world where typing becomes as old fashioned as rotary telephones and stick shifts.
The Basics of Natural Language Processing
To understand NLP, it's important to know what's going on underneath the covers. While a detailed look at NLP is beyond the scope of this article, there are a few simple concepts that should supply most people with enough knowledge to consider themselves dangerous.
First, there is the intent. As it implies, intent is the intention conveyed by the user. For instance, "weather" is the intent of the question, "Will it rain today?"
You can classify intents into two groups. Casual intents are like small talk. Greetings such as "hello" and "goodbye" are casual intents. If I say "Hi" to a text bot, an appropriate response might be "What can I do for you today?" The same can be said for affirmative and negative responses -- "Yes," "Thank you," and "Not today" fall into the casual intent category.
The second group, business intents, correspond directly to the focus of the statement or conversation. "When will my package arrive?" would direct the NLP computer to return a date or send a tracking number.
The next big concept of NLP is entities. An entity is the metadata of an intent.
Like intents, entities come in multiple flavors. You can think of a nominal entity as a noun. For example, car is a nominal entity. So are city, book, movie, and person.
A named entity is more like a proper noun. Using my nominal entities above as examples, their named entity counterparts might be Chicago, "The Great Gatsby," "Love Actually," and, of course, "Andrew Prokop."
Returning to my "Will it rain today?" question, if weather is the intent, rain, snow, and hail are valid entities.
Composite entities consist of a number of component entities. Size, color, brand, and category could be the component entities for a product details composite entity.
Teach Me
Once you've designed your intents and entities, the next step is to train your system. This requires you to ask a series of questions that the NLP systems might encounter. For a weather bot, you might enter the following:
"Will it be sunny today?"
"What are the chances of it raining today?"
"Is there snow in the forecast?"
"Do I need to bring an umbrella to work?"
These questions train the system to the many ways that a user might ask for the same information. A simple rule for training is that you can never provide the system with too much data. Additionally, training is not a once-and-forget operation. A well behaved NLP system must be trained and retrained throughout its entire lifecycle.
NLP for the Masses
I learn best by doing, so I was overjoyed when I was told about Facebook's wit.ai NLP engine. Wit.ai is a free cloud service that provides developers with an easy-to-use console to create intents and entities, and then train them into an "application." I put application in quotes because a wit.ai application isn't an application in the traditional sense. It's more like an intelligent database of information that can be utilized by a "real" application to turn spoken and written languages into their actionable components.
Since wit.ai is a cloud service, that means you would use Web services to interact with it. While wit.ai exposes APIs to manage intents and entities, more importantly to the end-user application are the APIs that pass human speech to the service for processing.
For educational purposes, I created a wit.ai logistics "application" and then wrote Python code to ask wit.ai questions about product delivery. This code can be easily placed inside a text bot to interact with a human being who is interested in products that he or she has purchased.
Getting Geeky
Do you remember diagraming sentences in elementary school? As you might recall, you visualized the parts of a sentence by separating the subject from verbs, gerunds, participles, predicate nominatives, etc. Once diagramed, you can literally see the sentence. This allows you to be certain that your sentence is saying exactly what you want it to say.
Image from Wikipedia
Wit.ai isn't all that different from sentence diagraming. In my application, I send it sentences and it returns a JSON representation of those sentences parsed into its actionable components.
For instance, if I sent wit.ai "Send my package," it returns:*
* Note: I am leaving out most of the data that wit.ai returns to keep this simple to read.
Wit.ai knows that this sentence has something to do with delivery, but not much else. Of course, I didn't give it much information to work with. So, let's upgrade the sentence to "Send my package to 708 Lincoln Avenue, Saint Paul, Minnesota." With this new data, wit.ai returns:
Notice how wit.ai has pulled out the address portion.
Let's take this even deeper with "Send my package to 708 Lincoln Avenue, Saint Paul, Minnesota tomorrow morning."
I still receive:
But I also receive:
Not only do I still receive location, but I now have a date and time for that shipment. Also, notice that wit.ai feels pretty confident that the data it is providing is correct. This parsed data can now be fed directly into an order entry system without any human interaction, reducing and even eliminating the risk of human error.
While wit.ai supports text and speech (my Python application supports both), I've found that it parses text with a high degree of accuracy, but speech is hit or miss. While mostly decent, there were quite a few addresses it never understood. For instance, Arapaho Road always turned into something like "Are wrapping hope roar road."
Mischief Managed
It's important to realize that the goal of NLP is not to take action on the data it parses. That job is left to the next piece (or pieces) of software down the line. In the case of my application, it asks wit.ai to parse the data, but it relies on other services (e.g. Open Weather Map) to answer the user's question. However, knowing that the user wants to know about "rain" in "Saint Paul, Minnesota" "next Thursday" makes an application's job simple.
So, where is this technology useful? A better question might be, "Where isn't it useful?" From help desks, to contact centers, to text bots, to whatever, processing language into actionable components is the future of communication. In fact, I will venture to say that within the next several years, it will be nearly impossible to know "who" you are talking to when you make that customer service call or chat. For better or worse, machines will become indistinguishable from humans.
Follow Andrew Prokop on Twitter and LinkedIn!
@ajprokop
Andrew Prokop on LinkedIn