No Jitter is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726. Aims to Address Cross-Border Markets With Voice Recognition


A soundwave
Image: Yevhenii Yarmolenko -
One of the realities of a global economy is that the communications tools designed for one geographic market don't always meet the needs of another. The company, which provides a customizable speech recognition engine to enterprise contact center customers, has developed speech and language technologies for text messages and voice calls in enterprise contact centers and begun a focus on the southeast Asian call center market.
The company's six applications are bundled into its "SeaSuite" and include a framework for automated chatbot responses, speech-to-text (STT) transcription feature that can be customized to understand different languages, and an automated response tool for contact centers, and it recently announced an expanded partnership with Twilio to offer SeaX, a contact center bundle aimed at multinational businesses in southeast Asian market.
No Jitter recently asked co-founder Xuchen Yao about why chose to focus on the southeast Asian market, and what the exciting milestones and obstacles are in speech recognition technology as part of the contact center.
Responses have been edited for clarity.'s calling card is a customizable speech recognition engine. In which ways can this engine be customized by your customers?
With so many digital channels and how different customer support looks today, we saw customization as the key to and our offering. Not only is our speech-to-text engine fully customizable, but our text-to-speech engine is also too. Having this ability is extremely helpful when a contact center needs to pronounce an agent’s name or a product name or a customer name correctly. We are able to reduce the last 5% of errors when compared with using off-the-shelf AI.
SeaX is the contact center tool, which is part of your overall SeaSuite. It features a tool that call center agents can use to transcribe and categorize incoming calls from customers. What are your plans to move to automatic transcription and categorization using speech recognition?
We built automatic transcription and categorization into SeaX, and it is available now as part of our meeting intelligence system, SeaMeet. It allows for extracting topics and follow-up actions from a call, detecting both customer and agent emotions and sentiments from a conversation, and coaching agents to be better objection handlers or service representatives.
All of these advanced features are enabled by our AI engines: SeaVoice (for speech recognition and synthesis) and SeaWord (for natural language processing).
Because’s tech stack is built in-house, we can deeply customize and ensure integration is seamless. Our goal is to deliver a very smooth experience to both customers and agents. Apple’s strategy of building everything in-house — hardware, software, CPU, GPU – very much resonates with us.
You're aiming at the southeast Asian market, calling it "underserved." How do you see Seasalt gaining traction and owning the market there?
For perspective, southeast Asia has the 4th largest population as a group after India, China, and the EU globally, yet businesses struggle with the availability of speech and language technologies to enable them to easily scale or engage with not only their own country or regions but globally as well. The consequence of this shows up in the fragmentation we see in the market.
If you look at multi-channel messaging, each country has its own popular chat app as the dominant platform in-country. Look at Zalo, Line, KaKao Talk, Telegram, or WhatsApp for example. A second major factor in fragmentation is the sheer number of languages across the region. Southeast Asia has 12 official languages with most of which are low-resource.
(Editor's note: low-resource languages are languages that are native to a sizeable number of people, but which may not have substantial amounts of data sets for training an AI model.)
From a tech/product perspective, SEA (and the Asia Pacific region too) pose the biggest technical challenges to contact centers with multiple languages and countless dialects. At, we created SeaSuite to tackle that issue. Our thought was that if we could make that work then the rest of the world would be easier, especially in regions that face similar challenges like Eastern Europe and Africa.
As we see growth in key markets like Taiwan, the Philippines, and Vietnam, we are excited to see collaboration and opening up again. Our team speaks seven different languages and meetings run from native English to a Taiwanese local language. We are uniquely suited to experience and train our technology to ensure we execute and deliver effectively. Many of our own internal dilemmas as a company are exactly the same challenges our customers are experiencing. We know we can solve it.
We certainly didn’t focus on southeast Asia first because it was easy, but because it’s very hard — but we genuinely believe it has made us that much better. As a result, is positioned to bring our product to a similarly challenging market in Europe or a super competitive market like the U.S.
The company founders have a background in voice and speech recognition software, with previous projects bringing wake words to products and building out chatbot engines. What do you see as the most exciting milestones in voice and speech recognition technology? What are the biggest obstacles?
In the past few years, the most exciting milestones have been the end-to-end deep learning of speech recognition models and its democratization due to open source and more affordable GPUs. With the help of (speech recognition software) Kaldi and recent technology and dataset development, open-sourced speech recognition models are now able to compete with commercial models from Google, Amazon, and Microsoft that we are seeing enter the market.
The biggest obstacles in speech are:
  • Understanding the semantics of speech. Getting voice transcribed correctly is table stakes at this point but understanding context and hidden meaning are far more challenging without world knowledge.
  • Building speech recognition at an extremely low cost. Using Kaldi and recent developments in speech recognition technologies, it’s not very difficult to build “a” speech recognition engine. The challenge is to build speech recognition for languages at an extremely low cost, especially for a small team. The trick is how to get annotated training data for all these languages. How can we train the models cost-efficiently without a crazy amount of GPUs? How can we reuse old trainings to save some training costs?
We are adding what we learn and are continuing to research into’s offerings as they are ready for prime time — we can handle low-resource languages well when compared to other vendors we see in deals. We are both active in Open Source speech as well as research — we continue to work with Johns Hopkins projects and support those efforts to this day by mentoring students, co-authoring papers, and participating in workshops together.