During the opening keynote event at Cisco WebexOne 2024 on Wednesday, October 23, 2024, Cisco's Executive Vice President and Chief Product Officer Jeetu Patel spoke with Alex Wang, CEO of Scale AI and Fareed Zakaria, host of CNN’s The Global Public Square (GPS) and Washington Post columnist. Jay Patel, SVP & GM, Webex Customer Experience Solutions, in a separate conversation, provided a few potentially contrarian thoughts on the usefulness of proprietary data.
Data, Data and More Data
Wang spoke about his company’s role in the AI ecosystem which consists of three major components: Compute, which companies such as Nvidia work on; Algorithms, which OpenAI, Google, Anthropic and others create; and then Data, which is where Scale operates. “Our role in the ecosystem is to produce all the frontier data – the very advanced data that goes into the training of these algorithms,” Wang said.
Scale was formed in 2016 to address what Wang and his cofounder Lucy Guo realized was perhaps the key limiter to AI systems: data. “I think everybody today recognizes that data is gold, and data will differentiate and power the next generation of powerful AI capabilities.”
This “data will differentiate” statement is critical for enterprises exploring the use of AI either through current deployments with or potential future implementations for their employees and/or contact centers. Wang said that the ‘industry’ has essentially exhausted the entire Internet in training the current crop of models – which means the AI models need new sources of data.
“There are basically two sources of data that will play a role. One is new data that we generate. This will be very advanced data such as reasoning data, agentic data, multimodal data, or robotics data, that you cannot find on the Internet,” Wang said. “And then there's private, proprietary data. GPT-4 was trained on about one petabyte of data, and JPMorgan Chase, just as one example, has 150 petabytes of data. So the amount of enterprise data that's out there is just enormous.”
And while there are certainly millions of companies each with their own many petabytes of data, just because they have that data doesn’t mean it is good or potentially useful.
“There is a working assumption that everybody has proprietary data that is useful. I'm not sure that's true,” said Jay Patel, SVP & GM, Webex Customer Experience Solutions. “It is true that you have proprietary data – because it is yours – but a lot of businesses are, at times, quite similar in what they do. A utility probably has the same data set broadly speaking like another utility – consumption records – but perhaps that’s just a result of the nature of that business.”
The same may hold true in other sectors, as well, where there is a lot of commonalities in what people want to get done and the data they need to get things done. So while data is, of course, important to AI deployments, it may make more sense to move ahead with a deployment rather than spend time, money and effort getting that historical data into perfect order.
“I just wonder whether the mass investment needed to harmonize data, etc., may not worth it in the end. Maybe you can tackle 80% of the problem with 20% of the data and that gets you most of the way there,” Patel said. “So I do think it's critical, but I think there needs to be a bit more critical thinking about how you're going to use it.”
Productivity Gains from AI
Zakaria spoke about the potential productivity gains from AI by citing Robert Solow’s productivity paradox: “You can see the computer age everywhere but in the productivity statistics.”
As Zakaria said, in the 1990s computers were increasingly everywhere but there was no corresponding uptick in productivity. “Productivity is the elixir of economic growth, and that’s comprised of just two things: how many workers you have and how productive they are. Multiply those two, and that's what gives you economic growth,” Zakaria said.
The productivity gains of the 1990s – an average of 9.6 percent per year from 1990 to 1995 – were not caused by the Internet, Zakaria said, because the Internet was just beginning. “It was the old mainframes working their way into the back-office functions, which suddenly made finance and accounting and logistics all much, much faster,” he said.
The recent 2.7 percent productivity growth in 2023 is, according to Zakaria, likely attributable to the software revolution, “what Andreessen called ‘Software Eating the World.’ Software controlling hardware has finally produced new increases in productivity. It'll be 10 years, I suspect, before we start to see the real transformative effects of AI.”
Welcome to the New Nuclear Age?
Zakaria also highlighted the current energy demands of AI. Many estimates appear in a search, but here are a few recent examples:
- Goldman Sachs Research: At present, data centers worldwide consume 1-2% of overall power, but this percentage will likely rise to 3-4% by the end of the decade.
- Barclays: Data centers account for 3.5% of US electricity consumption today, and data center electricity use could be above 5.5% in 2027 and more than 9% by 2030.
- IDC: Global datacenter electricity consumption [is likely] to more than double between 2023 and 2028 with a five-year CAGR of 19.5% and reaching 857 Terawatt hours (TWh) in 2028.
The throughline across all these estimates is that AI is driving massive demand for electricity that current energy production levels is unlikely to meet. “There's simply no way you can do it,” Zakaria said. “The estimated currently energy use for AI is under 1% of GDP. For total energy, the estimate is by 2030 it'll be 10%. That's impossible.”
Zakaria suggested that nuclear power – fission, at least, but perhaps fusion reactors in the future – were one way to solve the power demand. In recent weeks, Amazon, Google and Microsoft have all invested in building nuclear power plants. He also suggested that more power-efficient GPUs would also help reduce the amount of electricity required. “And you’re beginning to see this happen. Google’s GPUs are more efficient than Nvidia’s,” Zakaria said. “Nvidia’s are better [performance-wise], but Google's are pretty good, too.”