There’s a new entrant in the cohort of AI-related terms and acronyms – “AI Factory.” The term does not refer to the use of AI in manufacturing facilities, nor does it refer to a factory in which AI products go rolling off a conveyor belt. “AI Factory” seems to have originated with Harvard professors Marco Iansiti and Karim Lakhani who in 2020 published the article “Competing in the Age of AI,” which later led to a book by the same name.
Fast forward to March 2022, when Nvidia founder and CEO Jensen Huang said that AI data centers process massive amounts of data to train and refine AI models: “Raw data comes in, is refined, and intelligence goes out — companies are manufacturing intelligence and operating giant AI factories.” At the time, Huang was pitching Nvidia’s H100 which was built to speed up the training of large language models (LLMs). In this context, it seemed that the AI factory concept centered on data centers designed to produce and train AI models.
Two years later in March 2024 at Nvidia’s GTC event, Huang announced the Blackwell GPU, the successor to its H100 (Hopper) platform. According to Nvidia, Blackwell enables organizations to run and build real-time generative AI with a fraction of the energy consumption or overhead costs of its predecessor.
Two months later at Dell Technologies World in May 2024, Huang compared the “AI factory” concept to factories that, during the industrial revolution, used water to produce electricity. He said that today’s data centers transform data and electricity to produce intelligence that is “formulated [as] tokens that can then be expressed in any information modality that we’d like it to be.”
Still, the definition of AI factory seems to depend on who’s using it. Dell’s article on how it makes the AI Factory real states AI factories produce “actionable intelligence, fresh content and new insights” that every company will need. It also states that the Dell AI Factory brings AI close to where data resides to “minimize latency, lower costs and maintain data security by keeping sensitive information within a controlled environment.” That statement appears to argue for locked down and maybe even on-prem data centers.
In June 2024, during his Computex keynote, Huang announced that multiple companies would use Nvidia networking and infrastructure for enterprises to build AI factories and data centers that would drive generative AI breakthroughs. During that speech, Huang said:
The next industrial revolution has begun. Companies and countries are partnering with NVIDIA to shift the trillion-dollar traditional data centers to accelerated computing and build a new type of data center — AI factories — to produce a new commodity: artificial intelligence. From server, networking and infrastructure manufacturers to software developers, the whole industry is gearing up for Blackwell to accelerate AI-powered innovation for every field.
(Emphasis ours.)
That quote seemed to tilt the scales in favor of the AI factory term mostly being a marketing phrase to help one of the world’s most valuable companies become even more valuable. But maybe there was more to the story.
To help determine that, No Jitter spoke with Bradley Shimmin, Chief Analyst, AI & Data Analytics with Omdia. Brad has more than 25 years of experience in IT technology, analyzing data and information architectures. His expertise also covers market evaluation and go-to-market messaging, and he has a background in software development and database administration.
No Jitter (NJ): We’ve seen the term “AI factory” popping up and while it sounded to me like marketing speak, I figured it’d be better to speak to someone like yourself who covers AI all the time.
Brad Shimmin (Shimmin): Does the world's largest company even need to market? [laughs] I don't know. It is marketing speak, but it's interesting to hear them say this, because I think if [Jensen Huang] had taken to the stage just a year and a half ago, amid the Gen AI craze, [the term] would have meant something completely different.
Eighteen months or so ago, Nvidia and many [other] companies with a vested interest in building out the infrastructure necessary to build what we call frontier scale models – think the GPT-4o and beyond in scale, number of parameters, etc. – [these companies] would have all said that the [AI factory] is really [about] helping companies build and train these big models. But the market shifted.
You know the concept of the long tail of a market? Companies are always shooting for it because over the long haul, they make more money that way. What we’ve found is that inferencing is a much bigger opportunity than training simply because training takes a lot of time and a lot of data. [Training] happens sporadically, but inferencing happens all the time. And, inferencing is extremely expensive in terms of processing power and energy consumption, etc. So, inferencing [rather than training] is really the big pull on AI.
(Editor’s note: “Inferencing” refers to the activity of running data through a live, already-trained AI model to produce predictions or conclusions.)
If you set that in your mind and think about what Jensen is saying, he's talking about factories that aren't creating AI, but the factories that are themselves AI. And it's more than that, in a way, because when you think about what a generative AI model does, it's in the name — it generates content, whether that’s an image, summarization of text, of code – whatever. Those aren't throwaway.
[What the models generate] are a living part of the enterprise data landscape. You can look at these models in that light as indeed AI factories that are producing data. It even gets more literal in that a great opportunity is starting to open up with the concept of synthetic data. Have you heard very much about that?
NJ: I have, but I don't quite understand: How do you create data synthetically?
Shimmin: Yeah, it's weird, right? Say I work in health care, and I'm building a regression model to predict readmittance to the hospital based on age and other variables. Because of HIPAA requirements, I cannot enter some of the data that would help me to make a better prediction because [patient data] is private.
What you can do is generate that data as synthetic data to help your model produce better outputs. That doesn't break any privacy requirements because [the data isn’t] real – but it works like real data. And this is the real catch, and why it's such an opportunity. This was all happening before generative AI, but with generative AI, the models themselves benefit from the synthetic data in just the same way traditional predictive models do – like that example in healthcare for hospital readmittance. Have you heard of a model called Phi that Microsoft built a while back?
(Editor’s Note: Synthetic data is created programmatically via several techniques: machine learning-based models (including Transformer-based models), agent-based models and hand-engineered methods. Real data is gathered in the real world. Synthetic data is digitally fabricated in a way that imitates actual data in terms of structure, features and characteristics. Synthetic data mimics the statistical properties of the original dataset. There are tools that check for discrepancies between the real data and synthetic data along three metrics: fidelity, utility and privacy. Each of these categories have specific tests that can be used. See the included links for more detail.)
NJ: I actually have not heard of that one.
Shimmin: It's fascinating. It's based on a scientific paper called Textbooks Are All You Need, which is itself a play on an earlier paper that Google put out called Attention Is All You Need. The [Textbooks] paper is really fascinating because it talks about generating prompt and response pairs that you can use to train and model. So you don't actually need real data to train these generative models. You can use synthetic data that other models have either built, scored or ranked to give yourself the cleanest, best data to train your model on.
Let's say you go into a given field, like with Merck searching for a new drug. You can create this synthetic data set that is specific to your domain without breaking any privacy rules and, without actually having data, you can just make more and more data that refines and further refines your models so that they can do a better job than if they didn't have it.
Getting back to Jensen, I think that's really in the spirit of what he's talking about. We're not just building factories that pump out AI, we're building actual AI factories that pump out synthetic data. It's a different view, I think, of what he and anyone else would have taken a year and a half ago. It's very much now a market where companies are realizing that these models, in a way, represent the data that is their corporate data.
IBM took to the stage last month in Boston at IBM Think, and said, [if you take] generative AI models, and start pushing your data into the model to help train it, by either changing its parameters or simply sending it to the model as [retrieval augmented generation] RAG, the model becomes a representation – and that’s a very specific word they were using – a representation of your company's data. That, too, points to what Huang was saying – you're building these factories that are the new representation of your corporate data.
(Editor’s note: In “The Future of AI is open” presentation at IBM Think, IBM’s SVP and Director of Research Darío Gil used the term representation to illustrate how a new IBM/Red Hat new open-source project, InstructLab, which represents and augments enterprise data by creating and merging “changes to LLMs without having to retrain the model.” Gil contrasted the InstructLab approach with retrieval-augmented generation (RAG) and fine-tuning. With RAG, the LLM doesn’t change, it just benefits from grounding on the enterprise’s data. With fine-tuning, the model is altered to suit a use case, so the enterprise potentially ends up with multiple copies of the model which they must then manage. Gil’s comments on data representation begin at about 19 minutes.)
NJ: How is it that the AI factory, or maybe the models themselves, represent what the corporation itself knows.
Shimmin: The one thing that generative AI has really driven home is the fact that there are many representations of knowledge. And by representation, I'm really talking about vectorizing information. Have you dug into the vector databases at all?
NJ: I've heard about them, read the definition and done some digging, but I can't say I fully understand what they are.
(Editor’s note: A vector database is a collection of data stored as mathematical representations; these numerical representations are known as vectors. Read on for a more detailed explanation of how the data turns into vectors.)
Shimmin: It’s cool because anything can be vectorized. An image, a piece of music, a piece of text, can basically be turned into a number. And guess what turns it into a number? A generative model. But the number that’s produced is a huge string of floating-point numbers — very long numbers that form a vector, which is like a direction in a multi-dimensional space, like the XY graph we did as kids.
You take that representation of the music or the audio from a call recording or a photograph or a chart and you vectorize it and feed it into a model. That sequential data can basically be turned into, I want to say, a living part – which I guess it is, in a way, but it earns a representation as a concept within that model.
Anthropic recently published some research they did in mapping out concepts that exist within their Claude [LLM] model. They conducted an experiment where they located the concept of the Golden Gate Bridge inside the model, in the weights of the model, and then sort of amplified those weights, and suddenly Claude thought it was the Golden Gate Bridge. That is a fascinating little rabbit hole.
(Editor’s note: This page details Golden Gate Claude and links to the research paper in which Anthropic researchers began mapping the inner workings of its Claude 3 Sonnet AI model. In the “mind” of Claude, researchers said they found a specific combination of neurons in Claude’s neural network that activate when it encounters a mention (or a picture)” of the Golden Gate Bridge in San Francisco. When the researchers increased the strength of the activation of those neurons, Golden Gate Claude’s responses focused on the Golden Gate Bridge even when it wasn’t directly relevant to the prompt – if asked “what it imagines it looks like, [the model would] likely tell you that it imagines it looks like the Golden Gate Bridge.”)
Think about these models as living, in a way, because you can fine tune them whenever you want, you can show them contextual learning through RAG pipelines whenever you want — so they are sort of living representations of corporate knowledge. And you can see how Huang is not wrong…it’s not hype at all. It's just the reality of the era that we've entered and are fully now living in.
Back in the 70s, the era was the database. Everything lived in the database. We strove to have these perfectly rationalized and normalized databases to represent the value of all the knowledge in the company. And that’s shifting a bit here, because, as you said, these models are starved for data. The data that you're talking about is unlabeled data, the data that you just suck into the model from PDFs and forum posts and Reddit subreddits and all that. But [like] the synthetic data I mentioned earlier, you can have the models document the [unlabeled] data, describe it and turn it into something more valuable.
That's why that paper from Microsoft on Phi and Textbooks Are All You Need is so telling. It's expensive to have humans sit down and label data. Companies have been doing data labeling for a living for quite a while. If you can augment or supplement humans with AI to [turn unlabeled data into labeled data], suddenly you have much more valuable information that these models can consume.
(Editor’s note: Unlabeled data (or unsupervised data) refers to data that lacks distinct identifiers or classifications – i.e., the data doesn't have "tags" or "labels" that indicate their characteristics or qualities. Unlabeled data can contain hidden patterns, structures, and relationships that can be discovered through an unsupervised (machine) learning algorithm. By finding underlying patterns in the unlabeled data, a generative model can produce new instances that are indistinguishable from the original data, leading to applications in text generation, image synthesis, and more.)
NJ: One of the other things I've been wondering about is that all these foundation models, at least, are in the cloud – they're owned by Microsoft, Google, OpenAI, etc. But enterprises want to keep their data to themselves and locked down. So, can these models, or some version of these models, be run on premise, inside of a locked-down enterprise data center facility where they don't need to necessarily send data out into the cloud?
Shimmin: That's part of that shift I was talking about – it's not training, but inferencing – [that’s where the profit-seeking AI companies are shifting to focus on the “long tail” of the market].
That shift came down to innovations that mostly the open-source community made, not just from (but primarily from) the open sourcing of smaller models like Meta’s Llama and that Phi model I mentioned which is Apache open-source software. And it's not alone. Google has Gemma, a family of sub 7 billion parameter models that you can do whatever you want with.
It's not just as a steppingstone to get you into their cloud hosted models. It's legitimately a preferred methodology for many enterprises to use these smaller models, because you can very efficiently fine tune them on some good, labeled data, if you will, to do specific tasks like answer questions on your specific product in a call center. You can do that by taking the call recordings that you've made over the years in your call center, vectorize them, feed them into a model, and train the model on them.
You can run that all inside of a single GPU – an A100 from Nvidia or other [chipsets] – and with maybe 32 gigs of RAM. It'll do the job. I have a bunch of old laptops running several of these models locally. [Some of these models] can fit in space so small that people are running them on the Pi system-on-a-chip. The announcements that Apple made at WWDC were focused on these smaller models – they have their own family of models that are all very small.
The big frontier models are good at being able to generalize and demonstrate emergent capabilities – you wouldn't call it creative thought or planning. It looks like it, but it’s not. The smaller models, if they have really good data and they're trained intelligently, like Microsoft Phi’s model, which I think is sub 2 billion, by the way, so very tiny, can, for specific tasks, be better than a big model. This is because you can more easily customize [the small models], they can be told to run totally privately and they're extremely fast and efficient at inferencing.
You can do all sorts of magical stuff by feeding the first five books of Harry Potter into Anthropic’s Claude 3.5 model, but the inferencing cost of that is astoundingly high – not to mention the time and energy spent. But with some of the techniques that the open-source community is inventing as we go, you can build your own huge context window into a smaller Llama model and run that on your MacBook Pro.
I manage a database that tracks job postings for AI and the skills and techniques, etc., that people are hiring for, and the most popular model skills that the companies want is not for OpenAI, Microsoft or Google, those big frontier models. Instead, it’s for the Llama family of models. That's the top desired skill set, because those [models] can be secured, you can put them into the software that you want without worrying about what Meta might do later down the road with things like updating the model.
The API services like Claude, OpenAI’s ChatGPT, Gemini and others, change over time. You've probably heard many people complaining about how ChatGPT gets stupider as time goes on. It's because it's constantly evolving. If you're an enterprise and you need a model that reads receipts and outputs JSON documents with the numbers from those receipts, and to do so repeatedly and consistently, you're not going to use ChatGPT for that. You'll use a small model that is honed for that job.
NJ: With respect to AI factories and all the various topics we've been talking about, what do you think our readers should know if they're going into a buying decision, or an RFP phase? I know there’s lots of different topics in there, but maybe at a high level.
Shimmin: At a high level, they should do a couple of things. First, they should adopt an open stance toward the technology stack they build, meaning they should build a stack that they know is going to be obsolete in a couple of months and prepare for that because it's just moving so rapidly that you must do that.
Relatedly, they need to build their projects to fail and understand that to move in step with the market, they can't take 8 to 10 months to roll out a solution. And frankly, given the capability of generative AI and its applicability across many use cases and the way that it can be assembled, not built, means you don't have to hire a team of data scientists to get some natural language processing done.
For a call center, for example, you want to build, iterate rapidly and fail, and move forward in a flexible manner. That's why these open-source tools have such value. There's not a tremendous amount of lock-in in them, and you can move at your own pace, and not be trapped into well, wherever Adobe takes you, for example.
So, it's build, build to be open, build to be iterative, and build to fail.
The folks you write for are not dealing with problems that are completely unique to them or that aren't found in other verticals and horizontal use cases. They should look around and find examples of innovation and know that it's not going to take them a year to catch up with that. If they sit down with Claude 3.5 with Artifacts – Artifacts is a new capability they recently released – and have it write the front-end software and the back end code, they can basically build their own generative AI with generative AI solutions. They can move quickly with it. It's not a market they need to take a step back on.
Where they need to spend their time is on the data, making sure that they have high quality data to bring to these models, to train them, and also to fine tune them, and to do context learning through RAG and other techniques. That's really where they should be spending their time. In other words, if a company is trying to build their own chatbot for call center, and they can't correctly parse the data [from] the the recordings that they want to use for [the bot], it's just going to create a crappy model. It doesn't matter how good the model is, if you feed bad data into it, it's not going to be what you want.
NJ: In this context, what does good data mean?
Shimmin: Well, in the realm of generative AI right now, you would start [by answering this question]: Is it free of any sort of legal impingement? Is it something that I don’t have to worry about getting sued for later? Second, the data must be appropriate. [To give an extreme example], you can’t use data from math quizzes to train a model to take an order through the drive through.
Then you move on to the quality of the data. If it's inconsistent, or filled with errors or bad punctuation, or it's missing or even creating significant holes in terms of knowledge, then guess what happens with the model? As we've all seen with the concept of hallucinations, the models will only hallucinate when they don't have a representative pattern in their weights in the trained model. Like a trained golden retriever, they'll do their best to bring the paper back. If there's no paper, then the slipper comes back.
After quality, well, there’s a huge laundry list of these things, but it's really: Are you free to use the data? Is the data appropriate? Is it of high quality?
The problem of high-quality data is no different than it was [a few years ago]. What's different now is our ability to use generative AI to get more out of the data we have. You’ve heard of knowledge graphs, I'm sure. When building those property graphs of connections between elements, well, guess what? Generative AI is really good at doing that.
And we're just talking about creating a set of nodes and edges that represent the connectedness between items in your data set that you know. Before now, you had to write some really complex queries and hire some really smart people to figure that out. Because these models are so good at searching these huge opportunity spaces, so to speak – whether it's looking for molecules that haven't been made before for Merck and other pharma companies, or whether it's creating property graphs for building a knowledge graph – put [the models] in the AI factory, and let that AI factory chunk out valuable information.
Want to know more?
This article provides an overview of the AI-generated responses that Google pulled and fixed. This is how Google responded to the kerfuffle.
Per this IBM article, synthetic data is data that has been created artificially through computer simulation or that algorithms can generate to take the place of real-world data. This new data can be used as a placeholder for test data sets and is more frequently used for the training of machine learning models because of its benefit to data privacy.
This is Microsoft’s Phi-3 family of open small language models (SLMs). And here is the link to the paper: Textbooks Are All You Need, as well as a link to the Attention Is All You Need paper written by Google researchers in 2017 that proposed the Transformer architecture on which today’s generative AI models are based.
This article provides an overview of a knowledge graph, which is a “graph-based data model to store details about entities, the relationships between those entities, and groupings or categorizations of those entities.” A property graph is a “type of graph model where relationships not only are connections but also carry a name (type) and some properties.”
This article provides an overview of vector databases and retrieval augmented generation (RAG). This research paper dives directly into how vector databases and LLMs can be used together to address the some of the challenges of LLMs including “hallucination, bias, real-time knowledge updates, and the high costs of implementation and maintenance in commercial settings.”
This IBM article quotes IBM’s SVP and Director of Research Darío Gil who said that “less than 1% of enterprise data has made it into AI models” and that same article quotes Gil’s presentation in which he said that “businesses can begin to build a representation of their data that can be fed into models to solve their most pressing issues.”