As 2023 rolled along, and the generative AI announcements began to accumulate, the wind picked up and drove vendors toward the necessary discussion about the data on which the models were trained, how the models were trained, what biases may have been introduced, and how those inputs (among others) might affect the outputs of those LLMs – toxicity, bias, hallucinations, factually wrong responses, etc.

According to Omdia’s AI and intelligent automation research director Natalia Modjeska, one major challenge companies face is data. “Specifically, high-quality, scalable, reliable, and trusted enterprise data to fine-tune and ground foundational models to make them usable. We know that many organizations still struggle with that because of the chronic underinvestment in data quality, management, and governance.

“I like to say that one can build a beautiful generative AI house and maybe even host a house-warming party, but you won’t be able to live there unless the house has a proper foundation and unless utilities such as water and gas mains, telecommunication, etc., have been laid in the ground. Data is that foundational infrastructure for AI.”

Related to that, enterprises have grown increasingly wary of how their data might be exposed on the Internet and/or how their data might be used, without their consent or perhaps knowledge, to train the LLMs. According to a recent article by Metrigy analyst and frequent NJ contributor Irwin Lazar, “IT and business leaders continue to express concern over the training of models and how company data is protected. We expect vendors to continue to message (and differentiate) around their security, governance, and compliance capabilities and partnerships.”

Focus on AI and Data in No Jitter Roll

Over the course of 2023, responsible AI, usage policies, protecting personally identifiable information (PII), managing data, etc., all became part and parcel of the discussion around generative AI and how it uses data. A few examples include:

For Further Reading

Much of NJ’s Conversations series dealt with asking many of the companies directly involved in the incorporation and implementation of AI and generative AI-based solutions about how the capabilities of those models could be used and controlled. A few key points included:

The need for data security, guardrails, etc.

Asking questions, at least, about the data used to train the models.

Keeping “humans in the loop” to evaluate what the generative AI “engine” produces as a backstop against toxicity, hallucination, incorrect information, etc.

Also look at these articles regarding the use of data by AI: