Multiple macro trends aligned to create unprecedented pressure on companies to seek new efficiency gains. The transformational potential of generative AI architectures in mining, analyzing, and leveraging operational data presents a unique opportunity for enterprises to not only streamline their operations but also discover untapped avenues for growth. This article delves into a comparative analysis of various generative AI architectures, examining their capabilities and limitations in the context of KPI optimization.
Our survey spans from Large Language Models to sophisticated DB Knowledge Mining Agents, encompassing intermediary models like fine-tuned LLMs, Retrieval Augmented Generation (RAG), and LLMs coupled with code interpreters. Each architecture is qualitatively evaluated against a standard scorecard focusing on dimensions such as grounding/validation on operational data, Precise Quantitative Insights, new knowledge discovery, transparency and explainability, knowledge update frequency, responsiveness, ideation bias reduction, and DB analysis scale.
Key insights highlight that while Vanilla LLMs offer high simplicity and responsiveness, they fall short on grounding/validation on operational data, precise quantitative insights, and ideation bias reduction. In contrast, DB Knowledge Mining Agents excel across all evaluated dimensions, demonstrating superior capability in grounding their outputs in real, operational data, providing Precise Quantitative Insights, discovering new knowledge, and managing biases effectively. However, this comes at the cost of higher technical complexity and requires more time to integrate.
This whitepaper underscores the importance of selecting the right generative AI architecture tailored to an enterprise's specific needs and operational challenges. It provides actionable insights into how businesses can leverage AI to uncover hidden patterns in data, enhance decision-making, and ultimately achieve operational excellence. As AI continues to evolve, understanding these architectures' nuanced strengths and limitations will be crucial for businesses aiming to stay at the forefront of innovation and competitive advantage.
A comprehensive study by MIT Sloan Management Review and BCG, involving over 3,000 respondents from more than 25 industries, found that organizations employing AI to improve or create new KPIs realized enhanced business benefits. AI-driven KPIs are more forward-looking and connected, leading to substantial improvements over legacy performance metrics.
The research underscores that AI-enriched KPIs, or "smart KPIs", offer predictive insights and situational awareness, fostering better coordination among corporate functions. This points to AI's capacity for providing quantitative insights and managing biases by producing more aligned and predictive KPIs. Examples include General Electric and Sanofi, which utilize smart predictive and prescriptive KPIs for forecasting and corrective actions, respectively (MIT Sloan).
The application of AI in developing KPIs has also shown to discover interdependencies among indicators, suggesting AI's capability for new knowledge discovery. By creating KPI "ensembles" that bundle distinct KPIs for connected business activities, AI helps uncover hidden patterns and relationships, thereby grounding decisions in operational data and enhancing cross-functional performance The emphasis on making KPIs more visible and transparent as highlighted in the research suggests that AI can contribute to greater transparency and explainability in operational decision-making. This aligns with the requirement for AI systems to be understandable by humans, particularly in how decisions or recommendations are derived.
As enterprises navigate the promise and complexities of integrating generative AI into their workflows, choosing the right architecture is one of the critical factors of successful implementations.
We would like to survey the capacity of various architectures to leverage knowledge hidden in enterprise’s operational data. This knowledge is essentially patterns reflecting underlying factors that affect the organization’s most important metrics.
For each architecture we will highlight the key capabilities as well as the challenges and limitations, and summarize them in a score-card across the following dimensions:
Our objective is to provide a qualitative comprehensive assessment covering both technical and practical aspects of each architecture. The survey is based on a large number of conversations with industry experts from hyperscalers and system integrators, enterprises in various stages of Gen-AI implementation and our research team.
We selected a range of AI architectures for analysis, including LLMs, Fine-tuned LLMs, LLMs + RAG, LLM + Code Interpreter, and DB Knowledge Mining Agents. These were chosen based on their prevalence in current discussions and applications in the field of AI and their potential impact on KPI optimization.
These criteria were selected to cover the range of capabilities necessary for effectively leveraging AI for KPI optimization in enterprise environments.
The evaluation considers how well each architecture performs across these dimensions, identifying their strengths and weaknesses in the context of optimizing KPIs and leveraging enterprise operational data for strategic insights and operational improvements.
The table presented below offers a summary of the scorecards for the different generative AI architectures evaluated in this article, focusing on their capacity to generate insights for operational enhancement derived from database analysis
Here's the chart showcasing the different characteristics of the surveyed architectures across the discussed metrics:
The following sections provide a detailed overview of each one of the architectures and offer a score-card with explainations behind each one of the scores.
We start our architecture survey with the simplest architecture, which simply involves a high-performance Large Language Model (LLM) such as GPT-4, Claude Opus or Gemini Ultra, all which have demonstrated emergent common-sense reasoning capabilities. When combined with their vast knowledge, it’s natural to consider them as useful tools for bouncing ideas off. They can read customer feedback, provide qualitative insights into user engagement issues, or propose a personalized language for a marketing campaign. These models excel in generating coherent narratives and engaging content, leveraging their extensive training on diverse datasets. However, their application in operational decision-making faces critical challenges, primarily due to their limitations in grounding and validation against real operational data.
Real-world Implications: A key risk to consider is basing decisions on inaccurate or entirely fabricated insights. For instance, in a business context, an LLM might generate a convincing analysis of market trends that, in reality, is not supported by current data. This misalignment can lead businesses to pursue ineffective strategies, allocate resources suboptimally, or miss out on crucial market opportunities.
Scorecard:
Mitigation Strategies: Despite these challenges, there are strategies that businesses can employ to leverage the strengths of LLMs while minimizing risks. These include approaches combining LLM outputs with analytical tools, continuous training and update, RAG and others.
What is fine-tuning?
LLM fine-tuning is the process of adjusting LLM weights based on a specific dataset or task to improve its performance in that domain. This allows the model to generate responses that are more accurate or relevant to the fine-tuning data, enhancing its applicability to specialized topics or industries. Fine-tuning a large language model involves preparing a specialized dataset and then training the model on this data to adjust its parameters, enhancing its performance on specific tasks while employing techniques to prevent overfitting. This process refines the model's abilities, making it more adept at the targeted tasks without losing its general applicability, often requiring iterative evaluation and adjustment to achieve the desired balance.
LLMs can be fine-tuned with structured operational data from CRMs, ERPs, and logs. This significantly enhances their ability to contextualize and interpret the unique nuances of a business. This adaptation enables LLMs to provide more contextualized and better articulated insights, automate nuanced customer interactions, and support decision-making processes.
However, the application of these fine-tuned LLMs, is marked by inherent limitations. They excel in identifying textual patterns, while are limited in analysis of time series, geo-spatial or complex relational structures. Thus, while fine-tuned LLMs can infer from several data modalities that can be easily serialized as text, offering qualitative insights based on the LLM’s “intuition” and biases, they generally do not have the ability to quantify the insights, including their impact, significance, confidence and correlation with the KPI metric. In addition, they are still limited to only small portions of the data due to the context window length limitations.
Understanding the strengths and limitations of fine-tuned LLMs allows businesses to leverage them most effectively. These models are invaluable for generating insights from textual data, automating responses based on pattern recognition, and identifying potential trends or issues from historical and contextual analysis. For holistic insight discovery, especially in the presence of observational time series data (e.g. transactional data), fine-tuned LLMs lack the required quantitative time series analysis skills. Integrating LLMs with dedicated analytical tools and models can offer a comprehensive approach. This synergy between LLMs' text-based analytical prowess and traditional quantitative analysis ensures a holistic strategy for data-driven decision-making, maximizing operational efficiency and strategic foresight.
Scorecard:
The integration of Retrieval Augmented Generation (RAG) allows for the augmentation of LLM outputs with up-to-date, contextually-relevant information fetched from documents. This ensures that the AI system is not just relying on its pre-existing knowledge (which might be outdated) but can access and incorporate current data. In the context of e-commerce, RAG could be utilized to fetch the latest customer reviews or product information, providing a more accurate and grounded understanding of the causes behind cart abandonment. This method helps mitigate some of the limitations of LLMs by reducing hallucinations and producing responses based on up-to-date information. For example, internal reports on factors impacting user engagement may be incorporated into the context. It’s important to note that manually produced reports may be outdated, biased and insufficiently granular.
Scorecard:
The code interpreter is capable of executing simple code that the LLM generates.
This architecture combines a code interpreter with the LLM, just like you get on the paid version of ChatGPT and Gemini Advanced.
In the context of KPI optimization, the LLM can generate ideas for relevant database queries, then execute them using the code interpreter.
It’s important to note that ChatGPT’s code interpreter runs in an isolated environment, and therefore can’t communicate with any enterprise database.
However, implementing the same architecture in the enterprise environment, with access to the operational systems like the CRM and the ERP offers some unlocks the ability to evaluate the insights on the actual data.
For example, the AI could identify a common complaint among users about a cumbersome checkout process. The achilles heel of this approach is that it doesn’t solve the “ideation bottleneck”, as it will only test the limited and biased set of ideas that the LLM will generate. DB knowledge mining agents represent a robust approach to resolve this bottleneck.
Scorecard:
“I think AI agentic workflows will drive massive AI progress this year — perhaps even more than the next generation of foundation models. This is an important trend, and I urge everyone who works in AI to pay attention to it.”, Andrew NG on DeepLearning.ai.
DB Knowledge mining agents combine LLMs, RAG, a code interpreter and a systematic, unbiased hypothesis generation component.
This architecture involves the addition of a systematic hypothesis generation and testing process, which generates Billions of queries that combine, transform and aggregate data from multiple sources, measure correlations and discover meaningful patterns. These patterns are then translated to natural language and are fed into the LLM through RAG or in-context. For instance, in e-commerce, this means not only understanding the textual feedback from customers but also analyzing transaction logs, user interaction data, and product performance metrics to identify patterns and possible correlations and drivers of KPIs as well as reason about and propose potential actions to improve the KPIs. For instance, this method may discover that cart abandonment is higher by 30% when the user lives within half a mile from a grocery store. By running billions of queries, DB knowledge mining agents can pinpoint specific factors that lead to abandonment, such as price sensitivity, product availability issues, or checkout process friction, and suggest actionable strategies to address them. This additional knowledge provides critical grounding for LLMs, reduces hallucinations, provides Precise Quantitative Insights and provides additional transparency and explainability by showing the supporting evidence.
The screenshot below shows the enhanced conversational experience enabled by DB knowledge mining agents.
Scorecard:
In conclusion, our exploration of generative AI architectures for KPI optimization reveals a spectrum of capabilities, from the basic insights of Vanilla LLMs to the advanced analytical power of DB Knowledge Mining Agents. While simpler AI models offer a starting point, their limitations in operational grounding and ideation bias reduction: highlight the necessity for more sophisticated systems.
Among the architectures examined, DB Knowledge Mining Agents stand out for their exceptional ability to derive actionable insights from vast data sources, showcasing the potential for AI to significantly enhance decision-making and operational efficiency. However, their complexity and the challenges of integration should not be underestimated.
The path to effectively leveraging AI for KPI optimization is multifaceted, requiring a strategic approach that aligns AI capabilities with business objectives, alongside a commitment to ethical considerations and ideation bias reduction. As we move forward, the integration of AI into business practices offers not just improved operational metrics, but a reimagining of business strategy and performance in the AI era.
We're actively working on creating a comprehensive set of benchmarks that quantitatively compare different approaches and architectures across a meaningful array of datasets and use cases. These benchmarks aim to provide clear, data-driven insights into the performance, efficiency, and effectiveness of each architecture in real-world scenarios. We hope to equip technology leaders with actionable information that can guide the strategic implementation of Gen-AI in enhancing operational excellence.
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis