Why LLMs Struggle with Enterprise Data

The Challenge of Turning Tabular Data into Insights

Before it can be used effectively, tabular data typically needs to go through multiple preprocessing steps - such as cleaning, labeling, removing duplicates, and more - depending on the specific use case. Once prepared, traditional machine learning models like XGBoost are commonly applied to extract insights or make predictions. While these models perform well on standard tasks like classification and regression, they typically require supervised training tailored to each new dataset and objective, making them less flexible and harder to scale across large enterprise databases.

Thanks to their breakthrough performance on text-rich tasks, Large Language Models (LLMs) are now showing promising potential in automating parts of the data pipeline. Unlike traditional models, LLMs can interpret large volumes of text without requiring task-specific architectures and retraining. This may seem as a compelling opportunity for businesses, promising to drastically cut the manual effort involved in the data pipeline and deliver valuable insights instantly.

Here’s the catch: the magic of LLMs quickly fades in real-world settings: private enterprise data is very different from the public, text-heavy datasets LLMs are typically trained on. It’s messy, noisy, heterogeneous, and deeply tied to internal processes. We're talking millions of rows and columns, tied with complex, domain-specific relationships that constantly evolve, far from the mostly unstructured data that powers LLM training.

Recent research confirms the challenge: while LLMs perform well on public datasets, their performance can drop dramatically - sometimes by more than 85% - when faced with real enterprise data [1, 2, 3]. For example, an SAP study showed that popular LLMs like GPT-4, Claude 3.5, and Llama 3.1 performed well on public benchmarks but struggled on actual customer tables, with performance plummeting almost to zero when internal knowledge was incorporated (see Figure 1).
‍

Figure 1 (source: [1])

‍Why LLMs Aren’t Enough?

While LLMs have shown impressive results on many tasks, they face significant limitations when applied to real-world enterprise data. Here are some of the reasons why they often fall short in these complex business scenarios:

Enterprise tasks are more complex than public benchmarks: They’re often ambiguous, involve multiple steps, and are tailored to specific business needs, making them far more challenging than the datasets used in academic benchmarks.

Enterprise data relies on internal knowledge: Custom column definitions, company-specific logic, and other proprietary context are usually missing from public sources, limiting the ability of LLMs to fully understand this data without additional business knowledge.

Enterprise data includes mixed data types: Since LLMs are mainly trained on text, they often have difficulty accurately interpreting mixed data types, particularly numerical values, which is a common and important feature of enterprise datasets.

LLMs are structure-agnostic: They treat input mostly as sequences of text tokens, without built-in understanding of their underlying structure - such as rows, columns, or relationships between data points - that naturally exist in enterprise tabular data.

The Rise of Tabular AI

To bridge this gap, we need models specifically designed to address the unique challenges of enterprise tabular data. Tabular AI is an emerging and rapidly growing field in AI focused precisely on this: building models from the ground up to handle large, complex, and highly structured datasets. Unlike LLMs, which rely primarily on unstructured text sequences, Tabular AI can help reason over high-dimensional structured data and ultimately provide business insights that traditional methods might miss. It’s an exciting domain that’s growing fast, and one that can truly make a difference for companies ready to leverage their data more effectively.

The Tabular AI Revolution

🚀 At Neuralk-AI, we’re excited about the progress we’ve made in driving the Tabular AI revolution and can’t wait to share what we’ve been working on. Stay tuned for more updates coming soon!
‍

References

[1] Jan-Micha Bodensohn, Ulf Brackmann, Liane Vogel, Anupam Sanghi, and Carsten Binnig (2025). Unveiling Challenges for LLMs in Enterprise Data Engineering. arXiv preprint arXiv:2504.10950.

[2] Moe Kayali, Fabian Wenz, Nesime Tatbul, and Çağatay Demiralp. 2025. Mind the Data Gap: Bridging LLMs to Enterprise Data Integration. In 15th Annual Conference on Innovative Data Systems Research.

[3] Jan-Micha Bodensohn, Ulf Brackmann, Liane Vogel, Matthias Urban, Anupam Sanghi, and Carsten Binnig. 2024. LLMs for Data Engineering on Enterprise Data. In Proceedings of Workshops at the 50th International Conference on Very Large Data Bases, VLDB 2024, Guangzhou, China, August 26-30, 2024. ‍

‍

Press

Neuralk-AI featured in the Business Times

A joint interview with Business Times: "Created a year ago by Antoine Moissenot and Alexandre Pasquiou, Neuralk-AI develops AI models that help data science teams improve their predictions across a range of concrete use cases"

November 21, 2025

Why LLMs Struggle with Enterprise Data — and How Tabular AI Fixes It

The Challenge of Turning Tabular Data into Insights

‍Why LLMs Aren’t Enough?

The Rise of Tabular AI

The Tabular AI Revolution

References