Commerce organizations rely on vast amounts of structured data to run their operations. Every new product listed, customer interaction, and sales transaction generates data, most of it stored in a familiar format: tables. Whether it’s product catalogs, sales histories or customer information in a CRM, this raw tabular data forms the backbone of modern retail operations. Yet, turning it into actionable insights remains one of the most time-consuming and costly challenges that commerce organizations continue to face today.
Before it can be used effectively, tabular data typically needs to go through multiple preprocessing steps - such as cleaning, labeling, removing duplicates, and more - depending on the specific use case. Once prepared, traditional machine learning models like XGBoost are commonly applied to extract insights or make predictions. While these models perform well on standard tasks like classification and regression, they typically require supervised training tailored to each new dataset and objective, making them less flexible and harder to scale across large enterprise databases.
Thanks to their breakthrough performance on text-rich tasks, Large Language Models (LLMs) are now showing promising potential in automating parts of the data pipeline. Unlike traditional models, LLMs can interpret large volumes of text without requiring task-specific architectures and retraining. This may seem as a compelling opportunity for businesses, promising to drastically cut the manual effort involved in the data pipeline and deliver valuable insights instantly.
Here’s the catch: the magic of LLMs quickly fades in real-world settings: private enterprise data is very different from the public, text-heavy datasets LLMs are typically trained on. It’s messy, noisy, heterogeneous, and deeply tied to internal processes. We're talking millions of rows and columns, tied with complex, domain-specific relationships that constantly evolve, far from the mostly unstructured data that powers LLM training.
Recent research confirms the challenge: while LLMs perform well on public datasets, their performance can drop dramatically - sometimes by more than 85% - when faced with real enterprise data [1, 2, 3]. For example, an SAP study showed that popular LLMs like GPT-4, Claude 3.5, and Llama 3.1 performed well on public benchmarks but struggled on actual customer tables, with performance plummeting almost to zero when internal knowledge was incorporated (see Figure 1).
While LLMs have shown impressive results on many tasks, they face significant limitations when applied to real-world enterprise data. Here are some of the reasons why they often fall short in these complex business scenarios:
To bridge this gap, we need models specifically designed to address the unique challenges of enterprise tabular data. Tabular AI is an emerging and rapidly growing field in AI focused precisely on this: building models from the ground up to handle large, complex, and highly structured datasets. Unlike LLMs, which rely primarily on unstructured text sequences, Tabular AI can help reason over high-dimensional structured data and ultimately provide business insights that traditional methods might miss. It’s an exciting domain that’s growing fast, and one that can truly make a difference for companies ready to leverage their data more effectively.
🚀 At Neuralk-AI, we’re excited about the progress we’ve made in driving the Tabular AI revolution and can’t wait to share what we’ve been working on. Stay tuned for more updates coming soon!
[1] Jan-Micha Bodensohn, Ulf Brackmann, Liane Vogel, Anupam Sanghi, and Carsten Binnig (2025). Unveiling Challenges for LLMs in Enterprise Data Engineering. arXiv preprint arXiv:2504.10950.
[2] Moe Kayali, Fabian Wenz, Nesime Tatbul, and Çağatay Demiralp. 2025. Mind the Data Gap: Bridging LLMs to Enterprise Data Integration. In 15th Annual Conference on Innovative Data Systems Research.
[3] Jan-Micha Bodensohn, Ulf Brackmann, Liane Vogel, Matthias Urban, Anupam Sanghi, and Carsten Binnig. 2024. LLMs for Data Engineering on Enterprise Data. In Proceedings of Workshops at the 50th International Conference on Very Large Data Bases, VLDB 2024, Guangzhou, China, August 26-30, 2024.