Article available in Forbes - By Rocio Wu, Contributor (read here)
Thanks to Chance Mathisen for his contribution
In the current wave of generative AI innovation, industries that live in documents and text — legal, healthcare, customer support, sales, marketing — have been riding the crest. The technology transformed legal workflows overnight, and companies like Harvey and OpenEvidence scaled to roughly $100 million in ARR in just three years. Customer support followed closely behind, with AI-native players automating resolution, summarization, and agent workflows at unprecedented speed.
But industries built on structured data have not been as quick to adopt genAI. In financial services, insurance, and industrials, AI teams still stitch together thousands of task-specific machine learning models — each with its own data pipeline, feature engineering, monitoring, retraining schedule, and failure modes. These industries require a general-purpose primitive for structured data, an LLM-equivalent for rows and tables instead of sentences and paragraphs.
We believe that primitive is now emerging: tabular foundation models. And they represent a major opportunity for industries sitting on massive databases of structured, siloed, and confidential data.
How LLMs Devoured Unstructured Data (And Why They’re So Good At It)
LLMs use attention mechanisms to understand relationships between words, and simultaneously capture context, nuance, and meaning across sentences and entire documents. As these models scaled, an unprecedented supply of freely available text across the internet provided trillions of tokens that taught them how language works across domains, styles, and use cases. Models that could read, write, summarize, and reason over text suddenly became everyday business tools — drafting emails, answering tickets, and redlining contracts in seconds.
Entrepreneurs quickly recognized the pattern: plug into a foundation model’s API, wrap it in a vertical interface, solve a painful workflow, and sell seats to high-value knowledge workers. Thousands of AI-native startups followed, forming a virtuous cycle: application companies drove demand, foundation model providers reinvested in better capabilities, and improved models enabled even more powerful applications. Domain by domain, LLMs devoured unstructured data wherever it lived.
Where Current LLMs Hit A Wall: Understanding Structured Data
But LLMs were trained on text, not tables. When asked to work with structured data, they flatten spreadsheets into token sequences and strip away the meaning encoded in schemas, column relationships, data types, and numerical semantics.
The typical workaround is indirect. The model generates SQL or Python, hands it off to an external system for execution, and hopes the result is correct. This works for simple queries, but breaks down quickly. A single ambiguous column name — “revenue” versus “revenue_id” — can derail an entire analysis or forecast.
This problem compounds in large enterprises. Years of tech debt, acquisitions, and mergers leave behind dozens of siloed and brittle systems. Current LLMs and agents have greatly improved, but they still can’t confidently understand and manipulate an organization’s data which lives across different ERPs, CRMs, data warehouses, and spreadsheets. A single query can force an agent to join tables that were never designed to fit together, built by teams that no longer operate.
As a result, high-stakes sectors like financial services and healthcare remain anchored to their trusted (and sprawling) stacks of traditional ML models. Startups have built agents that write Excel formulas or execute Python notebooks via natural language, but when it comes to actuarial-level accuracy, large-scale forecasting, or multi-table reasoning that drives million-dollar decisions, the heavy lifting still falls to libraries like XGBoost and LightGBM.
LLMs can interact with structured data, but they are not the right engine to model it.
Unlocking The $600 Billion Opportunity With Tabular Foundation Models
Structured datasets require a foundation model built natively for structured data. It must understand schemas, column relationships, and numerical semantics from the ground up, rather than treating tables as flattened text.
The market opportunity here is staggering. The global data analytics market is projected to exceed $600 billion by 2030, but the industries most reliant on structured data — financial services, insurance, and healthcare — represent trillions in market cap that have yet to fully leverage generative AI.
Tabular foundation models may be the key required to unlock that TAM for startups. TFMs are trained to reason over rows and columns the way LLMs reason over sentences and pages. They deliver state-of-the-art predictions across classification, regression, and time-series tasks in seconds rather than hours.
Unlike traditional machine learning, TFMs can work with messy, heterogeneous data out of the box. They can deal with missing values, inconsistent formats, and ambiguous column names with no feature engineering, no model selection, and no hyperparameter tuning required.
A new generation of companies is building in this space, including Rowspace, Prior Labs, Fundamental, Intelligible AI, Kumo AI, Neuralk AI, Avra AI, Wood Wide AI, each exploring different architectural approaches to representing tabular and relational data, learning cross-column dependencies, and generalizing across tasks.
The operational implications of TFM are profound. Rather than maintaining a fragmented portfolio of brittle, task-specific models, enterprises can consolidate around a single foundation that generalizes across use cases. This would dramatically reduce the cost and complexity of building, monitoring, and retraining models.
But there are also real risks for startups building in this space. As LLMs get better at coding, some argue that generating analysis scripts on the fly could eliminate the need for specialized tabular models altogether. Open-source pressure may also compress technical differentiation, as happened with now-commoditized image models.
This makes distribution and business models critical. Technical advantage alone will not be durable. TFMs must be embedded into enterprise workflows, sold with clear ROI, and priced in ways that reflect the value of reliability and reduced operational overhead — before the shelf life of the technology advantage expires.
Catalyzing A New Set of Startups
For industries where AI adoption has lagged, TFMs offer a reset. Use cases that once required months of data science work — custom pipelines, bespoke features, continuous retraining — can now be tackled with a single, general-purpose model that delivers reliable results out of the box.
In healthcare, that means patient risk stratification and diagnostic prediction.
In financial services, credit decisioning and fraud detection.
In insurance, claims triage and pricing optimization.
In manufacturing, predictive maintenance and demand forecasting.
These problems have been addressed with traditional ML for years — but never with the speed, flexibility, or scalability that a foundation model enables.
For founders, this is a greenfield opportunity. Just as LLMs unlocked a wave of AI-native companies built on text, TFMs open the door to startups tackling structured-data problems that were previously too slow, too expensive, or too complex to solve at scale. As investors with a long history of investing in infrastructure and applications that power financial services, healthcare, and regulated industries, we believe tabular foundation models represent the next major opportunity to unlock AI adoption in these industries. If you’re working on tabular foundation models, building applications on top of them, or tackling structured-data problems in those industries, we’d love to hear from you.