Tabular Foundation Models are coming for your Spreadsheets (Part 1 of 5)
Foundation models—the technology behind ChatGPT and image generators—are now being built specifically for the rows and columns of your business data. This series explores how they work, their value for enterprises, limitations, and future developments in the field.
Here's a curious fact: while AI has been generating photorealistic images, writing poetry, and passing bar exams, the humble spreadsheet has remained stubbornly resistant to foundational models.
Think about it. ChatGPT can write a Shakespearean sonnet about supply chain optimization. DALL-E can paint your quarterly results as a Renaissance masterpiece. But ask a large language model to predict which of your customers will churn next month based on your CRM data? That's where things get... awkward.
For years, the undisputed champions of tabular data (that's the technical term for data organized in rows and columns—spreadsheets, databases, CSV files) have been gradient boosted trees. Names like XGBoost and LightGBM might not roll off the tongue like Claude and LeChat, but they've been quietly powering everything from credit scoring to fraud detection.
Until now.
What Exactly Is a Tabular Foundation Model?
A foundation model is a large AI model that's been pre-trained on massive amounts of data, designed to be applied across many different tasks without needing to be retrained from scratch each time.
For example, GPT-4 is a foundation model for text. It learned the structure of language from trillions of words, and now it can tackle everything from translation to coding to creative writing—all without needing task-specific retraining.
A Tabular Foundation Model (TFM) applies the same philosophy to structured data. Instead of learning from text or images, it learns from millions of tables with varying structures, column types, and relationships. The goal? Understand the deep patterns that exist across all tabular problems, from predicting house prices to classifying diseases to forecasting demand.
The key insight: even though a customer churn dataset looks nothing like a medical diagnosis dataset, the underlying statistical patterns and relationships share surprising similarities. By training on vast amounts of diverse tabular data, TFMs develop emergent in-context learning capabilities. This allows TFMs to recognize patterns and apply that knowledge to new tables they have never encountered, something traditional machine learning models like gradient boosted trees cannot do.
Why Should You Care?
If you work with data in any capacity—and in 2026, who doesn't?—this matters for three reasons:
Speed: Traditional machine learning requires extensive model training and hyperparameter tuning. This process can take days or weeks. With tabular foundation models, you can get competitive predictions in seconds. Literally, seconds.
Messy or small Data: Here's the plot twist—these models often shine brightest when you have messy data. Traditional methods struggle with small datasets (under 10,000 rows) because they don't have enough examples to learn patterns. But a foundation model has already learned general patterns from millions of tables. It arrives at your dataset with prior knowledge, like a new employee who's already worked in the industry for years. Similarly, it can handle missing values with more ease, having seen a multitude of different tables during it’s training.
Simplicity: No more spending hours tuning 47 different hyperparameters. Tabular foundation models work out of the box. Feed them your data, and they make predictions. That's it. (Well, mostly—we'll get into the nuances in later articles.)
How It Actually Works
Let's peek under the hood, just a bit.
Imagine you want to predict whether a bank customer will default on their loan. The traditional approach:
Collect historical data (customers who did/didn't default)
Clean the data (handle missing values, encode categories)
Engineer features (create new columns that might be predictive)
Train a model (probably XGBoost or LightGBM)
Tune hyperparameters (cross-validate, grid search, hopes and prayers)
Evaluate and iterate (repeat steps 2-5 several times)
With a tabular foundation model like NICL, it works differently:
Collect historical data
Engineer features (create new columns that might be predictive)
Feed it to the model (yes, with missing values and all)
Get predictions
The magic is in what's called in-context learning. Rather than adjusting its internal weights to fit your specific dataset (traditional training), the model uses your training data as "context"—much like how you might show ChatGPT a few examples before asking it to continue a pattern. The model processes your entire training set in one forward pass and immediately makes predictions on new data.
The term for this architecture is a Prior-data Fitted Network (PFN). It's a transformer (yes, the same architecture behind ChatGPT) that has been trained to approximate Bayesian inference—essentially, to make optimal predictions by considering all possible explanations for the patterns in your data, weighted by how plausible each explanation is.
Differences in approaches
Several tabular foundation models have emerged in recent years, each of them with slightly different approaches to the core problem: how do you train a neural network to be good at all tabular problems, not just the one in front of you? Below are some of the main players.
NICL: Developed by the team at Neuralk-AI, with a clear focus on real-world enterprise needs rather than benchmark optimization, NICL is a proprietary architecture designed from the ground up for production-scale deployment. It handles all table sizes (millions of rows and thousands of columns) while maintaining fast inference times - a critical requirement for indsutry applications where data volumes far exceed academic benchmarks.
TabPFN (and its successors TabPFN-2, TabPFN-2.5): Developed by researchers who later founded Prior Labs, TabPFN has evolved from handling 1,000 training samples to nearly 100,000. It's the model that proved this whole concept could work.
TabDPT: Pre-trained on real-world tables (rather than synthetic ones), this model uses a technique called column-masking to learn patterns from actual datasets on OpenML (a leading repository of resources and datasets for Data science).
TabICL: Designed by researchers at INRIA, this model can handle tables with up to 500,000 rows, addressing one of the key limitations of earlier approaches.
What's Coming in This Series
This is Part 1 of a five-part series describing Tabular Foundation Models for non-technical audiences. This article set the stage for TFMs; Part 2 will go back on the history of traditional machine learning, it's great value and limitations. Part 3 will dive into the techniques and requirements for training Tabular Foundation Models, while Part 4 will take a closer look at what TFMs bring to the table. Finally, in Part 5 we'll share some of the exciting unknowns about TFMs, and what comes next for this space. Spoiler: the intersection with large language models is getting very interesting.
Key Takeaways
→ Tabular Foundation Models are pre-trained AI systems designed specifically for structured data (spreadsheets, databases, CSV files)
→ They learn patterns from millions of diverse tables and can apply that knowledge to new datasets without retraining
→ Key advantages: speed (seconds vs. hours), performance on small or messy data, and simplicity (no hyperparameter tuning)
→ The leading models can now handle very large datasets, making them relevant for Enterprise use cases.
Next up: Part 2 explores traditional machine learning—how random forests, gradient boosting, and their friends actually work, and why they've ruled the tabular world for decades.
👉 Reach out with any questions or article suggestions !
Glossary of Terms
Tabular data: Data organized in rows and columns—think spreadsheets, database tables, CSV files
Foundation model: A large AI model pre-trained on massive data, designed to be applied across many tasks
In-context learning: Making predictions by using training examples as context, without adjusting model weights
Hyperparameter: Settings that control how a machine learning model trains (e.g., learning rate, tree depth)
Prior-data Fitted Network (PFN): A neural network trained to approximate Bayesian inference on new datasets
Bayesian inference: A statistical approach that updates beliefs based on evidence, considering all possible explanations
Feature engineering: Creating new input columns from existing data to help a model make better predictions
Prior: The prior is the implicit knowledge embedded in the model from pre-training, including: Statistical patterns - How numerical features typically relate to outcomes; Column relationships - Common dependencies between features; Data distributions - What "normal" tabular data looks like; Feature importance patterns - Which types of columns tend to be predictive