__________
Introducing TabBench, an open-source benchmark built by Neuralk-AI to evaluate tabular ML models on practical, real-world industry tasks, starting with commerce-related use cases.
We put our Tabular Foundation Model head-to-head against the recently released LLM-based classifier from Mistral AI, focusing on a real-world challenge every retailer faces: Product Categorization, a classification task that is very common in Commerce, consisting in assigning products the right category.
The result of the comparison? A score of 93% for our Tabular AI model vs. 39% for Mistral’s LLM.
The metric we used to evaluate performance is F1-score, a key classification metric that balances the proportion of correct predictions among all predicted instances (Precision) and the proportion of actual instances successfully identified (Recall).
While Mistral’s model struggled with imbalances and noisy signals resulting in an F1-score of 39%, Neuralk-AI’s Tabular AI model soared to 93%.
Our Tabular Foundation Model is designed for structured data (meaning data that is formatted in rows and columns, like retail product catalogs) not for polished text that LLMs are optimized for. It doesn’t just read tables but understands both the meaning behind the data and the complex interrelations across rows and columns, making its performance robust to noise, imbalance, and real-world messiness.
Accurate product categorization eliminates the need for manual data processing, often slashing time-to-market delays by days or weeks, and drives up conversion rates by powering more relevant search and recommendation systems.
Are you ready to transform your product categorization? Get your free expert evaluation today by filling out the form here.