Neuralk-AI is looking for a PhD researcher in synthetic data generation with strong expertise in generative models and tabular representation learning. This PhD position focuses on advancing the next generation of tabular foundation models using high-fidelity synthetic data for in-context learning.
You must have a Master’s degree and be eligible to enroll in a PhD program to apply.
You will collaborate with our Paris-based research team (4 members) and conduct your work under the supervision of a leading academic research group from a top UK university.
Neuralk is a deep-tech company building the next generation of Foundation Models for Data Science. Our mission is to build the predictive layer for businesses, transforming data science from a series of one-off initiatives, stitched together across silos, overly bespoke, and dependent on a handful of specialists, into a durable capability: a scalable predictive infrastructure that continuously learns from an organization’s data and powers decisions across the enterprise.
Our product is a Data Science agent, powered by our Foundation Models, that assists data scientists throughout their workflow, from problem framing to robust, production-ready models. We focus on the hardest and most common data problems in companies: structured datasets describing customers, operations, risks or financial activity.
As an early-stage, well-funded AI startup, Neuralk builds on state-of-the-art research to solve concrete business challenges. We value clarity over complexity, strong fundamentals over hype, and fast iteration grounded in rigorous engineering. Our ambition is to redefine how predictive AI is built and used in organizations, at scale.
Joining Neuralk means working hard in a fast-moving, research-driven environment, with a high level of ownership and the opportunity to shape a core product at the intersection of machine learning, engineering and real-world impact.
As a PhD Researcher in Synthetic Data Generation, you will:
• Develop advanced generative models for realistic and diverse tabular data.
• Work at the intersection of foundational ML research and real-world industrial AI applications.
• Contribute directly to the performance and generalization of our in-house Tabular Foundation Model forin-context learning.
• Model Design: Develop deep generative models (e.g., transformer-,diffusion-, or flow-based) for tabular synthetic data generation that captures complex real-world distributions.
• Evaluation: Define task-aware fidelity metrics to assess the usefulness of synthetic data for pre-training.
• Pretraining Support: Improve pretraining convergence of our Tabular ICLs by generating informative samples that guide learning dynamics.
• Curriculum Learning: Create generation pipelines with controllable task complexity to enable curriculum-based ICL training.
• Collaboration: Work closely with Neuralk engineers and external academic partners on experiment design, model evaluation, and deployment readiness.
• Publication & Conferences: Publish your findings in top-tier machine learning venues and participate actively in the international research community.
• Master’s degree in Computer Science, Machine Learning, or a closely related field.
• Experience with at least one family of generative models (GANs, Flows, Diffusion, VAEs) applied tostructured data.
• Solid knowledge of machinelearning, particularly model training, evaluation, and data representation.
• Good communication skills inEnglish.
• Capacity to workindependently while collaborating effectively with interdisciplinary teams.
• A mindset driven by research impact and real-world applications.
• Publication record in ML conferences or workshops (e.g., NeurIPS, ICLR, ICML).
• Experience with curriculum learning, causal modeling, or representation learning for structured data.
• Background in data-centric AI or meta-learning techniques.
• Familiarity with framework ssuch as SynthCity, SDV, or TabPFN.
Get in touch and we will get back to you shortly.