Back to jobsodysseyml
Member of Technical Staff, Data Platform Lead
Palo Alto, US on-site full time senior Mar 11, 2026
About this role
WHO WE ARE
Odyssey https://odyssey.ml/ is an AI lab pioneering general-purpose world models: causal, multimodal systems that learn to predict and interact with the world over long horizons, while generating real-time, interactive simulations from any starting point. This foundational technology promises to revolutionize robotics, science, healthcare, education, gaming, defense, and beyond.
WHAT WE'RE LOOKING FOR
We need a deeply experienced Data Platform Lead to take full ownership of our data practice. This is a crucial technical leadership position focused on architecture, strategy, and getting things done. You should be an expert with serious, hands-on data engineering chops, capable of defining the long-term architectural vision while still diving into the code. Success in this role requires a complete understanding of the data lifecycle: from partnering with Operations to source data, designing robust data recipes and ensuring the resulting data assets are optimized for our world models.
WHAT YOU’LL DO
- Define and implement the long-term technical architecture for our data platform, ensuring scalability, reliability, and support for high-volume, multimodal datasets.
- Take ownership of the end-to-end data lifecycle, from sourcing and acquisition to delivery for machine learning model training.
- Design and build robust data processing pipelines, including data recipes for cleaning, feature engineering, and normalization, specifically addressing the complexity of inputs required for world models.
- Develop and manage the data curation system, including flexible metadata schemas, evolving labels, and modular tagging pipelines, to allow researchers to dynamically categorize, resample, and select high-quality training data.
- Work closely with ML Research and Engineering teams to understand immediate and future data requirements, translating research needs into actionable data infrastructure and acquisition strategies.
- Lead the integration of sophisticated signals and quality filtering into the data flow, such as VLM analysis, pose estimation, and aesthetic scoring, to ensure training datasets meet high quality standards.
- Drive the strategy for data acquisition, evaluating the trade-offs between various methods, aligning with budget constraints and quality requirements.
WHO YOU ARE
- You live and breathe data, with a strong belief in data quality and diversity as a primary lever for optimizing model performance.
- 8+ years building data platforms, focused on data architecture and engineering.
- Experience supporting ML teams, specifically preparing and optimizing data for model training.
- Great at designing and building reliable, high-volume data pipelines (ETL/ELT).
- Expert in cloud data technologies like data warehousing and lakehouse architectures (e.g., Snowflake, Databricks, BigQuery, and AWS S3/Redshift).
- Proficient with modern data processing frameworks (e.g., Spark, Flink, Kafka) and various databases (NoSQL, graph, relational).
- Knows how to set up practical data governance, quality checks, and metadata management.
- A strong technical leader who can set a clear technical direction and mentor other engineers.
- Experienced with complex data types (images, video, text) and signal processing.
- Degree in Computer Science, Engineering, or a related field.