Skip to content
flint
Back to jobs
hyphenconnect

Synthetic Data Engineer (AI Data/Training)

United States, US mid Apr 24, 2026

About this role

We are seeking a talented and innovative Synthetic Data Engineer. In this role, you will design and implement domain-specific synthetic data generation pipelines, ensuring high-quality data management for training loops. Your expertise will drive the success of data processing and model training within the organization.   Responsibilities: Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting. Implement automated quality scoring and de-duplication systems. Manage data pipelines that feed directly into SFT and DPO training loops. Qualifications: Proven experience building large-scale data pipelines (Airflow, Spark, Ray). Deep knowledge of prompt engineering for data generation. Familiarity with dataset distillation and bias mitigation. Offices: United States ( United States);
Sign in Apply