Data Automation Engineer

$800k – $1500k/yr Delhi, IN remote full time mid 17d ago

Skills

About this role

About Manifest Global Manifest Global is building the infrastructure for global human capital mobility — connecting students, schools, universities, and employers across 50+ countries. Our portfolio spans Cialfo (AI-powered college counseling, 2,000+ schools), BridgeU (university guidance for international schools globally), Kaaiser (trusted study abroad counseling across India and Southeast Asia), and Explore (AI-powered university outreach, 1,000+ university partners). Together, we move talent across borders at scale. $80M raised. Still early. What This Role Is When a student in India opens Cialfo to research universities in the UK, Canada, or Australia, they are looking at data that someone collected, validated, and kept current. University profiles. Course listings. Entry requirements. Application deadlines. Tuition fees. Scholarship information. Across 544 partner universities and thousands more, every piece of that data has to be accurate, up to date, and trustworthy enough to inform one of the most significant decisions a young person makes. Right now, a meaningful portion of that work is manual. The University Data Engineering team has the domain knowledge to know what correct university data looks like. What it doesn't have is the technical capability to automate the work that shouldn't be manual. That's what this role brings. As Data Automation Engineer, you'll own the automation function end to end: building the scrapers, AI-powered workflows, and data pipelines that replace manual data collection with reliable, production-grade automation. You'll report into Engineering and work directly alongside the University Data Engineering team as the sole owner of the technical stack. This is a building role. There is no existing automation function to inherit. You are creating it. What Makes This Role Different Most data engineering roles involve maintaining pipelines someone else designed or contributing to a function where the architecture decisions are made above you. This one doesn't work that way. You make the tooling decisions. N8N or Python. Firecrawl or Playwright. Claude API or rule-based extraction. You explain your reasoning and you own the outcome. The University Data Engineering team will QC your output honestly. When something is wrong, you'll hear it clearly. That's a feature, not a friction point: it's what makes the automations better. The work ships directly to a product used by hundreds of thousands of students making real university decisions. When you build a notification classifier that saves the team 20 hours a week, or a quality audit agent that catches data errors before they reach students, the impact is immediate and visible. What You'll Build Your first 90 days have a defined backlog, in rough order of priority: A notification classification pipeline handling 450 alerts per week, replacing six hours of daily manual signal triage across the team. A signal addressal workflow processing 150 signals per week, replacing six hours of daily Core data updates. An automated quality audit agent running nightly across all recent data updates, replacing six to seven hours of daily manual accuracy checks. A rankings and key stats ingestion system covering 4,441 universities, replacing the full manual collection cycle for QS, THE, and US News rankings. Entry requirements automation for 150+ universities using dynamic, JavaScript-rendered pages that currently require manual extraction. Beyond the initial backlog, you own the full 25-task automation portfolio: maintaining what exists, rebuilding scrapers when source sites change structure, and designing new automations as the team's data commitments grow. What You Own The complete automation stack Build and maintain the full technical stack: N8N workflows, Python scripts, LLM-powered extraction pipelines, and web scrapers Own production reliability: when a scraper fails silently because a university website restructured, you diagnose and fix it without waiting to be told. You have monitoring in place so you know before the team does Make architecture and tooling decisions independently and explain your reasoning clearly AI output accuracy Own the accuracy and reliability of every LLM-powered pipeline you build: the notification classifier, the quality audit agent, and any Claude or OpenAI integration that does real production work Own the test sets, the iteration cycles, and the decision to go live. The University Data Engineering Leads trust your QC gates when you've earned that trust through your accuracy track record Upload pipeline independence Working within Engineering, own direct API write access to Core, Contentful, and Explore for approved data types, removing the manual ticket loop for routine data pushes Build and maintain API clients against real production systems with rate limits, authentication, pagination, and error handling What Success Looks Like The markers below reflect where the automation function is today. We'll calibrate the specifics once you're in the seat. These are directional, not fixed. By 90 days, the notification classification pipeline will be live and saving the team 20+ hours per week. You'll have diagnosed and fixed at least one automation that broke in production without asking Engineering for help. The University Data Engineering team will be coming to you with data collection problems and getting back working solutions. You'll have made at least one tooling decision that changed how the team operates and can explain clearly why. From there, the automation portfolio will be growing. Manual work that currently takes hours will be running reliably in the background. Coverage will be expanding. The team will be operating at a scale it currently can't reach because the manual ceiling is gone. Over time, the automation function you've built will be the reason Cialfo can offer students more accurate, more comprehensive university data than any platform in the market. That's the consequence of what this role builds. The specifics will be calibrated once you're in the role. The direction won't change. What You Bring What's non-negotiable: You write production-grade Python. Not scripts you contributed to. Scripts you wrote, owned, ran, and fixed when they broke in production. You've built and maintained REST API clients against real production systems, including rate limits, auth, pagination, and error handling. Not tutorial projects. You've built scrapers for dynamic pages, JavaScript-rendered content, and sites that actively block scrapers. You've rebuilt scrapers when source sites changed structure and you know what monitoring needs to be in place so you find out before the team does. You've owned an automation or data pipeline without a senior engineer making the architecture decisions above you. You've been the person others came to when something broke. You've shipped something in production where Claude or OpenAI was doing real work: classification, extraction, structured output. You've dealt with the accuracy and reliability problems that come with LLM-powered pipelines and you know how to build the test sets and iteration cycles that get them to production quality. Strongly preferred: Four to six years of relevant experience. Independent ownership in messy production environments takes time to develop. N8N or equivalent workflow automation: you can read, edit, and build workflows without needing a tutorial. Experience with unstructured, inconsistent source data: PDFs, scraped HTML, university websites with no consistency across 500 sources. Most importantly, you read the description of what this team is building and thought: I know how to build this and I'm ready to own it from day one. That's who this role is for. Why Manifest Manifest Global is building the infrastructure for global human capital mobility, connecting students, schools, universities, and employers across 50+ countries. Our portfolio spans Cialfo (AI-powered college counselling, 2,000+ schools), BridgeU (university guidance for international schools globally), Kaaiser (trusted study abroad counselling since 1997 across India and Southeast Asia), and Explore (AI-powered university outreach platform). Together, we move talent across borders at scale. $700B flows annually in remittances from migrant workers. 85M workers will be missing from developed economies by 2030. We're building the operating system that changes that. $80M raised. Still early. For this role specifically, the quality of Cialfo's university data is one of the most direct determinants of whether the platform genuinely serves students well. Accurate deadlines, correct fees, reliable entry requirements: these are the things a student in Delhi or Dubai is relying on when they make a decision about where to apply. The Data Automation Engineer who builds the systems that keep that data accurate and comprehensive is doing work that connects directly to outcomes that matter for real people. Cialfo is part of Manifest Global, a multi-brand group building the infrastructure for global human capital mobility, operating across 50+ countries with $80M raised from Tiger Global, SIG, and Square Peg. Offices: Delhi, India (India (Delhi));