Skip to content
flint
Back to jobs
hyphenconnect

Multimodal AI Systems Architect (AI Engineering)

US senior Apr 24, 2026

About this role

We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.   Responsibilities: Integrate vision encoders and audio-native models into core agent reasoning loops. Optimize streaming latency for voice-to-voice AI interactions. Architect multimodal RAG systems capable of retrieving insights from videos and PDFs. Qualifications: Experience with Whisper, CLIP, and multimodal LLM integration. Knowledge of streaming architectures and WebRTC. Expertise in cross-modal alignment.   Offices: United States ( United States);
Sign in Apply