Member of Technical Staff - Compilers

$180k – $400k/yr San Francisco, US on-site full time senior Mar 10, 2026

Skills

ai accelerators c++gpus llvm mlir python triton tvm xla

About this role

About Us Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting fundamental limits in power, capacity, and cost with today’s homogeneous, vertically integrated infrastructure. Gimlet addresses this by decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates each component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems across multi-vendor and multi-generation hardware, including the latest emerging accelerators. These systems unlock step-function improvements in performance and cost efficiency at scale. On top of this foundation, Gimlet is building a production-grade neocloud for agentic workloads. Customers use Gimlet to deploy and manage their workloads through stable, production-ready APIs, without having to reason about hardware selection, placement, or low-level performance optimization. Gimlet works with foundation labs, hyperscalers, and AI native companies to power real production workloads built to scale to gigawatt-class AI datacenters. MISSION Gimlet Labs is seeking a Member of Technical Staff focused on compiler infrastructure for ML execution systems, spanning IR transformations, runtime systems, kernel orchestration, scheduling, and serving optimization. You will help build the execution stack that transforms modern AI workloads into efficient programs running across heterogeneous hardware. The work spans runtime systems, compiler infrastructure, scheduling, memory movement, kernel orchestration, and serving optimization for large-scale inference workloads. This is not a traditional language compiler or backend code generation role. We are looking for engineers who think deeply about execution behavior: IR transformations, runtime optimization, scheduling, memory locality, kernel composition, distributed execution, and heterogeneous serving infrastructure. https://gimletlabs.ai/blog/low-latency-spec-decode-corsair RESPONSIBILITIES - Design and implement compiler and runtime pipelines for large-scale AI inference workloads - Build and evolve IR transformations, lowering passes, and execution optimizations across graph, tensor, and kernel representations - Optimize execution for latency, throughput, memory efficiency, and heterogeneous hardware utilization - Develop scheduling, partitioning, and kernel orchestration strategies across accelerators and serving runtimes - Work on execution systems spanning compiler infrastructure, runtime behavior, memory movement, and kernel dispatch - Integrate new model architectures, execution patterns, and serving optimizations into the stack - Collaborate closely with systems, runtime, and kernel engineers to ensure correctness and performance across the full execution pipeline QUALIFICATIONS - Strong systems and performance engineering fundamentals - Experience building compiler systems, compiler-adjacent infrastructure, or execution/runtime systems - Experience implementing IR transformations, compiler passes, lowering logic, or code generation systems - Ability to reason about execution behavior, memory systems, scheduling, and hardware efficiency - Strong software engineering skills in C++ and/or Python PREFERRED QUALIFICATIONS - Experience with MLIR, LLVM, XLA, TVM, Triton, or similar compiler/runtime infrastructure - Experience optimizing ML inference or serving workloads - Familiarity with runtime systems, kernel dispatch, launch APIs, or memory allocators - Experience working with GPUs, AI accelerators, or heterogeneous hardware systems - Experience profiling and debugging performance-critical systems - Familiarity with scheduling, partitioning, or kernel-level optimizations