P

ML Ops Engineer (EMEA Remote)

Pragmatikevia LinkedIn
RemotoMontenegro, Rio Grande Do Sul, BrazilPlenoCLT25 dias atrásFull-timeEngineering and Information TechnologyIT Services and IT Consulting25 candidaturas

Salário Estimado

R$ 7.020,00 - R$ 10.530,00

0de 100

Regular

Score da Vaga

Descrição da Vaga

Location: Fully remote (EMEA timezone)

Start date: ASAP

Languages: Fluent English required

Industry: Cloud Computing / AI / European Deep-Tech SaaS

About The Role

Pragmatike is recruiting on behalf of a fast-scaling, well-funded distributed cloud infrastructure startup building next-generation AI-native cloud services. The company is redefining how compute is delivered by providing GPU-powered infrastructure for AI/ML workloads, secure storage, and high-speed data transfer through a decentralized architecture that significantly reduces environmental impact compared to traditional cloud providers.

We are seeking a ML Ops Engineer with strong experience in production-grade model serving and infrastructure for AI systems. This is a highly technical, hands-on role focused on building scalable, reliable, and efficient ML inference platforms powering real-time AI applications.

You will be responsible for designing and operating the core infrastructure that serves machine learning models at scale. You will work closely with infrastructure, platform, and applied AI teams to ensure high availability, low latency, and cost-efficient inference systems. Strong ownership, production mindset, and experience with distributed GPU systems are essential.

Your Responsibilities

  • Build and operate production-grade model serving infrastructure using frameworks such as vLLM, TGI, Triton, or equivalent
  • Design and implement robust deployment pipelines with blue/green and canary rollout strategies for ML models
  • Develop and maintain auto-scaling systems, multi-model serving architectures, and intelligent request routing layers
  • Optimize GPU utilization, memory efficiency, network throughput, and model artifact storage performance
  • Design observability systems for tracking inference latency, throughput, GPU usage, cost metrics, and system health
  • Manage model registries and CI/CD pipelines enabling automated and reproducible model deployments
  • Own the full lifecycle of ML systems from development through production, including operational support and on-call responsibilities
  • Define engineering best practices and contribute to platform scalability in a fast-moving startup environment

Required Qualifications

  • 4+ years of experience in ML Ops, Platform Engineering, SRE, or similar infrastructure roles focused on ML systems
  • Hands-on experience with model serving frameworks such as vLLM, TGI, Triton, or equivalent
  • Strong background in container orchestration and operating GPU-based workloads in production
  • Experience with MLOps tooling including model registries, experiment tracking, and automated deployment pipelines
  • Proficiency in Python and infrastructure-as-code tools (e.g., Terraform, Helm, or similar)
  • Strong understanding of distributed systems, performance tuning, and production reliability engineering
  • Ability to effectively use AI coding assistants to accelerate development and debugging workflows
  • Ownership mindset with the ability to operate independently in a remote-first environment

Preferred Qualifications

  • Experience with ML platforms such as Kubeflow, MLflow, or KubeAI
  • Knowledge of GPU scheduling, CUDA/ROCm optimization, or multi-tenant inference systems
  • Experience with cost optimization across different GPU types and inference workloads
  • Background in early-stage startups or greenfield infrastructure projects
  • Proven experience building production systems from scratch rather than maintaining legacy platforms

Why Join Us

  • Take ownership of critical infrastructure powering a rapidly scaling AI-native cloud platform
  • Build foundational ML inference systems from the ground up in a high-growth, well-funded startup
  • Work at the intersection of distributed systems, GPU computing, and sustainable cloud architecture
  • Gain deep expertise in next-generation AI infrastructure and large-scale model serving systems
  • Influence core engineering decisions and define best practices that will scale with the company.

Pragmatike is committed to a fair, transparent, and inclusive recruitment process. We do not discriminate based on age, disability, gender, gender identity or expression, marital or civil partner status, pregnancy or maternity, race, religion or belief, sex, or sexual orientation.

In accordance with GDPR, your personal data will be processed lawfully, fairly, and securely, and used solely for recruitment purposes, including sharing it with our client(s) for employment consideration.

Vagas Semelhantes

F

00070/2026 - Bolsista Pesquisador - Inteligência Artificial

FIESC - Federação das Indústrias de Santa CatarinaLinkedIn
RemotoFlorianópolis, Santa Catarina, Brazil6 dias atrás

R$ 8k - 12k/mês

PlenoCLT

DESCRIÇÃOCHAMADA PARA SELEÇÃO DE BOLSISTA - 00070/2026 - Bolsista Pesquisador - Inteligência ArtificialO Serviço Nacional de Aprendizagem Industrial – Departamento Regional de Santa Catarina torna pública a presente Chamada para Seleção de Bolsistas e convoca os interessados a se candidatarem, confo...

RemotoBrazil8 dias atrás

R$ 7k - 11k/mês

PlenoCLT

Descrição da vagaCom mais de 15 anos de trajetória, a MESHA Tecnologia se posiciona como parceira estratégica na transformação digital de empresas em todo o Brasil. Nossa essência é guiada por valores sólidos que colocam o ser humano no centro — diversidade, meritocracia e valorização de talentos — ...

D

DevOps Engineer

DexianLinkedIn
RemotoBrazil15 dias atrás

R$ 7k - 11k/mês

PlenoCLT

Quem SomosA DEXIAN é uma consultoria global de origem Americana focada em talentos para complementar a equipe de tecnologia da informação de nossos clientes. Estamos presente em 10 países, com 45 escritórios ao redor do mundo. Por mais de 25 anos, a DEXIAN® atende indústrias e clientes globais, em d...

RemotoRemoto18 dias atrás

R$ 8k - 11k/mês

PlenoCLT

• Python • Machine learning COMPANY Arcyn isafreshly funded pre-seed stage startup. We’re creating anAI-powered compliance checker for architectural design— anAutodesk Revit pluginthat automatically checks designs against regulations asthey’re being created. Architects get instant feedback onviolati...

Interessado nesta vaga?

Candidatar-se

Você será redirecionado para o site original

Informações

NívelPleno
ContratoCLT
LocalMontenegro, Rio Grande Do Sul, Brazil
RemotoSim
MoedaBRL
Publicada25 dias atrás
FonteLinkedIn

Análise de Vaga com IA

Estimativa salarial, match de tecnologias e análise de requisitos feitos com Inteligência Artificial

Quer se preparar melhor? Pratique entrevistas com IA no Recrutadoria ou melhore suas habilidades no BitMentor

← Voltar às Vagas