DevOps/MLOps Engineer (ML / LLM Infrastructure)

Київстарvia LinkedIn

RemotoAll, Missouri, UsPlenoCLTHoje

Salário Estimado

R$ 7.425,00 - R$ 11.138,00

Tecnologias

Python GCP Docker Kubernetes Jenkins Git GitHub Mobile IA

0de 100

Excelente

Score da Vaga

Descrição da Vaga

We are looking for a DevOps Engineer to design, build, and operate the infrastructure behind our LLM platform.

You will be responsible for keeping our ML infrastructure reliable, scalable, and efficient - from data pipelines to training and inference.

In this role, you will develop and maintain CI/CD pipelines, orchestration workflows, and observability for distributed ML workloads across GPU/TPU/CPU environments.

This is a DevOps-first role with strong exposure to ML infrastructure.

You will work closely with ML Engineers and Data Engineers, while focusing on building a robust, automated, and production-grade platform that accelerates model development and delivery.

About Us Kyivstar.

Tech is a Ukrainian hybrid IT company and a resident of Diia.

City.

We are a subsidiary of Kyivstar, one of Ukraine’s largest telecom operators.

Our mission is to change lives in Ukraine and around the world by creating technological solutions and products that unleash the potential of businesses and meet users’ needs.

Over 600+ KS.

Tech specialists work daily in various areas: mobile and web solutions, as well as design, development, support, and technical maintenance of high-performance systems and services.

We believe in innovations that truly bring quality changes and constantly challenge conventional approaches and solutions.

Each of us is an adherent of entrepreneurial culture, which allows us never to stop, to evolve, and to create something new.

Responsibilities • Design, build, and operate scalable ML infrastructure on GCP (GKE), supporting both experimentation and production workloads for LLMs and NLP systems.

•Manage Kubernetes-based environments (GKE): deployment, scaling, upgrades, and reliability of training and inference workloads across GPU/TPU/CPU pools. • Build and maintain CI/CD pipelines (GitHub Actions, Jenkins) to automate testing, training, and deployment of ML services and infrastructure.

•Implement infrastructure as code (Terraform, Ansible) to provision and manage cloud resources in a reproducible, secure, and cost-efficient way. • Ensure observability of ML systems: monitoring, logging, and alerting for infrastructure, pipelines, and production inference workloads.

•Collaborate with ML engineers and Data Engineers to design and support reliable training and inference pipelines. • Optimize resource utilization and cost, improving efficiency of training and serving infrastructure.

•Troubleshoot and resolve issues across the ML platform - from data pipelines to distributed training and production deployments. • Contribute to engineering best practices: code reviews, automation, and continuous improvement of platform reliability and developer experience.

Required Qualifications • Experience: 4+ years in DevOps, Platform Engineering, or ML Infrastructure roles, with strong understanding of production systems and distributed workloads.

•Cloud & Infrastructure: Hands-on experience with GCP. other major cloud platforms is a plus.

Strong understanding of cloud-native architectures and experience designing scalable systems for compute and data-intensive workloads. • Kubernetes & Containers: Solid experience with Docker and Kubernetes (preferably GKE), including deploying, scaling, and operating production workloads.

Familiarity with Helm and Kubernetes networking fundamentals. • CI/CD & Automation: Experience building and maintaining CI/CD pipelines (GitHub Actions, Jenkins, or similar) to automate testing, deployment, and infrastructure changes.

•Workflow Orchestration: Experience with Airflow (or similar tools). • Infrastructure as Code: Strong experience with Terraform (preferred) or similar tools for provisioning and managing infrastructure in a reproducible way.

•Programming: Strong hands-on scripting languages experience (Bash and/ or Python). • Observability & Reliability: Experience with monitoring and logging systems (e.g., Prometheus, Grafana).

Understanding of reliability, alerting, and debugging in distributed systems. • ML Infrastructure Understanding: Familiarity with the ML lifecycle (training, evaluation, inference) and experience supporting ML workloads in production environments.

•Collaboration: Ability to work closely with ML Engineers and Data Engineers, translating ML requirements into reliable and scalable infrastructure solutions.

What We Offer • Office or remote — it’s up to you.

•Remote onboarding • Performance bonuses

•We train employees with the opportunity to learn through the company’s library, internal resources, and programs from partners • Health and life insurance

•Wellbeing program and corporate psychologist • Reimbursement of expenses for Kyivstar mobile communication We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses.

These tools assist our recruitment team but do not replace human judgment.

Final hiring decisions are ultimately made by humans.

If you would like more information about how your data is processed, please contact us.

Requisitos

Experience: 4+ years in DevOps, Platform Engineering, or ML Infrastructure roles, with strong understanding of production systems and distributed workloads
Strong understanding of cloud-native architectures and experience designing scalable systems for compute and data-intensive workloads
Kubernetes & Containers: Solid experience with Docker and Kubernetes (preferably GKE), including deploying, scaling, and operating production workloads
Familiarity with Helm and Kubernetes networking fundamentals
CI/CD & Automation: Experience building and maintaining CI/CD pipelines (GitHub Actions, Jenkins, or similar) to automate testing, deployment, and infrastructure changes
Workflow Orchestration: Experience with Airflow (or similar tools)
Programming: Strong hands-on scripting languages experience (Bash and/ or Python)
Observability & Reliability: Experience with monitoring and logging systems (e.g., Prometheus, Grafana)
Understanding of reliability, alerting, and debugging in distributed systems

Responsabilidades

You will be responsible for keeping our ML infrastructure reliable, scalable, and efficient - from data pipelines to training and inference
In this role, you will develop and maintain CI/CD pipelines, orchestration workflows, and observability for distributed ML workloads across GPU/TPU/CPU environments
This is a DevOps-first role with strong exposure to ML infrastructure
You will work closely with ML Engineers and Data Engineers, while focusing on building a robust, automated, and production-grade platform that accelerates model development and delivery
Design, build, and operate scalable ML infrastructure on GCP (GKE), supporting both experimentation and production workloads for LLMs and NLP systems
Manage Kubernetes-based environments (GKE): deployment, scaling, upgrades, and reliability of training and inference workloads across GPU/TPU/CPU pools
Build and maintain CI/CD pipelines (GitHub Actions, Jenkins) to automate testing, training, and deployment of ML services and infrastructure
Implement infrastructure as code (Terraform, Ansible) to provision and manage cloud resources in a reproducible, secure, and cost-efficient way
Ensure observability of ML systems: monitoring, logging, and alerting for infrastructure, pipelines, and production inference workloads
Collaborate with ML engineers and Data Engineers to design and support reliable training and inference pipelines
Optimize resource utilization and cost, improving efficiency of training and serving infrastructure
Troubleshoot and resolve issues across the ML platform - from data pipelines to distributed training and production deployments
Contribute to engineering best practices: code reviews, automation, and continuous improvement of platform reliability and developer experience
ML Infrastructure Understanding: Familiarity with the ML lifecycle (training, evaluation, inference) and experience supporting ML workloads in production environments
Collaboration: Ability to work closely with ML Engineers and Data Engineers, translating ML requirements into reliable and scalable infrastructure solutions

Benefícios

Office or remote — it’s up to you

Remote onboarding

Performance bonuses

We train employees with the opportunity to learn through the company’s library, internal resources, and programs from partners

Health and life insurance

Wellbeing program and corporate psychologist

Reimbursement of expenses for Kyivstar mobile communication

Vagas Semelhantes

AI Development Engineer – DevX Platform, 13+ years exp.

TMSRemote Rocketship

RemotoRemoto25 dias atrás

R$ 9k - 14k/mês

PlenoCLT

Job Description: • Design and build robust backend services and microservices that power the DevX platform ecosystem. • Integrate Large Language Models (LLMs) and custom AI models to enable features like semantic code search, automated refactoring, and natural language infrastructure provisioning. •...

JavaScript TypeScript React Vue Node+14

Ver Detalhes

Lead Backend Developer (Golang, Java or Python)

BrillioLinkedIn

RemotoNew York, New York, Us6 dias atrás

R$ 8k - 9k/mês

SêniorCLT

Lead Backend Developer (Golang, Java or Python) About Brillio: Brillio is one of the fastest growing digital technology service providers and a partner of choice for many Fortune 1000 companies seeking to turn disruption into a competitive advantage through innovative digital adoption. Brillio, reno...

Python Java Go Rust MySQL+14

Ver Detalhes

Software Engineer- Sr. Consultant (GenAI/Cloud)

VisaTeal

RemotoBellevue, Washington, Us11 dias atrás

R$ 12k - 19k/mês

SêniorCLT

the position Visa’s Technology Organization is a community of problem solvers and innovators reshaping the future of commerce. We operate the world’s most sophisticated processing networks capable of handling more than 65k secure transactions a second across 80M merchants, 15k Financial Institutions...

React Python Java Go MySQL+12

MedicalDentalVision

Ver Detalhes

AI & LLM Developer — Senior

Open InsuranceIndeed

RemotoRemoto27 dias atrás

R$ 16k - 25k/mês

SêniorCLT

Location: Remote or Hybrid (if US Located) Employment Type: Contract — Full-Time Department: Engineering / Product Development Experience Level: Senior (5–8+ years) Reports To: Director of Engineering Role Overview We are seeking a highly skilled Senior AI & LLM Developer with deep, hands-on experie...

JavaScript TypeScript Python Java Go+15

Competitive contract compensation commensurate with experiencePay: From $4,000.00 per month

Ver Detalhes

Interessado nesta vaga?

Candidatar-se

Você será redirecionado para o site original

Informações

NívelPleno

ContratoCLT

LocalAll, Missouri, Us

RemotoSim

MoedaBRL

PublicadaHoje

FonteLinkedIn

Análise de Vaga com IA

Estimativa salarial, match de tecnologias e análise de requisitos feitos com Inteligência Artificial

← Voltar às Vagas