Logo EPAM Systems, Inc.

Middle DevOps Engineer

EPAM Systems, Inc.via Glassdoor
RemotoBrPlenoCLTHoje

Salário Estimado

R$ 7.425,00 - R$ 11.138,00

0de 100

Excelente

Score da Vaga

Descrição da Vaga

We are expanding our delivery team with a Middle DevOps Engineer focused on reliable Kubernetes and Linux platforms for AI and research workloads.


You will help automate and optimize GPU-enabled orchestration with Kubernetes and Volcano, supporting scheduling, quotas, and scripting in Python and Shell in a client-facing environment.


Apply to help build efficient, scalable compute environments Responsibilities • Deploy and operate GPU-enabled Kubernetes clusters and standalone Linux compute environments to keep scheduling and performance efficient

Implement and support Volcano job scheduling, including queue setup, POD execution, GPU allocation, and namespace quota enforcement • Administer Kubernetes environments end-to-end, covering namespaces, RBAC, resource quotas, and workload isolation approaches
Build and maintain Python and Shell automation to simplify job submission, resource provisioning, and system reporting • Collaborate with orchestration, optimization, and observability teams to raise scheduling efficiency, capacity utilization, and researcher workflows
Monitor platform health and resource usage, sharing data and feedback to meet optimization and reporting needs • Recommend improvements to infrastructure, tooling, and automation workflows to boost performance, scalability, and usability
Ensure operations provide a smooth and effective experience for researchers running diverse AI and computational workloads Requirements • Hands-on experience with 2+ years in DevOps or infrastructure engineering roles supporting complex, large-scale environments
Expert-level knowledge of Kubernetes administration and orchestration, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management • Practical experience with Volcano scheduler for GPU job execution, queue configuration, workload prioritization, and Kubernetes integration
Proven background managing GPU cluster environments in Kubernetes and on standalone Linux compute nodes • Advanced scripting skills in Python for infrastructure automation plus proficiency with UNIX Shell scripting (e.g., Bash)
Strong Linux system administration capability, including troubleshooting, performance tuning, and configuration management • Solid understanding of infrastructure automation and orchestration concepts and related tooling
Fluent English communication skills (spoken and written) for direct client interaction Nice to have • Helm for Kubernetes application package management
Monitoring and observability tooling, especially Prometheus, Grafana, and Loki • Infrastructure as Code tools such as Terraform
Multi-cloud Kubernetes exposure, including Amazon EKS and Google GKE • Azure Networking knowledge, including VPN, ExpressRoute, and network security
Familiarity with AI-assisted coding tools (e.g., GitHub Copilot, ChatGPT, Claude) • Experience with hybrid (cloud + on-premises) scheduling and resource optimization

Vagas Semelhantes

RemotoRemoto22 dias atrás

R$ 9k - 14k/mês

PlenoCLT

Job Description: • Design and build robust backend services and microservices that power the DevX platform ecosystem. • Integrate Large Language Models (LLMs) and custom AI models to enable features like semantic code search, automated refactoring, and natural language infrastructure provisioning. •...

RemotoUs7 dias atrás

R$ 12k - 19k/mês

SêniorCLT

This a Full Remote job, the offer is available from: EMEA Location Our Senior Software Engineer will be an integral part of our Business Systems Engineering team. This role is based remotely as a full-time employee in the UK, Ireland, Estonia, the Netherlands, Sweden and Israel. We are also open to ...

RemotoUs7 dias atrás

R$ 12k - 19k/mês

SêniorCLT

This a Full Remote job, the offer is available from: EMEA Location Our Senior Software Engineer will be an integral part of our Business Systems Engineering team. This role is based remotely as a full-time employee in the UK, Ireland, Estonia, the Netherlands, Sweden and Israel. We are also open to ...

RemotoBrHoje

R$ 7k - 10k/mês

PlenoCLT

100% RemoteUSA TimezoneContractor / PJ positionRole OverviewThe goal is to shape the reliability and scalability of mission-critical platforms on Azure, Kubernetes, and modern DevOps toolchains. You will solve complex infrastructure challenges, automate end-to-end operations, and ensure systems oper...

Interessado nesta vaga?

Candidatar-se

Você será redirecionado para o site original

Informações

NívelPleno
ContratoCLT
LocalBr
RemotoSim
MoedaBRL
PublicadaHoje
FonteGlassdoor

Análise de Vaga com IA

Estimativa salarial, match de tecnologias e análise de requisitos feitos com Inteligência Artificial

Powered by CodeCortex
← Voltar às Vagas