J
Remoto LinkedIn

Site Reliability Engineer - AI Agents

Jobgether Brazil 25 candidaturas Ontem

Salário estimado

R$ 7k - 10k/mês

Pleno CLT
49%

Score de curadoria

Indicador interno 0 a 100: transparência salarial, stack, descrição útil e sinais de qualidade do anúncio. Não é match com o seu CV.

Descrição da vaga

Texto agregado para leitura rápida. Confira sempre a fonte original ao enviar a candidatura.

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Site Reliability Engineer - AI Agents based in Brazil.

This role sits at the intersection of platform engineering, site reliability, and applied AI, focusing on the systems that power production-grade AI agents at scale. You will help design, operate, and evolve the infrastructure that enables orchestration, execution, and serving of AI-driven workflows across internal tools and external-facing products. The environment is fast-moving and highly technical, requiring strong production discipline applied to emerging AI technologies. You will work closely with data, ML, and engineering teams to ensure reliability, observability, and scalability of agentic systems. Beyond operations, the role emphasizes building developer-facing platforms, APIs, and SDKs that make AI infrastructure accessible and reusable across teams. This is a high-impact opportunity to shape foundational systems for next-generation AI agent platforms in a globally distributed organization.

Accountabilities

You will be responsible for building and operating the infrastructure backbone that supports AI agent systems in production, ensuring reliability, scalability, and usability across engineering teams.

  • Design, build, and operate cloud-native infrastructure supporting AI agent workflows, including orchestration, execution, and model serving
  • Ensure high reliability, scalability, and observability of distributed agentic systems across internal and external products
  • Develop platform capabilities such as APIs, SDKs, and self-service tools to enable efficient consumption of AI infrastructure
  • Manage compute, deployment, and serving infrastructure for AI and ML workloads in production environments
  • Build and maintain CI/CD pipelines enabling safe, reliable, and rapid deployment of AI services and agent workflows
  • Implement Infrastructure as Code using tools such as Terraform to provision and manage AWS-based environments
  • Design and operate observability systems, including monitoring, alerting, and incident response tailored to AI/ML workloads
  • Define reliability patterns, failure handling mechanisms, and recovery strategies for LLM and agent-based systems
  • Collaborate with AI, Data Engineering, and Product teams to transition experimental prototypes into production-grade systems
  • Manage Kubernetes-based container orchestration environments to ensure efficient scaling and deployment of services
  • Implement security controls and access management best practices across infrastructure layers
  • Document system architecture, operational procedures, and best practices to support platform adoption and knowledge sharing

Requirements

The ideal candidate is a strong infrastructure or SRE engineer with platform engineering experience and exposure to ML or AI-driven systems in production.

  • 5+ years of experience in Site Reliability Engineering, Platform Engineering, Infrastructure Engineering, or similar roles
  • Hands-on experience supporting ML infrastructure, model serving, or MLOps pipelines in production environments
  • Experience building developer platforms, internal tooling, APIs, or SDKs used at scale by engineering teams
  • Strong understanding of platform engineering principles, including self-service infrastructure and developer experience design
  • Proficiency with Infrastructure as Code tools, particularly Terraform
  • Strong experience with Kubernetes and containerized environments (Docker)
  • Solid cloud infrastructure experience, preferably AWS
  • Strong scripting skills (bash/shell) and proficiency in at least one programming language (Python preferred)
  • Experience designing and operating observability, monitoring, and alerting systems
  • Experience with incident response, on-call rotations, and production reliability ownership
  • Strong collaboration skills across data, AI, and engineering organizations
  • High ownership mindset and ability to operate in fast-paced, high-stakes production environments
  • Familiarity with AI agent systems, LLM-based applications, or orchestration frameworks is a strong plus

Benefits

  • Competitive compensation package with performance-based incentives
  • Fully remote working model across eligible countries, including Brazil
  • Comprehensive healthcare coverage (medical, dental, and vision where applicable)
  • Retirement savings programs with employer contributions (where applicable)
  • Flexible PTO policy and paid company holidays
  • Mental health and wellness support programs
  • Learning and development budget for professional and technical growth
  • Opportunity to work on cutting-edge AI agent infrastructure at global scale
  • Distributed, high-ownership engineering culture with strong collaboration across teams
  • Exposure to advanced platform engineering and applied AI systems;

How Jobgether Works

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Vagas relacionadas

Seleção por stack em comum com esta oportunidade

S
LinkedIn
Match50%

Especialista SRE

Serasa Experian São Paulo 100 candidaturas Hoje

Salário estimado

R$ 23k - 38k/mês

Especialista CLT

Company DescriptionA Serasa Experian é a primeira e a maior Datatech do Brasil. Líder em soluções de inteligência para análise de riscos e oportunidades, com foco nas jornadas de crédito, autenticação e prevenção à fraude. Com tecnologia de ponta, inovação e os melhores talentos, transforma a incert...

Ver Detalhes
I
LinkedIn
Match35%

Engenheiro de Dados Pleno

iDdata São Paulo 25 candidaturas Hoje

Salário estimado

R$ 4k - 7k/mês

Júnior CLT

Buscamos uma pessoa para atuar como Engenheira(o) de Dados Pleno, com foco em desenvolvimento de pipelines e governança de dados no ecossistema Databricks. Neste cargo, você fará parte do time de Dados e Analytics da ID Data, colaborando diretamente em projetos para clientes de grande porte — contri...

Ver Detalhes
D
Remoto LinkedIn
Match65%

Data Scientist

DoorDash São Paulo 200 candidaturas Hoje

Salário estimado

R$ 9k - 14k/mês

Pleno CLT

About The TeamThe Analytics team is looking for experienced Data Scientists to guide measurement, strategy, and tactical decision-making across the company across a variety of teams and levels. Data Scientists at DoorDash work to uncover insights and turn them into relevant recommendations, driving ...

Ver Detalhes