Remoto LinkedIn

Site Reliability Engineer Lead (Observabilidade)

Jobgether • São Paulo • 25 candidaturas Hoje

Salário estimado

R$ 15k - 23k/mês

Sênior CLT

50%

Score de curadoria

Indicador interno 0 a 100: transparência salarial, stack, descrição útil e sinais de qualidade do anúncio. Não é match com o seu CV.

Stack

Kubernetes Python Docker Java AWS Go IA

Descrição da vaga

Texto agregado para leitura rápida. Confira sempre a fonte original ao enviar a candidatura.

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer Lead (Observabilidade) in Brazil.

This role is focused on building and leading a high-impact SRE function responsible for platform reliability, observability, and incident management at scale. You will combine technical depth with strong people leadership, guiding a team responsible for defining and evolving observability standards, SLOs, and reliability practices. The environment is fast-paced, cloud-native, and highly data-driven, requiring close collaboration with engineering, platform, and security teams. You will play a key role in ensuring full visibility across systems, reducing operational toil, and improving system resilience. This position involves shaping both strategy and execution, establishing best practices that enable proactive monitoring and faster incident resolution. It is a leadership role for someone passionate about reliability engineering, automation, and building high-performing technical teams.

Accountabilities

Lead, mentor, and develop a high-performing SRE team, fostering collaboration, technical excellence, and continuous learning.
Define the SRE strategy, roadmap, and priorities aligned with cloud and business objectives.
Establish and evolve observability standards, including metrics, logs, and traces across systems and applications.
Drive adoption and governance of SLIs, SLOs, and error budgets for critical services.
Oversee the evolution of observability platforms using tools such as Prometheus, Grafana, OpenTelemetry, Loki, and Tempo.
Design and implement actionable alerting strategies to reduce noise and improve incident response efficiency.
Lead incident management processes, including escalation, war rooms, communication, and post-mortem reviews.
Ensure blameless post-incident analysis and drive systemic improvements based on recurring issues and data insights.
Promote automation initiatives to reduce operational toil and improve engineering efficiency.
Collaborate with Cloud Engineering, Platform Engineering, and Security teams to align reliability initiatives.
Manage team capacity, priorities, and trade-offs while ensuring high-quality delivery.
Report reliability metrics, risks, and team progress to senior leadership.

Requirements

Proven experience leading technical teams such as SRE, DevOps, or Cloud Engineering.
Strong hands-on experience with SRE principles including SLIs, SLOs, error budgets, and toil reduction.
Experience with observability and APM tools such as Datadog, New Relic, or Dynatrace.
Solid knowledge of telemetry systems (metrics, logs, traces) using Prometheus and OpenTelemetry (Grafana ecosystem).
Experience with Infrastructure as Code tools such as Terraform or AWS CDK.
Strong scripting and programming skills in Python, Bash, and at least one language such as Go or Java.
Experience with logging and tracing solutions at scale such as Loki, Tempo, Jaeger, or ELK Stack.
Strong cloud experience, preferably in AWS environments.
Experience with containers and orchestration technologies such as Docker, Kubernetes, or ECS.
Solid understanding of incident management and post-mortem processes.
Strong Linux systems knowledge and troubleshooting skills.
English proficiency for technical reading and writing.
(Differential) Experience with FinOps, chaos engineering, AIOps, or large-scale distributed systems.

Benefits

Competitive CLT employment model with stable full-time structure.
Comprehensive health and dental insurance plans.
Life insurance coverage and wellness support programs.
Flexible working hours (8h/day, Monday to Friday).
Home office support, equipment provision, and mobility assistance for remote setup.
Meal and food allowance with flexible usage.
Childcare assistance and extended parental leave policies.
Learning and development support, including courses, books, and education subsidies.
Access to mental, physical, and financial well-being platforms and benefits.
Stock options and performance-based bonuses.
Birthday day off and additional lifestyle perks.

How Jobgether Works

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Vagas relacionadas

Seleção por stack em comum com esta oportunidade

Match50%

Desenvolvedor(a) Back-end Node.js Pleno

iK • São Paulo • 25 candidaturas Hoje

Salário estimado

R$ 8k - 12k/mês

Pleno CLT

DESCRIÇÃOEstamos em busca de um(a) profissional para atuar no desenvolvimento, manutenção e evolução de aplicações back-end, contribuindo para a sustentação e melhoria contínua dos sistemas e processos.Principais Atividades Desenvolver, manter e evoluir aplicações back-end utilizando Node.js; Criar ...

JavaScript PostgreSQL Docker Node REST +1

Ver Detalhes →

Match50%

Desenvolvedor(a) Back End

Avanade • São Paulo • 25 candidaturas Hoje

Salário estimado

R$ 8k - 12k/mês

Pleno CLT

Junte-se a nós na engenharia de software, automatizando sistemas empresariais com tecnologia de ponta e uma forte visão de negócios, moldando o futuro juntos!Bem-vindo ao universo do desenvolvimento Back-End! Aqui você construirá e automatizará funcionalidades empresariais, modelando soluções comple...

GitHub Redis Azure Java

Ver Detalhes →

LinkedIn Hot

Match50%

Analista DevOps Pleno

CRAF Tech • São Paulo • 25 candidaturas Hoje

Salário estimado

R$ 15k - 23k/mês

Sênior CLT

Buscamos um Analista DevOps experiente para fortalecer nossa equipe de infraestrutura, com ênfase em Azure Kubernetes Service (AKS). Você será responsável por automatizar pipelines de deploy, gerenciar clusters Kubernetes escaláveis e otimizar operações em nuvem Azure, garantindo alta disponibilidad...

Kubernetes Docker Azure

Ver Detalhes →