Texto agregado para leitura rápida. Confira sempre a fonte original ao enviar a candidatura.
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Tech Manager - SRE in Brazil.
This is a high-impact technical leadership role responsible for shaping and scaling Site Reliability Engineering practices within a fast-growing B2B SaaS environment. You will play a central role in evolving operational maturity, ensuring system reliability, and enabling the company’s next phase of global scalability. The role combines hands-on technical depth with people leadership, focusing on building a culture of reliability, observability, and data-driven decision-making. You will lead initiatives across incident management, SLO/SLI implementation, and error budget governance while transforming operations into a strategic driver of product excellence. Working closely with engineering, product, and customer-facing teams, you will help bridge technical operations with business outcomes. This is a remote-first position designed for a leader who thrives in complex, high-scale, and high-responsibility environments.
Accountabilities
- Lead the design and implementation of modern SRE practices, including SLOs, SLIs, error budgets, and blameless postmortems.
- Build and scale a reliability engineering culture that integrates engineering, operations, and customer feedback into a unified system.
- Oversee incident management and response processes, ensuring clear communication, rapid resolution, and continuous learning from failures.
- Develop and evolve observability frameworks, runbooks, and monitoring systems to support global-scale operations.
- Drive Customer Reliability Engineering practices by connecting customer support insights to product and engineering roadmaps.
- Establish data-driven operational governance using reliability, performance, and DORA metrics to guide engineering decisions.
- Lead, mentor, and develop technical teams, growing future tech leads and engineering leaders within the organization.
- Collaborate with senior stakeholders to align reliability strategy with product, business, and customer success goals.
- Champion the adoption of AI-driven operations, including copilots, agents, and automation in incident management and support workflows.
Requirements
- Proven experience leading technical teams in cloud-native environments such as DevOps, SRE, or Cloud Operations.
- Strong background in AWS architecture, including observability, security, scalability, and cost optimization.
- Hands-on experience with modern SRE principles, including SLO/SLI management, error budgets, and incident response frameworks.
- Experience leading major production incidents, including executive-level communication and coordination.
- Strong people management experience with a focus on developing high-performing engineering teams.
- Deep technical understanding of distributed systems, cloud infrastructure, and operational reliability.
- Experience with Infrastructure as Code (Terraform), CI/CD pipelines (GitHub), and observability tools such as New Relic.
- Strong communication skills with the ability to translate between technical and executive-level discussions.
- Experience operating in high-pressure, SLA-driven SaaS environments.
Differentials
- Experience in SaaS companies with embedded Customer Support + SRE models (e.g., Stripe-like environments).
- Exposure to AI-driven operations such as agents, copilots, or intelligent automation in incident workflows.
- AWS certifications (Solutions Architect, Security Specialty, etc.).
- Experience in multi-geography, high-scale SaaS environments with strict SLAs.
- Track record of transforming existing teams and processes without disruptive restructuring.
Benefits
- Remote-first work model with flexibility.
- Meal and food allowance via flexible card system.
- Comprehensive health and dental insurance coverage.
- Home office support allowance.
- Wellness and mental health support programs.
- Education and professional development support.
- Life insurance coverage.
- Extended parental leave policies (including maternity and paternity benefits).
- Total Pass fitness and wellness access.
- Birthday day off.
- Strong influence on strategic decisions in a high-growth environment.
How Jobgether Works
We use an
AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.