Logo LeoTech

[Remote] AI/LLM Evaluation & Alignment Software Engineer

LeoTechvia Jobright
RemotoUsPlenoCLT22 dias atrás

Salário Estimado

R$ 11.250,00 - R$ 13.333,00

0de 100

Excelente

Score da Vaga

Descrição da Vaga

Note: The job is a remote job and is open to candidates in USA.


LeoTech is passionate about building software that solves real-world problems in the Public Safety sector.


The AI/LLM Evaluation & Alignment Software Engineer will ensure that Large Language Model (LLM) and Agentic AI solutions are accurate and aligned with public safety workflows by designing evaluation frameworks and implementing bias-mitigation strategies.


Responsibilities • Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases

Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows • Partner with AI engineers and data scientists to define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability)
Implement continuous evaluation pipelines for AI models, integrated into CI/CD and production monitoring systems • Collaborate with stakeholders to stress test models against edge cases, adversarial prompts, and sensitive data scenarios
Research and integrate third-party evaluation frameworks and solutions; adapt them to our regulated, high-stakes environment • Work with product and customer-facing teams to ensure explainability, transparency, and auditability of AI outputs
Provide technical leadership in responsible AI practices, influencing standards across the organization • Contribute to DevOps/MLOps workflows for deployment, monitoring, and scaling of AI evaluation and guardrail systems (experience with Kubernetes is a plus)
Document best practices and findings, and share knowledge across teams to foster a culture of responsible AI innovation Skills • Bachelor's or Master's in Computer Science, Artificial Intelligence, Data Science, or related field
3–5+ years of hands-on experience in ML/AI engineering, with at least 2 years working directly on LLM evaluation, QA, or safety • Strong familiarity with evaluation techniques for generative AI: human-in-the-loop evaluation, automated metrics, adversarial testing, red-teaming
Experience with bias detection, fairness approaches, and responsible AI design • Knowledge of LLM observability, monitoring, and guardrail frameworks e.g Langfuse, Langsmith
Proficiency with Python and modern AI/ML/LLM/Agentic AI libraries (LangGraph, Strands Agents, Pydantic AI, LangChain, HuggingFace, PyTorch, LlamaIndex) • Experience integrating evaluations into DevOps/MLOps pipelines, preferably with Kubernetes, Terraform, ArgoCD, or GitHub Actions
Understanding of cloud AI platforms (AWS, Azure) and deployment best practices • Strong problem-solving skills, with the ability to design practical evaluation systems for real-world, high-stakes scenarios
Excellent communication skills to translate technical risks and evaluation results into insights for both technical and non-technical stakeholders Benefits • 3 weeks of paid vacation – out the gate!!
Generous medical, dental, and vision plans. • Sick, and paid holidays are offered.

Company Overview • LeoTech is leading the effort to assist public safety efforts around the nation.


It was founded in 2018, and is headquartered in Los Angeles, California, USA, with a workforce of 51-200 employees.


Its website is https://leotechnologies.com.

Requisitos

  • Bachelor's or Master's in Computer Science, Artificial Intelligence, Data Science, or related field
  • 3–5+ years of hands-on experience in ML/AI engineering, with at least 2 years working directly on LLM evaluation, QA, or safety
  • Strong familiarity with evaluation techniques for generative AI: human-in-the-loop evaluation, automated metrics, adversarial testing, red-teaming
  • Experience with bias detection, fairness approaches, and responsible AI design
  • Knowledge of LLM observability, monitoring, and guardrail frameworks e.g Langfuse, Langsmith
  • Proficiency with Python and modern AI/ML/LLM/Agentic AI libraries (LangGraph, Strands Agents, Pydantic AI, LangChain, HuggingFace, PyTorch, LlamaIndex)
  • Experience integrating evaluations into DevOps/MLOps pipelines, preferably with Kubernetes, Terraform, ArgoCD, or GitHub Actions
  • Understanding of cloud AI platforms (AWS, Azure) and deployment best practices
  • Strong problem-solving skills, with the ability to design practical evaluation systems for real-world, high-stakes scenarios
  • Excellent communication skills to translate technical risks and evaluation results into insights for both technical and non-technical stakeholders

Responsabilidades

  • Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases
  • Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows
  • Partner with AI engineers and data scientists to define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability)
  • Implement continuous evaluation pipelines for AI models, integrated into CI/CD and production monitoring systems
  • Collaborate with stakeholders to stress test models against edge cases, adversarial prompts, and sensitive data scenarios
  • Research and integrate third-party evaluation frameworks and solutions; adapt them to our regulated, high-stakes environment
  • Work with product and customer-facing teams to ensure explainability, transparency, and auditability of AI outputs
  • Provide technical leadership in responsible AI practices, influencing standards across the organization
  • Contribute to DevOps/MLOps workflows for deployment, monitoring, and scaling of AI evaluation and guardrail systems (experience with Kubernetes is a plus)
  • Document best practices and findings, and share knowledge across teams to foster a culture of responsible AI innovation

Benefícios

3 weeks of paid vacation – out the gate!!
Generous medical, dental, and vision plans
Sick, and paid holidays are offered

Vagas Semelhantes

Logo LEO Technologies, LLC

AI/LLM Evaluation & Alignment Software Engineer

LEO Technologies, LLCRemote Rocketship
RemotoRemoto3 dias atrás

R$ 11k - 13k/mês

PlenoCLT

Job Description: • Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases. • Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows. • Partner wit...

RemotoSão Paulo6 dias atrás

R$ 16k - 23k/mês

SêniorCLT

Descrição da empresa Na Bosch, moldamos o futuro por meio das inovações tecnológicas de alta qualidade e de serviços que despertam entusiasmo e melhoram a vida das pessoas. Temos uma promessa sólida para nossos colaboradores: crescemos juntos, gostamos do nosso trabalho e inspiramos uns aos outros. ...

RemotoSão Paulo8 dias atrás

R$ 16k - 23k/mês

SêniorCLT

Descrição da empresa Na Bosch, moldamos o futuro por meio das inovações tecnológicas de alta qualidade e de serviços que despertam entusiasmo e melhoram a vida das pessoas. Temos uma promessa sólida para nossos colaboradores: crescemos juntos, gostamos do nosso trabalho e inspiramos uns aos outros. ...

Logo Caderno Nacional

Engenheiro De Dados E Ml

Caderno NacionalJooble
RemotoRemoto3 dias atrás

R$ 9k - 13k/mês

PlenoCLT

Engenheiro De Dados E Ml - Detalhes da Vaga. ● O(a) candidato(a) ideal será responsável por construir a infraestrutura de dados moderna que alimenta soluções de Machine Learning e IA Generativa, desde a ingestão e transformação de dados até a implantação de modelos em produção. ● Este papel híbrido ...

Interessado nesta vaga?

Candidatar-se

Você será redirecionado para o site original

Informações

NívelPleno
ContratoCLT
LocalUs
RemotoSim
MoedaBRL
Publicada22 dias atrás
FonteJobright

Análise de Vaga com IA

Estimativa salarial, match de tecnologias e análise de requisitos feitos com Inteligência Artificial

Quer se preparar melhor? Pratique entrevistas com IA no Recrutadoria ou melhore suas habilidades no BitMentor

← Voltar às Vagas