C

LLM Data Engineer | United States | Fully Remote

careerbox.42webvia Careerbox.42web
RemotoGbPlenoCLT10 dias atrás

Salário Estimado

R$ 7.425,00 - R$ 11.138,00

0de 100

Regular

Score da Vaga

Descrição da Vaga

We are seeking an experienced AI/LLM Data Engineer to build and maintain the data pipeline for our Generative AI platform.


The ideal candidate will be well-versed in the latest Large Language Model (LLM) technologies and have a strong background in data engineering, with a focus on Retrieval-Augmented Generation (RAG) and knowledge-base techniques.


This role sits in the AI COE within DX Tech & Digital.


As a AI/LLM Data Engineer (you will report into the Director, AI Solutions & Development who oversees the AI COE.


You will work on highly visible strategic projects, collaborating with cross-functional teams to define requirements and deliver high-quality AI solutions.


The ideal candidate will have a passion for Generative AI and LLMs, with a proven track record of delivering innovative AI applications.


Responsibilities • Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processes

Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platform • Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text data
Benchmark and implement various vector stores, embedding techniques, and retrieval methods • Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types (e.g., vector search, hybrid search)
Implement and maintain auto-tagging systems and data preparation processes for LLMs • Develop tools for text and image data crawling, cleaning, and refinement
Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML models • Work with data lake house architectures to optimize data storage and processing
Integrate and optimize workflows using Snowflake and various vector store technologies Requirements• Master's degree in Computer Science, Data Science, or a related field
3-5 years of work experience in data engineering, preferably in AI/ML contexts • Proficiency in Python, JSON, HTTP, and related tools
Strong understanding of LLM architectures, training processes, and data requirements • Experience with RAG systems, knowledge base construction, and vector databases
Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts • Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated)
Knowledge of data crawling techniques and associated ethical considerations • Strong problem-solving skills and ability to work in a fast-paced, innovative environment
Familiarity with Snowflake and its integration in AI/ML pipelines • Experience with various vector store technologies and their applications in AI
Understanding of data lakehouse concepts and architectures • Excellent communication, collaboration, and problem-solving skills.
Ability to translate business needs into technical solutions. • Passion for innovation and a commitment to ethical AI development.
Experience building LLMs pipeline using framework like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions. • Familiar with different LLM parameters like temperate, top-k, and repeat penalty, and different LLM outcome evaluation data science metrics and methodologies.

Preferred Skills Experience with popular LLM/ RAG frameworks Familiarity with distributed computing platforms (e.g., Apache Spark, Dask) Knowledge of data versioning and experiment tracking tools Experience with cloud platforms (AWS, GCP, or Azure) for large-scale data processing Understanding of data privacy and security best practices Practical experience implementing data lakehouse solutions Proficiency in optimizing queries and data processes in Snowflake or Databricks Hands-on experience with different vector store technologies BenefitsUS employees benefit package.


Apply tot his job Apply tot his job

Vagas Semelhantes

RemotoSan Francisco, California, Us8 dias atrás

R$ 15k - 23k/mês

SêniorCLT

the Role We’re looking for a motivated Entry-Level Python / AI Engineer to join our growing engineering team in San Francisco. This role is ideal for recent graduates or early-career engineers who are passionate about Python, machine learning, and building real-world AI-driven products. You’ll work ...

Competitive base salary up to $150,000Mentorship from experienced AI and software engineersOpportunity to work on cutting-edge AI products
RemotoBr14 dias atrás

R$ 13k - 19k/mês

SêniorCLT

Fusemachines Founded in 2013, Fusemachines is a global provider of enterprise AI products and services, on a mission to democratize AI. Leveraging proprietary AI Studio and AI Engines, the company helps drive the clients’ AI Enterprise Transformation, regardless of where they are in their Digital AI...

RemotoBr2 dias atrás

R$ 7k - 11k/mês

PlenoCLT

Somos mais que uma máquina, somos pessoas que transformam e criam infinitas possibilidades. Trabalhamos para simplificar e impulsionar negócios para todas as pessoas, oferecendo soluções financeiras inteligentes. Aqui, investimos em tecnologia, promovemos desenvolvimento e fomentamos a inovação para...

S

Desenvolvedor Python

SIS Innov & TechWhatJobs
RemotoTaguatinga, Tocantins, Br5 dias atrás

R$ 6k - 10k/mês

PlenoCLT

Sobre a Empresa Há mais de 20 anos mercado, somos uma consultoria estratégica de Inovação e Transformação Digital. Nossa especialidade é impulsionar as demandas de nossos clientes, integrando processos, pessoas e tecnologia de alta performance. Sobre o Cargo: Desenvolvedor Experiência sólida com Pyt...

Interessado nesta vaga?

Candidatar-se

Você será redirecionado para o site original

Informações

NívelPleno
ContratoCLT
LocalGb
RemotoSim
MoedaBRL
Publicada10 dias atrás
FonteCareerbox.42web

Análise de Vaga com IA

Estimativa salarial, match de tecnologias e análise de requisitos feitos com Inteligência Artificial

Powered by CodeCortex
← Voltar às Vagas