Job Description

Job Description

We are seeking a skilled AI Engineer with 3+ years of experience to implement AI solutions using large language models (LLMs). The role involves managing the end-to-end lifecycle of LLM-based applications, including configuring high-performance inference engines, architecting advanced Agentic AI workflows, and bridging model capabilities with business logic using RAG and CAG patterns.

Responsibilities

Configure and optimize vLLM and other inference frameworks to deliver low-latency, high-throughput model serving that meets performance targets
Design and implement Retrieval-Augmented Generation (RAG) pipelines using vector databases and Chain-of-Agents Generation (CAG) strategies to minimize redundant computation and enhance efficiency
Deploy and tune vLLM clusters to provide scalable, high-throughput, low-latency API endpoints for open-source LLMs, ensuring reliable service delivery
Design, develop, and maintain Apache Airflow DAGs and RAGFlow workflows to automate AI lifecycle tasks including data ingestion, automated evaluation, and prompt versioning for continuous improvement
Develop, version-control, and refine system prompts using Chain-of-Thought (CoT) techniques to enhance LLM reasoning capabilities and output quality
Implement CAG strategies to optimize key-value (KV) cache reuse, reducing compute costs for long-context tasks and improving resource efficiency
Author and refine system prompts using Agentic AI techniques to ensure consistent and robust performance across multiple LLM backends

Required competencies and certifications

Bachelor’s degree in information technology, Computer Science, Finance, or a related field
Minimum 3+ years of hands-on experience working with large language models (LLMs), including expertise in vLLM and model quantization techniques such as AWQ and GPTQ
Strong proficiency in Apache Airflow for designing and scheduling complex AI and data pipelines
Experience with RAGFlow or similar deep-document Retrieval-Augmented Generation frameworks and vector databases
Proven experience building multi-agent AI systems using external APIs and tools to execute multi-step tasks effectively
Advanced programming skills in Python, and practical experience with containerization and orchestration technologies including Docker and Kubernetes
Experience using AI observability tools to monitor and analyze latency, cost, and hallucination rates for model performance optimization

AI Engineer

Skills Required

Job Description

About ELLIOTT MOSS CONSULTING PTE. LTD.

Similar Jobs

Senior Database Administrator

DevSecOps Engineer

Data Integration Analyst

Senior Data Engineer – Databricks

Cloud Engineer