Job Description

Job Description

· We are seeking a skilled AI Engineer with a minimum of 3+ years of hands-on experience in designing, building, and deploying Large Language Model (LLM)-based solutions.

· The ideal candidate will be responsible for the end-to-end lifecycle of AI applications, from high-performance model inference and optimization to the development of advanced Agentic AI workflows using RAG and CAG patterns.

· This role requires close collaboration with product, data, and engineering teams to translate business requirements into scalable, reliable, and cost-efficient AI systems.

Required Skills & Qualifications

· Bachelor’s degree in Information Technology, Computer Science, Finance, or a related field.

· Minimum 3+ years of experience working with Large Language Models (LLMs) in production environments.

· Hands-on expertise with vLLM and model quantization techniques such as AWQ and GPTQ.

· Strong proficiency in Apache Airflow for scheduling and orchestrating complex data and AI pipelines.

· Experience with RAGFlow or similar deep-document Retrieval-Augmented Generation (RAG) frameworks.

· Practical experience with vector databases (e.g., FAISS, Milvus, Pinecone, Weaviate).

· Proven ability to design and implement multi-agent systems that leverage tools and external APIs to perform multi-step tasks.

· Advanced proficiency in Python, Docker, and Kubernetes.

· Experience using AI observability and monitoring tools to track latency, cost, throughput, and hallucination rates.

Key Responsibilities

· Configure, deploy, and optimize vLLM and other inference frameworks to ensure low-latency, high-throughput LLM serving.

· Design and implement RAG pipelines using vector databases and Cache-Augmented Generation (CAG) strategies to reduce redundant computation and improve response quality.

· Deploy and tune vLLM clusters to support scalable, production-grade API endpoints for multiple open-source LLMs.

· Design, implement, and maintain Apache Airflow DAGs and RAGFlow pipelines to automate the AI lifecycle, including data ingestion, indexing, evaluation, and prompt/version management.

· Develop, version-control, and continuously refine system prompts, applying techniques such as Chain-of-Thought (CoT) to improve reasoning accuracy and consistency.

· Implement CAG strategies to optimize KV cache reuse and minimize compute costs for long-context and multi-step AI tasks.

· Build and refine Agentic AI workflows, enabling autonomous task planning, tool usage, and API orchestration across different LLM backends.

· Monitor and analyze AI system performance using observability tools, ensuring reliability, cost efficiency, and controlled hallucination rates.

· Collaborate with cross-functional teams to align AI solutions with business objectives, security standards, and scalability requirements.

· Experience Level 3+ years of relevant experience in AI/ML engineering, with demonstrated production experience in LLM-based systems.

AI Engineer

Skills Required

Job Description

About ELLIOTT MOSS CONSULTING PTE. LTD.

Similar Jobs

Senior Database Administrator

DevSecOps Engineer

Data Integration Analyst

Senior Data Engineer – Databricks

Cloud Engineer