SalaryPeak

AI Engineer

QUESS SELECTION & SERVICES PTE. LTD.
Singapore 2+ years Posted Jan 24, 2026

Salary Range

SGD 54,000 - SGD 84,000 /year

SGD 4,500 - SGD 7,000/month

Apply on MyCareersFuture

Skills Required

TensorFlowMachine LearningAirflowPipelinesInformation TechnologyKerasPyTorchPythonDockerAPISchedulingBridgeApacheDatabasesLinuxC++

Job Description

Requirements:

  • Bachelor’s degree in information technology, Computer Science, Finance, or related field.
  • Minimum 3+ years of experience with LLMs; hands-on expertise with vLLM and model quantization (AWQ/GPTQ).
  • Strong proficiency in Apache Airflow for scheduling complex data and AI pipelines.
  • Experience with RAGFlow (or similar deep-document RAG frameworks) and vector databases.
  • Experience to build multi-agent systems that use tools and external APIs to complete multi-step tasks.
  • Advanced Python, Docker, and Kubernetes
  • Experience with AI observability tools to track latency, cost, and hallucination rates

Job Summary

We are looking for a skilled AI Engineer with 3+ years of experience to assist implementation of AI solutions. In this role, you will be responsible for the end-to-end lifecycle of LLM-based applications—from configuring high-performance inference engines like vLLM to architecting advanced Agentic AI workflows. You will bridge the gap between raw model capabilities and project-specific business logic using RAG and CAG patterns.

Key Responsibilities

  • Configure and optimize vLLM and other inference frameworks to ensure low-latency, high-throughput model serving.
  • Design and implement RAG pipelines using vector databases and CAG strategies to minimize redundant computation.
  • Deploy and tune vLLM clusters to provide high-throughput, low-latency API endpoints for various open-source LLMs.
  • Design and maintain Apache Airflow DAGs/ RAGFlow to automate the end-to-end AI lifecycle, including data ingestion, automated evaluation, and prompt versioning.
  • Develop and version-control sophisticated system prompts, employing techniques like Chain-of-Thought (CoT) to improve reasoning.
  • Implement CAG strategies to optimize KV cache reuse and reduce compute costs for long-context project tasks.
  • Author and refine system prompts using Agentic techniques to ensure consistent performance across different LLM backends.