Salary Range
SGD 84,000 - SGD 93,600 /year
SGD 7,000 - SGD 7,800/month
Skills Required
Job Description
Role Name - LLM / AI Quality Engineer
Work Location - Serangoon North
Qualifications:
The ideal candidate should possess:
- 3+ years in software testing/QA with strong test methodology and tooling; hands-on API testing and performance testing.
- Programming familiarity (e.g., Python/TypeScript) and experience with CI/CD and version control.
- Cloud basics (AWS/Azure/GCP) and microservices fundamentals.
- Degree / Diploma in CS/IT or equivalent.
Preferred (AI/ML Focus)
- Understanding of ML concepts and MLOps; experience with model validation and monitoring in production.
- Experience with AI-specific security testing and vulnerability assessment.
- Familiarity with evaluation/observability tools (any of): LangSmith, Weights & Biases, RAGAS, TruLens, Promptfoo, DeepEval, Guardrails/LlamaGuard, Presidio; plus OpenTelemetry-style LLM traces.
- Practical exposure to Azure OpenAI/Bedrock/Vertex and model gateways; quota & token accounting know-how.
Tooling & Automation
- Modern automation frameworks (e.g., Playwright, Cypress, Selenium), API test tools (Postman/REST Assured), performance tool (k6/JMeter), and CI/CD integration.
- Data evaluation pipelines for RAG (embedding validation, filtering, drift detection).
Traits
- Outcome-oriented, high standards; strong communication and collaboration; customer-focused; proficient in written and spoken English.
Telco Context (Nice-to-Have)
- - Experience testing copilots/agents for BSS/OSS, NOC analytics, and enterprise care; ability to tie eval KPIs to CSAT, AHT, FCR, MTTR.
Job Description
What you will do:
As a LLM / AI Quality Engineer, you will lead the end-to-end evaluation of AI applications—LLM features, RAG systems, and multi-agent workflows—to ensure they meet business outcomes, safety requirements, and platform standards. Own test design, execution, and reporting across offline, pre-prod, and in-prod stages, integrating with CI/CD and working closely with product, data, and platform teams.
1) AI/LLM Evaluation & Test Design
- Define evaluation strategies (golden sets, adversarial suites, regressions), pass/fail gates, and SLOs for quality, safety, latency, and cost.
- Establish rubric-based human reviews (usefulness, faithfulness, safety, clarity) and calibrate annotators.
- Instrument LLM-as-judge where appropriate with calibration and spot checks.
2) RAG, Retrieval, & Grounding
- Measure retrieval precision/recall, MRR/nDCG, and answer faithfulness to sources; detect hallucination and citation errors.
- Test chunking, prompt templates, filters, and policy chains; monitor stale/poisoned content.
3) Agentic & Tool-Use Scenarios
- Validate multi-step plans, tool selection, error recovery, retries, and idempotency for functions with side effects.
- Contract-test JSON schemas and structured outputs across services.
4) Non-Functional, Performance & Cost
- Run token-aware load/soak tests (context length, temperature, batching); track p50/p95/p99, throughput, timeouts, cache hit rate, and cost per successful task.
- Recommend optimizations (prompt/policy changes, retrieval tweaks, caching).
5) Security, Privacy & Safety
- Red-team for prompt injection, data exfiltration, indirect injections via retrieved content; validate guardrails pre/post inference.
- Enforce PII controls, data-residency, and compliance checks; align with organizational security testing practices.
6) Observability & CI/CD Integration
- Implement prompt/dataset/version lineage and trace-based evals; automate in CI (pre-merge golden tests, nightly adversarials) with canary/A-B in prod and rollback criteria.
- Produce clear, decision-ready reports with risk assessments and release recommendations.
7) Project Delivery & Collaboration
- Analyze requirements, enhance test plans with additional cases, prepare environments (including cloud), execute tests per plan, and drive defect resolution.
- Provide regular status updates; manage test activities to schedule; support SIT/UAT and production readiness.
8) Performance, API & Platform Testing (Carry-over)
- Execute API, performance, and load testing for microservices/web services that underpin AI features; integrate automated testing into CI/CD.
9) Team & Standards
- Adopt and improve test standards/methodology; share practices, train teams, participate in peer reviews, and pursue self-directed learning.
About BASIL TECHNOLOGIES PTE. LTD.
Similar Jobs
Consultant (Talend Data Integration & Informatica BDM)
BASIL TECHNOLOGIES PTE. LTD.
SGD 124,800 - SGD 128,400/yr
Data Engineer
BASIL TECHNOLOGIES PTE. LTD.
SGD 114,000 - SGD 117,600/yr
Analyst (Talend Data Integration & Informatica BDM)
BASIL TECHNOLOGIES PTE. LTD.
SGD 84,000 - SGD 92,400/yr
AI Engineer
BASIL TECHNOLOGIES PTE. LTD.
SGD 132,000 - SGD 138,000/yr
Analyst (Talend Data Integration & Informatica BDM)
BASIL TECHNOLOGIES PTE. LTD.
SGD 84,000 - SGD 92,400/yr