Salary Range
SGD 60,000 - SGD 72,000 /year
SGD 5,000 - SGD 6,000/month
Skills Required
DashboardsService DesignSecurity ComplianceTriageEvidence ManagementConfiguration TrackingTraceabilityOperational Risk ManagementSupply Chain SecurityRCALow LatencyConcurrent Programming
Job Description
Responsibilities:
- Design & own service observability usage model: ensure all service metrics, logs, traces flow into Elastic Cloud (authoritative); maintain dashboards & SLOs; evaluate pragmatic use of CloudWatch, AWS Managed Prometheus / Grafana for supplemental or fallback views.
- Build proactive, noise‑reduced alerting and incident response playbooks; drive post‑incident RCA & remediation tracking (closure SLA).
- Optimize service performance (profiling, caching layers, autoscaling heuristics, concurrency tuning) meeting latency & throughput targets.
- Implement secure supply chain & runtime controls (image scanning, SBOM consumption, secrets management, TLS / mTLS) leveraging shared platform tooling.
- Curate operational runbooks, golden dashboards, reliability readiness + production readiness checklists.
- Integrate model / guardrail service telemetry (latency, queue depth, GPU/CPU utilization) into unified Elastic Cloud views.
- Support compliance & audit evidence collection (access logs, config lineage, change histories) via automated evidence capture fed into Elastic.
- Introduce configuration drift detection & policy-as-code guardrails (OPA / Kyverno) at the workload / namespace layer to enforce baseline controls.
- Mentor engineers on production readiness, observability patterns, and operational excellence; evolve on-call playbooks.
- Participate in (and improve) an equitable on-call rotation focusing on sustainable alert volumes & burnout prevention.
Requirements
- 4+ years (or equivalent impact) in SRE / Production Ops / Platform / Reliability for SaaS or high-throughput services.
- Working knowledge of AWS & Kubernetes (deployment, troubleshooting, networking concepts) sufficient to collaborate effectively with platform owners (not necessarily owning cluster upgrade orchestration).
- Familiarity with Infrastructure as Code & GitOps (Terraform, Argo, etc.) to consume modules, review changes, and enforce policy.
- Observability implementation & usage (metrics, logs, traces, profiling) with Elastic Cloud; understanding of Prometheus / OpenTelemetry concepts.
- Proven on-call & incident management experience (triage, MTTR reduction, RCA authorship).
- Scripting / automation in Python, Bash, or Go for ops tooling.
- Security & compliance aware: vulnerability management, image scanning, supply chain controls.
- Clear, concise communication of operational risk & trade-offs to technical + non-technical stakeholders.
About FPT ASIA PACIFIC PTE. LTD.
Similar Jobs
M36 - Full Stack Engineer (Java)
FPT ASIA PACIFIC PTE. LTD.
SGD 72,000 - SGD 96,000/yr
G13 - Senior Software Engineer (Platform & Infra)
FPT ASIA PACIFIC PTE. LTD.
SGD 96,000 - SGD 120,000/yr
Server Product Manager and Channel Operation
FPT ASIA PACIFIC PTE. LTD.
SGD 54,000 - SGD 72,000/yr
Software Engineer ( C#, .NET framework, Java)
FPT ASIA PACIFIC PTE. LTD.
SGD 72,000 - SGD 102,000/yr
M03 - IT Security Officer (Application Security)
FPT ASIA PACIFIC PTE. LTD.
SGD 72,000 - SGD 102,000/yr