Site Reliability Engineer
ALLEGIS GROUP SINGAPORE PRIVATE LIMITEDSalary Range
SGD 120,000 - SGD 180,000 /year
SGD 10,000 - SGD 15,000/month
Skills Required
Job Description
OVERVIEW
We’re hiring a Site Reliability Engineer to support a key global technology client. You’ll join a modern, cloud‑native engineering environment and partner closely with development teams to improve the reliability, scalability, and automation of distributed platforms. The role blends software engineering with reliability ownership: you’ll design and build internal services and tooling, streamline CI/CD, implement Infrastructure‑as‑Code at scale, and strengthen observability so issues are found and fixed before they impact users.
This position offers high autonomy and visibility. You’ll work across well‑documented systems and established tooling, prepare proof‑of‑concepts to influence change, and drive pragmatic automation (in Go or Python) that reduces manual effort and makes releases safer and faster. If you enjoy hands‑on engineering, diagnosing complex problems, and landing improvements in real production environments, this is an opportunity to make a clear and measurable impact.
DESCRIPTION
As a Site Reliability Engineer, you will:
- Build internal platforms, services, and APIs that enable self‑service provisioning, safe deployments, and efficient day‑to‑day operations.
- Enhance CI/CD workflows (e.g., Jenkins or similar) to increase deployment reliability, add guardrails, and improve developer experience and velocity.
- Implement and evolve Infrastructure‑as‑Code using Terraform (and related patterns) to standardize environments, reduce configuration drift, and improve repeatability.
- Define and operationalize SLIs/SLOs and error budgets, build actionable dashboards, and tune alerts to reflect user experience and business risk.
- Operate Kubernetes workloads at scale; improve resilience, performance, and cost‑efficiency through sound engineering and automation.
- Strengthen observability (metrics, logs, traces) using Prometheus and complementary platforms; drive root‑cause analysis and preventative fixes.
- Automate routine work and periodic upgrade cycles (preferably in Go/Python) to eliminate toil and reduce change risk.
- Troubleshoot complex incidents across compute, networking, containers, and deployments; participate in a shared on‑call rotation and contribute to post‑incident reviews.
- Collaborate with engineers, architects, and product stakeholders to translate requirements into secure, observable, and scalable infrastructure solutions.
- Document patterns and best practices; mentor teams on reliability‑first ways of working and platform standards.
QUALIFICATIONS
- Strong hands‑on experience with AWS (production environments) and cloud‑native architectures; familiarity with hybrid or multi‑cloud concepts is a plus.
- Practical expertise operating Kubernetes (deployments, day‑2 operations, and troubleshooting).
- Solid CI/CD skills with Jenkins or similar tools (pipeline design, release safety, rollbacks).
- Proficiency in Infrastructure‑as‑Code (Terraform) and Git‑based workflows for environment management.
- Programming/automation in Go and/or Python (production‑quality code; tooling and services, not just scripts).
- Observability experience with Prometheus and dashboards/alerting tuned to SLIs/SLOs; familiarity with platforms such as Grafana, Datadog, or CloudWatch is welcome.
- Networking fundamentals for distributed systems, DNS, load balancing, VPC design, security groups, and layer‑7 routing/proxies.
- Sound understanding of secure system design (least privilege, secrets management, change control) and performance/reliability trade‑offs.
- Excellent communication skills and the ability to operate independently in distributed, asynchronous teams while influencing stakeholders through clear proposals and POCs.
- 7+ years in SRE/DevOps/Infrastructure/Software Engineering with a track record of operating production‑grade systems at scale.
PROFESSIONAL ATTRIBUTES
- Ownership: You’re accountable across both build and run; you close the loop with measurable outcomes.
- Automation first: You remove toil with durable solutions, not quick fixes.
- Engineering rigor: You apply design patterns, testing, and code reviews to platform work.
- Influence without authority: You use documentation, POCs, and calm communication to align teams.
- Proactive and visible: You work independently across time zones and keep stakeholders informed.
We regret to inform that only shortlisted candidates will be notified / contacted.
EA Registration No: R21103843, Andrew Jonas Matthew
Allegis Group Singapore Pte Ltd, Company Reg No. 200909448N, EA License No. 10C4544
About ALLEGIS GROUP SINGAPORE PRIVATE LIMITED
Similar Jobs
Network Operations Engineer
ALLEGIS GROUP SINGAPORE PRIVATE LIMITED
SGD 72,000 - SGD 120,000/yr
Network Operations Engineer
ALLEGIS GROUP SINGAPORE PRIVATE LIMITED
SGD 72,000 - SGD 120,000/yr
Network Engineer
ALLEGIS GROUP SINGAPORE PRIVATE LIMITED
SGD 96,000 - SGD 168,000/yr
Network Engineer
ALLEGIS GROUP SINGAPORE PRIVATE LIMITED
SGD 96,000 - SGD 168,000/yr
DevOps Engineer
ALLEGIS GROUP SINGAPORE PRIVATE LIMITED
SGD 72,000 - SGD 144,000/yr