Site Reliability Engineer (AI & Automation)
TOSS-EX PR PTE. LTD.Salary Range
SGD 96,000 - SGD 132,000 /year
SGD 8,000 - SGD 11,000/month
Skills Required
Job Description
Job Summary
We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in AI-driven operations, automation, and cloud platforms. The ideal candidate will be responsible for ensuring high availability, performance, scalability, and reliability of mission-critical systems while leveraging AI/ML and automation tools to enhance operational efficiency and incident management.
Key Responsibilities
Reliability & Operations
- Ensure high availability, scalability, and performance of production systems.
- Define and manage SLIs, SLOs, and SLAs.
- Perform root cause analysis (RCA) and implement preventive measures.
- Manage incident response, problem management, and postmortems.
Automation & AI Integration
- Design and implement AI-driven monitoring, alerting, and anomaly detection solutions.
- Automate repetitive operational tasks using scripts, workflows, and orchestration tools.
- Leverage AIOps platforms to predict and prevent incidents.
- Build self-healing systems using automation frameworks.
Cloud & Infrastructure
- Manage and optimize infrastructure on cloud platforms (GCP/AWS/Azure).
- Implement Infrastructure as Code (IaC) using tools like Terraform or CloudFormation.
- Ensure resilience, failover strategies, and disaster recovery readiness.
Required Skills & Qualifications - Technical Skills
- Strong experience in Linux/Unix systems administration
- Proficiency in Python, Java, or Go for automation
- Hands-on experience with containerization (Docker, Kubernetes)
- Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI)
- Expertise in cloud platforms (GCP preferred, AWS/Azure acceptable)
- Knowledge of Infrastructure as Code (Terraform, Ansible, Puppet)
- AI & Automation
- Experience with AIOps tools (e.g., Dynatrace, Moogsoft, Datadog AI features)
- Understanding of machine learning basics for anomaly detection
- Experience in building or integrating automation frameworks and bots
- Familiarity with chatbots, auto-remediation scripts, and predictive analytics
- SRE Practices
Strong understanding of SRE principles
- Experience with incident management and reliability engineering
- Knowledge of capacity planning and performance tuning
About TOSS-EX PR PTE. LTD.
Similar Jobs
Automation Testing
TOSS-EX PR PTE. LTD.
SGD 60,000 - SGD 74,400/yr
Senior Program Engineer (Operations)
TOSS-EX PR PTE. LTD.
SGD 66,000 - SGD 69,600/yr
Program Engineer (Cybersecurity)
TOSS-EX PR PTE. LTD.
SGD 42,000 - SGD 50,400/yr
Senior Program Engineer
TOSS-EX PR PTE. LTD.
SGD 60,000 - SGD 72,000/yr
Junior Program Engineer
TOSS-EX PR PTE. LTD.
SGD 36,000 - SGD 45,600/yr