SalaryPeak

Technical Lead - Infrastructure Site Reliability Engineering(Core Java 1.8,Spring Boot, Quartz

EXASOFT PTE. LTD.
Singapore 8+ years Posted 3w ago

Salary Range

SGD 78,000 - SGD 90,000 /year

SGD 6,500 - SGD 7,500/month

Skills Required

KubernetesXMLcommunications leadDesign ReviewLoad BalancingSpring BootCore JavaCommunication ProtocolsMessaging ServicesEnterprise ArchitectureHybrid CloudOracle Linux

Job Description

Responsibilities:

  • Lead and mentor a team of SRE engineers and platform developers across distributed locations.
  • Translate architectural requirements into detailed technical solutions and engineering tasks.
  • Review and guide design, code, automation pipelines, Kubernetes deployments, and IaC modules.
  • Architect and optimize enterprise Linux environments, virtualization platforms, clustering, and NFS storage.
  • Lead deployment and tuning of messaging systems including IBM MQ, RabbitMQ, and Erlang/Mnesia engines.
  • Oversee implementation of load balancers (F5 LTM/ASM/ASR), proxies, networking, and hybrid connectivity.
  • Design, build, and optimize cloud and hybrid environments across AWS/Azure/GCP.
  • Lead Kubernetes architecture, resource organization, workload scaling, observability, and resilience patterns.
  • Guide container platform enhancements including service mesh, pod security, and cluster governance.
  • Architect IaC frameworks using Terraform, Ansible, CloudFormation, and Chef.
  • Develop reusable automation standards, modules, libraries, and operational workflows.
  • Drive automation for provisioning, patching, compliance, deployments, and operational tasks.
  • Build platform utilities, automation services, schedulers, orchestration engines using Core Java, Spring Boot, Quartz, and Erlang.
  • Develop integration modules for messaging systems, connectors, event pipelines, and internal micro‑services.
  • Implement operational tooling, log processors, alert engines, webhook handlers, and reliability frameworks.
  • Design and maintain observability pipelines using Prometheus, Grafana, Datadog, and Splunk.
  • Ensure reliability, performance, and cluster tuning across relational and NoSQL platforms.
  • Communicate technical decisions, risks, and progress clearly to stakeholders.
  • Support delivery governance and contribute to engineering roadmaps and improvement initiatives.

Requirements:

  • Hands on experience in Core Java 1.8, Spring, Spring Boot, Quartz
  • Hands on experience with messaging systems such as RabbitMQ and IBM MQ
  • Experience in Linux Operating Systems (Oracle Linux 7.6)
  • Experience in Application Servers, preferably IBM WebSphere / Apache Tomcat 8.5.x
  • Experience in Oracle SQL and PL/SQL
  • Strong experience in XML, XSL, XSLT
  • Experience in Hardware Security Modules (HSM) such as PayShield 9000 / Shield XC High
  • Experience in communication protocols such as REST, AMQP, JMS
  • Experience in DevOps teams and industry standard CI/CD tools including:Jenkins, GitLab, Kubernetes, Docker, Terraform, Ansible, CloudFormation, Chef CI, GitHub Actions, Scripting using Python, Go, and Bash
  • Cloud exposure on AWS, Azure, or GCP
  • Experience in supporting large, complex, high availability, high volume applications
  • Understanding of failover mechanisms and disaster recovery
  • Excellent communication and interpersonal skills
  • Experience in Virtualization Technologies, including Oracle VM (OVM) in LAC / VMware environments
  • Good understanding of infrastructure components, including F5 load balancers
  • Experience in monitoring tools such as Tivoli, and Splunk
  • Experience in payments processing systems or the banking/financial services industry
  • Experience in Shell scripting
  • Experience in process driven environments (e.g., ITIL)
  • 8+years of engineering experience with strong infrastructure SRE background.
  • Experience designing and optimizing enterprise infrastructure, high‑availability systems, and messaging platforms.
  • Experience in developers/SREs, conducting technical reviews, and driving engineering best practices.
  • Proven ability to troubleshoot complex multi‑layered issues across infra, container, messaging, and application ecosystems.
  • Strong history of involvement in incident management, RCA, reliability improvements, and operational readiness.