SalaryPeak

Lead Infrastructure Engineer (OpenShift , AWS, Docker, Terraform, Gen AI )

Unison Group
Singapore, Singapore Posted Apr 8, 2026

Salary not disclosed by employer

Apply on LinkedIn

Job Description

Overview We are seeking an experienced Senior GenAI Platform Engineer / OpenShift SME to lead and manage enterprise-scale infrastructure supporting GenAI applications. This role focuses on OpenShift platform engineering, hybrid cloud environments, disaster recovery (DR), and security for highly scalable and resilient AI platforms. Requirements • 10+ years of experience in infrastructure engineering / platform engineering. • Strong expertise in managing OpenShift (OCP) in enterprise production environments. • Hands-on experience in infrastructure sizing, capacity planning, and performance tuning for AI workloads. • Experience supporting Oracle Database from an infrastructure/application standpoint. • Strong knowledge of certificate management, secrets handling, and key management. • Experience with CI/CD pipelines and infrastructure automation. • Solid background in security, vulnerability management, and compliance. • Proven experience in designing and implementing Disaster Recovery (DR) solutions. • Experience with AWS cloud services and hybrid cloud environments. • Strong experience with Docker and Kubernetes. • Excellent coordination and stakeholder management skills across cross-functional teams. Key Responsibilities • Lead and manage end-to-end infrastructure for enterprise GenAI applications hosted on OpenShift (OCP). • Own capacity planning, sizing, and performance optimization of OpenShift clusters and related infrastructure components. • Manage and optimize infrastructure including Oracle DB, Redis, Elastic DB, PostgreSQL, Dell ECS storage, and Linux environments (RedHat/Ubuntu). • Design and implement Disaster Recovery (DR) strategies ensuring high availability, resilience, and business continuity. • Lead E2E DR setup including replication, failover, testing, and documentation in collaboration with infra and network teams. • Manage certificate lifecycle (TLS/SSL), secrets, and key management across platforms. • Implement vulnerability management, patching, and remediation across Kubernetes, containers, and infrastructure. • Support and coordinate penetration testing and address security findings. • Work with AWS services (EC2, VPC, CloudWatch, Lambda, Bedrock) in hybrid cloud environments. • Build and maintain infrastructure automation using Terraform and CloudFormation. • Manage observability using monitoring, logging, alerting tools, and Control-M schedulers. • Collaborate with DevOps, Security, and Development teams for platform reliability and performance. • (Bonus) Work with or support open-weight LLM models for AI/ML use cases.