Job Description

OPERATIONS SUPPORT ENGINEER (Consultant/Senior Consultant)

KEY RESPONSIBILITIES:

Infrastructure Management: Design, build, and maintain critical cloud infrastructure platforms encompassing compute, storage, networking, containerisation, virtualisation, DNS, monitoring, and supporting systems across development, staging, and production environments. Monitor and manage comprehensive cloud services including CloudWatch logs, alarms, synthetic monitoring, and integrated third-party solutions.

Monitoring and Observability: Implement and maintain robust monitoring and observability frameworks for all platform components utilising modern tooling including AWS CloudWatch Canaries, StackOps, Prometheus, Grafana, and ELK stack implementations. Establish comprehensive observability practices to support proactive problem diagnosis and provide actionable insights into system health and performance metrics.

Compliance and Security: Maintain adherence to Whole-of-Government platform standards, compliance frameworks, and security requirements through continuous monitoring using government-approved security and monitoring solutions. Implement security controls including access management, security hardening, and compliance monitoring with tools such as CyberArk.

Automation and Infrastructure as Code: Develop and maintain infrastructure using Infrastructure as Code (IaC) methodologies with tools including Terraform, Ansible, and AWS CloudFormation to ensure repeatable, automated, and version-controlled deployments. Follow platform standards whilst executing infrastructure automation and modern operational practices to enhance efficiency and reliability.

Site Reliability Engineering: Identify and eliminate repetitive operational tasks to improve Developer and Infrastructure Engineer efficiency whilst enhancing overall system reliability through systematic toil elimination and error budget management. Define, track, and report on SRE metrics including Service Level Objectives (SLO), Service Level Indicators (SLI), and error budgets.

Platform Operations: Manage virtualisation platforms including VMware vSphere and Hyper-V, encompassing capacity monitoring, performance optimisation, and lifecycle management. Administer AWS Cloud services including EC2, ECS, S3, RDS (PostgreSQL and MS SQL), Docker/Kubernetes, Lambda, CloudFormation, CloudWatch, IAM, and VPC configurations alongside physical server infrastructure.

Network and System Administration: Demonstrate proficiency with local networking technologies including TCP/IP, DNS, DHCP, VPN configurations, and routing protocols. Execute comprehensive platform patching strategies leveraging automation to maintain security and stability whilst minimising service disruption.

Business Continuity: Maintain backup, disaster recovery, and high availability solutions for critical platform components including AWS Fault Injection Simulator (FIS) testing and multi-availability zone configurations. Support containerisation initiatives and maintain container orchestration platforms for traditional workloads.

Collaboration and Documentation: Collaborate effectively with application teams to support platform stability, performance, and scalability requirements. Create and maintain comprehensive platform documentation, operational runbooks, and standard operating procedures. Support team development through knowledge sharing and mentoring on platform operations and modern infrastructure practices.

SENIOR OPERATIONS SUPPORT ENGINEER - ADDITIONALREQUIREMENTS

Leadership and Management: Lead infrastructure engineering teams to deliver comprehensivemanaged services for entire IT infrastructure environments. Direct desktop engineering teams to provide first-level support and technical problem resolution for end-user communities.

Strategic Operations: Oversee and direct daily IT infrastructure operations, ensuringreliable and secure system, service, and application performance. Monitor andmanage incident response for business-critical systems with focus on timelyresolution to prevent operational delays and service outages.

Organisational Engagement: Demonstrate capability to engage effectively withorganisational management whilst establishing guidelines, policies, andprocedures with strong execution oversight. Manage multiple concurrentdeadlines as a self-directed professional with appropriate prioritisationskills.

Operational Excellence: Monitor and respond to data centre issues and incidents whilstperforming routine operational checks on servers, network devices, storage, andenvironmental systems. Track IT asset inventory ensuring comprehensiveequipment accountability and end-of-life management.

Incident and Change Management: Respond promptly to system alerts, alarms, and incidents withappropriate escalation to support teams following defined procedures. Supportincident troubleshooting and recovery activities whilst managing plannedmaintenance, change requests, and scheduled outages. Coordinate hardwareinstallation, replacement, and decommissioning activities alongside mediahandling and secure storage management.

EXPERIENCE AND SKILLS REQUIRED

Technical Expertise:

Advanced experience with enterprise virtualisation platforms (VMware vSphere, Hyper-V)
Proficiency in Linux and Windows Server administration
Expertise in server monitoring tool installation and regular patching of virtual and physical servers
Comprehensive health check capabilities for servers, storage, and virtualisation platforms
Strong experience with infrastructure automation tools (Ansible, Puppet, Chef)
Proficiency with container technologies (ECS, Docker, Kubernetes)
Experience with monitoring and observability platforms
Infrastructure as Code expertise (Terraform, AWS CloudFormation, Ansible)
Solid understanding of networking concepts and technologies
Scripting capabilities in Python, PowerShell, Bash, and Node.js
Experience with high-availability and disaster recovery solutions including AWS FIS
Proficiency with GitHub tools and CI/CD pipeline setup and workflow management

Professional Qualifications: Bachelor’s degree in computer science, Information Technology, or related technical discipline with demonstrated experience in infrastructure operations and engineering. Strong understanding of enterprise infrastructure components with proven experience supporting infrastructure modernisation initiatives.

Core Competencies: Excellent analytical and problem-solving capabilities with strong documentation skills and effective communication abilities for both technical and non-technical stakeholders.

Desired Certifications:

VMware Certified Professional (VCP) or Windows vSphere
Microsoft Certified: Windows Server
Red Hat Certified Engineer (RHCE)
AWS Certified Solutions Architect or AWS Certified SysOps Administrator
Additional certifications in networking, security, or government IT standards. Previous experiences in government or highly regulated environments are strongly preferred.

**Please note that the salary will be determined based on the candidate’s experience, skills, and overall suitability for the role.

Operation Suppport Engineer

Skills Required

Job Description

About SEDHA CONSULTING PTE. LTD.

Similar Jobs

Technical Lead

Platform Engineer (Openshift/Kubernetes)

Application Security Engineer

Technical Lead

IT Security Officer