Job Description

Job Summary

As a Data Engineer at Statworks (S) Pte Ltd, you will lead the development and optimization of data pipelines and cloud-based Data Lake solutions. You will drive the implementation of standardized data models and enable a unified customer view, collaborating with cross-functional teams to deliver scalable, high-quality data solutions that support business objectives.

Responsibilities

Design and build scalable data pipelines to ingest diverse data from organizational, social media, and public sources for downstream analytics consumption
Collaborate with cross-functional teams to source, integrate, and make data accessible for business use cases
Develop and implement effective solution designs aligned with business requirements and technical standards
Communicate proactively with key stakeholders to identify and address risks, issues, and concerns impacting project delivery
Manage project timelines, milestones, and deliverables to meet quality standards
Develop and execute coordinated communication plans for internal and external stakeholders throughout initiative execution
Oversee handover of projects to business-as-usual operations and conduct post-implementation reviews to validate objectives and capture lessons learned
Build batch data pipelines using Apache Spark (Spark SQL, Dataframe API) or Hive Query Language (HQL) to process large-scale datasets
Develop streaming data pipelines leveraging Apache Spark Structured Streaming or Apache Flink on Kafka to enable real-time data processing
Implement and maintain NoSQL database solutions, including Cosmos DB, to support flexible data storage and retrieval
Utilize RESTful APIs and GraphQL to facilitate efficient data delivery and integration
Apply big data ETL processing tools, data modeling, and data mapping techniques to ensure data quality and consistency
Work with Hadoop ecosystem components and file formats such as Avro, Parquet, and ORC for optimized data storage
Write and maintain shell/bash scripts to automate data workflows and operational tasks
Integrate multiple data sources, including relational databases (SQL Server, Oracle, DB2, Netezza), NoSQL/document databases, and flat files
Employ CI/CD tools such as Jenkins, JIRA, Bitbucket, Artifactory, Bamboo, and Azure DevOps to automate deployment and maintain code quality
Apply DevOps practices using Git version control to support collaborative development and continuous integration
Debug, fine-tune, and optimize large-scale data processing jobs to enhance performance and reliability
Analyze complex problems and develop innovative solutions to meet evolving business needs

Required competencies and certifications

Expertise in Databricks platform for data engineering and analytics
Experience with at least one cloud infrastructure provider (Azure or AWS)
Proficiency in building batch and streaming data pipelines using Apache Spark and Apache Flink
Familiarity with RESTful APIs and GraphQL for data integration
Experience with big data ETL tools, data modeling, and Hadoop file formats
Basic scripting skills in shell/bash
Experience working with diverse data sources including relational and NoSQL databases
Proficiency with CI/CD tools and DevOps practices including Git version control
Strong problem analysis and debugging skills for large-scale data environments

Preferred competencies and qualifications

Certifications related to Data and Analytics (not mandatory but advantageous)

Data Engineer

Skills Required

Job Description

About STATWORKS (S) PTE. LTD.

Similar Jobs

Full Stack Developer