Salary Range
SGD 84,000 - SGD 120,000 /year
SGD 7,000 - SGD 10,000/month
Skills Required
Apache SparkOracleAzureBig DataData ModelingPipelinesHadoopScriptingETLData QualityData EngineeringSQLSQL ServerApacheDatabasesBusiness Requirements
Job Description
Job Summary
As a Data Engineer at Statworks (S) Pte Ltd, you will lead the development and optimization of data pipelines and cloud-based Data Lake solutions. You will drive the implementation of standardized data models and enable a unified customer view, collaborating with cross-functional teams to deliver scalable, high-quality data solutions that support business objectives.
Responsibilities
- Design and build scalable data pipelines to ingest diverse data from organizational, social media, and public sources for downstream analytics consumption
- Collaborate with cross-functional teams to source, integrate, and make data accessible for business use cases
- Develop and implement effective solution designs aligned with business requirements and technical standards
- Communicate proactively with key stakeholders to identify and address risks, issues, and concerns impacting project delivery
- Manage project timelines, milestones, and deliverables to meet quality standards
- Develop and execute coordinated communication plans for internal and external stakeholders throughout initiative execution
- Oversee handover of projects to business-as-usual operations and conduct post-implementation reviews to validate objectives and capture lessons learned
- Build batch data pipelines using Apache Spark (Spark SQL, Dataframe API) or Hive Query Language (HQL) to process large-scale datasets
- Develop streaming data pipelines leveraging Apache Spark Structured Streaming or Apache Flink on Kafka to enable real-time data processing
- Implement and maintain NoSQL database solutions, including Cosmos DB, to support flexible data storage and retrieval
- Utilize RESTful APIs and GraphQL to facilitate efficient data delivery and integration
- Apply big data ETL processing tools, data modeling, and data mapping techniques to ensure data quality and consistency
- Work with Hadoop ecosystem components and file formats such as Avro, Parquet, and ORC for optimized data storage
- Write and maintain shell/bash scripts to automate data workflows and operational tasks
- Integrate multiple data sources, including relational databases (SQL Server, Oracle, DB2, Netezza), NoSQL/document databases, and flat files
- Employ CI/CD tools such as Jenkins, JIRA, Bitbucket, Artifactory, Bamboo, and Azure DevOps to automate deployment and maintain code quality
- Apply DevOps practices using Git version control to support collaborative development and continuous integration
- Debug, fine-tune, and optimize large-scale data processing jobs to enhance performance and reliability
- Analyze complex problems and develop innovative solutions to meet evolving business needs
Required competencies and certifications
- Expertise in Databricks platform for data engineering and analytics
- Experience with at least one cloud infrastructure provider (Azure or AWS)
- Proficiency in building batch and streaming data pipelines using Apache Spark and Apache Flink
- Familiarity with RESTful APIs and GraphQL for data integration
- Experience with big data ETL tools, data modeling, and Hadoop file formats
- Basic scripting skills in shell/bash
- Experience working with diverse data sources including relational and NoSQL databases
- Proficiency with CI/CD tools and DevOps practices including Git version control
- Strong problem analysis and debugging skills for large-scale data environments
Preferred competencies and qualifications
- Certifications related to Data and Analytics (not mandatory but advantageous)