Job Description

Job Summary

Design, build, and maintain scalable, fault-tolerant data pipelines using Python, SQL, and orchestration tools. Manage package dependencies, ensure security compliance, and optimize code for production data workflows.

Responsibilities

Develop modular Python code with robust error handling and logging for data pipelines
Write advanced SQL queries using joins, window functions, and optimization techniques to support data processing
Process and transform data using Pandas and integrate streaming data with Kafka producers and consumers
Design, build, and maintain Directed Acyclic Graphs (DAGs) and flows using orchestration tools such as Apache Airflow and Prefect
Ensure pipelines are idempotent, scalable, and fault-tolerant to support reliable data workflows
Implement logging, monitoring, and alerting mechanisms to maintain pipeline observability and operational health
Manage Python package installations, upgrades, and dependency resolution across development, UAT, and production environments
Maintain dependency manifests (e.g., requirements.txt) with version pinning to ensure environment consistency
Support deployments in restricted or air-gapped environments by managing package and dependency constraints
Analyze vulnerability reports from security scanning tools and remediate security issues by upgrading or replacing vulnerable libraries
Fix broken imports, deprecated APIs, and compatibility issues arising from library updates while maintaining pipeline stability
Collaborate with security teams to ensure compliance with organizational security standards and secure coding practices
Refactor legacy code in data ingestion APIs, data transformation (Pandas/SQL), model training/inference pipelines, and orchestration workflows to improve modularity, readability, and performance
Ensure backward compatibility and minimize disruption to production systems during code changes
Perform data validation and ensure schema consistency and data quality across pipeline stages
Implement unit and integration tests for data pipelines to ensure reliability before deployment
Troubleshoot pipeline failures, perform root cause analysis, and provide production support for continuous workflow improvement
Handle Kafka schema evolution and message serialization/deserialization to maintain streaming data integrity
Work effectively in regulated or high-security environments, applying security and reliability best practices

Preferred competencies and qualifications

Preferably 2-3 or more years of experience in data engineering
Prior experience working with production data pipelines
Experience handling dependency conflicts, library upgrades, and refactoring in live systems
Ability to work across multiple layers including API, data processing, orchestration, and machine learning pipelines

Data Engineer

Skills Required

Job Description

About BASIL TECHNOLOGIES PTE. LTD.

Similar Jobs

Principle AI Engineer

Junior Business Analyst

Consultant (Talend Data Integration & Informatica BDM)

Data Engineer

Analyst (Talend Data Integration & Informatica BDM)