Mumbai
10 days ago
Lead I - Data Engineering

Role Summary:  Big Data Engineer with a deep understanding of data pipeline development and cloud technologies. The ideal candidate will possess strong coding expertise in Python, PySpark, SQL, and experience working with data warehouse solutions such as Snowflake, BigQuery, and Delta Lake. This role requires proficiency in ETL tools like Informatica, AWS Glue, Databricks, and DataProc. The successful candidate will be responsible for designing, developing, and optimizing data pipelines, ensuring efficient and scalable data solutions. Additionally, the role includes performance tuning, defect management, and ongoing customer satisfaction improvement.

Key Responsibilities:

Data Pipeline Development & Design:

Develop and maintain scalable and efficient data pipelines for ingesting, transforming, and joining data from various sources. Use tools like Informatica, Glue, Databricks, and DataProc to build ETL solutions. Code in Python, PySpark, and SQL to ensure high-quality and optimized data processing. Document and communicate milestones and stages for the end-to-end delivery of data pipelines. Optimize data pipeline performance, ensuring low resource consumption and fast processing times.

Data Storage & Management:

Develop and manage data storage solutions, including relational databases, NoSQL databases, and data lakes (Snowflake, BigQuery, Delta Lake). Ensure data solutions are cost-effective, scalable, and reliable. Perform performance tuning of code and optimize it according to infrastructure needs.

Collaboration & Customer Engagement:

Interface with users to clarify requirements, provide solutions, and ensure the integration of seamless data solutions. Present design options to customers and conduct product demos. Collaborate with cross-functional teams to identify opportunities for value addition and improve customer satisfaction through effective data solutions.

Documentation & Configuration:

Create and review templates, guidelines, and standards for design, processes, and development. Develop and review deliverable documents such as design documents, architecture, and business requirements. Define and govern configuration management plans, ensuring compliance within the team.

Testing & Quality Assurance:

Create and review unit test cases, test scenarios, and execution plans. Collaborate with the testing team to provide clarifications and support during testing phases. Track and resolve production bugs quickly and efficiently.

Performance & Estimation:

Work on performance tuning and optimization of data processes and pipelines. Contribute to the estimation of effort and resource requirements for data-related projects.

Defect & Knowledge Management:

Perform root cause analysis (RCA) on defects and take proactive measures to improve quality and reduce the occurrence of issues. Contribute to project-related documents, libraries, and knowledge repositories.

Cloud & Big Data Tools Expertise:

Work with cloud platforms (AWS, Azure, Google Cloud) to optimize data solutions and ETL pipelines. Use Big Data frameworks (Hadoop, Spark, Hive, Presto) for efficient data processing and analysis. Stay updated on trends and best practices in data engineering, cloud technologies, and big data tools.

Skills & Qualifications:

Required Skills:

5+ years of experience in Big Data Engineering. Expertise in coding with SQL, Python, PySpark, and other programming languages for data manipulation. Experience with ETL tools such as Informatica, AWS Glue, Dataproc, and Azure ADF. Strong hands-on experience with cloud platforms like AWS, Azure, or Google Cloud. Proficiency in designing and optimizing data warehouses (Snowflake, BigQuery, Delta Lake). Expertise in performance tuning and cost optimization for data solutions. Familiarity with data security concepts and best practices.

Preferred Skills (Nice to Have):

Experience in migrating reporting solutions to AWS Athena for improved query performance. Exposure to Kafka/Debezium for real-time data streaming. Understanding of disaster recovery (DR) solutions on AWS. Familiarity with CI/CD pipelines and DevOps tools.

Key Metrics for Success:

Adherence to engineering processes and coding standards. Timely delivery of projects in line with schedules and SLAs. Reduction of defects post-delivery and quick resolution of production bugs. Efficiency improvements in data pipelines (e.g., faster run times, reduced resource consumption). Number of data security incidents or compliance breaches. Achievement of relevant certifications and completion of mandatory training.

Certifications & Development:

Complete relevant domain/technology certifications to stay competitive in the field. Participate in ongoing training programs to enhance your expertise.
Confirm your E-mail: Send Email