Role Proficiency:
This role requires proficiency in developing data pipelines including coding and testing for ingesting wrangling transforming and joining data from various sources. The ideal candidate should be adept in ETL tools like Informatica Glue Databricks and DataProc with strong coding skills in Python PySpark and SQL. This position demands independence and proficiency across various data domains. Expertise in data warehousing solutions such as Snowflake BigQuery Lakehouse and Delta Lake is essential including the ability to calculate processing costs and address performance issues. A solid understanding of DevOps and infrastructure needs is also required.
Outcomes:
Act creatively to develop pipelines/applications by selecting appropriate technical options optimizing application development maintenance and performance through design patterns and reusing proven solutions. Support the Project Manager in day-to-day project execution and account for the developmental activities of others. Interpret requirements create optimal architecture and design solutions in accordance with specifications. Document and communicate milestones/stages for end-to-end delivery. Code using best standards debug and test solutions to ensure best-in-class quality. Tune performance of code and align it with the appropriate infrastructure understanding cost implications of licenses and infrastructure. Create data schemas and models effectively. Develop and manage data storage solutions including relational databases NoSQL databases Delta Lakes and data lakes. Validate results with user representatives integrating the overall solution. Influence and enhance customer satisfaction and employee engagement within project teams.Measures of Outcomes:
TeamOne's Adherence to engineering processes and standards TeamOne's Adherence to schedule / timelines TeamOne's Adhere to SLAs where applicable TeamOne's # of defects post delivery TeamOne's # of non-compliance issues TeamOne's Reduction of reoccurrence of known defects TeamOne's Quickly turnaround production bugs Completion of applicable technical/domain certifications Completion of all mandatory training requirements Efficiency improvements in data pipelines (e.g. reduced resource consumption faster run times). TeamOne's Average time to detect respond to and resolve pipeline failures or data issues. TeamOne's Number of data security incidents or compliance breaches.Outputs Expected:
Code:
Develop data processing code with guidanceensuring performance and scalability requirements are met. Define coding standards
templates
and checklists. Review code for team and peers.
Documentation:
checklists
guidelines
and standards for design/process/development. Create/review deliverable documents
including design documents
architecture documents
infra costing
business requirements
source-target mappings
test cases
and results.
Configure:
Test:
scenarios
and execution. Review test plans and strategies created by the testing team. Provide clarifications to the testing team.
Domain Relevance:
leveraging a deeper understanding of business needs. Learn more about the customer domain and identify opportunities to add value. Complete relevant domain certifications.
Manage Project:
Manage Defects:
Estimate:
and plan resources for projects.
Manage Knowledge:
SharePoint
libraries
and client universities. Review reusable documents created by the team.
Release:
Design:
LLD
SAD)/architecture for applications
business components
and data models.
Interface with Customer:
Manage Team:
Certifications:
Skill Examples:
Proficiency in SQL Python or other programming languages used for data manipulation. Experience with ETL tools such as Apache Airflow Talend Informatica AWS Glue Dataproc and Azure ADF. Hands-on experience with cloud platforms like AWS Azure or Google Cloud particularly with data-related services (e.g. AWS Glue BigQuery). Conduct tests on data pipelines and evaluate results against data quality and performance specifications. Experience in performance tuning. Experience in data warehouse design and cost improvements. Apply and optimize data models for efficient storage retrieval and processing of large datasets. Communicate and explain design/development aspects to customers. Estimate time and resource requirements for developing/debugging features/components. Participate in RFP responses and solutioning. Mentor team members and guide them in relevant upskilling and certification.Knowledge Examples:
Knowledge Examples
Knowledge of various ETL services used by cloud providers including Apache PySpark AWS Glue GCP DataProc/Dataflow Azure ADF and ADLF. Proficient in SQL for analytics and windowing functions. Understanding of data schemas and models. Familiarity with domain-related data. Knowledge of data warehouse optimization techniques. Understanding of data security concepts. Awareness of patterns frameworks and automation practices.Additional Comments:
Senior Data Engineer UST Global® is looking for a highly energetic and collaborative Senior Data Engineer with experience leading enterprise data projects around Business and IT operations. The ideal candidate should be an expert in leading projects in developing and testing data pipelines, data analytics efforts, proactive issue identification and resolution and ing mechanism using traditional, new and emerging technologies. Excellent written and verbal communication skills and ability to liaise with technologists to executives is key to be successful in this role. As a Senior Data Engineer at UST Global, this is your opportunity to· • Assembling large to complex sets of data that meet non-functional and functional business requirements • Identifying, designing and implementing internal process improvements including re-designing infrastructure for greater scalability, optimizing data delivery, and automating manual processes • Building required infrastructure for optimal extraction, transformation and loading of data from various data sources using GCP/Azure and SQL technologies • Building analytical tools to utilize the data pipeline, providing actionable insight into key business performance metrics including operational efficiency and customer acquisition • Working with stakeholders including data, design, product and executive teams and assisting them with data-related technical issues • Working with stakeholders including the Executive, Product, Data and Design teams to support their data infrastructure needs while assisting with data-related technical issues • Strong background in data warehouse design • Overseeing the integration of new technologies and initiatives into data standards and structures • Strong Knowledge in Scala, Spark, PySpark, Python, SQL • Experience in Cloud platform(GCP/Azure) data migration – Source/Sink mapping, Build pipelines, work flow implementation, ETL and data validation processing • Strong verbal and written communication skills to effectively share findings with shareholders • Experience in Data Analytics, optimization, machine learning techniques is added advantage • Understanding of web-based application development tech stacks like Java, Reactjs, NodeJs is a plus· Key Responsibilities • 20% Requirements and design • 60% coding & testing and 10% review coding done by developers, analyze and help to solve problems • 10% deployments and release planning You bring: • Bachelor’s degree in Computer Science, Computer Engineering or a software related discipline. A Master’s degree in a related field is an added plus • 6 + years of experience in Data Warehouse and Hadoop/Big Data • 3+ years of experience in strategic data planning, standards, procedures, and governance • 4+ years of hands-on experience in Scala • 4+ years of experience in writing and tuning SQLs, Spark queries • 3+ years of experience working as a member of an Agile team • Experience with Kubernetes and containers is a plus • Experience in understanding and managing Hadoop Log Files. • Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn. • Experience in Data Analysis, Data Cleaning (Scrubbing), Data Validation and Verification, Data Conversion, Data Migrations and Data Mining. • Experience in all the phases of Data warehouse life cycle involving Requirement Analysis, Design, Coding, Testing, and Deployment., ETL Flow • Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters • Experience in analyzing data in HDFS through Map Reduce, Hive and Pig is a plus • Experience building and optimizing ‘big data’ data pipelines, architectures and data sets. • Strong analytic skills related to working with unstructured datasets • Experience in Migrating Big Data Workloads • Experience with data pipeline and workflow management tools: Airflow • Cloud Administration For this role, we value: • The ability to adapt quickly to a fast-paced environment • Excellent written and oral communication skills • A critical thinker that challenges assumptions and seeks new ideas • Proactive sharing of accomplishments, knowledge, lessons, and updates across the organization • Experience designing, building, testing and releasing software solutions in a complex, large organization • Demonstrated functional and technical leadership • Demonstrated analytical and problem-solving skills (ability to identify, formulate, and solve engineering problems)