Ensures the expansion and optimization VNSNY data and data pipeline architecture, as well as the optimization of data flow and collection for cross functional teams. Participates on data initiatives and ensures optimal data delivery architecture is consistent throughout ongoing projects. Architects and designs processes and code frameworks that enable the optimization and re-engineering of VNSNY’s data architecture systems to support the next generation of products and data initiatives. Works under general supervision.
Responsibilities
Creates and maintains optimal data pipeline architecture integrating large, complex data sets that meet functional and non-functional business requirements.Identifies, designs, and implements internal process improvements such as automating manual processes, optimizing data delivery, and re-designing infrastructure for greater scalability.Builds the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS cloud native technologies.Builds analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.Works with project stakeholders to assist with data-related technical issues and supports data infrastructure needs.Works with data and analytics team to strive for greater functionality in our data systems.Participates in special projects and performs other duties, as required.
Qualifications
Education: Bachelor’s degree in Computer Science, Statistics, Informatics, Information Systems or a related field required.
Experience: Minimum of five years of experience working on advanced SQL knowledge, relational databases, and query authoring (SQL) required. Experience building and optimizing big data pipelines, architectures and data sets required. Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement required. Strong analytic skills related to working with structured/unstructured datasets required. Experience building processes supporting data transformation, data structures, metadata, dependency and workload management required. Working knowledge of message queuing, stream processing, and highly scalable data stores required. Experience with relational SQL and NoSQL databases, such as Oracle, SQL Server, MySQL, Postgres, MongoDB and Snowflake required. Experience with data pipeline and workflow management tools such as AWS Glue, and Airflow required. Experience with AWS cloud services such as EC2, RDS, S3 & Athena, Redshift, and Dynamo required. Experience with stream-processing systems including PySpark, Storm, and Spark-Streaming required. Experience with object-oriented/object function scripting languages such as Python, Java, C++, and Scala required.