Chicago, IL, USA
5 days ago
Data Quality Engineer

Department

BSD CTD - User Services - GDC


About the Department

The Center for Translational Data Science (CTDS) at the University of Chicago is a research center whose mission is to develop the discipline of translational data science to impactful problems in biology, medicine, healthcare, and the environment. We envision a world in which researchers have ready access to the data needed and the tools required to make data driven discoveries that increase our scientific knowledge and improve the quality of life. We architect ecosystems of large-scale commons of research data, computing resources, applications, tools, and services for the broader research community to use data at scale to pursue scientific inquiry and accelerate discovery. Learn more at https://gdc.cancer.gov/, https://gen3.org/, https://stats.gen3.org/, and https://ctds.uchicago.edu/.

This at-will position is wholly or partially funded by contractual grant funding which is renewed under provisions set by the grantor of the contract. Employment will be contingent upon the continued receipt of these grant funds and satisfactory job performance.


Job Summary

The job works independently to perform a variety of activities relating to software support and/or development. Analyzes, designs, develops, debugs, and modifies computer code for end user applications, beta general releases, and production support. Guides development and implementation of applications, web pages, and user-interfaces using a variety of software applications, techniques, and tools. Solves complex problems in administration, maintenance, integration, and troubleshooting of code and application ecosystem currently in production.

The Data Quality Engineer is a problem solver with a background working in data integrity and testing to ensure high quality data and metadata is distributed to the cancer research community. This is an opportunity to elevate your career working with one of the world's largest collections of harmonized cancer genomic data. This role focuses on the Genomic Data Commons, which is at the forefront of both cutting edge research and production systems supporting cancer research. Your role will be as the lead engineer for data quality and integrity, joining a team of engineers developing innovative technologies in the pursuit of discovery through data-driven cancer research. You will focus on leading data quality efforts related to data integration, higher level data products, and distribution to the cancer research community, working across multiple teams to build and automate frameworks such as anomaly detection, reporting, and alerting to ensure data quality. You will gain expertise not only in the data itself, but the systems as well to interrogate the data and understand gaps in data quality. Data and metadata quality has a broad scope, so you are expected work collaboratively across teams to determine priorities and best methods for achieving objectives. Additionally, support for end users will be required through user communications and documentation.

Responsibilities

Drive the design of the data QA infrastructure and execution of testing protocols to validate pipelines, integrated datasets, and data products.

Use a combination of exploratory, regression, and automated testing to ensure data quality standards. Assess appropriate inclusion/exclusion of data based on defined data dictionary.

Assist in evaluation and development of data dictionaries and utilize data specification and code to validate data as it relates to quality.

Assist in data release planning and implementation based on stakeholder requirements and data availability. 

Proactively identify potential data issues and downstream impact. Identify existing data issues and perform research and root cause analyses to determine resolution. Work collaboratively with software engineers, bioinformaticians, and stakeholders to achieve and verify resolution.

Establish and maintain processes and standards to improve data quality assurance and implement efficiencies in data management.

Define measurements and metrics to conduct and present routine data reports to the project team and stakeholders.

Participate in data acquisition and integration planning efforts including data modeling, data dictionary definitions, and data harmonization pipeline development.

Develop a deep understanding of multiple genomic datasets and the technical data management software and processes of the underlying system.

Define data quality and integrity criteria and develop a comprehensive data quality management plan to lead key data QC efforts through team collaboration for all phases of the data management life cycle.

Contribute written knowledge and expertise to system documentation, user documentation, scientific manuscripts, reporting, grant proposals and reports, and presentation materials. Stay abreast of broad knowledge of existing and emerging technologies and QC tools in the cancer genomics space.

Use a deep understanding of the data, scientific goals and methodology, and underlying biological and translational concepts in assigned data commons and cloud environments to provide user support in high profile and troubling cases.

Coordinate on user management and issue resolution with functional teams, including, but not necessarily limited to, operations, development, design, bioinformatics, data science, project management, and information security.

Designs new systems, features, and tools. Solves complex problems and identifies opportunities for technical improvement and performance optimization. Reviews and tests code to ensure appropriate standards are met.

Utilizes technical knowledge of existing and emerging technologies, including public cloud offerings from Amazon Web Services, Microsoft Azure, and Google Cloud.

Performs other related work as needed.


Minimum Qualifications

Education:

Minimum requirements include a college or university degree in related field.


Work Experience:

Minimum requirements include knowledge and skills developed through 5-7 years of work experience in a related job discipline.


Certifications:

---

Preferred Qualifications

Education:

Bachelor's degree in Computer Science, Informatics, Bioinformatics, Biological Sciences, or related field.

Masters or doctoral degree in Computer Science, Informatics, Bioinformatics, Biological Sciences, or related field highly preferred.

Experience:

Experience working in data quality and integrity engineering or testing.

Experience with data modeling, analysis, design, development, testing, and documentation.

Experience with data quality standards and practices.

Experience writing and executing data-centric tests cases to validate data.

Experience writing database queries, reading and understanding database queries, and utilizing other database artifacts.

Experience with Python.

Experience working with Linux/Unix systems and basic shell scripting.

Experience with biospecimen and clinical data curation.

Experience with advanced high-throughput genomic technologies.

Experience providing bioinformatics services or support.

Experience using NCI datasets (TCGA, TARGET, and CGCI).

Experience with graph and NoSQL databases.

Preferred Competencies

Ability to lead across a collaborative team environment.

Ability and willingness to acquire new programming languages, statistical and computational methods, and background in research area.

Ability to prioritize and manage workload to meet critical project milestones and deadlines.

Confidentiality related to sensitive matters such as strategic initiatives, trade secrets, quiet periods, and scientific discoveries yet to be put in the public domain.

Ability to take a broad plan and break it into incremental tasks and oversee the completion of each task.

Ability to come into a team used to minimal supervision and oversight and ensure accountability for deliverables and outcomes.

Ability to persuade others to adapt new structures or systems to meet objectives.

Ability to gain the trust of management to gain the authority to successfully coordinate the team.

Working Conditions

Office environment.

Application Documents

Resume (required)

Cover Letter (preferred)


When applying, the document(s) MUST be uploaded via the My Experience page, in the section titled Application Documents of the application.


Job Family

Information Technology


Role Impact

Individual Contributor


Scheduled Weekly Hours

40


Drug Test Required
 

No


Health Screen Required
 

No


Motor Vehicle Record Inquiry Required
 

No


Pay Rate Type

Salary


FLSA Status

Exempt


Pay Range

$98,940.00 - $137,000.00

The included pay rate or range represents the University’s good faith estimate of the possible compensation offer for this role at the time of posting.


Benefits Eligible

Yes

The University of Chicago offers a wide range of benefits programs and resources for eligible employees, including health, retirement, and paid time off. Information about the benefit offerings can be found in the Benefits Guidebook.


Posting Statement
 

The University of Chicago is an Affirmative Action/Equal Opportunity/Disabled/Veterans and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender, gender identity, national or ethnic origin, age, status as an individual with a disability, military or veteran status, genetic information, or other protected classes under the law. For additional information please see the University's Notice of Nondiscrimination.

 

Staff Job seekers in need of a reasonable accommodation to complete the application process should call 773-702-5800 or submit a request via Applicant Inquiry Form.

 

We seek a diverse pool of applicants who wish to join an academic community that places the highest value on rigorous inquiry and encourages a diversity of perspectives, experiences, groups of individuals, and ideas to inform and stimulate intellectual challenge, engagement, and exchange.

 

All offers of employment are contingent upon a background check that includes a review of conviction history.  A conviction does not automatically preclude University employment.  Rather, the University considers conviction information on a case-by-case basis and assesses the nature of the offense, the circumstances surrounding it, the proximity in time of the conviction, and its relevance to the position.

 

The University of Chicago's Annual Security & Fire Safety Report (Report) provides information about University offices and programs that provide safety support, crime and fire statistics, emergency response and communications plans, and other policies and information. The Report can be accessed online at: http://securityreport.uchicago.edu. Paper copies of the Report are available, upon request, from the University of Chicago Police Department, 850 E. 61st Street, Chicago, IL 60637.

Confirm your E-mail: Send Email