Position Purpose:
The Software Engineer Principal, Reliability Engineering, is responsible for establishing standards for designing and delivering resilient software systems, and for working with product teams to guide the implementation of these standards. In establishing these standards, Software Engineer Principals, RE, collaborate with other Software Engineer Principals and the broader Enterprise Architecture organization to construct standards that meet long-term organizational goals. Software Engineer Principals, RE, are knowledgeable both about general reliability patterns and the specific domain areas they work with. They provide mentoring to more junior engineers to grow their capabilities.
Key Responsibilities:
70% Delivery & ExecutionCollaborates and pairs with other product team members (UX, engineering, and product management) to create secure, reliable, scalable software solutionsDocuments, reviews and ensures that all quality and change control standards are metWrites custom code or scripts to automate infrastructure, monitoring services, and test casesWrites custom code or scripts to do "destructive testing" to ensure adequate resiliency in productionCreates meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactivelyContributes to enterprise-wide tools to drive destructive testing, automation, or engineering empowermentIdentifies product enhancements (client-facing or technical) to create a better experience for the end usersIdentifies unsecured code areas and implements fixes as they are discovered with or without toolingIdentifies, implements, and shares technical solutions that can be used across the organizationCreates and architects foundational code elements that can be reused many times by a productCreates meaningful architecture diagrams and other documentation needed for security reviews or other interested partiesDefines Service Level Objectives for product to constantly measure their reliability in production and help prioritize backlog work20% Support & Enablement:Fields questions from other product teams or support teamsMonitors tools and participates in conversations to encourage collaboration across product teamsProvides application support for software running in productionProactively monitors production Service Level Objectives for productsWorks with vendors and the open-source community to help identify and implement feature enhancements in software productsWorks with other product teams to create API specifications and contracts for shared dataProactively reviews the performance and capacity of all aspects of production: code, infrastructure, data, and message processingTriages high priority issues and outages as they arise10% Learning:Participates in and leads learning activities around modern software design and development core practices (communities of practice)Learns, through reading, tutorials, and videos, new technologies and best practices being used within other technology organizationsAttends conferences and learns how to apply new technologies where appropriateDirect Manager/Direct Reports:
Typically reports to the Software Engineer Manager or Sr. Manager, Technology Director or Sr. Director.Travel Requirements:
Typically requires overnight travel less than 10% of the time.Physical Requirements:
Most of the time is spent sitting in a comfortable position and there is frequent opportunity to move about. On rare occasions there may be a need to move or lift light articles.Working Conditions:
Located in a comfortable indoor area. Any unpleasant conditions would be infrequent and not objectionable.Minimum Qualifications:
Must be eighteen years of age or older.Must be legally permitted to work in the United States.Mastery of an object oriented programming language (preferably Java)Must be legally permitted to work in the United StatesPreferred Qualifications:
6-8 years of relevant work experienceMastery of effective database selection and data modeling within both SQL and NoSQL paradigmsMastery of production systems design including High Availability, Disaster Recovery, Performance, Efficiency, and SecurityMastery of a cloud computing platform and the associated automation patterns they provide (preferably Google Cloud and Terraform)Mastery of a modern container orchestration platform (preferably Kubernetes)Mastery of modern observability tooling (preferably Prometheus and OpenTelemetry)Mastery of modern scripting language (preferably Typescript or Python)Mastery of writing SQL queries against a relational databaseMastery of a version control systems (preferably Git)Proficient in a CI/CD toolchain (preferably GitHub Actions)Proficient in query optimization and troubleshootingProficient in destructive testing methodologies and toolsProficient in a modern web application frameworkProficient in defensive coding practices and patterns for high availabilityProficient in modern microservice-based architectures and methodologiesProficient in successful application of design patternsProficient in test-driven development and effective unit test creationExperience with a front-end technology and framework such as HTML, CSS, JavaScript, AngularJS, or ReactExperience in working with 12-factor methodology and understanding its benefits, and able to demonstrate appropriate patterns to more junior team membersMinimum Education:
The knowledge, skills and abilities typically acquired through the completion of a bachelor's degree program or equivalent degree in a field of study related to the job.Preferred Education:
No additional educationMinimum Years of Work Experience:
6Preferred Years of Work Experience:
No additional years of experienceMinimum Leadership Experience:
NonePreferred Leadership Experience:
NoneCertifications:
NoneCompetencies:
Action Oriented: Taking on new opportunities and tough challenges with a sense of urgency, high energy and enthusiasmBusiness Insight: Applying knowledge of business and the marketplace to advance the organization's goalsCollaborates: Building partnerships and working collaboratively with others to meet shared objectivesCommunicates Effectively: Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiencesCultivates Innovation: Creating new and better ways for the organization to be successfulDrives Results: Consistently achieving results, even under tough circumstancesGlobal Perspective: Taking a broad view when approaching issues; using a global lensInterpersonal Savvy: Relating openly and comfortably with diverse groups of peopleManages Ambiguity: Operating effectively, even when things are not certain or the way forward is not clearManages Complexity: Making sense of complex, high quantity, and sometimes contradictory information to effectively solve problemsNimble Learning: Actively learning through experimentation when tackling new problems, using both successes and failures as learning fodderOptimizes Work Processes: Knowing the most effective and efficient processes to get things done, with a focus on continuous improvementSelf-Development: Actively seeking new ways to grow and be challenged using both formal and informal development channelsSituational Adaptability: Adapting approach and demeanor in real time to match the shifting demands of different situations