Chicago, IL, US
14 hours ago
Lead Site Reliability Engineer
Welcome page Returning Candidate? Log in Lead Site Reliability Engineer Job Locations US-TX-Westlake | US-TX-Austin | US-CO-Lone Tree | US-IL-Chicago Requisition ID 2025-106791 Posted Date 1 day ago(1/8/2025 1:40 PM) Category Engineering & Software Development Salary Range USD $107400.00 - $178900.00 / Year Application deadline 1/17/2025 Position Type Full time Your Opportunity

At Schwab, you’re empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together.

 

The Client Trading Experience Technology team is essential in supporting the operational reliability of real-time trading applications that operate 24x7x365 in locations across the world. We partner with multiple support teams to provide guidance and drive adoption of key reliability engineering practices in support of large-scaled mission-critical trading services. We are looking for skilled candidates enthusiastic about learning new and existing technologies to deliver solutions for the resiliency of our production systems. The role will require a high level of responsibility and accountability yet has a foundational structure for professional development and career growth.   

 

As a Lead Site Reliability Engineer, you will be responsible for proactively preventing production incidents by supporting application releases in our software deployment pipeline. During Blameless Post-mortem, you will have the opportunity to recommend improvements to monitoring and other processes in production and work with respective teams to design and implement the recommendations. Work closely with development teams for symbol updates, server restarts, certificate and system account management and patching coordination when appropriate. Other key responsibilities include return to service activities, on-call rotation, and proactive monitoring.

 

Responsibilities include, but are not limited to:

 

Practice Site Reliability Engineering mindset and solve problems through automation and instrumentation.Partner with the Architects, Development Leads, Business Partners and other SREs in the team, to ensure implementations are architected and designed from the aspect of production resiliency.Perform production support, application deployments and provide a rapid response for critical trading applications.Proactively detect, troubleshoot, report and resolve all issues impacting production applications.Proactively perform system monitoring, including reviewing system and application logsImplement and collaborate on solutions that increase the monitoring and observability of systems at scale.Work with development teams to provide feedback and perform recommended upgrades.Advocate for Schwab’s Reliability Engineering principles, guidelines, and standards.Foster a culture of learning through education and knowledge sharing around reliability practices, processes, and tools.Participate in On-Call escalations during Market and off-hours. What you have

To ensure that we fulfill our promise of “challenging the status quo,” this role has specific qualifications that successful candidates should have:

 

Required Qualifications:

 

5+ years of experience with large-scale enterprise system administration, application support or incident handling.5+ years of experience of RHEL Linux administration or Windows server administration.5+ years of experience with proven track record of supporting enterprise production environment while adhering to various DevOps & SRE frameworks.5+ years of experience building application dashboards for proactive monitoring, setting up Alerts, etc.5+ years of experience with logging/application monitoring tools (AppDynamics, Splunk, Dynatrace, Thousand Eyes).5+ years of experience supporting applications on Cloud operations such as AWS and Pivotal Cloud Foundry (PCF).5+ years of experience using Atlassian tools Jira, Confluence, Bamboo.

 

Preferred Qualifications:

 

Experience researching and building dashboards for Grafana and Prometheus.Strong understanding & experience of Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) such as Pivotal Cloud Foundry (PCF).Experience with Continuous Integration/Continuous Delivery pipelines (CI/CD).Understanding of High Availability Enterprise systems and leveraging tools to automate proactively and eventually predictive availability solutions.Receptive, approachable teammate, with the ability to positively interact with business partners, technology teams, offshore, and professional services.Strong advocate with excellent written and verbal communication skills.

 

In addition to the salary range, this role is also eligible for bonus or incentive opportunities

Options Apply for this jobApplyShareRefer a friendRefer Sorry the Share function is not working properly at this moment. Please refresh the page and try again later. Share on your newsfeed Why work for us?

Own Your Tomorrow embodies everything we do! We are committed to helping our employees ignite their potential and achieve their dreams. Our employees get to play a central role in reinventing a multi-trillion-dollar industry, creating a better, more modern way to build and manage wealth.

 

Benefits: A competitive and flexible package designed to empower you for today and tomorrow. We offer a competitive and flexible package designed to help you make the most of your life at work and at home—today and in the future.   Application FAQs

Software Powered by iCIMS
www.icims.com

Confirm your E-mail: Send Email