Job Description:
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day.
One of the keys to driving Responsible Growth is being a great place to work for our teammates around the world. We’re devoted to being a diverse and inclusive workplace for everyone. We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical, emotional, and financial well-being.
Bank of America believes both in the importance of working together and offering flexibility to our employees. We use a multi-faceted approach for flexibility, depending on the various roles in our organization.
Working at Bank of America will give you a great career with opportunities to learn, grow and make an impact, along with the power to make a difference. Join us!
Job Description:
This job is responsible for providing front-line support to end users, responding to issues related to incidents and problem management governance for multiple applications, and leading triage activities on all business impacting incidents. Key responsibilities include ensuring compliance with incident management and problem management policies and procedures, serving as a focal point for the customer, client, and associate experience, restoring complex production incidents under tight Service Level Agreements, and pursuing root cause and problem resolution follow ups.
Responsibilities:
Leads production support triage efforts, manages bridge line troubleshooting, engages in technical research, and escalates issues to leadership as needed
Ensures all impacts are accurately recorded and documented in the system of record, oversees that documents and wikis are updated and available for use during triage, and supports the documentation of application flows, upstream/downstream impacts during outages, the customer experience, and contacts for support needs
Identifies and/or validates business impacts through interpretation of monitors, dashboards, and logs to communicate with leadership and vendors
Manages activities to identify incident root cause, resolution, preventative actions, and change requests, and reports on incident data quality
Promotes and enforces production governance during triage/testing and identifies production failure scenarios, vulnerabilities, and opportunities for improvement
Serves as a subject matter expert for applications within a portfolio, leveraging extensive knowledge of application functionalities and application flows
Assesses and prioritizes research requests, ad hoc reports, and offline incidents at the direction of senior team members and delegates work as needed to team members and peers
Required Qualification :
Experience architecting a large-scale production database platform.
5 + year of experience on data base management.
Strong Knowledge of Postgres production and contingency peplication feature and configuration.
Strong Knowledge of Postgres HA Clustered environment.
Strong hands-on experience on failover/migration and data restores in a HA environment.
Support Database patching and ability to provide continuous support for the Application.
Proficient in handling crontabs and data backups using pgBackRest.
Proficient in handling Kubernetes/Open Shift cluster for PostgreSQL.
Creating and maintaining documentation, troubleshooting playbooks, testing failover and recovery plans.
Perform regular database maintenance tasks.
Ability to write ansible playbooks.
PostgreSQL DBA experience in a 24x7 production environment.
Desired Qualifications :
Proficient in one of the following scripting languages: Python, Bash
Skills:
Production Support
Risk Management
Automation
Collaboration
Innovative Thinking
Solution Design
Solution Delivery Process
Stakeholder Management
Shift:
1st shift (United States of America)Hours Per Week:
40