Menlo Park, CA, USA
14 days ago
NAGIOS ENGINEER
Job Seekers, Please send resumes to resumes@hireitpeople.com

Must Have Skills:

Experience in Unix or Python Scripting Exp. in DevOps technologies such as Jenkins / Ansible / puppet / Docker / Kubernetes

Job Responsibilities:

Operational Performance & Stability: Works with various team to ensure that the in-scope applications/platforms are meeting performance and stability requirements. Managing Major Incidents to Mitigation/Resolution. Problem Management: Performs Post-Incident Reviews of all Major Incidents and determining Action Items required to avoid similar issues/minimize downtime for future Incidents. Monitors and Metrics: Works with Application Development to ensure that assigned applications/platforms have the appropriate monitoring and metrics in place to appropriately measure performance and stability. Identify Functional and Non-Functional Improvements: Acts as the Operations representative in Value Stream planning and prioritizes sessions to ensure that Operational needs of assigned applications/platforms are addressed as needed. Holds quarterly Operational Performance Reviews with Value Stream management. Release Planning & Coordination: Works with SCM and Development team to ensure that the Production releases for their in-scope applications/platforms are properly planned and coordinated. This includes Holds Change/Release implementation reviews to ensure thorough and appropriate implementation plans. Provides review and sign-off/approval of change tickets for the assigned Value Stream. Participates in Program Increment Planning Sessions as a liaison for Operations and Infrastructure support. Provides information regarding upcoming critical changes to the Value Stream. Operational Readiness: Ensures that applications/platforms are Operationally ready for Production. This includes Annual Review of all SOPs/Knowledge Articles. Monitors review for any new Feature launch or other significant change that may impact monitoring. SOP/Knowledge Article review for any new Feature launch or other significant change that may impact support documentation. Training of Command Center and Application 1st level Support on new SOPs, Knowledge Articles, and any other support-related needs. Performs Monthly Capacity Analysis of applications/platforms within the Value Stream. Creates and Maintains Operationally focused ELK Dashboards for the Value Stream.

Responsibilities you would expect the Subcon to shoulder and execute*:

Actively provide data for and participate in root cause analysis. Share knowledge globally between various teams. Analyze systems and make recommendations to prevent possible incidents. Strive for continuous improvement and make recommendations.
Confirm your E-mail: Send Email