Performs engineering and systems operational support for business unit-aligned functions including Application Support, Cloud Enablement, Environment Management, Digital tier and Web Operations, Platform Engineering and Basic DMZ Network Operations. Coordinates workflows using Continuous Integration and Continuous Deployment (CI/CD) pipelines and associated technologies. Scripts in PowerShell, Python, and Java. Supports application production using Docker, Cloud computing, and Cloud services platforms like Amazon Web Services (AWS) and Azure. Supports technology issues using Ping, Traceroute, DNS, routing, search, and analytics engines such as Splunk and Datadog. Monitors system performance and error logs to ensure a high level of application availability. Provides technical expertise and assistance for crisis management for major system outages or issues with major impact to the business. Consults with application support and development groups on application problems, new releases, new applications, systems, and infrastructure.
Primary Responsibilities:
• Participates in engineering and builds end-to-end systems environments using Linux and Windows operating systems.
• Implements or configures software to optimize operational efficiency.
• Troubleshoots technical problems, identifies gaps in the end-to-end process, and develops automated solutions to mitigate incidents.
• Creates automated alerts to monitor software performance and application availability.
• Develops queries to monitor logs.
• Creates dashboards for users to monitor application performance and health.
• Develops, creates, and modifies programming language scripts for software application deployment and specialized utility programs.
• Documents procedures, including application flows and dependency diagrams.
• Develops and maintains monitoring and instrumentation capabilities for existing suite of applications.
• Establishes plans for continuous improvement regarding stability and availability.
• Coordinates the adoption of new tools and processes within the team.
• Escalates incidents requiring support to internal teams (development, network, hardware, and vendors) as required.
• Coordinates all activities and provides updates to operations and management.
• Onboards new applications and facilitates the upgrade and modification of existing applications.
• Assists in incident management and problem management to identify root causes and coordinate fixes.
• Creates contingency plans covering key areas of vulnerability within the system.
• Coordinates systems installation and configuration of systems installation and infrastructure changes.
• Develops tools to automate the collection and analysis of operational data.
• Participates in the evaluation of application design to assess their reliability, performance, usage, maintainability, and cost of ownership.
Education and Experience:
Bachelor’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Technology and Management, Information Systems, Mathematics, Physics, or a closely related field and three (3) years of experience as a Senior Systems Engineer (or closely related occupation) providing systems services support for Java/Tomcat applications, Docker containers, Kubernetes, and AWS ECS/EKS platforms.
Or, alternatively, Master’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Technology and Management, Information Systems, Mathematics, Physics, or a closely related field and one (1) year of experience as a Senior Systems Engineer (or closely related occupation) providing systems services support for Java/Tomcat applications, Docker containers, Kubernetes, and AWS ECS/EKS platforms.
Skills and Knowledge:
Candidate must also possess:
• Demonstrated Expertise (“DE”) configuring software deployment tools to enable an end-to-end, Continuous Integration/Continuous Delivery (CI/CD) platform; and modifying programming language scripts for software application deployments and targeted utility purposes using Jenkins, Bitbucket, and IBM UrbanCode Deploy.
• DE providing systems services support for Java/Tomcat applications, Docker containers, Kubernetes, and AWS ECS/EKS platforms; and enabling DevOps and Site Reliability Engineering (SRE) practices and principles in multi-Cloud environments using AWS Kubernetes and Amazon ECS.
• DE configuring internet-facing network, traffic-routing, firewall, and webserver environments, troubleshooting problems for component tiers and operating systems using F5 and VMware NSX Advanced Load Balancer (Avi).
• DE creating and instrumenting monitors and dashboards to track, alert, and present application metrics, traces, and logs using Splunk and Datadog.