As a Cloud Service Reliability Engineer, you will drive the execution and evolution of our Cloud Service Quality (CSQ) framework with a
strong emphasis on service reliability, Cloud operations, and best-in-class customer experience. Your role involves implementing Cloud
Service Quality program components across all CPL Cloud SaaS products, collaborating with cross-functional teams to understand and
evaluate Cloud Service risks, monitor reliability standards, and measure service performance.
Key Responsibilities:
• Service Reliability: Implement Cloud Service Quality program components to provide an objective and measurable
assessment of cloud service health, as well as to identify best practices to improve operational excellence.
• Incident Management: Lead post-incident analysis to continuously improve the reliability and quality of Cloud services by
conducting root cause analysis, implementing corrective and preventative actions for incidents affecting service performance,
and ensuring minimal service disruption during outages.
• Data Analytics and KPIs/Metrics: Develop, maintain, and conduct data analytics by defining and implement insightful business
metrics, key performance indicators (KPIs) and dashboards using PowerBI. Monitor KPIs for service resiliency (SLA, Mean
Time, Root Cause) and service delivery to inform strategic decisions and drive improvements, including analyzing operational
data to enhance cloud performance.
SLI/SLO Implementation: Provide expertise to assist teams in identifying and implementing effective Service Level Indicators
(SLIs) and Service Level Objectives (SLOs) to align with business goals and user experience, with a focus on Cloud
operational metrics.
• Managed Supplier Program: Assist in implementing a supplier relationship program for critical cloud service providers, defining
firm metrics/targets for responsiveness, root cause analysis (RCA), prevention, measuring supplier performance, and setting
clear expectations for maintenance and issue resolution, including collaboration with suppliers to enhance operational
reliability.
• Collaboration: Collaborate with cross-functional teams to understand and evaluate cloud service risks, providing
recommendations to enhance resilience and performance.
• Continuous Improvement: Monitor and track progress of continuous improvement actions in both service reliability and Cloud
operational practices, ensuring their effective implementation.
• Reporting: Participate in management meetings and provide quality related updates and insights to the management team.
Secondary Responsibilities:
• Software Quality Support: Contribute to implementing software quality program components and maintaining quality standards
across our software products.
• PowerBI Maintenance: Support the maintenance of PowerBI visualizations and reports related to software quality metrics.
Qualifications:
• Bachelor’s degree in computer science, engineering, or a related field.
• Proven experience in Cloud Service reliability engineering or a similar role.
• Knowledge of Cloud platforms (e.g., AWS, Azure, GCP) and understanding of Cloud operations best practices.
• Proficiency in PowerBI, data analytics, scripting or programming.
• Familiarity with QA methodologies, such as DevOps, Scaled Agile, and CI/CD models.
• Excellent problem-solving and communication skill
Education
• Bachelor’s degree (or similar) with a concentration in a discipline that focuses on problem-solving, data-analytics, cloud
service quality, Information Systems, or equivalent experience.
Competencies
• Data-driven decision-making and visualization.
• Microsoft Office Suite: Word, PowerPoint, Excel, PowerBI.