Site Reliability Optimization Analyst/Lead /Scientist
Movius Interactive
Job Title Site Reliability Optimization Analyst/Lead /Scientist Created 16-Sep-2024 Department SRE Revised Job Summary As Analyst/ Lead /Scientist, you will play a pivotal role in optimizing our SRE team's ability to proactively identify and resolve issues within our voice, text messaging platform infrastructure. You will leverage your expertise in AI/ML and data analysis to model system behavior, flag anomalies, and analyze large-scale datasets to drive data-driven optimization and ensure effective utilization of ML capabilities. Key Duties & Responsibilities 1 AI/ML Model Development: Develop and implement AI/ML models to analyze system behavior, identify anomalies, and predict potential issues. 2 Collaborative Problem Solving: Work closely with SRE and DevOps teams to identify data logs, analyze system behavior, and develop AI/ML models to address issues. 3 Data Analysis and Modeling: Conduct in-depth analysis of large-scale datasets (SQL, NoSQL) to extract valuable insights and build predictive models. 4 Anomaly Detection: Develop robust anomaly detection algorithms to flag unusual system behavior and prevent potential disruptions. 5 Data-Driven Optimization: Optimize system performance and resource allocation based on data-driven insights and AI/ML recommendations. 6 ML Capability Utilization: Ensure effective integration and utilization of ML capabilities across the SRE team to enhance operational efficiency and reliability. 7 Telemetry Data Analysis: Analyze large datasets of telemetry data from various sources (e.g., call logs, performance metrics, system logs) to identify patterns, trends, and anomalies. 8 Alerting Optimization: Develop and refine alerting rules based on data-driven insights to ensure timely notification of critical issues and minimize alert fatigue. 9 Proactive Issue Identification: Leverage data analysis techniques and AI/ML models to proactively identify potential system issues or outages before they occur. 10 Root Cause Analysis: Investigate and analyze incidents to identify root causes and implement preventive measures. 11 Data Visualization: Create clear and informative visualizations to communicate findings to stakeholders and facilitate decision-making. Qualifications/Skills/Abilities Minimum Requirements Formal Education Bachelor's degree in computer science, Information Technology, or a related field with specialization in data science (or equivalent experience). Experience (type & duration) 5+ years of experience in data analysis or data science, preferably in a technical or engineering environment. Telecom domain experience is good to have. Skills Proficiency in data analysis tools (e.g., SQL, Python, etc.,). Strong understanding of statistical concepts and techniques. Experience with data visualization tools (e.g., Tableau, Power BI, kibana, graphana, DOMO). Familiarity with cloud-based infrastructure and applications (e.g., AWS, Azure, GCP). Ability to work effectively in a fast-paced, collaborative environment. Experience working with large datasets, log analysis, and tools like Elastic and Domo (or similar) will be a significant advantage. Strong knowledge of AI/ML algorithms and frameworks (e.g., TensorFlow, PyTorch). Experience with anomaly detection techniques and tools. Accreditation/certifications/licenses Preferred: Advanced degree in data science or a related field. Experience in the telecom domain. Certification in data science or machine learning.
Confirm your E-mail: Send Email
All Jobs from Movius Interactive