Paris, France
7 days ago
Site Reliability Engineer, AI Platform

Algolia was built to help users deliver an intuitive search-as-you-type experience on their websites and mobile apps. We provide a search API used by thousands of customers in more than 100 countries. Billions of search queries are answered every month thanks to the code we push into production every day.

Join the AI Platform: Building Core components to speed up AI delivery

The AI Platform is dedicated to enable AI product delivery by providing other teams with turnkey tools, frameworks, and features so that they can focus on their core business instead of redundant work that falls outside their expertise. The areas covered by the AI Platform are two-fold: allowing teams to quickly design new models (AI development) and generating and serving predictions in production (AI productionization).

We’re looking for problem solvers with an entrepreneurial mindset—people who focus on outcomes and use data to drive decisions. If you're passionate about reliability, scalability, and automation, and want to contribute to a platform that powers AI at scale, we’d love to hear from you!

The team is composed of a variety of roles ranging from Site Reliability Engineer to Machine Learning specialists with a strong focus on Data Engineering, most of whom are fully remote, with different skill sets and backgrounds. Your experience, your knowledge and your perspective will add to this diversity and help the team deliver products that make a difference.

Day to day you will: Implement, maintain, and improve the infrastructure that powers the AI Platform Ensure the reliability and performance of Kubernetes-based deployments across cloud providers (GCP, AWS, Azure) Develop and maintain infrastructure as code Optimize CI/CD pipelines and deployment processes Enhance monitoring, observability, and alerting systems Contribute to incident response and post-mortem analysis You might be a fit if you have: Hands-on experience with Kubernetes and container orchestration in production environments Experience with cloud providers (GCP, AWS, or Azure) Experience with automation and infrastructure as code (e.g., Terraform) Solid knowledge of CI/CD pipelines and deployment automation Familiarity with monitoring and observability tools (e.g., Datadog) A problem-solving mindset and a proactive approach to improving system reliability Excellent spoken and written English skills Ideally, you would also have: Programming skills in Go and/or Python Exposure to incident response and on-call best practices We’re looking for someone who can live our values: GRIT - Problem-solving and perseverance capability in an ever-changing and growing environment TRUST - Willingness to trust our co-workers and to take ownership  CANDOR - Ability to receive and give constructive feedback. CARE - Genuine care about other team members, our clients and the decisions we make in the company. HUMILITY- Aptitude for learning from others, putting ego aside.

#LI-Remote

Confirm your E-mail: Send Email
All Jobs from Algolia