Cupertino, CA, 95015, USA
19 hours ago
Site Reliability Engineer
Description: The main objective of this role will be to support the ramp up of Task queue and Pulsar for a platform. The role will eventually support the upgrade and migration of said tool with Pulsar to enhance features for internal users. WHAT Proactive capacity monitoring: periodic check on growth and utilization metrics to predict when additional resources will be needed, or when existing resources can be decommissioned or shrunk. For example, if we see a sustained growth in CPU utilization in proprietary tool boxes and it's predicted to reach 100% in 3 months, SRE would help detect that and procure and provision additional hardware. Another example is when the C* SRE communicates the dev team that the storage of one of their clusters is above 60% and it's expected to go 100% in 3 months. WHY This gives an opportunity to the dev team to work on reclaiming some space or to confirm that additional space needs to be provided. This proactive monitoring will apply to all platforms offered by the Workflow Platform. In the case of K8s workloads, SRE will work with Platform team to automate the autoscaling capabilities of those platforms. HOW If migration from Kube to Kubernetes will make autoscaling possible, the SRE team will help the Platform team migrate platforms there. Reactive 24/7 1st layer support: this is the traditional on-call 1st layer support in which SRE partners get paged when there's an incident and they reach out to the Dev team if run books are not available or are not enough to solve the issue. Depending on the platform and use cases onboarded, SLAs for the on-call support will vary. Task Queue is expected to onboard an App Store use case for Crystal C for which SLAs may become more aggressive. What you will be doing * Pulsar ramp-up: initial work to ramp up pulsar operation in production * Get knowledge transfer and ownership of the Pulumi recipes built by the Platform team to bring up Pulsar clusters in Cloud accounts. * Setup alerts and write runbooks for each of them. * Help load test, stress test and fine tune Pulsar properties * Write K8s operators to autoscale pulsar infrastructure. * Adapt Pulumi recipes to eventually deploy Pulsar clusters in proprietary cloud Top Skills Details: Experience/Familiarity with the following: -Platforms: Kubernetes, virtualization, Linux OS -Dependencies: Pulsar, Kafka, Cassandra, common cloud services (Object Store, Load Balancing) -Scripting Languages: Java, Python, Typescript or golang -Automation Tools: Terraform, Pulumi, Helm Charts, Spinnaker, Puppet, Kustomize -Business Continuity: Multi-Region design patterns, Blast Radius reduction, GSLB/GTM concepts, Alerting & Incident Management Experience Level: 6+ years of experience needed. Additional Benefits: + Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following: + Medical, dental & vision + Critical Illness, Accident, and Hospital + 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available + Life Insurance (Voluntary Life & AD&D for the employee and dependents) + Short and long-term disability + Health Spending Account (HSA) + Transportation benefits + Employee Assistance Program + Time Off/Leave (PTO, Vacation or Sick Leave) *This posting will close on September 20th. About TEKsystems: We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company. The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.
Confirm your E-mail: Send Email