Who We Are
Bloomberg is the global leader in business and financial data. Providing real-time and historical market data to our customers – reliably, accurately, and quickly – is at the heart of what we do, and the Ticker Plant system is the core that makes it happen. Our system processes hundreds of billions of unique market events every single day. We ingest and process events from hundreds of exchanges and thousands of other financial institutions, 24 hours a day, around the world, on millions of financial instruments across all asset classes, including stocks, bonds, commodities, currencies, and crypto. We disseminate corresponding updates to our clients in real-time, after the events have been normalized and enriched by our systems. In addition, we respond to billions of requests for current snapshot and historical data every day, retrieved from our petabytes of recorded market history, to which we add terabytes of new data to.
The SRE team is central to Ticker Plant's success! We are engineers whose expertise centers on the emergent properties of a large-scale, distributed, real-time market data system. Our mission aligns with our customers' expectations, and we focus on the characteristics of the system they care about, namely:
Correctness - the data a customer sees should accurately reflect the marketplace
Performance - real-time latencies should be minimized; requests should be served without delay
Availability - System components will fail; in a sufficiently large system, parts of it fail all the time. But the system as a whole should not fail.
At the scale at which we operate, we cannot achieve these goals without sophisticated monitoring, proactive management, and automated response mechanisms. Thus, we concern ourselves with latency analysis, capacity management, cluster organization, deployment and configuration, fault tolerance, and telemetry. In addition to developing software, we also advise our partner component teams on the development of resilient software, and we analyze and fix system failures as they happen.
What's in it for you:
Design and develop predictive data models for our system capacity
Build systems capable of early detection of issues through metrics and signals, and develop automated correction and remediation strategies
Develop Python/C++ services, libraries and tools that implement our designs
Proactively scale our services to stay ahead of ever-increasing market data demands by driving capacity planning, instrumentation and performance analysis
Define service level objectives and apply them to drive measurable service improvement
Manage entire projects, including meeting with partners, and build implementation plans
Share your accomplishments at internal forums and speak at industry conferences (e.g. SRECon)
We’ll trust you to:
Code – to read, debug, and write production-quality code.
Design – write code that integrates with components across the entire system, often in collaboration with component teams. This involves assessing workflows and designing appropriate interfaces that provide consistent access to the vital functionality, and then building the applications that can perform many workflows.
Analyze – SRE is concerned with the behavior of our system. We are often asked to consider the impact of potential changes prior to production or analyze causes to why the system is not behaving as expected.
You’ll need to have:
4+ years working with an object-oriented programming language (C/C++, Python, Java, etc.)
A Degree in Computer Science, Engineering, Mathematics, similar field of study or equivalent work experience
An understanding of Computer Science fundamentals such as data structures and algorithms
Prior contributions to system design and architecture and scaling fault-tolerant, distributed systems
Honest approach to problem-solving, and ability to collaborate with peers, partners, and management
We’d love to see:
Comfortable with data analysis and quantifying decision-making process
Monitoring - assessing system health and performance, understanding SLIs and SLOs and alerting mechanisms
Distributed systems - heterogeneity, fault tolerance, network and node failure, local inconsistencies (delays in convergence of shared state)
Cluster management - clusters, deployments, staging, configuration management, A/B testing
Workflow automation through orchestration
Operating systems - processes, threads, and scheduling, file systems, memory management, performance tuning; knowledge of Linux or other POSIX-based system is especially useful
Salary Range = 160000 - 240000 USD Annually + Benefits + BonusThe referenced salary range is based on the Company's good faith belief at the time of posting. Actual compensation may vary based on factors such as geographic location, work experience, market conditions, education/training and skill level.
We offer one of the most comprehensive and generous benefits plans available and offer a range of total rewards that may include merit increases, incentive compensation, [Exempt roles only], paid holidays, paid time off, medical, dental, vision, short and long term disability benefits, 401(k) +match, life insurance, and various wellness programs, among others. The Company does not provide benefits directly to contingent workers/contractors and interns.