background_image
  • IMAGE: Return to Main
  • IMAGE: Show All Jobs


Position Details: Senior Site Reliability Engineer - 1259050E

Location: Beaverton, OR
Openings: 1
Job Number:

Share

Description:

Senior Site Reliability Engineer

Every engineering team at Client is responsible for running and operating the software that they build. The Reliability Engineers (SREs) work towards standardizing and supporting all of the rapidly growing teams throughout our organization, assessing their architecture, helping them design scalable services, and fostering excellent operational practices. It's a mission-critical role of ensuring that our systems are always healthy, monitored, automated, and designed to scale.

What makes Reliability Engineering different at Client?

  • It is software engineering! We work on resolving the problems with the mindset on how to ensure they don't happen again. We are looking to enhance Observability into our systems and to automate ourselves out of our jobs.

  • We are engineers that’s either embedded in specific development teams where we drive operational improvements or are part of the CORE Site Reliability Team where we focus on innovating in Observability & Resilience engineering space.

Responsibilities

  • Define roadmap and architecture based on technology and business needs.

  • Build holistic visibility into SLIs, SLOs, SLAs, dependency graphs, past performance of software, network, and system to ensure that we can continue to scale without increasing operational burden or toil.

  • Share your knowledge by giving brown bags, tech talks, and evangelizing appropriate tech and engineering best practices.

  • Build infrastructure and drive projects that break things with the aim to improve the robustness of production systems

  • Use the core Site Reliability Engineering principles of Monitoring, emergency response, capacity planning, and production readiness reviews to run the platform.

  • Step back to observe patterns and develop innovative tools and automation to minimize toil. Use those learnings to drive the best operational practices.

  • Partner with the broader Client organization to build a culture of rigorously learning from incidents.

  • Unblock, support, and effectively communicate across teams to achieve results

  • Diagnose and develop fixes to implement quickly and efficiently for production incidents.

  • Design and implement Observability strategy for Client Global Technology.

Required Skills

  • Proficient in Java with 5+ years’ experience.

  • Experience with JavaScript on frontend (React, Angular, etc.) and backend (Node.js) components.

  • 3 years’ experience in building cloud-based enterprise systems, ideally on AWS.

  • Basic understanding of DNS, Networking, Virtualization, Linux.

  • Experience with Docker/Containers and Serverless patterns.

  • Expertise in designing, debugging and running fault-tolerant large- scalable Distributed systems.

  • Expertise in NoSQL datastore systems to build highly scalable solutions.

  • Experience with messaging (pub-sub) patterns

  • Good understanding of async/non-blocking Restful APIs approaches and frameworks

  • Basic understanding of the following tools: ServiceNow, Jira, Jenkins, Splunk, SignalFx, NewRelic.

  • Good communication skills

Good to have skills

  • Experience with python or Scala

  • Experience with test driven development

  • Background with ITIL or Lean a plus

  • Experience with code instrumentation for adding Metrics & Traces.

  • Demonstrated negotiation and influencing skills.

Education

Requires a Bachelor’s Degree in Computer Science, Engineering, IT or a related field; MBA a plus. Minimum of 2 years of relevant work experience.

Perform an action:

IMAGE: Apply to Position
mautic is open source marketing automation




Powered by: CATS - Applicant Tracking System