See all roles

Site Reliability Engineer 3

Work from home Full-time role Hiring

About the position The Site Reliability Engineering team designs and builds the global infrastructure on which MongoDB deploys its services, focusing on the flagship MongoDB Atlas platform. As customers grow and globalize, services must satisfy demands for low-latency requests around the globe and comply with various data sovereignty requirements. The SRE Team’s mission is to build this increasingly complex infrastructure, while continually lowering the operational burden associated with it and increasing internal visibility into the health of the system. They are strong believers in infrastructure-as-code and self-healing systems. The SRE Team is fully integrated with all other engineering teams, working closely together with a soft and traversable boundary between their areas of responsibility. Candidates based in New York City are sought for a hybrid working model. MongoDB is built for change, empowering customers and people to innovate at the speed of the market. They have redefined the database for the AI era, enabling innovators to create, transform, and disrupt industries with software. MongoDB’s unified database platform—the most widely available, globally distributed database on the market—helps organizations modernize legacy workloads, embrace innovation, and unleash AI. Their cloud-native platform, MongoDB Atlas, is the only globally distributed, multi-cloud database and is available across AWS, Google Cloud, and Microsoft Azure. With offices worldwide and nearly 60,000 customers—including 75% of the Fortune 100 and AI-native startups—relying on MongoDB for their most important applications, they are powering the next era of software. MongoDB's Leadership Commitment guides decisions, interactions, and success. To drive personal growth and business impact, they are committed to developing a supportive and enriching culture for everyone, offering employee affinity groups, fertility assistance, and a generous parental leave policy to support employee wellbeing and professional and personal journeys. MongoDB is committed to providing necessary accommodations for individuals with disabilities within their application and interview process and provides equal employment opportunities to all employees and applicants.

Responsibilities

  • Design and build the infrastructure for a global cloud service that comprises hundreds of thousands of MongoDB clusters, processes a billion metrics per day, and replicates tens of billions of database writes to our backup service
  • Design, implement, and troubleshoot the automation and monitoring of services that seamlessly spans the globe - including several cloud providers
  • Become an expert in infrastructure performance, helping us optimize from the application level all the way through the firmware
  • Build for resilience. Our goal is that nobody’s pager goes off, ever. Are we there yet? No. Are we really close? Very. While we work on that - participate in a weekly on-call rotation
  • Improve our infrastructure capabilities, optimizing for cost, simplicity, and maintainability

Requirements

  • 3+ years of experience running a mission critical service at scale in a Linux environment
  • Firm grasp of at least one modern programming language, beyond basic scripting
  • Familiarity with web and network protocols and standards (HTTP, TLS, DNS, etc)
  • Bachelor’s degree in Computer Science or equivalent experience
  • Experience writing automation tools & eagerness to "automate all the things"

Nice-to-haves

  • Experience building large applications from scratch, complete with CI/CD infrastructure
  • Experience in networking, security, hardware or OS performance tuning
  • Experience with at least one of the major cloud providers (Amazon Web Services, Google Compute, Microsoft Azure)
  • Experience managing kubernetes clusters or some other container orchestration infrastructure
  • Experience with observability of large scale distributed systems

Benefits

  • fertility assistance
  • generous parental leave policy
  • employee affinity groups

Apply tot his job Apply To this Job

You might like

Sr. Site Reliability Engineer

Work from home Full-time role

Senior SRE - INTL MX

Work from home Full-time role

Junior Site Reliability Engineer

Work from home Full-time role

SRE (Kubernetes)

Work from home Full-time role

Principal SRE

Work from home Full-time role

Site Reliability Engineer (SRE) – II

Work from home Full-time role

Site Reliability Engineer III

Work from home Full-time role

Intermediate Site Reliability Engineer, Environment Automation

Work from home Full-time role

Lead Site Reliability Engineer (GCP & Hybrid Cloud) Hybrid

Work from home Full-time role

Senior Infrastructure Engineer/SRE

Work from home Full-time role

Experienced Online Chat Specialist – Customer Service Representative

Work from home Full-time role

Commercial Title Officer (Remote)

Work from home Full-time role

Experienced Full Stack Data Entry Specialist – Remote Operations Support

Work from home Full-time role

Experienced Customer Service Representative – Remote Live Chat Support Specialist at arenaflex

Work from home Full-time role

Experienced Bilingual Customer Service Representative – La Linea Level 7-1 (English & Spanish Only)

Work from home Full-time role

Experienced Customer Service Specialist – Entry-Level Position with Career Growth Opportunities at arenaflex

Work from home Full-time role

Experienced Customer Service Representative (Part-Time) – Delivering Exceptional Support for arenaflex Customers

Work from home Full-time role

Experienced Part-Time Remote Customer Service Representative – Flexible Work Solutions

Work from home Full-time role

Travel Scheduler & Workflow Planner

Work from home Full-time role

Growth Acquisition Specialist Senior ‍♂️ [LATAM]

Work from home Full-time role