This Position is Closed

This job is no longer accepting applications. Check out similar opportunities below or browse all active jobs.

Senior Site Reliability Engineer, (Production Excellence)

PoshmarkRedwood City, California, USAEngineering & Technical

Posted 5 months ago

Full-Time

Employment Type

Remote

Work Location

About This Role

ABOUT POSHMARK

Poshmark is the leading fashion marketplace where style comes alive through discovery, self-expression, and human connection. Powered by a vibrant community of 165 million members, Poshmark brings real people and taste to shopping through a social experience shaped by shared discovery. Buying and selling fashion feels simple, joyful, and personal, while every item tells its own story. Poshmark empowers sellers to grow meaningful businesses, keeps fashion in circulation longer, and gives shoppers access to unique and trusted finds, from everyday pieces to one-of-a-kind vintage and luxury.

SENIOR SITE RELIABILITY ENGINEER (PRODUCTION EXCELLENCE)

We are looking for a Senior Site Reliability Engineer to serve as the guardian of our complex, web-scale ecosystem. You won't just be "managing" systems; you will be the architect of their health, ensuring they are monitored, automated, and designed to scale flawlessly. The ideal candidate is an SRE purist who believes that automation is the antidote to toil and that deep application knowledge is the key to operating large-scale systems.

6-Month Accomplishments

Audit & Observe: Deep-dive into the Poshmark tech stack and infrastructure requirements.
Automate Toil: Master and improve existing automation tools/frameworks within the CloudOps organization.
Primary Integration: Transition from secondary on-call support to a primary contributor on small to medium-scale architectural projects.

12+ Month Accomplishments

System Ownership: Execute complex communications and infrastructure projects independently.
Precision Alerting: Engineer meaningful alerts and high-fidelity dashboards that reduce "alert fatigue" and focus on system health.
Architectural Evolution: Identify systemic gaps and lead the implementation of infrastructure improvements to bolster uptime.
Incident Leadership: Serve as a core pillar of the on-call rotation, leading incident response and blameless post-mortems.

Responsibilities

Serve as the primary point of accountability for the health, performance, and capacity of mission-critical, internet-facing services.
Partner with development teams beginning at the design phase to ensure all platforms are built with "operability" and "recoverability" at their core.
Improve and exchange tools that automate the deployment and monitoring of custom applications in large-scale UNIX environments.
Thrive in a fast-paced environment where you bridge the gap between "moving fast" and "staying up"
Participate in a structured 12x7 on-call rotation designed to maintain 24/7 support for production environments.

Desired Skills

Battle-Proven Experience: 5–8+ years in a Systems Engineering or Site Reliability role, specifically within a startup or fast-growing environment.
Scale Mastery: Proven track record in a UNIX-based, large-scale web operations role.
Production Support Mindset: Extensive experience providing 24/7 support for high-traffic production environments.
Cloud Architecture: Expert-level experience with AWS, GCP, or Azure.

- The Sre Toolkit

CI/CD & Config: Jenkins, Ansible, and Terraform.
Observability: Hands-on experience with Datadog, New Relic, Graphite, or Nagios.
Orchestration: Deep knowledge of Kubernetes, Docker
Code: Strong scripting/coding skills used for infrastructure-as-code and automation.

Technologies We Use

Languages/Servers: Ruby, JavaScript, Node.js, Tomcat, Nginx, HAProxy.
Data & Messaging: MongoDB, RabbitMQ, Redis, ElasticSearch.
Infrastructure: AWS (EC2, RDS, CloudFront, S3), Kubernetes, Docker.

Note: 1) Poshmark is currently unable to provide visa sponsorship for this position. 2) This is a hybrid role based out of Redwood City, CA.

Save Time & Effort

Apply to Multiple Jobs with AI

Let our AI automatically apply to hundreds of remote jobs on your behalf. Just upload your resume and set your preferences.

Try It Now

500+

Jobs Applied

24/7

Auto-Apply

5 min

Setup Time

Similar Active Opportunities

Forward Deployed Engineer - AI Solutions Engineering

Aircall

Seattle Office

Salary not specified

Aircall is a unicorn, AI-powered customer communications platform used by 22,000+ companies worldwide to drive revenue, resolve issues faster, and sca...

4 months ago

View Details Let AI apply for you

Senior Software Engineer, Data Management

Amplitude

San Francisco, CA

$190k - $286k

Amplitude is the leading AI analytics platform, helping over 4,700 customers—including Atlassian, Burger King, NBCUniversal, and Square—build better p...

4 months ago

View Details Let AI apply for you

Director, Engineering (Interventions)

CloverHealth

Remote - USA

$223k - $290k

We are transforming healthcare and improving patient care with our innovative primary care platform. By supporting Primary Care Physicians (PCPs), we ...

1 week ago

View Details Let AI apply for you

Browse All Categories