This Position is Closed

This job is no longer accepting applications. Check out similar opportunities below or browse all active jobs.

Poshmark logo

Senior Site Reliability Engineer, (Production Excellence)

PoshmarkRedwood City, California, USAEngineering & Technical

Posted 2 weeks ago

Full-Time

Employment Type

Remote

Work Location

About This Role

ABOUT POSHMARK

Poshmark is a leading fashion resale marketplace powered by a vibrant, highly engaged community of buyers and sellers and real-time social experiences. Designed to make online selling fun, more social and easier than ever, Poshmark empowers its sellers to turn their closet into a thriving business and share their style with the world. Since its founding in 2011, Poshmark has grown its community to over 130 million users and generated over $10 billion in GMV, helping sellers realize billions in earnings, delighting buyers with deals and one-of-a-kind items, and building a more sustainable future for fashion. For more information, please visit www.poshmark.com http://www.poshmark.com, and for company news, visit newsroom.poshmark.com http://newsroom.poshmark.com.

SENIOR SITE RELIABILITY ENGINEER (PRODUCTION EXCELLENCE)

We are looking for a Senior Site Reliability Engineer to serve as the guardian of our complex, web-scale ecosystem. You won't just be "managing" systems; you will be the architect of their health, ensuring they are monitored, automated, and designed to scale flawlessly. The ideal candidate is an SRE purist who believes that automation is the antidote to toil and that deep application knowledge is the key to operating large-scale systems.

6-Month Accomplishments

  • Audit & Observe: Deep-dive into the Poshmark tech stack and infrastructure requirements.
  • Automate Toil: Master and improve existing automation tools/frameworks within the CloudOps organization.
  • Primary Integration: Transition from secondary on-call support to a primary contributor on small to medium-scale architectural projects.

12+ Month Accomplishments

  • System Ownership: Execute complex communications and infrastructure projects independently.
  • Precision Alerting: Engineer meaningful alerts and high-fidelity dashboards that reduce "alert fatigue" and focus on system health.
  • Architectural Evolution: Identify systemic gaps and lead the implementation of infrastructure improvements to bolster uptime.
  • Incident Leadership: Serve as a core pillar of the on-call rotation, leading incident response and blameless post-mortems.

Responsibilities

  • Serve as the primary point of accountability for the health, performance, and capacity of mission-critical, internet-facing services.
  • Partner with development teams beginning at the design phase to ensure all platforms are built with "operability" and "recoverability" at their core.
  • Improve and exchange tools that automate the deployment and monitoring of custom applications in large-scale UNIX environments.
  • Thrive in a fast-paced environment where you bridge the gap between "moving fast" and "staying up"
  • Participate in a structured 12x7 on-call rotation designed to maintain 24/7 support for production environments.

Desired Skills

  • Battle-Proven Experience: 5–8+ years in a Systems Engineering or Site Reliability role, specifically within a startup or fast-growing environment.
  • Scale Mastery: Proven track record in a UNIX-based, large-scale web operations role.
  • Production Support Mindset: Extensive experience providing 24/7 support for high-traffic production environments.
  • Cloud Architecture: Expert-level experience with AWS, GCP, or Azure.

- The Sre Toolkit

  • CI/CD & Config: Jenkins, Ansible, and Terraform.
  • Observability: Hands-on experience with Datadog, New Relic, Graphite, or Nagios.
  • Orchestration: Deep knowledge of Kubernetes, Docker
  • Code: Strong scripting/coding skills used for infrastructure-as-code and automation.

Technologies We Use

  • Languages/Servers: Ruby, JavaScript, Node.js, Tomcat, Nginx, HAProxy.
  • Data & Messaging: MongoDB, RabbitMQ, Redis, ElasticSearch.
  • Infrastructure: AWS (EC2, RDS, CloudFront, S3), Kubernetes, Docker.

Note: 1) Poshmark is currently unable to provide visa sponsorship for this position. 2) This is a hybrid role based out of Redwood City, CA.

Save Time & Effort

Apply to Multiple Jobs with AI

Let our AI automatically apply to hundreds of remote jobs on your behalf. Just upload your resume and set your preferences.

500+

Jobs Applied

24/7

Auto-Apply

5 min

Setup Time

Similar Active Opportunities