This Position is Closed
This job is no longer accepting applications. Check out similar opportunities below or browse all active jobs.
Job Highlights
AI-extracted key information
As a Software Engineer - Pipelines on the Data Stack team, you will build and iterate on the data import system, focusing on creating resilient systems and managing data workflows. This role involves working with Python, Kubernetes, and batch processing to enhance the data stack that serves both users and internal teams.
Software Engineer — Warehouse Pipelines
Posted 4 weeks ago
Full-Time
Employment Type
Remote
Work Location
About This Role
ABOUT POSTHOG
We're shipping every product that companies need https://posthog.com/handbook/why-does-posthog-exist from their first day, to the day they IPO, and beyond. The operating system for folks who build products.
We started with open-source product analytics, launched out of Y Combinator's W20 cohort https://posthog.com/handbook/story. We've since shipped more than a dozen products https://posthog.com/products, including:
- A built-in data warehouse https://posthog.com/docs/data-warehouse, so users can query product and customer data together using custom SQL insights.
- A customer data platform https://posthog.com/docs/cdp, so they can send their data wherever they need with ease.
- PostHog AI, https://posthog.com/max an AI-powered analyst that answers product questions, helps users find useful session recordings, and writes custom SQL queries.
Next on the roadmap are CRM, Workflow, revenue analytics, and support products. When we say every product, we really mean it!
We Are
- Product-led. More than 100,000 companies have installed PostHog, mostly driven by word-of-mouth. We have intensely strong product-market fit.
- Well-funded. We've raised more than $100m from some of the world's top investors https://posthog.com/handbook/strategy/investors. We're set up for a long, ambitious journey.
- Default alive. Revenue is growing 10% MoM on average, and we're very efficient. We raise money to push ambition and grow faster, not to keep the lights on.
We're focused on building an awesome product for end users, hiring exceptional teammates, shipping fast, and being as weird as possible https://posthog.com/deskhog.
WHAT YOU'LL BE DOING
As a Software Engineer on the Warehouse Pipeline team in the Data Stack group, https://posthog.com/teams/data-stack you’ll build and iterate on our data import system.
Our import workers are built in python and we pull in data from APIs and databases in batches, process the data using Apache Arrow in memory, and move the data into object storage in open table formats.
You’ll build and maintain our source library, as we’re looking for creative ways to make our library manageable at scale. You’ll revamp our schema management strategy, and build resilient systems (e.g logging, observability, testing)
You’ll debug stateful data workflows by digging into k8s pod metrics, and schedule jobs using Temporal.io http://Temporal.io. As you can see, there’s a huge breadth of challenges and opportunities to tackle, and nothing is off-limits.
The PostHog Data Stack group provides both a core product for our users and a foundational platform https://posthog.com/blog/data-warehouse-at-posthog for our internal teams. Data is a first-class product at PostHog, not an afterthought.
You will have the chance to push the boundaries of what our Data Stack team can do while ensuring we remain stable and production-ready.
You now know what you’ll be doing, but what about what you’ll need to bring along?
YOU’LL FIT RIGHT IN IF:
- You’re a builder. You bring strong skills in building resilient systems, with experience in Kubernetes, Docker, and S3 at scale. We build in python. Async-python and Temporal.io http://Temporal.io skills are welcome.
- You have hands-on experience with batch processing and modern data formats. We use Arrow to stream data. Experience with Iceberg and/or Delta is welcome, we don’t expect you to have experience with all three (although that would be great)
- You're more than a connector of things. Building pipelines is more than configuring tools to make them work together, it's about actually building the tooling used in data warehousing pipelines. We need you to have experience with building tools versus using off-the-shelf tools
- You bring experience with creating and maintaining data pipelines. You are comfortable with debugging stateful, async data workflows by digging into k8s pod metrics.
- You bring a mix of skills. It’s not just about the Data Pipeline work. You’ll need strong backend skills as we run a complex system.
- You love getting things done. Engineers at PostHog have an incredible amount of autonomy to decide what to work on, so you’ll need to be proactive and just git it done.
- You’re ready to do the best work of your career. We have incredible distribution, a big financial cushion and an amazing team. There’s probably no better place to see how far you can go.
If this sounds like you, we should talk.
We are committed to ensuring a fair and accessible interview process. If you need any accommodations or adjustments, please let us know
WHAT’S IN IT FOR YOU?
Now that we've told you what you'll be building with us, let's talk about what we'll be building for you.
Apply to Multiple Jobs with AI
Let our AI automatically apply to hundreds of remote jobs on your behalf. Just upload your resume and set your preferences.
500+
Jobs Applied
24/7
Auto-Apply
5 min
Setup Time
Similar Active Opportunities
Forward Deployed Engineer
Who Are We? Postman is the world’s leading API platform, used by more than 45 million+ developers and 500,000 organizations, including 98% of the Fort...
About Vercel: Vercel gives developers the tools and cloud infrastructure to build, scale, and secure a faster, more personalized web. As the team behi...
Staff Engineer, Identity
Who Are We? Postman is the world’s leading API platform, used by more than 45 million+ developers and 500,000 organizations, including 98% of the Fort...
