I am a Senior Software Engineer who started her career learning Ruby and Elixir at the same time back in 2017 and fell in love with both languages. After a couple years working with both, and a few more focused on Ruby, I decided to go exclusive with Elixir 4 years ago and haven’t looked back. I am passionate about sharing what I know and learning from others wherever I go, and strongly believe communities are the base for all technology.
Background jobs are a staple in any seasoned engineer’s toolbelt. But in an ecosystem where OTP is readily available and concurrency is a first-class citizen, where exactly does a library like Oban fit? And more importantly, what happens when you hit the “scaling wall” of 10 million jobs a day?
Thinking of background processing at this magnitude requires a shift in strategy. It’s no longer just about using the library, it’s about leveraging advanced database techniques and the full power of Distributed Elixir (Erlang). Drawing from a production-proven implementation, we will explore:
- The “Why”: A critical comparison of Oban vs. pure GenServers vs. Broadway. When is the overhead of a database-backed queue a feature, and when is it a bottleneck?
- The “How”: Solving real-world challenges including strict rate-limiting for brittle external APIs, managing job distribution across a multi-node cluster, and handling dynamic scheduling (from crontabs to self-enqueueing recursive workers).
- The “Cost”: Tactical patterns for maintaining Postgres health under high-volume job churn, focusing on index bloat, pruning, and transactional integrity.
Key Takeaways:
-
- The “When” and “Why” of Background Jobs: A clear mental model for choosing between GenServers (ephemeral/fast), Broadway (external stream processing), and Oban (transactional/reliable).
-
- Specific strategies for managing Postgres health (pruning, indexing, and IOPS) when your job table becomes the busiest part of your database.
-
- Distributed Patterns: How to orchestrate Oban across a Distributed Erlang cluster (e.g., Fly.io), ensuring workers are distributed correctly without overloading specific nodes.
-
- Patterns for handling real world scenarios, like implementing rate-limiters for external APIs and building self-healing recursive workers that can survive third-party downtime.
-
- Dynamic Scheduling: Moving beyond static configs to handle complex business requirements, such as dynamic scheduling and self-enqueueing logic.
Target Audience:
- People working with Elixir in production, possibly dealing with large volumes of data and interested in architecture design.