Scaling FHIR Data Processing with Promise Pools and Redis

When you're working with healthcare data, scale becomes a major challenge very quickly. At Flexpa, we connect to hundreds of healthcare systems to retrieve patient records. Each system has its own quirks, rate limits, and capabilities.

In this post, I'll share how we built a system that reliably processes healthcare records from our network of hundreds of payers using two key technologies: Promise Pools for controlled concurrency and Redis for distributed job processing.

The Challenge: Processing FHIR Data at Scale

Healthcare data exchange using FHIR comes with several inherent challenges - we call them the four Vs:

Volume: A single patient can have thousands of records across different resource types (Medication, Condition, Procedure, ExplanationOfBenefit, etc.)
Variety: FHIR defines over 100 resource types, each with its own schema and relationships
Velocity: Healthcare APIs impose strict rate limits that vary widely across payers, requiring careful management of request speed
Variance: FHIR server implementations vary significantly in their capabilities, adherence to standards, and performance characteristics

These "Four Vs" are compounded when working with multiple payers simultaneously. Each FHIR server implements the specification differently, supports different subsets of resources, and handles operations with varying levels of reliability.

Our solution needed to efficiently handle these FHIR-specific challenges while maintaining reliability, scalability, and performance.

Flexpa's Extract Architecture

Our jobs system distributes work across multiple servers through Redis queues, but the real magic happens in our Promise Pool implementation:

┌──────────────────────────────────────────────────────────────────┐
│                FHIR Record Extract Architecture                  │
└──────────────────────────────────────────────────────────────────┘

┌─────────────────┐    ┌───────────────────────────────────────────┐
│   Redis Queue   │    │            Worker Process                 │
│                 │    │  ┌─────────────────┐   ┌────────────────┐ │
│  • Jobs         │    │  │  Promise Pool   │   │ Resource Map   │ │
│  • Rate Limits  │◄══►│  │                 │   │                │ │
│  • Job State    │    │  │ Concurrency: 5  │══►│ Patient/1      │ │
└─────────────────┘    │  │ Auto-retry      │   │ Condition/A    │ │
                       │  │ Deduplication   │   │ Medication/X   │ │
                       │  └─────────────────┘   └────────────────┘ │
                       │          ║                    ▲           │
                       │          ║                    ║           │
                       │          ▼                    ║           │
                       │  ┌─────────────────┐          ║           │
                       │  │  FHIR Servers   │          ║           │
                       │  │                 │          ║           │
                       │  │ • $everything   │══════════╝           │
                       │  │ • Search        │                      │
                       │  │ • References    │                      │
                       │  └─────────────────┘                      │
                       └───────────────────────────────────────────┘
                                     ║
                                     ║
                                     ▼
┌──────────────────────────────────────────────────────────────────┐
│                      Healthcare Payers                           │
│                                                                  │
│  • Different rate limits                                         │
│  • Varying FHIR implementations                                  │
│  • Custom capabilities                                           │
└──────────────────────────────────────────────────────────────────┘

We use BullMQ, a Redis-based queue for Node.js, to manage job processing across multiple distributed worker processes. These workers handle different types of jobs including patient data consents.

At the heart of our architecture is our custom Promise Pool implementation for controlled concurrency when making API calls, coupled with an Extract-Transform-Load pipeline specialized for FHIR data.

This architecture allows us to scale horizontally by adding more worker processes while maintaining tight control over resource utilization and API rate limits.

What is a Promise Pool?

Imagine you need to make 1,000 API requests. You have two obvious approaches:

Make all 1,000 requests at once (but you'll likely hit rate limits and overload the server)
Make requests one at a time (but this will be painfully slow)

A Promise Pool gives you a middle ground - it's like having a small team of workers that process requests concurrently.

Our own implementation of this idea has five key features that make it perfect for healthcare data:

Configurable concurrency: We can adjust the number of concurrent requests based on each API's rate limits
Automatic deduplication: The pool tracks which URLs have been requested to prevent redundant calls
Lazy execution: Tasks aren't processed until results are requested, allowing better batching
Smart error handling: Failed requests can be retried with customizable strategies
Predictable ordering: Results are returned in a consistent order regardless of completion time

Putting this all together, we get a workflow that looks something like this:

┌────────────────────────────────────────────────────────────────────┐
│           Lifecyle of a FHIR Resource in our Promise Pool          │
└────────────────────────────────────────────────────────────────────┘

   Initial Query         Resource Map          Reference Resolution
  ┌───────────┐        ┌───────────────┐       ┌─────────────────────┐
  │           │        │               │       │                     │
  │ Patient/1 ├───────►│ Patient/1     │       │ For each found      │
  │           │        │               │       │ reference:          │
  └───────────┘        └───────────────┘       │                     │
                              │                │ 1. Check if         │
                              │                │    already in       │
  ┌───────────┐               │                │    resource map     │
  │           │        ┌───────────────┐       │                     │
  │ Condition?├───────►│ Condition/A   │       │ 2. If not, add      │
  │ patient=1 │        │ Condition/B   │◄─┐    │    to Promise       │
  │           │        │               │  │    │    Pool queue       │
  └───────────┘        └───────────────┘  │    │                     │
                              │           │    │ 3. Once fetched,    │
                              │           │    │    add to map       │
  ┌───────────┐               │           │    │    and check its    │
  │           │        ┌───────────────┐  │    │    references       │
  │ Medication├───────►│ Medication/X  │◄─┼────┤                     │
  │ ?patient=1│        │ Medication/Y  │  │    │ 4. Limit depth      │
  │           │        │               │  │    │    to prevent       │
  └───────────┘        └───────────────┘  │    │    infinite loops   │
                           │              │    │                     │
                           ▼              │    └────────┬────────────┘
                     ┌───────────────┐    │             │
                     │ Final unified │    │             │
                     │ resource map  │    │             │
                     │ with all      │    │             │
                     │ interconnected│    │             ▼
                     │ FHIR resources│    │     ┌────────────────────┐
                     └───────────────┘    │     │ Promise Pool       │
                                          │     │ Queue              │
                                          │     │                    │
                                          └─────┤ New refs           │
                                                │ added here         │
                                                └────────────────────┘

Why Redis?

While Promise Pools handle concurrency within a single process, we needed a way to coordinate work across multiple servers and handle failures gracefully. That's where Redis comes in.

Redis serves as the backbone of our job processing system by:

Storing job data: We use BullMQ (a Redis-based queue library) to track what jobs need to be processed
Coordinating workers: Multiple servers can pull jobs from the same queues without conflicts
Surviving crashes: If a server fails, Redis preserves the job data so it can be picked up later
Managing rate limits: Redis helps us track and enforce API rate limits across our entire system

This approach lets us scale out by simply adding more servers when we need to process more data, without changing our code.

FHIR-Specific Challenges and Solutions

Working with healthcare data at scale has taught us valuable lessons about the unique challenges of processing FHIR data. Let me share the most important insights that shaped our approach.

FHIR-Specific Extraction Strategies

Our approach to fetching FHIR data adapts to each healthcare provider's unique implementation:

Try $everything first: We first attempt FHIR's specialized $everything operation, which returns all resources associated with a patient in a single request
Fall back gracefully: If that fails (or returns incomplete data), we use individual resource-type queries
Use the CapabilityStatement: We examine each server's published CapabilityStatement to determine what resources and operations it supports
Optimize requests: We create tailored requests based on the server's documented capabilities
Adapt dynamically: We maintain different strategies for different endpoints based on their real-world behavior

For endpoints requiring special handling, we create secondary Promise Pools with custom configurations. This lets us adapt our concurrency settings based on each payer's characteristics - some APIs can handle many simultaneous requests, while others require a more careful approach.

Lessons Learned Working with FHIR

1. Implementation Variance Is the Norm

FHIR is a standard, but implementations vary dramatically across healthcare providers:

Some servers support $everything but return incomplete results
Many servers claim to support search parameters in their CapabilityStatement but return errors when used
Pagination implementations range from excellent to completely broken
Rate limits can vary from 5 requests per second to 500+
Error handling differs dramatically across providers

Our system dynamically adapts to each provider's unique behavior, not just what they claim to support.

2. Reference Resolution Requires Careful Handling

FHIR resources are interconnected through references - a medication references a patient, which references a provider, and so on:

Circular references are common in healthcare data
Some references point to nonexistent resources
Reference chains can be arbitrarily deep
Some references are more important than others for business logic

Our system tracks all references, automatically fetches missing resources, uses depth limits to prevent infinite loops, and deduplicates requests to avoid redundant API calls.

3. Incremental Sync Requires Multiple Approaches

Most providers support _lastUpdated filtering to only retrieve data that has changed, but implementation quality varies widely. We've built a hybrid approach:

Time-based filtering for most resources
Complete refreshes for critical data types
Resource-specific synchronization strategies for problematic endpoints

This ensures data completeness while minimizing unnecessary data transfer.

4. Error Handling Must Be FHIR-Aware

FHIR servers return specialized OperationOutcome resources for errors. Our error handling:

Identifies FHIR-specific error patterns
Adapts retry strategies based on error types
Automatically modifies problematic requests (like removing _count when rejected)
Uses exponential backoff to avoid overwhelming struggling servers

Beyond handling errors, we also collect detailed metadata about each resource type, monitor reference relationships, and continuously optimize our approach based on this information.

The Results

This architecture has been running in production at Flexpa for years, and the results speak for themselves:

Millions of records processed across hundreds of different healthcare systems
99.9% reliability even when working with unstable APIs
Optimal throughput for each healthcare provider based on their capabilities
Complete data capture with automatic reference resolution
Graceful error handling that adapts to each provider's quirks

The combination of Promise Pools for controlled concurrency within each process and Redis for coordinating across multiple servers has proven to be exceptionally powerful. It gives us the best of both worlds - fine-grained control over individual API interactions and the ability to scale horizontally when we need more capacity.

Healthcare data is complex and messy, but with the right architecture, it's possible to build systems that handle it reliably at scale. We hope sharing our approach helps others facing similar challenges.