Distributed Locks in Microservices — What Actually Works in Production

December 9, 2025 · Asad Ali

Distributed locks are one of those topics that look deceptively simple when you read the theory… and then immediately explode in your face when you try to implement them in a real distributed system. I’ve had outages, race conditions, duplicate workflows, and state corruption caused by bad locking assumptions — enough scars to write this article with strong opinions.

If you’re building microservices that coordinate shared resources, schedule jobs, or perform cross-process updates, you’ll eventually need a distributed lock. The trick is choosing the right style of lock for the problem instead of blindly applying whatever you saw in a blog or StackOverflow snippet.

When a distributed lock is actually needed

In microservices, you typically reach for distributed locking when:

  • Only one node should perform a critical operation at a time (e.g. sending payouts, billing cycles, nightly rollups)
  • You have a shared resource (file, job, aggregate, tenant, external API) that must not be mutated concurrently
  • You’re orchestrating workflows in message-driven systems where duplicate events can arrive
  • You want to prevent concurrency races across a horizontally scaled fleet of workers
Warning: Most teams introduce distributed locks because they skipped idempotency. A lock is not a replacement for idempotent business logic.

The real problem: clocks lie, nodes die, networks split

A distributed lock isn’t just “set a key in Redis and release later”. You must assume:

  • Processes crash before releasing the lock
  • Network partitions cause a node to think it still owns a lock it already lost
  • Clock drift ruins time-based expiry semantics (especially in VMs or containers)
  • Message handlers retry and re-enter critical sections unexpectedly

In other words, you’re designing for chaos, not for the happy path.

Locking approaches that actually work

Let’s walk through battle-tested lock mechanisms I’ve used across projects — and the pitfalls that come with each.

1. Redis-based locks (Redlock or simpler variants)

Redis is the most common distributed locking tool because it offers atomic operations (`SET NX PX`). But how you use Redis determines whether your lock is safe or a ticking time bomb.

Atomic lock acquisition

await db.StringSetAsync(
    key: "locks:invoice:123", 
    value: instanceId, 
    when: When.NotExists, 
    expiry: TimeSpan.FromSeconds(30)
);

This works fine only if:

  • The lock expiration is longer than any critical section
  • You renew the lock when approaching expiry
  • You release the lock only if you still own it
Anti-pattern: Never use Redis locks without a unique lock owner ID. Otherwise two nodes can release each other’s locks.

Redlock — the controversial distributed algorithm

Redlock (the multi-Redis-instance consensus algorithm) improves safety but comes with complexity. Martin Kleppmann’s famous critique is valid: Redlock is not linearizable under partitions, so don’t use it where correctness requirements are extremely strict.

When Redlock is fine:

  • Background jobs
  • Leader election
  • Periodic maintenance work

When not to use it: financial transactions, shared counter increments, or anything requiring strong consistency.

Architecture insight: A lock is not a transaction. It just reduces the probability of collisions — it does not guarantee correctness.

2. Database row-level locks (SQL Server, PostgreSQL)

Traditional RDBMS systems already support strong locking semantics. Sometimes the simplest distributed lock is: use the database.

SQL Server example

BEGIN TRAN
SELECT * FROM LockTable WITH (UPDLOCK, ROWLOCK)
WHERE Name = 'BillingCycle'
-- critical section
COMMIT

This ensures exactly one node enters the critical section. PostgreSQL’s FOR UPDATE works similarly.

Pros:

  • Strong guarantees
  • No extra infrastructure
  • Works beautifully when locks relate to DB-backed aggregates

Cons:

  • Not ideal for long-running operations
  • Risk of deadlocks if you mix different lock orderings
  • Doesn’t scale well for high-contention scenarios
Tip: For aggregate-level consistency, DB locks are usually simpler and safer than distributed caches.

3. Blob storage leases (Azure Blob Lease API)

I’ve used Blob Leases a lot in event-driven microservices. A blob lease is essentially a distributed mutex with built-in expiry and renewal.

They’re perfect for:

  • Leader election
  • Ensuring only one Function App instance processes a partition
  • Cross-region coordination of scheduled jobs

The API is simple and reliable:

  • Acquire lease (60 seconds)
  • Renew regularly
  • Release explicitly
Best practice: If your microservices run on Azure and you don’t want Redis or Zookeeper, Blob Leases are one of the safest distributed locks available.

4. Optimistic concurrency instead of locking

Sometimes the best lock is no lock — just use optimistic concurrency. EF Core, SQL, and almost every ORM support concurrency tokens.

public class Invoice
{
    public int Id { get; set; }
    public decimal Amount { get; set; }
    [Timestamp]
    public byte[] RowVersion { get; set; }
}

On update, if another process modified the record, you get a concurrency exception. You retry — or bail.

This works insanely well for:

  • Financial aggregates
  • Inventory adjustments
  • Profile updates
Warning: Teams often overuse distributed locks because they misunderstand optimistic concurrency. Use the DB before adding infrastructure.

5. Message-driven idempotency instead of locking

Many microservices problems that appear to need locking are actually event deduplication problems. When a message is processed exactly-once (idempotently), you don’t need a lock — duplicates simply no-op.

Patterns like:

  • Outbox
  • Inbox / dedup store
  • Idempotency tokens
  • Versioned aggregates

reduce the need for locks dramatically.

Debugging insight: 80% of the “we need a lock” requests I’ve reviewed over the years turned out to be “we need idempotency”.

The real decision tree: how I pick a locking strategy

After years of battling distributed systems, I now follow this sequence:

  1. Can we make the operation idempotent? If yes, do that. No lock needed.
  2. Is the resource stored in a relational DB? Use DB row locks or optimistic concurrency.
  3. Do we need cross-process coordination for background jobs? Use Blob Lease or Redis lock.
  4. Does the workflow require strict, linearizable correctness? Avoid Redis; consider DB or specialized coordination systems (Zookeeper/etcd).
  5. Is the operation long-running? Use a renewable lock (Blob Lease / Redis with renewal).

What has burned me in production

I’ve lost count of the incidents where:

  • Redis expired a lock too early because clock drift made one instance renew late
  • A Function App scaled to 20 instances and all of them believed they were the leader
  • A SQL-based lock caused a deadlock chain and stalled multiple microservices
  • Message deduplication was missing, so locking didn’t prevent double-processing anyway
Anti-pattern: Using distributed locks as a band-aid for poor domain design.

If you reach for locking too early, you end up patching symptoms instead of fixing architecture.

Final thoughts

Distributed locks are neither good nor bad. They are specialised tools with specific guarantees — and very sharp edges. The trick is understanding your domain invariants, your infrastructure tolerance, and failure behavior under partitions and retries.

The best microservices I’ve built rely heavily on idempotency and optimistic concurrency, and only rarely need a true distributed lock. When I do reach for one, I choose based on the consistency model — not convenience.

Design for failure, assume retries, and always verify ownership before releasing locks. Those three habits alone have saved me from more outages than any tool or cloud service.

— Asad