Distributed locks are one of those topics that look deceptively simple when you read the theory… and then immediately explode in your face when you try to implement them in a real distributed system. I’ve had outages, race conditions, duplicate workflows, and state corruption caused by bad locking assumptions — enough scars to write this article with strong opinions.
If you’re building microservices that coordinate shared resources, schedule jobs, or perform cross-process updates, you’ll eventually need a distributed lock. The trick is choosing the right style of lock for the problem instead of blindly applying whatever you saw in a blog or StackOverflow snippet.
When a distributed lock is actually needed
In microservices, you typically reach for distributed locking when:
- Only one node should perform a critical operation at a time (e.g. sending payouts, billing cycles, nightly rollups)
- You have a shared resource (file, job, aggregate, tenant, external API) that must not be mutated concurrently
- You’re orchestrating workflows in message-driven systems where duplicate events can arrive
- You want to prevent concurrency races across a horizontally scaled fleet of workers
The real problem: clocks lie, nodes die, networks split
A distributed lock isn’t just “set a key in Redis and release later”. You must assume:
- Processes crash before releasing the lock
- Network partitions cause a node to think it still owns a lock it already lost
- Clock drift ruins time-based expiry semantics (especially in VMs or containers)
- Message handlers retry and re-enter critical sections unexpectedly
In other words, you’re designing for chaos, not for the happy path.
Locking approaches that actually work
Let’s walk through battle-tested lock mechanisms I’ve used across projects — and the pitfalls that come with each.
1. Redis-based locks (Redlock or simpler variants)
Redis is the most common distributed locking tool because it offers atomic operations (`SET NX PX`). But how you use Redis determines whether your lock is safe or a ticking time bomb.
Atomic lock acquisition
await db.StringSetAsync(
key: "locks:invoice:123",
value: instanceId,
when: When.NotExists,
expiry: TimeSpan.FromSeconds(30)
);
This works fine only if:
- The lock expiration is longer than any critical section
- You renew the lock when approaching expiry
- You release the lock only if you still own it
Redlock — the controversial distributed algorithm
Redlock (the multi-Redis-instance consensus algorithm) improves safety but comes with complexity. Martin Kleppmann’s famous critique is valid: Redlock is not linearizable under partitions, so don’t use it where correctness requirements are extremely strict.
When Redlock is fine:
- Background jobs
- Leader election
- Periodic maintenance work
When not to use it: financial transactions, shared counter increments, or anything requiring strong consistency.
2. Database row-level locks (SQL Server, PostgreSQL)
Traditional RDBMS systems already support strong locking semantics. Sometimes the simplest distributed lock is: use the database.
SQL Server example
BEGIN TRAN
SELECT * FROM LockTable WITH (UPDLOCK, ROWLOCK)
WHERE Name = 'BillingCycle'
-- critical section
COMMIT
This ensures exactly one node enters the critical section. PostgreSQL’s FOR UPDATE works similarly.
Pros:
- Strong guarantees
- No extra infrastructure
- Works beautifully when locks relate to DB-backed aggregates
Cons:
- Not ideal for long-running operations
- Risk of deadlocks if you mix different lock orderings
- Doesn’t scale well for high-contention scenarios
3. Blob storage leases (Azure Blob Lease API)
I’ve used Blob Leases a lot in event-driven microservices. A blob lease is essentially a distributed mutex with built-in expiry and renewal.
They’re perfect for:
- Leader election
- Ensuring only one Function App instance processes a partition
- Cross-region coordination of scheduled jobs
The API is simple and reliable:
- Acquire lease (60 seconds)
- Renew regularly
- Release explicitly
4. Optimistic concurrency instead of locking
Sometimes the best lock is no lock — just use optimistic concurrency. EF Core, SQL, and almost every ORM support concurrency tokens.
public class Invoice
{
public int Id { get; set; }
public decimal Amount { get; set; }
[Timestamp]
public byte[] RowVersion { get; set; }
}
On update, if another process modified the record, you get a concurrency exception. You retry — or bail.
This works insanely well for:
- Financial aggregates
- Inventory adjustments
- Profile updates
5. Message-driven idempotency instead of locking
Many microservices problems that appear to need locking are actually event deduplication problems. When a message is processed exactly-once (idempotently), you don’t need a lock — duplicates simply no-op.
Patterns like:
- Outbox
- Inbox / dedup store
- Idempotency tokens
- Versioned aggregates
reduce the need for locks dramatically.
The real decision tree: how I pick a locking strategy
After years of battling distributed systems, I now follow this sequence:
- Can we make the operation idempotent? If yes, do that. No lock needed.
- Is the resource stored in a relational DB? Use DB row locks or optimistic concurrency.
- Do we need cross-process coordination for background jobs? Use Blob Lease or Redis lock.
- Does the workflow require strict, linearizable correctness? Avoid Redis; consider DB or specialized coordination systems (Zookeeper/etcd).
- Is the operation long-running? Use a renewable lock (Blob Lease / Redis with renewal).
What has burned me in production
I’ve lost count of the incidents where:
- Redis expired a lock too early because clock drift made one instance renew late
- A Function App scaled to 20 instances and all of them believed they were the leader
- A SQL-based lock caused a deadlock chain and stalled multiple microservices
- Message deduplication was missing, so locking didn’t prevent double-processing anyway
If you reach for locking too early, you end up patching symptoms instead of fixing architecture.
Final thoughts
Distributed locks are neither good nor bad. They are specialised tools with specific guarantees — and very sharp edges. The trick is understanding your domain invariants, your infrastructure tolerance, and failure behavior under partitions and retries.
The best microservices I’ve built rely heavily on idempotency and optimistic concurrency, and only rarely need a true distributed lock. When I do reach for one, I choose based on the consistency model — not convenience.
Design for failure, assume retries, and always verify ownership before releasing locks. Those three habits alone have saved me from more outages than any tool or cloud service.