Race conditions in async code

What do you think is wrong with this particular block of code?

async function createEntity(name: string) {
  const existing = await this.userRepo.findOne({ where: { name } });
  if (existing) throw new ConflictException('Name already exists');
  await this.userRepo.save({ name });
}

What do you think is wrong with this block of code?

It looks fine, works fine in most cases, and probably passed your tests. I’ve seen this type of logic in various code reviews, especially from junior developers.

It’s also the kind of logic you’ll find sprinkled all over production systems. Harmless little findOne or SELECT checks followed by some save call. And 99.9% of the time, it’ll do exactly what it says on the tin.

Until it doesn’t.

As you may have been able to ascertain from the title of this post, there is a race condition hidden in this perfectly normal looking code. Most people would assume that JavaScript being a single-threaded language makes it safe from what traditionally causes race conditions in most multi-threaded languages but single-threaded ≠ synchronous, and JavaScript being an asynchronous language allows room for context-switching.

This particular kind is called TOCTOU: Time of Check to Time of Use, a pretty self-explanatory name.

The bug lies in the window of opportunity between these two lines:

const existing = await this.userRepo.findOne({ where: { name } });
// <-- Context switch happens here
await this.userRepo.save({ name });

If two requests come in at the same time with the same name, maybe from a user double-clicking the button, or just two people picking the same name.

Both requests hit findOne()
Both see nothing
Both proceed to save()

Now you’ve got two users with the same name, or your DB throws a 500 because of a violated unique constraint (which is close to the best possible scenario, except for the 500 part).

Beginners and even some experienced devs, get used to writing imperative flows that “just work”. The logic is sound in a vacuum but can’t hold up under concurrency, traffic spikes, or retries.

Race conditions like these are everywhere.

They are so easy to miss because they often happen in rare occasions when the stars align, but also often cause the most trouble by sneaking into places you wouldn’t expect. Anywhere with shared state, concurrent access, and lack of coordination is an opportunity for a race.

Obviously, the TOCTOU issue I introduced earlier is quite easy to solve. You could either set uniqueness at the DB level and ensure such an exception is handled by the server or be proactive and make the operation atomic using a transaction.

But not all race conditions involve a database and they’re not always this easy to reason about. Some common examples include:

Syncing with a third-party API
Firing off webhooks or processing them
Processing retries
Scheduling jobs with CRON or queue workers
Updating cache that might be stale by the time it gets read

And your system starts behaving in unexpected, hard-to-reproduce ways in the form of silent failures, double charges, missed events, phantom data, and the like.

There is no silver bullet to fix all sorts of race conditions, but in many cases.

Enforcing idempotency, short-circuiting repeat requests
Using atomic job locks (e.g., Redis SETNX with expiry)
Versioning (ETags, updatedAt, etc.) to detect stale updates
Debouncing/deduplicating at the client/handler level.
Preferring command-style APIs over partial updates (e.g, PUT over PATCH)

Are some of the many ways you could solve such conditions

The takeaway isn’t to use a particular library or technique. Anytime you have a shared state and concurrency, you must assume the worst and design for it defensively, not optimistically.

Race conditions in async code

Comments

More from this blog

Unlearn faster

The real unfair advantage is focus

Vibe-engineering an OpenAPI compatible API authoring tool

You aren't qualified until you've lived the role

Command Palette

Comments

More from this blog