Architecting Multi-Tenant SaaS Applications on Azure with ASP.NET Core and EF Core

December 2, 2025 Β· Asad Ali

Multi-tenant SaaS looks deceptively simple on whiteboards and brutally unforgiving in production. The moment you mix ASP.NET Core, EF Core, Azure App Service, Azure SQL, Azure AD B2C, and a real billing story, you’ve signed up for a distributed systems problem dressed as β€œjust another web app.” In this article I’m going to walk through how I actually design and operate multi-tenant SaaS platforms on Azure β€” the patterns that survive real incidents, the traps that will absolutely leak tenants or melt your shard, and the opinionated architecture I’d use if I had to ship a new SaaS platform tomorrow.

Why multi-tenancy is the real architecture test

Most teams underestimate multi-tenancy because they treat it as a routing concern: β€œWe’ll figure out which tenant it is from the host name or token and add a filter.” That’s how you accidentally build a platform that:

  • Leaks data across tenants due to a single misordered middleware.
  • Deadlocks EF Core on a hot shard under noisy-neighbor load.
  • Breaks your identity flow because Azure AD B2C was wired without tenant context.
  • Is impossible to bill correctly because events are not tenant-scoped and not idempotent.

ASP.NET Core is one of the most popular frameworks in the Stack Overflow Developer Survey, and EF Core is the official data access platform for .NET developers. Combine that with Azure’s growth (and the fact that 90% of Fortune 500 companies run on Microsoft Cloud), and you get the de-facto stack many of us are using for SaaS. That’s exactly why you can’t wing this architecture.

Choosing the tenancy model before you write a single line of code

Multi-tenancy is primarily about isolation trade-offs. Martin Fowler, Vaughn Vernon, and Sam Newman all circle around the same truth in different ways: your domain and your operational model must align. For SaaS, the first real architectural decision is not β€œmicroservices or monolith” β€” it’s your tenancy model.

Shared everything: efficient, dangerous if you’re sloppy

Shared-everything means:

  • Single logical database (maybe scaled via elastic pools / hyperscale) shared across tenants.
  • TenantId column on every multi-tenant table.
  • Application layer enforces tenant isolation.

This model is operationally cheap, great for hundreds to a few thousand tenants with moderate data. The death blow comes when:

  • You rely on β€œremembered filters” instead of globally enforced tenant predicates.
  • You let reporting queries run without tenant constraints.
  • You use naive EF Core global query filters that are poisoned by a wrong model cache key.

Shared-everything is fine, but you must treat tenant isolation as non-negotiable, as strict as auth. A missing WHERE TenantId clause is not a bug; it’s a breach.

Isolated everything: safe, expensive, and sometimes necessary

Isolated-everything typically means:

  • One database per tenant (or per small group of tenants).
  • Possibly one resource group or even subscription per strategic customer.
  • Network-level isolation using VNETs, private links, and possibly dedicated App Service plans.

This is what you use when a tenant is big enough to behave like its own customer environment (think enterprise customers with strong data residency or audit requirements). You pay in:

  • Provisioning complexity (onboarding flows, migration flows, per-tenant schema updates).
  • Management overhead (1000+ databases means 1000+ indexes, backups, failovers, and noise).
  • Query / reporting complexity (cross-tenant analytics needs a data warehouse or lakehouse path).

In my last multi-tenant platform, we ended up with three tiers: shared DB for small tenants, shard-per-region for mid-tier, and dedicated DB for top-tier customers. Trying to put everyone into one database or one-database-per-tenant from day one is just religious; the right answer is almost always tiered.

Hybrid: the only model that usually survives growth

Hybrid models combine shared and isolated:

  • Core metadata in a shared β€œcontrol plane” database.
  • Tenant data in sharded or isolated β€œdata planes.”
  • Billing, identity, feature flags, and licensing are central, but data path is fanned out.

This is the model I normally recommend for Azure-based SaaS:

  1. A shared control DB (Azure SQL) for tenant registry, subscriptions, plans, feature flags, billing meters.
  2. A pool of tenant databases either sharded by region or capacity (elastic pools, hyperscale, or multiple logical servers).
  3. Routing layer at the app level that maps tenant to connection string + capabilities.

It gives you flexibility to move a tenant to an isolated DB when their usage (or legal department) demands it, without refactoring your entire stack.

Making ASP.NET Core tenancy-aware without leaking tenants

If you get tenant resolution wrong in ASP.NET Core, everything else cascades into chaos. Tenant identity must be derived once per request, validated, and then made available to the entire pipeline in a safe, immutable way.

Where tenant resolution actually belongs in the middleware pipeline

I’ve seen teams put tenant resolution after authentication, after routing, or even deep into MVC filters. This is how you end up with middleware that occasionally uses the previous request’s tenant context under load. Remember: Kestrel reuses connections; if you cache tenant context in static or incorrectly scoped services, you will leak tenants across concurrent requests.

A sane pipeline ordering for multi-tenant apps usually looks like:

// Program.cs (ASP.NET Core 8 / minimal hosting)
var builder = WebApplication.CreateBuilder(args);

// Kestrel & hosting tuned for multi-tenant workloads
builder.WebHost.ConfigureKestrel(options =>
{
    options.Limits.MaxConcurrentConnections = 10000; // tune with load testing
    options.Limits.MaxRequestBodySize = 10 * 1024 * 1024;
    options.AddServerHeader = false;
});

// Add services
builder.Services.AddHttpContextAccessor();

builder.Services.AddScoped<ITenantContextAccessor, HttpContextTenantContextAccessor>();

builder.Services.AddAuthentication(/* Azure AD B2C config */);

builder.Services.AddAuthorization();

builder.Services.AddControllers();

var app = builder.Build();

// 1. Correlation IDs & logging
app.Use(async (ctx, next) =>
{
    var traceId = ctx.Request.Headers["x-trace-id"].FirstOrDefault() ?? Guid.NewGuid().ToString("N");
    ctx.Items["TraceId"] = traceId;
    using (Serilog.Context.LogContext.PushProperty("TraceId", traceId))
    {
        await next();
    }
});

// 2. Tenant resolution MUST come before auth when you derive policies per-tenant
app.UseMiddleware<TenantResolutionMiddleware>();

// 3. Authentication / authorization
app.UseAuthentication();
app.UseAuthorization();

// 4. Routing
app.MapControllers();

app.Run();

Why tenant before auth? Because in some designs, your auth policies, token validation parameters, or scopes are tenant-specific. If you don’t know which tenant this is when you validate the token, you can’t apply the tenant’s policies (e.g., B2C authority, allowed issuers, or tenant-specific scopes).

A real tenant resolution middleware that doesn’t poison the pipeline

This is roughly what I use in production:

public sealed class TenantResolutionMiddleware
{
    private readonly RequestDelegate _next;
    private readonly ITenantRegistry _tenantRegistry;
    private readonly ILogger<TenantResolutionMiddleware> _logger;

    public TenantResolutionMiddleware(
        RequestDelegate next,
        ITenantRegistry tenantRegistry,
        ILogger<TenantResolutionMiddleware> logger)
    {
        _next = next;
        _tenantRegistry = tenantRegistry;
        _logger = logger;
    }

    public async Task InvokeAsync(HttpContext context, ITenantContextAccessor tenantAccessor)
    {
        // Avoid blocking calls here, or you'll starve the thread pool under load.
        var host = context.Request.Host.Host;
        var path = context.Request.Path.Value ?? string.Empty;

        // Strategy 1: subdomain (acme.saas.com)
        var tenantKey = ExtractTenantFromSubdomain(host)
                        ?? ExtractTenantFromPath(path)
                        ?? ExtractTenantFromHeader(context.Request);

        if (tenantKey is null)
        {
            _logger.LogWarning("Tenant resolution failed for {Host}{Path}", host, path);
            context.Response.StatusCode = StatusCodes.Status400BadRequest;
            await context.Response.WriteAsync("Tenant not specified");
            return;
        }

        var tenant = await _tenantRegistry.GetTenantAsync(tenantKey, context.RequestAborted);
        if (tenant is null || !tenant.IsActive)
        {
            _logger.LogWarning("Unknown or inactive tenant {TenantKey}", tenantKey);
            context.Response.StatusCode = StatusCodes.Status404NotFound;
            await context.Response.WriteAsync("Unknown tenant");
            return;
        }

        tenantAccessor.SetCurrentTenant(tenant);

        // Also push tenant info into logging scope
        using (Serilog.Context.LogContext.PushProperty("TenantId", tenant.Id))
        using (Serilog.Context.LogContext.PushProperty("TenantKey", tenant.Key))
        {
            await _next(context);
        }
    }

    private static string? ExtractTenantFromSubdomain(string host)
    {
        // e.g., acme.app.saas.com -> acme
        var parts = host.Split('.');
        if (parts.Length < 3) return null;
        return parts[0];
    }

    private static string? ExtractTenantFromPath(string path)
    {
        // e.g., /t/acme/orders
        if (!path.StartsWith("/t/")) return null;
        var segments = path.Split('/', StringSplitOptions.RemoveEmptyEntries);
        return segments.Length >= 2 ? segments[1] : null;
    }

    private static string? ExtractTenantFromHeader(HttpRequest request)
        => request.Headers["x-tenant"].FirstOrDefault();
}

The key points that seem obvious but teams keep missing:

  • Don’t store tenant context in static fields or singletons.
  • Don’t rely on AsyncLocal without a clear scoping strategy and tests under load.
  • Never perform heavy I/O (like reading from a remote config service) on every request; you’ll kill Kestrel with thread pool starvation. Cache aggressively with short TTLs.

Building EF Core multi-tenancy without poisoning the model cache

This is where many otherwise solid teams fall over. EF Core was not designed with multi-tenancy as a first-class concept, and its internal model cache can absolutely betray you if you’re not careful.

Global query filters: powerful and dangerous

Global query filters are the obvious way to enforce TenantId predicates:

public class TenantDbContext : DbContext
{
    private readonly ITenantContextAccessor _tenantAccessor;

    public TenantDbContext(DbContextOptions<TenantDbContext> options,
        ITenantContextAccessor tenantAccessor) : base(options)
    {
        _tenantAccessor = tenantAccessor;
    }

    public DbSet<Order> Orders => Set<Order>();

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        var tenantId = _tenantAccessor.Current?.Id ?? Guid.Empty;

        modelBuilder.Entity<Order>().HasQueryFilter(o => o.TenantId == tenantId);
        // ... more entities
    }
}

This looks right and WILL leak data in production if you run multi-tenancy with variable tenant contexts in the same process. The reason: EF Core caches the model per DbContext type + options. If you capture tenantId in OnModelCreating, that value is baked into the model and reused for all subsequent tenants.

This is model cache poisoning. The first tenant to touch the context wins.

Correct model caching with a custom cache key

The fix is to include the tenant shape (not the actual tenant ID) into the model cache key when the schema changes per tenant or when filters differ by tenant category. But you must not rebuild the model per-tenant for large systems β€” that will crater your performance.

In practice, I use:

  • Global query filters based on a parameter (tenant ID via EF.Property), not a captured value.
  • Custom model cache key only when schema itself changes (e.g., per-tenant schemas or table prefixes).
public class MultiTenantDbContext : DbContext
{
    private readonly ITenantContextAccessor _tenantAccessor;

    public MultiTenantDbContext(DbContextOptions<MultiTenantDbContext> options,
        ITenantContextAccessor tenantAccessor) : base(options)
    {
        _tenantAccessor = tenantAccessor;
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        // Use shadow property TenantId so filter uses EF.Property
        foreach (var entityType in modelBuilder.Model.GetEntityTypes())
        {
            if (typeof(ITenantEntity).IsAssignableFrom(entityType.ClrType))
            {
                modelBuilder.Entity(entityType.ClrType)
                    .Property<Guid>("TenantId");

                var parameter = Expression.Parameter(entityType.ClrType, "e");
                var tenantIdProperty = Expression.Call(
                    typeof(EF),
                    nameof(EF.Property),
                    new[] { typeof(Guid) },
                    parameter,
                    Expression.Constant("TenantId"));

                var tenantId = Expression.Property(
                    Expression.Property(
                        Expression.Constant(_tenantAccessor),
                        nameof(ITenantContextAccessor.Current)),
                    nameof(TenantContext.Id));

                var equal = Expression.Equal(tenantIdProperty, tenantId);
                var lambda = Expression.Lambda(equal, parameter);

                modelBuilder.Entity(entityType.ClrType).HasQueryFilter(lambda);
            }
        }
    }
}

This approach makes the model independent of any specific tenant ID; the query filter reads the tenant ID at runtime from the accessor. The key is avoiding baking any specific tenant into the model cache. For more complex scenarios (e.g., per-tenant schemas), you must implement IModelCacheKeyFactory and include the schema identifier in the key β€” but only for groups, not per actual tenant, or your startup times will explode.

DbContext lifecycles and hot-tenant pressure

Under heavy write load from a single noisy tenant, EF Core can exhibit query plan regressions and lock escalation if you:

  • Re-use DbContexts for too long (e.g., singleton or long-lived scoped contexts with large change trackers).
  • Run cross-tenant operations using one context instance while global filters create large, complex WHERE clauses.

I strongly recommend:

  • Using short-lived, request-scoped DbContexts for APIs.
  • Using IDbContextFactory<T> (not pooling) for background workers consuming messages per-tenant.
  • Configuring max batch size and command timeout per tenant tier.
builder.Services.AddDbContext<MultiTenantDbContext>((sp, options) =>
{
    var tenant = sp.GetRequiredService<ITenantContextAccessor>().Current;

    var connectionString = sp.GetRequiredService<ITenantConnectionResolver>()
        .ResolveConnectionString(tenant!);

    options.UseSqlServer(connectionString, sql =>
    {
        sql.EnableRetryOnFailure(5);
        sql.CommandTimeout(tenant!.Tier == TenantTier.Free ? 15 : 60);
    });

    options.EnableSensitiveDataLogging(false);
    options.UseQueryTrackingBehavior(QueryTrackingBehavior.NoTracking);
});

Free tier tenants should not get the same query timeout and resource budget as your biggest paying enterprise customer. Bake this into the data layer; don’t depend on humans to remember it.

Identity and access with Azure AD B2C that actually respects tenants

Azure AD B2C is powerful but unforgiving when you try to use it as a multi-tenant auth system without a clear design. The Azure Architecture Center has good reference scenarios, but the nuances show up when you mix tenants, policies, and per-tenant branding and user stores.

Three real-world patterns for tenant-aware B2C

  1. Single B2C tenant, app-tenant mapping in your control DB.
    You maintain one Azure AD B2C tenant, with custom policies or user flows, and put tenant membership in your own database. B2C authenticates users; your app determines which SaaS tenant they belong to.
  2. Per-customer B2C integration using external IdPs.
    You still have a central B2C instance, but each SaaS tenant configures their enterprise IdP (Azure AD, ADFS, Okta). You map the external claims to your internal tenant model.
  3. Separate B2C instances for very large or regulated customers.
    Rare but sometimes necessary when customers demand strict directory isolation.

Most SaaS teams land on (1) + (2). The core rule: your tenant model lives in your control plane DB, not in Azure AD B2C. B2C is an identity provider, not your tenancy source of truth.

Enforcing tenant membership at the API layer

The mistake I see is trusting a tenantId claim from the token and mapping it 1:1 to your tenant. Never do that in isolation. Instead:

  • Resolve the target tenant from host/path/header using your middleware.
  • Extract user identity from the B2C token.
  • Check in your DB whether this user is a member/admin of this tenant.
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Class)]
public sealed class RequireTenantMembershipAttribute : Attribute, IAsyncAuthorizationFilter
{
    public async Task OnAuthorizationAsync(AuthorizationFilterContext context)
    {
        var tenant = context.HttpContext.RequestServices
            .GetRequiredService<ITenantContextAccessor>().Current;

        if (tenant is null)
        {
            context.Result = new BadRequestObjectResult("Tenant not resolved.");
            return;
        }

        var user = context.HttpContext.User;
        if (!user.Identity?.IsAuthenticated ?? true)
        {
            context.Result = new UnauthorizedResult();
            return;
        }

        var userId = user.FindFirst("oid")?.Value // Azure AD object id
                     ?? user.FindFirst(ClaimTypes.NameIdentifier)?.Value;

        if (string.IsNullOrEmpty(userId))
        {
            context.Result = new ForbidResult();
            return;
        }

        var membershipService = context.HttpContext.RequestServices
            .GetRequiredService<ITenantMembershipService>();

        var hasAccess = await membershipService.UserHasAccessAsync(tenant.Id, userId,
            context.HttpContext.RequestAborted);

        if (!hasAccess)
        {
            context.Result = new ForbidResult();
        }
    }
}

This pattern has saved me more than once. A misconfigured B2C policy or a stale token cache can otherwise let a user β€œswitch tenants” by manipulating the host header if you’re not validating membership server-side.

Azure resource isolation: what we actually isolate per tenant

Pat Helland famously said, β€œThere is no such thing as a stateless architecture.” For SaaS, that state has to map cleanly to Azure resources. But you can’t give every tenant their own subscription and Kubernetes cluster; you’d drown in management overhead.

Control plane vs data plane in Azure

I like splitting Azure resources into two layers:

  • Control plane: everything that governs tenants, plans, identities, config, and metering.
  • Data plane: actual tenant data, compute, messaging, storage.

A typical logical layout:

  • Single subscription (or a small number by region).
  • One or more resource groups per environment (prod, staging) and sometimes per region.
  • Shared App Services or AKS for the app tier.
  • Shared Azure Service Bus namespaces for messaging, with per-tenant topics/queues only for high-volume tenants.
  • Azure SQL elastic pools or hyperscale for tenant DBs.

ASCII view of a practical multi-tenant SaaS on Azure

                      +-----------------------------+
      Browser / SPA   |        Azure AD B2C        |
  +------------------>|  (User flows, policies,    |
                      |   external IdPs, tokens)   |
                      +-------------+--------------+
                                    |
                             OIDC / JWT (id_token, access_token)
                                    |
                                    v   traceId / spanId
                        +-----------+------------+
   HTTPS / HTTP/2/3     |   Azure App Service    |
  +-------------------->|  ASP.NET Core (Kestrel)|
                        |------------------------|
                        |  [1] Correlation Id    |
                        |  [2] TenantResolution  |
                        |  [3] AuthN / AuthZ     |
                        |  [4] Controllers       |
                        +-----------+------------+
                                    |
                                    | ITenantContext (TenantId, Tier,
                                    |   ShardKey, Region)
                                    v
                   +----------------+----------------+
                   |   Tenant Routing / ConnResolver |
                   +----------------+----------------+
                                    |
             +----------------------+------------------------+
             |                                               |
             v                                               v
+-------------------------+                   +---------------------------+
|  Control Plane DB      |                   |   Tenant Data Plane       |
|  Azure SQL             |                   |   Azure SQL Elastic Pool  |
|  Tenants, Plans,       |                   |   or Hyperscale           |
|  Features, Billing     |                   |   shard-{A,B,C...}        |
+-----------+-------------+                   +------------+--------------+
            |                                                   |
            |                                                   |
            v                                                   v
  +-------------------+                             +-------------------+
  | Azure Service Bus |                             | Azure Service Bus |
  | (Billing Events,  |                             | (Domain Events,   |
  |  Provisioning)    |                             |  Outbox Inbox)    |
  +---------+---------+                             +---------+---------+
            |                                                   |
            v                                                   v
    +---------------+                                   +---------------+
    | Billing /     |                                   | Background    |
    | Metering Svc  |                                   | Workers       |
    +---------------+                                   +---------------+

Observability:
  - App Insights / OpenTelemetry across App Service, Workers
  - traceId / spanId propagated via headers
  - TenantId logged in structured logs for every span

This simple diagram hides a lot of pain: if you do not propagate traceId and TenantId consistently, your incidents will turn into archaeology expeditions.

SaaS billing and metering: where most teams cheat and regret it

You can duct tape billing for an MVP; you cannot do that for a real SaaS business. Udi Dahan and Greg Young have written and spoken extensively on messaging and consistency β€” billing is where you either respect those principles, or your CFO will do incident retrospectives with you.

Key principles of sane SaaS billing

  • Billing events must be idempotent β€” no matter how many times you process them, the customer is charged once.
  • Billing events must be tenant-scoped β€” every meter has a TenantId and a unique idempotency key.
  • Billing must be eventually consistent β€” never lock the main flow for billing writes.

Outbox + dedicated billing processor

Whenever a billable action happens (seat created, GB stored, API call beyond free tier), I record it in the same transaction as the domain change using an outbox table per tenant DB or per shard.

public class OutboxMessage
{
    public long Id { get; set; }
    public Guid TenantId { get; set; }
    public string MessageType { get; set; } = default!;
    public string Payload { get; set; } = default!;
    public DateTime OccurredAtUtc { get; set; }
    public string IdempotencyKey { get; set; } = default!;
    public bool Processed { get; set; }
    public DateTime? ProcessedAtUtc { get; set; }
}

public async Task RecordBillableEventAsync(TenantId tenantId, BillableEvent evt)
{
    var entity = new OutboxMessage
    {
        TenantId = tenantId.Value,
        MessageType = evt.GetType().FullName!,
        Payload = JsonSerializer.Serialize(evt, _jsonOptions),
        OccurredAtUtc = DateTime.UtcNow,
        IdempotencyKey = evt.IdempotencyKey
    };

    _db.OutboxMessages.Add(entity);
    await _db.SaveChangesAsync(); // Same DbContext used for main domain change
}

Separate workers (Azure Functions, Azure Container Apps, or WebJobs) drain these outbox messages, send them to a central Billing Service via Service Bus, and mark them processed in an idempotent way.

public class BillingOutboxProcessor
{
    private readonly IDbContextFactory<MultiTenantDbContext> _dbFactory;
    private readonly ServiceBusSender _billingSender;

    public async Task ProcessAsync(CancellationToken cancellationToken)
    {
        await using var db = await _dbFactory.CreateDbContextAsync(cancellationToken);

        var batch = await db.OutboxMessages
            .Where(x => !x.Processed)
            .OrderBy(x => x.Id)
            .Take(500)
            .ToListAsync(cancellationToken);

        foreach (var msg in batch)
        {
            var sbMessage = new ServiceBusMessage(msg.Payload)
            {
                Subject = msg.MessageType,
                MessageId = msg.IdempotencyKey,
                ApplicationProperties =
                {
                    ["tenantId"] = msg.TenantId.ToString(),
                    ["occurredAt"] = msg.OccurredAtUtc
                }
            };

            await _billingSender.SendMessageAsync(sbMessage, cancellationToken);

            msg.Processed = true;
            msg.ProcessedAtUtc = DateTime.UtcNow;
        }

        await db.SaveChangesAsync(cancellationToken);
    }
}

Service Bus’s MessageId can help prevent duplicates at the consumer side, but you still must design your Billing Service as idempotent. Replay will happen β€” DLQ replays, partial failures, or manual reprocessing during incident mitigation.

Azure App Service deployment shapes for multi-tenant workloads

You can absolutely run a serious multi-tenant SaaS on Azure App Service if you understand its limits and behavior under pressure. The trap is running everything on one S1 plan and hoping autoscale will save you. It won’t.

Separate horizontal scaling by traffic profile, not by microservice doctrine

In production, we typically split:

  • Core API App Service Plan (high RPS, latency-sensitive).
  • Background processing plan (Functions, WebJobs, or worker service).
  • Admin/reporting plan (lower RPS, heavier queries).

Some tenants (top-tier) may get a dedicated App Service Plan with their own instance of the API to guarantee noisy-neighbor isolation. App Service autoscale rules should factor in:

  • CPU
  • Requests per second
  • Queue length of Service Bus (for worker plans)

Configure App Service correctly for ASP.NET Core:

  • Run on Linux (cgroups) if you understand container behavior and want better control; otherwise Windows is fine.
  • Set WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT sensibly.
  • Ensure Http/2 is enabled especially behind Application Gateway / Front Door.

Kestrel under multi-tenant load

Kestrel’s request queue and thread pool behavior matter a lot more once you’re under real tenant traffic:

  • Long synchronous DB calls in tenant resolution will starve threads.
  • Misconfigured MaxRequestBodySize combined with a few tenants uploading giant files will monopolize workers.
  • If you push too many concurrent connections, your per-instance limits might cause request queuing and timeouts that look like β€œrandom” tenant slowness.

You must load test with realistic tenant mixes. A single hot tenant calling a heavy endpoint at 200 RPS can completely distort your scaling behavior.

Observability and lifecycle: multi-tenancy makes debugging harder by default

The Google SRE books hammer this point: you can’t operate what you can’t observe. Multi-tenancy multiplies this β€” every incident you debug must answer: β€œIs this a platform issue or a single tenant’s behavior?”

What I consider non-negotiable in observability

  • Every log line contains TraceId and TenantId (and user id where relevant).
  • Distributed tracing across App Service, workers, and DB using OpenTelemetry or App Insights SDK.
  • Per-tenant dashboards β€” latency, error rates, and throughput by tenant.
Log.Logger = new LoggerConfiguration()
    .Enrich.FromLogContext()
    .Enrich.WithProperty("Application", "SaaS.Api")
    .WriteTo.Console(outputTemplate:
        "{Timestamp:O} [{Level:u3}] (tenant={TenantId}) (trace={TraceId}) {Message:lj}{NewLine}{Exception}")
    .CreateLogger();

Once you have this, the question β€œIs this just ACME Corp complaining again or is this a systemic issue?” becomes answerable in seconds rather than hours.

Tenant lifecycle management as a first-class domain

Tenants aren’t just rows in a table; they go through full lifecycles:

  • Provisioned (DB created, migrations applied, seed data inserted).
  • Active (billing starts, quotas enforced).
  • Suspended (overdue invoices, maybe read-only access).
  • Cancelled (data retention timer starts, export windows honored).

Automate these as workflows, ideally with something like Durable Functions or a saga orchestrator if it spans multiple services. But Durable Functions have their own gotchas β€” zombie orchestrations and replay behavior will surprise anyone who hasn’t read the docs carefully. If you use them for tenant provisioning, keep orchestrations idempotent and externalize long-running side effects.

A realistic solution structure for a multi-tenant SaaS

A folder structure that has worked well for me on ASP.NET Core + EF Core:

πŸ“‚ src
β”œβ”€β”€πŸ“‚ Api
β”‚   β”œβ”€β”€πŸ“‚ Auth
β”‚   β”‚   β”œβ”€β”€πŸ“„ B2CAuthenticationExtensions.cs
β”‚   β”‚   β””β”€β”€πŸ“„ RequireTenantMembershipAttribute.cs
β”‚   β”œβ”€β”€πŸ“‚ Middlewares
β”‚   β”‚   β”œβ”€β”€πŸ“„ TenantResolutionMiddleware.cs
β”‚   β”‚   β””β”€β”€πŸ“„ CorrelationIdMiddleware.cs
β”‚   β”œβ”€β”€πŸ“‚ Controllers
β”‚   β”‚   β”œβ”€β”€πŸ“„ TenantAdminController.cs
β”‚   β”‚   β”œβ”€β”€πŸ“„ BillingController.cs
β”‚   β”‚   β””β”€β”€πŸ“„ OrdersController.cs
β”‚   β”œβ”€β”€πŸ“‚ Observability
β”‚   β”‚   β””β”€β”€πŸ“„ LoggingExtensions.cs
β”‚   β”œβ”€β”€πŸ“„ Program.cs
β”‚   β””β”€β”€πŸ“„ CompositionRoot.cs
β”œβ”€β”€πŸ“‚ Application
β”‚   β”œβ”€β”€πŸ“‚ Tenants
β”‚   β”‚   β”œβ”€β”€πŸ“‚ Commands
β”‚   β”‚   β”‚   β”œβ”€β”€πŸ“„ ProvisionTenantCommand.cs
β”‚   β”‚   β”‚   β””β”€β”€πŸ“„ SuspendTenantCommand.cs
β”‚   β”‚   β”œβ”€β”€πŸ“‚ Queries
β”‚   β”‚   β”‚   β””β”€β”€πŸ“„ GetTenantUsageQuery.cs
β”‚   β”œβ”€β”€πŸ“‚ Billing
β”‚   β”‚   β”œβ”€β”€πŸ“‚ Commands
β”‚   β”‚   β”‚   β””β”€β”€πŸ“„ GenerateInvoiceCommand.cs
β”‚   β”‚   β””β”€β”€πŸ“‚ Services
β”‚   β”‚       β””β”€β”€πŸ“„ BillingCalculator.cs
β”‚   β”œβ”€β”€πŸ“‚ Orders
β”‚   β”‚   β”œβ”€β”€πŸ“‚ Commands
β”‚   β”‚   β”œβ”€β”€πŸ“‚ Queries
β”‚   β”‚   β””β”€β”€πŸ“‚ Events
β”‚   β””β”€β”€πŸ“‚ Behaviors
β”‚       β””β”€β”€πŸ“„ MultiTenantValidationBehavior.cs
β”œβ”€β”€πŸ“‚ Domain
β”‚   β”œβ”€β”€πŸ“‚ Tenants
β”‚   β”‚   β”œβ”€β”€πŸ“„ Tenant.cs
β”‚   β”‚   β”œβ”€β”€πŸ“„ TenantPlan.cs
β”‚   β”‚   β””β”€β”€πŸ“„ TenantLifecycle.cs
β”‚   β”œβ”€β”€πŸ“‚ Billing
β”‚   β”‚   β”œβ”€β”€πŸ“„ BillableEvent.cs
β”‚   β”‚   β””β”€β”€πŸ“„ Invoice.cs
β”‚   β”œβ”€β”€πŸ“‚ Orders
β”‚   β”‚   β”œβ”€β”€πŸ“„ Order.cs
β”‚   β”‚   β””β”€β”€πŸ“„ OrderLine.cs
β”‚   β””β”€β”€πŸ“‚ ValueObjects
β”‚       β””β”€β”€πŸ“„ TenantId.cs
β””β”€β”€πŸ“‚ Infrastructure
    β”œβ”€β”€πŸ“‚ Persistence
    β”‚   β”œβ”€β”€πŸ“„ MultiTenantDbContext.cs
    β”‚   β”œβ”€β”€πŸ“„ TenantModelCacheKeyFactory.cs
    β”‚   β””β”€β”€πŸ“‚ Configurations
    β”‚       β”œβ”€β”€πŸ“„ TenantConfiguration.cs
    β”‚       β”œβ”€β”€πŸ“„ OrderConfiguration.cs
    β”‚       β””β”€β”€πŸ“„ OutboxConfiguration.cs
    β”œβ”€β”€πŸ“‚ Sharding
    β”‚   β”œβ”€β”€πŸ“„ TenantConnectionResolver.cs
    β”‚   β””β”€β”€πŸ“„ ShardMap.cs
    β”œβ”€β”€πŸ“‚ Messaging
    β”‚   β”œβ”€β”€πŸ“„ BillingOutboxProcessor.cs
    β”‚   β””β”€β”€πŸ“„ ServiceBusConfiguration.cs
    β”œβ”€β”€πŸ“‚ Tenants
    β”‚   β”œβ”€β”€πŸ“„ TenantRegistry.cs
    β”‚   β””β”€β”€πŸ“„ TenantMembershipService.cs
    β””β”€β”€πŸ“‚ Migrations
        β””β”€β”€πŸ“‚ ControlPlane
            β””β”€β”€πŸ“„ *.cs

This keeps multi-tenancy first-class where it belongs: from the domain model to persistence, not as an afterthought in a couple of middleware classes.

Failure modes and traps I see repeatedly

To wrap up, let me call out a few specific production-grade failure modes that bitten either my teams or ones I’ve helped:

  • EF Core model cache poisoning β€” capturing tenant-specific values in OnModelCreating. Fix it with proper query filters and cache keys.
  • Thread pool starvation in tenant resolution β€” doing synchronous network or key vault calls on every request. Cache aggressively; keep tenant lookup O(1) and in-memory where possible.
  • Hot partitions in Azure Storage or Service Bus β€” if your partition key is TenantId, your biggest tenant becomes your hottest partition. Add another dimension (time bucket, entity) to spread load.
  • Cross-tenant leaks via in-memory caches β€” caching authz decisions or per-tenant settings without including tenant key in the cache key. This will leak behavior between tenants.
  • Stale B2C config / token validation parameters β€” if you rotate keys or change policies and don’t refresh your OpenID metadata cache, some tenants will intermittently fail auth for hours.
  • Tenant affinity creating shard imbalance β€” always putting early big tenants into shard-0 β€œfor convenience”; two years later shard-0 is on fire and you can’t move them without complex live migrations.

Multi-tenant SaaS on Azure with ASP.NET Core and EF Core is absolutely doable, and honestly, it’s a great stack. But you need to treat tenant isolation, identity, billing, and observability as core architecture, not garnish. If you build your platform around those four pillars from day one, the rest β€” REST endpoints, DTOs, and frontends β€” is comparatively easy.