Enterprise Database Design Patterns

This document covers production-grade patterns for both OLTP and reporting systems. For normalization fundamentals (1NF-5NF), see Data Modeling & Normalization.

Two Workloads, Two Design Philosophies

Before diving into patterns, understand that OLTP and Reporting have fundamentally different goals:

AspectOLTP (Transactional)Reporting (Analytics)
Primary goalCorrectness under concurrencyFast, consistent queries
Schema shapeNormalized (3NF), narrow tablesDenormalized (star/snowflake), wide
Optimized forWrites, point lookups, small transactionsReads, aggregations, large scans
IndexesSelective, minimalCovering, partition-aligned
Success metricsLatency (p95/p99), throughput, deadlocksQuery time, analyst productivity

Design principle: Build OLTP for correctness first, Reporting for usability + speed, and connect them with reliable pipelines.


Domain-First Modeling

Senior engineers avoid “pretty but wrong” schemas by starting from business concepts:

Key Concepts

ConceptDefinitionExample
InvariantRule that must always hold“Balance ≥ 0 unless overdraft enabled”
AggregateData that must change atomicallyOrder + OrderLines (same transaction)
EntityHas identity + lifecycleCustomer, Account, Policy
Value ObjectNo identity, compared by valueMoney, Address, DateRange
EventImmutable fact for audit + projectionsPaymentReceived, AddressChanged

Relationship Design Guidelines

  • Prefer 1:M relationships for clarity and performance
  • Resolve M:N with intersection tables (add attributes where needed)
  • Enforce cardinality explicitly:
    • NOT NULL FKs → mandatory relationship
    • UNIQUE constraint → 1:1 relationship

Transactions & Concurrency

This is what separates toy databases from enterprise systems.

ACID Properties

PropertyMeaningPractical Impact
AtomicityAll changes commit together or none doFailed payment doesn’t leave partial state
ConsistencyConstraints/invariants hold at commitBalance never goes negative (if enforced)
IsolationConcurrent transactions don’t interfereTwo users buying last item → one succeeds
DurabilityCommitted data survives crashesConfirmed order isn’t lost on restart

Isolation Levels & Anomalies

Isolation LevelDirty ReadNon-Repeatable ReadPhantom ReadWrite Skew
Read Uncommitted✗ Possible✗ Possible✗ Possible✗ Possible
Read Committed✓ Prevented✗ Possible✗ Possible✗ Possible
Repeatable Read✓ Prevented✓ Prevented✗ Possible✗ Possible
Serializable✓ Prevented✓ Prevented✓ Prevented✓ Prevented

Anomaly definitions:

  • Dirty read: Reading uncommitted changes that may roll back
  • Non-repeatable read: Same row returns different values within one transaction
  • Phantom read: New rows appear in a range query during transaction
  • Write skew: Two transactions each make “safe” decisions that combined violate a rule

Concurrency Patterns

PatternWhen to UseImplementation
Atomic update with guardSimple state changesUPDATE ... SET stock = stock - 1 WHERE stock >= 1
Optimistic (OCC)Low contention, retries acceptableAdd version column, check on update
Pessimistic lockingHigh contention, conflicts costlySELECT ... FOR UPDATE, keep txn short
Idempotency keysRetries in distributed systemsStore idempotency_key with UNIQUE

OCC Example:

1
2
3
4
5
6
7
8
9
-- Read with version
SELECT id, balance, version FROM account WHERE id = 123;
-- Returns: balance=100, version=5

-- Update with version check
UPDATE account
SET balance = 80, version = version + 1
WHERE id = 123 AND version = 5;
-- If 0 rows affected → someone else changed it → retry

Deadlock Prevention

  • Lock resources in consistent order (e.g., always by ascending ID)
  • Keep transactions short — no user interaction mid-transaction
  • Avoid mixing long reads with writes in same transaction

Keys & Identifiers

Primary Key Strategy

ApproachProsConsUse When
Surrogate PKStable, never changesNo business meaningDefault choice for most tables
Natural PKBusiness meaning, no extra columnMay change, often compositeImmutable business identifiers
Composite PKNo extra column for junctionsComplex joins, harder to referenceJunction/intersection tables

Best practice: Use surrogate PKs + keep business keys as UNIQUE columns:

1
2
3
4
5
CREATE TABLE customer (
    customer_id   BIGINT PRIMARY KEY,        -- surrogate (stable)
    email         VARCHAR(255) UNIQUE,       -- business key
    external_ref  VARCHAR(100) UNIQUE        -- integration key
);

Identifier Types Comparison

TypeGenerationIndex SizeLocalityBest For
Sequential INTCentralizedSmallExcellentSingle-DB OLTP
UUID v4DistributedLargePoorDistributed systems
UUID v7 / ULIDDistributedLargeGoodDistributed + time-ordering
Snowflake IDDistributedMediumGoodHigh-volume distributed

External ID Safety

  • Never use external IDs as internal PKs (they can change or collide)
  • Always maintain a mapping: internal_id ↔ external_ref

Time, History & Auditability

Timestamp Rules

  • Store all timestamps in UTC
  • Distinguish between:
    • Event time: when it actually happened
    • Record time: when it was stored/processed
    • Effective date: when it takes effect (may be future)

History Patterns Comparison

PatternUse CaseProsCons
Audit columnsBasic trackingSimple, minimal overheadNo history, only latest
Type 2 SCDTrack attribute changesFull history, point-in-timeComplex queries, more storage
Event sourcingComplete audit trailImmutable, replay-ableComplex reads, eventual state
Temporal tablesSystem-managed historyAutomatic, SQL standardDB-specific support

Audit Columns (Minimum)

1
2
3
4
5
6
7
8
CREATE TABLE order (
    order_id    BIGINT PRIMARY KEY,
    -- business columns...
    created_at  TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    created_by  VARCHAR(100) NOT NULL,
    updated_at  TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_by  VARCHAR(100) NOT NULL
);

Type 2 SCD Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
CREATE TABLE customer_history (
    customer_sk   BIGINT PRIMARY KEY,       -- surrogate for this version
    customer_id   BIGINT NOT NULL,          -- business key
    name          VARCHAR(100),
    tier          VARCHAR(20),
    valid_from    DATE NOT NULL,
    valid_to      DATE,                     -- NULL = current
    is_current    BOOLEAN NOT NULL DEFAULT TRUE
);

-- Find customer state at a specific date
SELECT * FROM customer_history
WHERE customer_id = 123
  AND valid_from <= '2024-06-15'
  AND (valid_to IS NULL OR valid_to > '2024-06-15');

OLTP Schema Design

Table Design Rules

  • Keep tables narrow — avoid wide “god tables”
  • Store atomic attributes — avoid JSON blobs unless justified
  • Use explicit status columns with constrained values
  • Model state transitions intentionally — document valid transitions

Constraint Checklist

ConstraintUse ForExample
NOT NULLRequired fieldsemail VARCHAR(255) NOT NULL
UNIQUEBusiness keys, idempotency keysUNIQUE (email)
FOREIGN KEYReferential integrityREFERENCES customer(customer_id)
CHECKDomain rules (ranges, enums)CHECK (amount >= 0)
DEFAULTSensible defaults, timestampsDEFAULT CURRENT_TIMESTAMP

Common OLTP Anti-Patterns

Anti-PatternProblemFix
Lists in columnsBreaks 1NF, can’t query/indexChild table with FK
Magic strings for statusTypos, invalid statesCHECK constraint or enum table
Over-indexingSlow writes, wasted storageIndex only what queries need
Read-modify-write without lockRace conditions, lost updatesOCC or atomic updates
Wide tables (50+ columns)Poor cache utilizationSplit into related tables

Dimensional Modeling (Reporting/Analytics)

Star Schema Fundamentals

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
                    ┌─────────────┐
                    │  dim_date   │
                    └──────┬──────┘
┌─────────────┐     ┌──────┴──────┐     ┌─────────────┐
│dim_customer │────▶│ fact_sales  │◀────│ dim_product │
└─────────────┘     └──────┬──────┘     └─────────────┘
                    ┌──────┴──────┐
                    │ dim_store   │
                    └─────────────┘
ComponentPurposeCharacteristics
Fact tableMeasurements/events at defined grainNarrow, many rows, numeric measures
DimensionDescriptive context for analysisWide, fewer rows, text attributes

Grain: The #1 Design Decision

Define grain as a sentence before designing:

Grain StatementFact TableTypical Measures
“One row per order linefact_order_linesquantity, unit_price, line_total
“One row per transactionfact_transactionsamount, fee
“One row per account per dayfact_daily_balancebalance, available_credit

Rule: Never mix grains in one fact table.

Fact Types

TypeCan Sum AcrossExampleHandling
AdditiveAll dimensionsSales amount, quantitySum freely
Semi-additiveSome dims, not timeAccount balanceUse snapshot facts, avg over time
Non-additiveNoneConversion rate, ratioStore components, compute in query

SCD (Slowly Changing Dimension) Types

TypeBehaviorUse CaseTrade-off
Type 0Never changesFixed attributes (birth date)No flexibility
Type 1OverwriteCorrections, non-historicalLoses history
Type 2New row per changeTrack history over timeComplex, more storage
Type 3Previous value in columnLimited history (1 prior value)Only partial history

Type 2 Example:

1
2
3
4
customer_sk | customer_id | name    | tier   | valid_from | valid_to   | is_current
------------|-------------|---------|--------|------------|------------|------------
1001        | C123        | Alice   | Bronze | 2023-01-01 | 2024-03-14 | false
1002        | C123        | Alice   | Silver | 2024-03-15 | NULL       | true

Common Patterns

  • Transaction fact + Snapshot fact: Events vs point-in-time state
  • Bridge tables: Many-to-many dimensions (customer ↔ segments)
  • Role-playing dimensions: Same dim, multiple FKs (order_date, ship_date → dim_date)
  • Junk dimensions: Low-cardinality flags combined into one dim

Data Pipelines (OLTP → Reporting)

Pipeline Approaches

ApproachLatencyComplexityBest For
Batch ETLHours/DailyLowTraditional BI, stable reports
Batch ELTHours/DailyMediumCloud DW with transformation power
CDCMinutesMediumNear-real-time, event-driven
StreamingSecondsHighReal-time dashboards, alerts

Reliability Patterns

PatternPurposeImplementation
Outbox patternAtomic business write + event publishWrite to outbox table in same txn
Idempotent loadsSafe reruns without duplicatesNatural keys + MERGE/upsert
Late-arriving dimHandle facts before dimensions exist“Unknown” dimension record + backfill
ReconciliationProve reporting matches sourceRow counts, sums, balance checks

Guiding principle: Reporting is only as good as its lineage, deduplication, and reconciliation.


Indexing Strategy

AspectOLTPReporting
Index forPK lookups, FK joins, uniquenessDimension key joins, partition filters
Index countMinimal (writes matter)More liberal (reads dominate)
Query patternPoint lookups, small rangesLarge scans, aggregations
Pre-aggregationRarelySummary tables, materialized views

Query Optimization Tips

  • Keep predicates sargable — avoid WHERE YEAR(date_col) = 2024
  • Fetch only needed columns — avoid SELECT *
  • Use covering indexes for hot queries
  • Partition-align filters (time-based queries on time-partitioned tables)

Partitioning & Data Lifecycle

Partitioning Strategy

Partition TypeBest ForExample
Range (time)Logs, transactions, eventsMonthly partitions on created_at
ListRegional data, tenant isolationPartition per region or tenant_id
HashEven distributionHash on customer_id

Benefits: Faster queries (partition pruning), efficient archival/purge, parallel processing.

Data Lifecycle Tiers

TierAgeStorageAccess Pattern
Hot0-30 daysPrimary DB/SSDFrequent reads/writes
Warm30-365 daysSecondary/cheaperOccasional queries
Cold1+ yearsArchive/objectRare, compliance only

Schema Evolution

Expand/Contract Pattern (Safe Default)

1
2
3
4
5
1. EXPAND    → Add new nullable columns/tables
2. BACKFILL  → Populate in batches
3. DUAL-WRITE→ Keep old + new in sync (if needed)
4. MIGRATE   → Move consumers to new schema
5. CONTRACT  → Drop old columns after confidence window

Safe vs Dangerous Changes

Safe (Online)Dangerous (Requires Planning)
Add nullable columnDrop column with data
Add new tableRename column
Add index (CONCURRENTLY)Change column type
Add CHECK constraint (NOT VALID)Add NOT NULL to existing column

Security & Data Quality

Security Checklist

  • Minimize PII — store only what’s necessary
  • Encrypt — in transit (TLS) and at rest
  • Least privilege — separate read/write/admin roles
  • Audit trail — log sensitive data access and changes

Data Quality Layers

LayerWhenHow
PreventionAt write timeConstraints, validation, reference tables
DetectionContinuousProfiling, anomaly checks, threshold alerts
CorrectionOn discoveryDocumented fix procedures, backfills
ReconciliationPeriodicSource vs derived comparisons

Running Example: E-Commerce OLTP → Star Schema

This example shows how the same business concepts map from normalized OLTP to dimensional reporting.

OLTP Schema (Normalized, 3NF)

erDiagram
    CUSTOMER ||--o{ ORDER : places
    ORDER ||--|{ ORDER_LINE : contains
    PRODUCT ||--o{ ORDER_LINE : "appears in"
    CATEGORY ||--o{ PRODUCT : groups

    CUSTOMER {
        bigint customer_id PK
        varchar email UK
        varchar name
        varchar tier
        timestamp created_at
    }
    ORDER {
        bigint order_id PK
        bigint customer_id FK
        varchar status
        timestamp order_date
        timestamp shipped_date
    }
    ORDER_LINE {
        bigint order_id FK
        bigint product_id FK
        int quantity
        decimal unit_price
    }
    PRODUCT {
        bigint product_id PK
        varchar sku UK
        varchar name
        bigint category_id FK
        decimal current_price
    }
    CATEGORY {
        bigint category_id PK
        varchar name
    }

Characteristics: Normalized, no redundancy, optimized for writes.

Star Schema (Dimensional, for Reporting)

erDiagram
    DIM_DATE ||--o{ FACT_SALES : "order date"
    DIM_CUSTOMER ||--o{ FACT_SALES : who
    DIM_PRODUCT ||--o{ FACT_SALES : what

    FACT_SALES {
        bigint date_sk FK
        bigint customer_sk FK
        bigint product_sk FK
        int quantity
        decimal unit_price
        decimal line_total
        decimal discount_amount
    }
    DIM_DATE {
        bigint date_sk PK
        date full_date UK
        int year
        int quarter
        int month
        varchar month_name
        int day_of_week
        boolean is_weekend
        boolean is_holiday
    }
    DIM_CUSTOMER {
        bigint customer_sk PK
        bigint customer_id
        varchar email
        varchar name
        varchar tier
        date valid_from
        date valid_to
        boolean is_current
    }
    DIM_PRODUCT {
        bigint product_sk PK
        bigint product_id
        varchar sku
        varchar product_name
        varchar category_name
        decimal price_at_load
    }

Grain: One row per order line (sale transaction)

Characteristics: Denormalized (category_name in dim_product), surrogate keys, SCD Type 2 on customer.

Transformation Mapping

OLTP SourceStar Schema TargetTransformation
order.order_datefact_sales.date_skLookup in dim_date
customer.*dim_customer.*SCD Type 2 (track tier changes)
product.* + category.namedim_product.*Flatten join, snapshot price
order_line.quantity * unit_pricefact_sales.line_totalComputed measure

Sample Queries

OLTP Query (get customer’s recent orders):

1
2
3
4
5
SELECT o.order_id, o.order_date, SUM(ol.quantity * ol.unit_price) as total
FROM order o
JOIN order_line ol ON o.order_id = ol.order_id
WHERE o.customer_id = 123 AND o.order_date > CURRENT_DATE - 30
GROUP BY o.order_id, o.order_date;

Star Schema Query (sales by category and month):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT
    d.year, d.month_name,
    p.category_name,
    SUM(f.line_total) as revenue,
    SUM(f.quantity) as units_sold
FROM fact_sales f
JOIN dim_date d ON f.date_sk = d.date_sk
JOIN dim_product p ON f.product_sk = p.product_sk
WHERE d.year = 2024
GROUP BY d.year, d.month_name, p.category_name
ORDER BY d.year, d.month, revenue DESC;

Quick Reference Checklists

OLTP Design Review

  • Invariants identified and enforced (constraints, triggers, app logic)?
  • Concurrency strategy defined (OCC, pessimistic, atomic updates)?
  • Idempotency keys for retry-safe operations?
  • Audit columns on sensitive tables?
  • Indexes aligned with top query patterns?
  • Foreign keys enforcing referential integrity?

Reporting Design Review

  • Grain stated as a sentence for each fact table?
  • Fact types classified (additive/semi-additive/non-additive)?
  • SCD type chosen for each dimension attribute?
  • Conformed dimensions shared across marts?
  • Late-arriving data strategy defined?
  • Reconciliation checks automated?

For fastest enterprise readiness, learn these topics in order:

PriorityTopicKey Concepts
1Transactions & ConcurrencyACID, isolation levels, OCC, idempotency
2Normalization & Constraints1NF-3NF, when to enforce what, anomalies
3Dimensional ModelingGrain, facts, dimensions, SCD types
4Indexing StrategyOLTP vs reporting, covering indexes, sargability
5Partitioning & ArchivalTime-based partitions, hot/warm/cold tiers
6Data PipelinesCDC, ETL/ELT, reconciliation, idempotent loads
7Schema EvolutionExpand/contract, safe migrations