Introduction

In 2025, every modern software system generates logs. These logs serve as critical instruments for developers, enabling them to debug errors, trace the flow of requests, and evaluate system performance. Typically, these are referred to as application logs they are verbose, ephemeral, and primarily intended for engineering teams.

However, there exists another category of logs with significantly higher stakes Audit logs. Unlike application logs, audit logs are not primarily designed for debugging. Their purpose is accountability.
They systematically record events to answer critical questions such as:

Who accessed sensitive information?
What changes were executed?
When did these actions occur?
Where and through which mechanism were these actions performed?

Essentially, audit logs are concerned less with diagnosing code issues and more with providing verifiable evidence of system activity. To fulfill this purpose, audit logs must maintain strict ordering, immutability, and consistency, since regulators, auditors, and even courts may rely on them to evaluate compliance and accountability.

Initially, I presumed the solution would be straightforward simply insert each event into the application’s existing database .This approach is feasible at a small scale until certain limitations become apparent.

The audit table can grow extremely large, impacting the performance of the primary database.
Variations in logging practices across multiple services complicate the reconstruction of a complete event sequence.
Most critically, regulatory frameworks such as GDPR, HIPAA, and SOC2 impose requirements that exceed what a conventional database table can guarantee, including immutability, strict ordering, and consistency across distributed services.

It became evident that audit logging is not merely a functional feature. it is fundamentally a distributed systems challenge.

In this article, I will document the journey from the naive “just use the DB” approach to a thoughtfully designed system capable of scaling, maintaining event order, and achieving high availability in a global audit logging platform.

Why not just keep audit logs in the main Application database?

After encountering the limitations of the “just put it in the DB” approach, I began to ponder a more significant question Why should every developer or startup have to reinvent the wheel when it comes to audit logging?

Consider this every serious application requires audit logs for compliance, trust, and accountability. Yet, most teams find themselves

Adding another table in their application database.
Writing custom middleware to capture user actions.
Struggling with issues related to storage growth, ordering, and compliance requirements.

The outcome? Developers spend weeks building and maintaining something that is not their core product but remains critical for customers, regulators, and auditors.

This led to a realization what if audit logging could be offered as a service? Imagine a micro-SaaS that any developer can easily integrate similar to how they use Stripe for payments or Auth0 for authentication. Instead of each startup tackling audit logging independently, they could simply send events to a dedicated platform that guarantees

Immutable storage (preventing accidental deletions or tampering).
Consistent ordering across distributed services.
Compliance-ready retention policies.
Easy integration via SDKs and APIs.

Developers shouldn’t have to waste time building compliant audit logs when they could be focusing on features that matter to their users.

Of course, it's easy to say this. Building it is another challenge. So, how do we design such a system to be reliable, scalable, and compliant?

Let’s begin by examining my initial architecture for this problem.

The Initial Architecture – Simple but Straightforward

The most natural starting point for an audit logging system is also the simplest

Expose an API endpoint where client applications can send audit events.
Validate the incoming payloads.
Write them directly into a relational database.

This design is clean and easy to get started with. A relational DB provides structured storage and a straightforward way to persist logs.

But the limitations surface quickly

As event volume grows, the database becomes a bottleneck.
Writing directly to the DB increases latency for the client.
A single ingestion service creates availability risks if it fails.

So while the design works at small scale, it doesn’t hold up for high-volume, low-latency, and highly available systems.

So , I evolved this architecture replacing the relational DB with Click House DB for scale, introducing a local queue to keep ingestion fast, and running multiple ingestion services behind a load balancer for high availability.

Why This Implementation Struggles

Database Bottlenecks

Relational databases, such as PostgreSQL, are designed for transactional workloads, which makes them less suitable for handling high-volume, append-only writes at scale. As the audit log table expands to millions or even billions of rows, the system experiences slower query performance, bloated indexes, and challenging storage management.

Latency for Clients

Latency refers to the delay a client experiences between sending an event and receiving confirmation. When every client request must directly interact with the database, factors like network delays, locks, and slow write operations contribute to increased response times. From the client's perspective, the system appears sluggish & slow.

Single Point of Failure

Relying on a single ingestion service and database creates a lack of redundancy. If the ingestion service crashes or the database becomes unavailable, clients are unable to log events. This situation undermines the reliability that audit logs are expected to provide.

Scaling Across Multiple Clients

When hundreds or thousands of services are simultaneously pushing logs, a single ingestion pipeline struggles to keep up. Without load balancing, all traffic is directed to one service instance, resulting in overload and potential loss of events.

Updated architecture for scalability & Low-Latency Ingestion

Below is a compact, technical view of the evolved design which was designed to achieve HA & scalability. Let me explain how all these components helps me to make this system scalable & highly available.

Event Structure Capturing the Right Details

One of the first things I had to figure out was what does a good audit log event even look like?
Unlike normal application logs where you can throw in anything you want, audit logs need a clear and consistent structure.

Here’s an example of the format

{
    "timestamp": "2025-08-31T18:58:02+05:30",
    "event": "User Login",
    "user": {
        "id": "12345",
        "role": "Admin",
        "ip": "192.168.1.1"
    },
    "resource": {
        "type": "Patient Record",
        "id": "67890"
    },
    "action": "View"
}

The timestamp shows when the action happened (and later we’ll see why that’s trickier than it sounds in distributed systems). The event is a simple label like User Login, while user, resource, and action capture who did something, on what, and how (e.g., View, Update, Delete).

Payload Considerations JSON vs Compression vs Protobuf?

When building a high-volume audit logging system, one of the first questions is how should events be sent from client applications to the ingestion service?

The obvious first choice is JSON it’s human-readable, widely supported, and easy to debug. But JSON comes with overhead:

It’s verbose (field names repeated for every record).
Payload size is larger compared to binary formats.
Larger payloads mean more network bandwidth and higher latency.

Another option is compression. Compressing JSON can shrink payload size significantly, but it’s not free:

Compression and decompression consume CPU cycles on both client and server.
For high-throughput systems, this can actually increase latency and strain infrastructure.
Logs are usually small and frequent the compression overhead often outweighs the network savings.

That’s where Protobuf (Protocol Buffers) shines.

It’s a compact binary format much smaller on the wire than JSON.
Serialization and deserialization are faster than compression, with far less CPU overhead.
Since audit log schemas don’t change frequently, Protobuf’s strict structure is a great fit.

To further reduce overhead, the client SDK doesn’t send each log event individually. Instead, it keeps a small in-memory buffer and flushes events in batches (for example, every 1 second). This batching strategy avoids excessive network calls while still keeping latency low.

Format	Size per event (bytes)	Total for 1,000 events (bytes)	Total ≈
JSON (raw)	800	800,000	800 KB
Compressed JSON (gzip)	200	200,000	200 KB
Protobuf (binary)	180	180,000	180 KB

So , the final approach is to choose protobuf for payloads is efficient for this system .

Load Balancer The Front Door of the System

To make the ingestion service reliable, we can’t just run a single instance that would be a single point of failure. Instead, we place a load balancer in front:

Distributes traffic across multiple ingestion service instances, so if one fails, others continue serving requests.
Handles TLS termination encrypting client requests over HTTPS, then passing decrypted traffic internally to services for better performance.
Improves scalability more ingestion nodes can be added behind the load balancer without changing client configuration.

In short, the load balancer ensures high availability, better security, and smooth scaling. It an inevitable component is system design to scale the system.

Ingestion Service Fast, Durable and Simple

The ingestion service is designed to handle client requests quickly without making them wait for heavy processing. Its flow is simple:

Receive Event - The client sends an event to the ingestion API.
Push to Queue - The event is immediately pushed into a Redis queue. Redis Append-Only File (AOF) is enabled, ensuring durability even if the Redis node restarts.
Respond to Client - As soon as the event is queued, the service responds with success keeping client latency low.
Workers - Background workers running on the same server consume events from Redis and write them into the Click House database, which is optimized for handling millions of inserts per second.

This design decouples ingestion latency (client-facing) from storage latency (DB writes). Clients get fast acknowledgment, while the system ensures reliable, ordered persistence in the background.

Why click house DB for audit log storage?

At first, it feels natural to put audit logs into a relational database like PostgreSQL or MySQL after all, that’s where the rest of our application data lives. But the reality is, audit logs behave very differently from normal application records. They’re append-only, high-volume, and rarely updated. That makes traditional relational databases struggle under the load.

This is where Click House shines. It’s a columnar database built for fast inserts and analytical queries on massive datasets. Instead of storing rows like PostgreSQL, it stores data by columns, which makes queries like “show me all access events by user X in the last 90 days” blazingly fast.

Some features that make it a great fit for audit logging:

High throughput writes – it can handle millions of events per second without choking.
Columnar storage – queries over specific fields (like user ID, resource type, or action) are much faster and lighter.
Built-in compression – reduces storage costs for large volumes of logs.
Data retention policies – you can automatically drop or move old logs, which is critical for compliance.
Horizontal scalability – clusters can scale out as log volume grows, so you’re not stuck on a single box.

In short, Click House isn’t just a database it’s a log analytics engine. For something like audit logs, where you need both speed and reliability, it’s a far better choice than a general-purpose relational database.

So with a load balancer in front, ingestion services backed by Redis queues, and Click House handling storage and retention, the system is finally solid enough to scale, stay available, and absorb backpressure without falling over. At this point, we’ve solved the basics fast ingestion, durability, and reliable querying.

But here’s where the real challenge begins. Even with all this in place, distributed systems bring their own set of headaches. And one of the trickiest especially for something like audit logs is making sure events stay in the right order when different machines keep slightly different notions of time.

When Time Lies ? The hidden problem in Audit log ordering

On paper, the setup looks pretty solid.

A client fires an event say a User Login. The SDK packages it up in Protobuf and sends it over HTTPS to the load balancer. From there, it gets passed along to one of the ingestion services. That service quickly pushes the event into Redis, tells the client “all good,” and a worker eventually writes it down into ClickHouse.

End-to-end, it feels clean: client → ingestion → queue → database.

But here’s where things start to break: ordering events by timestamp isn’t as reliable as it sounds.

Take this simple example. At 18:58:02, a user logs in through Server A. One second later, at 18:58:03, they try to view a record this time through Server B.

Now, if Server B’s clock is running just a couple of seconds behind Server A, the logs might show the View happening before the Login. Suddenly, your audit trail is telling a very different story that someone looked at sensitive data without even logging in.

That’s not just a weird edge case. In compliance-heavy systems, it’s a red flag.

Why does this happen? A few reasons:

Clocks drift every server keeps its own time, and they never stay perfectly aligned.
Network delays one request can take longer to travel, so an event that happened later might actually get recorded earlier.
Retries if a request fails and retries, its timestamp may no longer reflect when the action really happened.

When everything sits on a single machine, ordering is easy. But once you scale out with load balancers, multiple ingestion servers, and queues, time itself becomes slippery.

And that’s why “just sort by timestamp” doesn’t cut it when you’re building a reliable audit log system.

Physical Clocks vs Logical Clocks

Before diving deeper into distributed ordering, it helps to clarify what we mean by time in a distributed system.

Physical Clocks

A physical clock is what you’re used to: the wall-clock time on your server. Each machine has its own internal clock, which can drift slightly due to hardware imperfections. That’s why 18:58:02 on Server A might actually be 18:58:04 on Server B.

Physical clocks are intuitive you can timestamp events with them, display them in logs, or sort data chronologically. But in distributed systems, relying solely on physical clocks is risky clocks drift, networks add latency, and retries can make later events appear earlier.

Keeping Physical Clocks in Sync

NTP is a protocol designed to synchronize clocks across a network of computers. Servers periodically query NTP servers to correct any drift in their local clocks.

Logical Clocks

A logical clock, on the other hand, doesn’t care about actual wall-clock time. Instead, it tracks the order of events relative to each other.

In essence, logical clocks answer the question: “Which happened first?” rather than “What time did this happen?” which is exactly what you need for audit logging and event sequencing in distributed systems.

Tackling the Time Problem

So, if timestamps from individual servers are unreliable, how do we restore order? That’s where distributed clocks come in two concepts that we need to understand to address this issue.

Lamport clocks
Vector clocks

Lamport Clocks

Lamport clocks assign a simple counter to every event. Every time a server performs an action, it increments its counter. When servers communicate, they exchange their counters, and the receiving server updates its own counter if the incoming value is higher.

Here’s the magic: even if the clocks on Server A and Server B are wildly different, Lamport clocks give us a consistent ordering of causally related events. If Event X happened before Event Y in the system, the Lamport counter ensures that X’s number is smaller than Y’s, no matter which server processed them.

Lamport clocks don’t tell you actual wall-clock time. They only ensure relative ordering which, for audit logs, is often exactly what you need.

Lamport clocks help us reason about the sequence of events .

Each node maintains a counter.
Before an event, increment your counter.
When sending a message, include the counter.
On receiving a message, set your counter to max(local, received) + 1.

This way, we can tell “happened-before” relationships without perfectly synchronized clocks.

Vector Clocks

Instead of a single counter, each server maintains a vector of counters one for every server in the cluster.

Whenever an event happens, the server increments its own counter in the vector. When events are shared, vectors are merged by taking the maximum value for each server. Now you can detect concurrent events events that happened independently and can’t be ordered relative to each other .

Think of it like a distributed ledger where each participant keeps a personal record but can reconcile with others to see who did what and when. This extra detail helps maintain accurate audit logs across multiple servers and queues, even under retries, network delays, or clock drift.

Instead of depending on “wall time,” each node tracks its own version of events.

By comparing these vectors element-by-element, we can tell whether an event happened before, after, or concurrently with another even without synchronized clocks. It’s a simple yet powerful idea that helps systems like DynamoDB maintain causal consistency when time itself becomes unreliable.

Distributed Counters

A clean way to fix event ordering is to move away from timestamps and use a single, monotonic counter a counter that only ever increases, guaranteeing global order.

Think of it as a “universal sequence number” that each event must pass through before being accepted into the system.

Redis Based Monotonic Counter

A simple implementation uses Redis atomic increment operation (INCR). Every incoming event from any server requests the next counter value. Redis ensures this happens atomically no two events ever get the same number, no matter how many servers or threads are writing simultaneously.

So, even if Server A’s clock is behind Server B’s, the ordering is now based on the sequence number, not time.

Example :

User Login -> gets counter = 10001

User View Record -> gets counter = 10002

This is a simple but powerful idea a monotonic source of truth for ordering. Scaling this would be harder as it again a single point of failure.

Updated Architecture after @Abhishek Prakash comment

In my initial design, I had the Redis-based monotonic counter sitting at the ingestion layer. It felt like a clean solution easy to scale and simple to manage on the server side. But there was a subtle issue hiding underneath.

Let’s say a user performs two actions in sequence Login (Event A) and View Record (Event B). Both events are sent from the same client server, one after another.
Now imagine there’s a bit of network delay or routing difference, and somehow Event B reaches the ingestion service before Event A.

Since the ingestion service assigns the counter, Event B gets ID N, and Event A gets N+1.
Boom now our audit log incorrectly shows the user “viewed a record before logging in.”
This happens because we’re relying on the order of arrival, not the order of occurrence.

That’s when it clicked the counter needs to live closer to where events actually happen.

The right approach is to move the distributed counter in front of the SDK, not at the ingestion layer.

Now, every time the SDK prepares an event, it first requests a new ID from this distributed counter service and attaches that ID to the event payload before sending it forward.

This ensures the sequence is decided before the event leaves the client’s environment — so even if network delays shuffle the arrival order, the true logical order stays intact.

It also avoids the complexity of maintaining a Lamport clock or persistent counter on the client side. The SDK stays lightweight, while the distributed counter (backed by Redis or another reliable store) guarantees monotonic ordering across all clients.

Scaling this at Scale

Systems like Spanner at Google take this concept much further with something called TrueTime.

TrueTime combines atomic clocks and GPS receivers to maintain extremely tight time synchronization across datacenters typically within microseconds. It doesn’t just give a timestamp; it gives a bounded uncertainty window.

e.g., this event happened at 18:58:03 ± 2ms.

By incorporating these uncertainty bounds, Spanner ensures that when it assigns a timestamp to a transaction, no other transaction with a smaller timestamp can still be pending giving a strict global order with minimal waiting.

It’s essentially a highly sophisticated, globally distributed monotonic time system blending the reliability of physical time with the guarantees of logical ordering.

Why I Chose a Single Monotonic Counter

In my design, I chose to use a single monotonic counter stored in Redis as the authoritative sequence generator for audit events.

Every event hitting the ingestion layer first fetches its unique counter before being queued. That way, no matter which server processed it or how clocks drift, the ordering is guaranteed based on a single, consistent source of truth.

This approach keeps the system simple, predictable, and in sync even as it scales across multiple nodes and services.

Conclusion

Building a reliable audit log system in a distributed environment is harder than it looks. Simple “sort by timestamp” logic can fail in subtle but serious ways. Lamport clocks, Vector clocks, and distributed counters provide tools to bring order to the chaos, giving you confidence that your audit trails are accurate.

Of course, that’s just one piece of the puzzle. End-to-end encryption of these logs, and validating their authenticity, is another challenge altogether one we’ll cover in the next post.

Stay tuned for more deep technical explorations like this distributed systems demand engineering, not assumptions.

Designing Scalable Audit Logging Systems Tackling Clock Drift and More