Merkle DAGs Explained

Tamper-Evident Evidence Chains for AI Governance

Heath Emerson, MBA — Founder & CEO

February 2026 | apotheon.ai

Download Full PDF

Get the complete whitepaper with references and citations

Executive Summary

Every AI agent makes decisions. In regulated industries, every decision must be accountable. The question is not whether organizations should record what their AI systems do—that much is mandated by the EU AI Act (Article 12), HIPAA, SOC 2, and a growing body of regulation worldwide. The question is how those records are produced, stored, and verified.

Traditional logging—text files, SIEM aggregation, database records—is the de facto standard. It is also fundamentally inadequate for the challenge of AI governance. Log files can be silently modified, selectively deleted, or retroactively altered by anyone with administrative access. They provide no mathematical proof that the record is complete. They require the auditor to trust the entity that produced the log. In an era where AI agents make millions of autonomous decisions daily, trust-based audit trails are a liability.

Merkle-DAGs (Merkle Directed Acyclic Graphs) replace trust with mathematics. Rooted in Ralph Merkle’s 1979 invention and battle-tested across Git, IPFS, Bitcoin, and Ethereum, Merkle-DAGs produce tamper-evident, content-addressed, cryptographically linked evidence chains where modifying any historical record is not merely difficult—it is mathematically detectable. This paper explains how Merkle-DAGs work, why they are superior to traditional logging for AI audit trails, and how Apotheon.ai’s THEMIS governance runtime implements them to create immutable evidence for every AI decision.

The Problem with Traditional Logging

Traditional logging has served enterprise IT for decades. Applications write events to log files. Log aggregation platforms—Splunk, the ELK stack, Datadog, Graylog—collect these files into centralized repositories. SIEM platforms layer analytics, correlation, and alerting on top. For operational monitoring, this architecture works. For regulatory evidence of AI decision-making, it does not.

The Six Fundamental Weaknesses

1. Mutability. Traditional log files are mutable by design. An administrator with file-system access can modify, delete, or reorder log entries without leaving any trace of the change. The OWASP Top 10 for 2025 ranks “Security Logging and Alerting Failures” as its ninth most critical vulnerability category, noting that log integrity is frequently compromised because logs are not properly protected from tampering. When an auditor reviews a log, they are trusting that no one with privileged access has altered it. For AI governance, where the organization producing the log is often the same entity being audited, this trust assumption is untenable.

2. Deletion without detection. A line removed from a log file leaves no gap. There is no mathematical mechanism to prove that a record once existed and was subsequently removed. Contrast this with a hash chain: removing a node changes the hash of every descendant, making deletion immediately detectable by anyone who holds a prior root hash.

3. No completeness proof. Traditional logs provide no mechanism for an external party to verify that the log contains every event that occurred. A system could log 999 out of 1,000 agent decisions and the missing entry would be undetectable. In industries like healthcare and financial services, where regulatory penalties apply per violation, the inability to prove completeness is a material compliance risk.

4. Volume and noise. Modern enterprises generate enormous log volumes. Dell has reported generating over 125 billion log events per day across its infrastructure. AI agent systems compound this problem: each agent action may trigger dozens of log entries across multiple services. SIEM log management research consistently identifies data overload as a primary cause of missed security incidents—teams drown in data while missing critical signals.

5. Inconsistent formats and siloed storage. Legacy logging typically means fragmented, inconsistent formats, manual searches, and data silos. Different services log in different schemas. Different teams use different retention policies. When a regulator requests a complete audit trail for a specific AI agent across its lifetime, reconstructing that trail from disparate log sources is error-prone at best and impossible at worst.

6. Trust dependency. The most fundamental weakness: traditional logging requires the auditor to trust the log producer. The organization claims the log is complete and unmodified. The auditor accepts this claim because there is no alternative. For AI systems making consequential decisions—credit scoring, medical triage, fraud detection—this trust model is inadequate for the evidentiary standard that regulations increasingly demand.

Traditional logging answers the question ‘What did the system report?’ Merkle-DAGs answer the question ‘What did the system actually do, and can you prove no records are missing?’

Merkle-DAGs: The Data Structure

A Merkle-DAG (Merkle Directed Acyclic Graph) is a data structure where every node is identified by the cryptographic hash of its contents, and nodes link to one another by referencing these hashes. The structure inherits its name from Ralph Merkle, who invented the Merkle tree in 1979 as part of his doctoral thesis on public-key cryptography and digital signatures at Stanford University. Merkle’s original insight was that a hierarchical hashing structure could enable efficient verification of data integrity—proving that a specific item belongs to a large set without disclosing the entire set or incurring linear computational costs.

From Merkle Trees to Merkle-DAGs

A Merkle tree is a balanced binary tree where each leaf node contains the hash of a data block, and each internal node contains the hash of its children. The root hash—a single value at the top of the tree—summarizes the entire dataset. Changing any leaf alters the root, making tampering instantly detectable.

A Merkle-DAG generalizes this structure by relaxing two constraints. First, there are no balance requirements—the graph can have any shape. Second, every node can carry a payload, not just leaf nodes. Most importantly, in a DAG, a node can have multiple parents, allowing branches to reconverge. This makes Merkle-DAGs more flexible than strict binary trees while preserving the critical property: content addressing means that the identifier (Content Identifier, or CID) of any node is permanently linked to the contents of that node and all its descendants. Two nodes with the same CID represent exactly the same subgraph.

Core Properties

Immutability. Merkle-DAG nodes are immutable. Any change to a node—its content, its links, its metadata—alters its hash and therefore its CID. Because parent nodes reference children by CID, changing a child invalidates the parent’s hash, which invalidates the grandparent’s hash, cascading up through the entire graph. The result: modifying any historical record creates a detectably different DAG.

Content addressing. Nodes are identified by what they contain, not where they are stored. This eliminates the need for a central naming authority and enables deduplication—identical content always produces the same CID, regardless of where or when it is generated.

Self-verification. The CID of a node is univocally linked to the contents of its payload and those of all its descendants. Any party with the root CID can verify the integrity of the entire graph by recomputing hashes. No trust in the storage provider is required.

Efficient verification. Merkle proofs allow verification of individual records in O(log n) time. To verify that a specific entry exists in a dataset of one million records, an auditor needs approximately 20 sibling hashes—not the entire million-record dataset. Recent benchmarks confirm Merkle tree verification latency of 12.4 to 21.6 microseconds regardless of dataset size, with proof sizes of 96 to 192 bytes.

Acyclicity guarantee. Hash functions are one-way functions. Creating a cycle—where node A references B and B references A—would require finding a hash collision, which is computationally infeasible with SHA-256 (2¹²⁸ resistance). This property guarantees that the evidence chain always moves forward in time.

Where Merkle-DAGs Are Battle-Tested

Merkle-DAGs are not a theoretical construct. They underpin some of the most widely deployed systems in computing. Git, the version control system used by virtually every software development team worldwide, stores repository history as a Merkle-DAG where each commit is a node linked by its content hash. IPFS (InterPlanetary File System) uses Merkle-DAGs to store and retrieve files across a distributed network, with content addressing enabling deduplication and integrity verification without a central authority. Bitcoin and Ethereum use Merkle trees to organize transactions within blocks, enabling lightweight clients to verify transaction inclusion using only the block header and a logarithmic-sized proof. Amazon DynamoDB uses Merkle trees for efficient comparison and reconciliation of state between database replicas.

How THEMIS Implements Merkle-DAGs for AI Governance

THEMIS, Apotheon.ai’s governance runtime, uses Merkle-DAGs as its core evidence layer. Every AI agent action that passes through THEMIS generates an evidence node that is cryptographically linked to its predecessors, creating a tamper-evident chain that records not just what happened, but who authorized it, what policy was evaluated, and whether compliance was proven.

Evidence Node Structure

Each THEMIS evidence node contains eight fields, all of which are hashed together to produce the node’s CID:

content_hash: A SHA-256 hash of the agent action’s content—the prompt, response, tool invocation, or data access. The raw content is stored separately (encrypted at rest); the hash proves what content was processed without including the content itself in the chain.

parent_cid: The CID of the previous node(s) in the chain. This is the link that creates the DAG structure. Because the parent’s CID is embedded in the child’s hash, modifying any historical node changes every descendant’s CID—making tampering cascadingly detectable.

policy_result: The outcome of THEMIS’s policy evaluation: PASS, DENY, or WARN. This proves that governance was evaluated at the moment of action, not retrofitted.

zk_proof: An optional zero-knowledge proof (zk-SNARK, approximately 200 bytes) proving that the policy was correctly evaluated without revealing the policy’s internals or the data it operated on. This enables auditors to verify compliance without accessing sensitive information.

caller_identity: A signed identifier for the agent, user, or service that initiated the action. This is cryptographically linked to the organization’s identity provider, creating an unbroken chain of accountability.

tenant_id: The namespace identifier that ensures multi-tenant isolation. Evidence from different tenants exists in separate DAG branches, preventing cross-contamination.

timestamp: An ISO-8601 timestamp with a cryptographic nonce to prevent replay attacks. Combined with the monotonic sequence number, this enables total ordering of events within a tenant.

node_signature: An ECDSA or Ed25519 digital signature over the entire payload, generated by THEMIS’s signing key (stored in a hardware security module or vault). This provides non-repudiation—the node can be verified without contacting THEMIS.

The Append-Only Guarantee

THEMIS’s Merkle-DAG is append-only by construction, not by policy. New nodes reference existing nodes by CID. Because CIDs are derived from content hashes, there is no mechanism to insert a node “before” an existing node without changing the existing node’s hash (which would cascade through all its descendants). Similarly, deleting a node breaks the parent reference of every node that links to it. The append-only property is not enforced by access controls that could be circumvented—it is enforced by the mathematics of cryptographic hash functions.

External Anchoring

Periodically, THEMIS computes the current Merkle root—the single hash that summarizes the entire evidence chain—and commits it to an external witness. This witness can be a public blockchain (Ethereum, for maximum independence), a notarization service (RFC 3161 timestamping authority), or an organizational audit system (such as an enterprise ledger maintained by internal audit). The anchored root creates an independent timestamp that cannot be altered by either the AI system operator or Apotheon.ai. An auditor can compare the current Merkle root against the anchored root to verify that no records have been modified, deleted, or reordered since the anchor was committed.

Why Merkle-DAGs Are Superior to Traditional Logging

The following table provides a systematic comparison across every dimension that matters for AI governance audit trails.

Regulatory Requirements and Merkle-DAG Alignment

EU AI Act (Article 12: Record-Keeping)

Article 12 of the EU AI Act mandates that high-risk AI systems must technically allow for the automatic recording of events (logs) over the lifetime of the system. These logs must enable identification of situations that may result in risk, support post-market monitoring, and facilitate operational oversight. Article 19 requires providers to retain these automatically generated logs for a minimum of six months. The draft standard ISO/IEC DIS 24970 (AI System Logging) is being developed to provide concrete implementation guidance, but it will not be finalized until late 2026—after the August 2026 general application deadline for high-risk obligations.

A critical observation from industry analysis: Article 12 implicitly requires tamper-resistant logging, yet no harmonized technical standard specifies how tamper resistance should be implemented. THEMIS’s Merkle-DAG evidence chain provides a ready-made answer: mathematically tamper-evident records with external anchoring, retention as long as required, and automatic generation without manual intervention. The hash chain itself constitutes the “automatic recording of events” that Article 12 requires, with the added property that its integrity is independently verifiable.

HIPAA (Security Rule)

HIPAA’s Security Rule requires covered entities to implement audit controls that record and examine activity in information systems containing protected health information (PHI). The 2025 Security Rule updates emphasize cryptographic safeguards. THEMIS’s Merkle-DAG provides a complete audit trail of every AI agent interaction with PHI—including the zero-knowledge proof that no PHI was exposed in agent outputs—with tamper evidence that satisfies HIPAA’s integrity requirements. The crypto-shredding mechanism addresses the intersection of audit retention and the HIPAA minimum necessary standard.

SOC 2 (Trust Services Criteria)

SOC 2 Type II audits require evidence that controls operated effectively over a period of time. Traditional approaches rely on periodic sampling—auditors test a sample of transactions and extrapolate. THEMIS’s Merkle-DAG enables continuous evidence: every transaction is recorded, and the hash chain proves that the recorded set is complete and unmodified. This shifts auditing from sampling to census verification, reducing both the probability of undetected violations and the effort required by auditors.

SEC 2026 Examination Priorities

The SEC’s 2026 examination priorities place AI governance alongside cybersecurity as a top-tier concern. For financial services firms deploying AI agents in trading, compliance, and customer service, the ability to produce an immutable audit trail of every AI decision is becoming a baseline expectation. THEMIS’s Merkle-DAG provides exactly this: a tamper-evident record of every agent action with cryptographic proof of policy compliance, identity verification, and temporal ordering.

Advanced: Merkle-DAG Engineering for Scale

Hash Function Selection

THEMIS uses SHA-256 as its default hash function, providing 128-bit collision resistance (2¹²⁸ operations to find a collision). SHA-256 is standardized by NIST, supported by hardware acceleration on modern CPUs (Intel SHA Extensions, ARM Cryptographic Extensions), and is the same algorithm used by Bitcoin and Git. For post-quantum readiness, THEMIS’s hash function is pluggable—organizations can substitute SHA-3 or BLAKE3 without changing the DAG structure. NVIDIA’s cuPQC library (v0.4) demonstrates GPU-accelerated Merkle tree computation with hash-based post-quantum signature schemes, validating that the performance overhead of Merkle structures is manageable even at extreme scale.

DAG Topology for Multi-Agent Systems

In single-agent systems, the Merkle-DAG is a simple hash chain: each node has exactly one parent, forming a linked list. In multi-agent systems—where agents operate concurrently, spawn sub-agents, or collaborate on shared tasks—the DAG branches. Each agent maintains its own chain, and coordination events create merge nodes with multiple parents. This is precisely analogous to how Git manages concurrent development: branches diverge, work proceeds in parallel, and merges create nodes with multiple parents. The Merkle-CRDTs research (2020) demonstrates that this structure naturally embeds causality information, allowing any party to reconstruct the causal ordering of events across agents.

Crypto-Shredding for GDPR Compliance

The tension between audit retention and the right to erasure is one of the most challenging problems in AI governance. THEMIS resolves it through crypto-shredding. All personally identifiable information within evidence nodes is encrypted with a per-user key stored in a separate vault (HashiCorp Vault, AWS KMS, Azure Key Vault, or GCP KMS). When a data subject exercises their right to erasure, THEMIS destroys the encryption key. The evidence node persists—its hash, parent references, policy results, and timestamps remain intact, preserving the integrity of the audit chain. But the personal data within the node becomes cryptographically inaccessible. The log entry proves that an action occurred and was governed; it no longer reveals whose data was involved.

Performance Characteristics

Evidence node creation in THEMIS adds sub-10 milliseconds of latency per agent action. The hash computation itself (SHA-256 of the node payload) takes microseconds on modern hardware. The majority of the latency budget is consumed by the digital signature (ECDSA signing, approximately 1–3 ms) and the vault interaction for key retrieval (1–5 ms, amortized via caching). Merkle root recomputation is incremental—only the path from the new leaf to the root is recalculated, requiring O(log n) hash operations regardless of the total number of nodes. External anchoring is batched (configurable interval, typically every 60 seconds), amortizing the cost of the external write across all nodes created in that window.

Real-World Scenarios

Healthcare: Proving HIPAA Compliance Across 10 Million Patient Interactions

A hospital system deploys AI agents for patient triage, appointment scheduling, and clinical decision support. Over one year, the system processes 10 million patient interactions, each involving protected health information. A HIPAA auditor requests evidence that PHI was never exposed in agent outputs across the entire dataset.

With traditional logging, the hospital would need to search 10 million log entries, verify that none were modified, and demonstrate that no entries are missing. This is operationally infeasible and fundamentally unprovable—the auditor cannot verify that logs were not selectively deleted.

With THEMIS’s Merkle-DAG, the hospital provides the current Merkle root and the externally anchored root from the start of the audit period. The auditor verifies that these are consistent (no modifications since anchoring). For any specific patient interaction, the auditor requests an inclusion proof—approximately 23 sibling hashes—and verifies in microseconds that the record exists, includes a zk-SNARK proving PHI compliance, and has not been altered. Completeness is mathematically provable: any missing node would change the root hash.

Financial Services: Demonstrating Fair Lending Compliance

A bank deploys AI agents for credit underwriting. Regulators require evidence that the model does not discriminate across protected classes. With traditional logging, the bank produces model output logs and statistical analyses. The regulator has no mechanism to verify that the logs are complete or that unfavorable results were not selectively removed.

With THEMIS, every credit decision generates a Merkle-DAG entry containing a zero-knowledge fairness proof—mathematical evidence that demographic parity constraints were satisfied without revealing the applicant’s protected attributes. The hash chain proves that every decision is accounted for. External anchoring proves the evidence existed at the time of the decision. The regulator verifies mathematics, not trust.

Cross-Border Operations: Satisfying Multiple Jurisdictions Simultaneously

A multinational corporation deploys AI agents across the EU, US, and APAC. Each jurisdiction has different logging requirements: the EU AI Act mandates six-month retention with tamper resistance; HIPAA requires six-year retention with cryptographic safeguards; Singapore’s MAS requires real-time auditability. THEMIS’s Merkle-DAG satisfies all three simultaneously: the evidence chain is append-only and tamper-evident (EU AI Act), stored with configurable retention periods and crypto-shredding for GDPR compatibility (HIPAA + GDPR), and queryable in real-time with O(log n) inclusion proofs (MAS).

Conclusion: From Logging to Evidence

The shift from traditional logging to Merkle-DAG evidence chains is not incremental—it is categorical. Traditional logs record what a system reports. Merkle-DAGs prove what a system did. Traditional logs require trust in the log producer. Merkle-DAGs require trust only in the mathematics of hash functions—mathematics that has been tested across billions of transactions in Git, IPFS, Bitcoin, and Ethereum over two decades.

For AI governance, this distinction is existential. As AI agents make more consequential decisions—in healthcare, finance, defense, and critical infrastructure—the evidentiary standard for accountability will continue to rise. Regulators will not accept “we logged it” when they can demand “prove it.” THEMIS’s Merkle-DAG evidence layer provides that proof: tamper-evident by construction, verifiable in microseconds, complete by mathematical guarantee, and anchored to independent witnesses that neither the AI operator nor the governance provider can retroactively alter.

The audit trail of the future is not a log file. It is a cryptographic evidence chain. THEMIS builds it into the runtime of every AI agent it governs.

Learn more at apotheon.ai | Request a demo of THEMIS

References

Merkle, R. C. (1979). “Secrecy, Authentication, and Public Key Systems.” Ph.D. Dissertation, Stanford University. Formalized the Merkle tree for digital signatures.

Merkle, R. C. (1989). “A Certified Digital Signature.” Advances in Cryptology — CRYPTO ’89. First published formal definition of the Merkle tree.

IPFS Documentation. “Merkle Directed Acyclic Graphs (DAGs).” https://docs.ipfs.tech/concepts/merkle-dag/

Sanjuan, H. et al. (2020). “Merkle-CRDTs: Merkle-DAGs Meet CRDTs.” arXiv:2004.00107. Demonstrated DAG-embedded causality for distributed systems.

Wikipedia / Grokipedia. “Merkle Tree.” Historical survey of applications in blockchain, Git, IPFS, DynamoDB, and P2P systems.

Watanabe, Y. et al. (2025). “Proof of Authenticity of General IoT Information with Tamper-Evident Sensors and Blockchain.” IEEE R10-HTC 2025. arXiv:2512.18560.

Zhou, J. et al. (2023). “Dynamic Data Integrity Auditing Based on Hierarchical Merkle Hash Tree in Cloud Storage.” Electronics 12(3):717, MDPI.

ForensiBlock (2023). “A Provenance-Driven Blockchain for Digital Forensics.” arXiv:2308.03927. Distributed Merkle trees for evidence traceability.

Yu, M. et al. (2019). “Coded Merkle Tree: Solving Data Availability Attacks in Blockchains.” IACR ePrint 2019/1139.

Gupta, A. et al. (2025). “Mastering Hash Functions, Digital Signatures & Merkle Trees.” JSRT Journal. Benchmarks: 12.4–21.6 µs verification, 96–192 byte proofs, 231x speedup.

NVIDIA (September 2025). “Improve Data Integrity with Accelerated Hash Functions and Merkle Trees in cuPQC 0.4.” Technical Blog.

Liu, X. et al. (2025). “Authenticated Private Set Intersection: A Merkle Tree-Based Approach.” arXiv:2506.04647. Formal analysis of Merkle tree integrity properties.

PMC / PLOS ONE (2024). “Algorithm for Key Transparency with Transparent Logs.” PMC11585852. 95% key verification under tamper, effective attack detection.

OWASP (2025). “A09: Security Logging and Alerting Failures.” OWASP Top 10:2025.

OWASP (2025). “Top 10 for Agentic Applications.” Agentic AI Security Initiative.

NetWitness (February 2026). “SIEM Log Management: 6 Costly Mistakes to Avoid.” Dell: 125B log events/day statistic.

Latitude (2025). “Audit Logs in AI Systems: What to Track and Why.” Best practices for cryptographic integrity in AI logging.

European Union (2024). EU AI Act, Article 12: Record-Keeping; Article 19: Automatically Generated Logs. Regulation (EU) 2024/1689.

ISO/IEC DIS 24970 (2025–2026, draft). “Artificial Intelligence — AI System Logging.” Under development by CEN-CENELEC JTC 21.

VDE (2026). “EU AI Act: AI System Logging.” Analysis of Article 12 requirements and ISO/IEC 24970 alignment.

VeritasChain Standards Organization (January 2026). “The EU AI Act’s Logging Requirements Are Clear. The Implementation Standards Are Not.” Analysis of crypto-shredding for GDPR/AI Act reconciliation.

SecurePrivacy (2026). “EU AI Act 2026 Compliance Guide: Key Requirements Explained.” Implementation analysis.

SEC (November 2025). 2026 Examination Priorities: AI Governance and Cybersecurity.

HIPAA Security Rule (2025 update). Cryptographic safeguards and audit control requirements.

Apotheon.ai (2026). THEMIS: Trusted Hash-Based Evidence Management & Integrity System. Internal Technical Documentation.

Download Complete Whitepaper PDF

Get the full technical analysis including architecture diagrams, competitive comparison table, complete references, and implementation guidelines.

View All Whitepapers