Skip to main content

Mining attacks on PoRA (Proof of Random Access)

· 13 min read

This is a crosspost with ethresear.ch/t/mining-attacks-on-pora-proof-of-random-access.

Abstract

This article analyzes the security of Proof of Random Access (PoRA) consensus mechanism against potential mining attacks. The focus is on two main attack vectors: the shrink attack, where an attacker uses cheaper or smaller storage devices with the same throughput to gain an economic advantage, and the Moore attack, where an attacker leverages advancements in technology to increase throughput while maintaining the same cost. The article examines these attacks under both unlimited and limited mining throughput scenarios and provides mathematical analysis to determine the conditions under which the attacks could be economically viable for miners.

Security model

In our security model, we consider an attacker who aims to optimize their mining procedure to gain an economic advantage over honest miners. The attacker can employ various strategies to achieve this goal:

  1. Equipment replacement: The attacker may replace their storage equipment with more efficient or cost-effective alternatives. This strategy allows the attacker to maintain or increase their mining performance while reducing costs.

  2. Partial fraction mining: The attacker can utilize a portion of their storage space and throughput for one client while using the remaining resources for another client. This allows the attacker to optimize their resource allocation and potentially gain an advantage by serving multiple clients simultaneously.

We will analyze two specific attack vectors:

  1. Shrink attack: In this attack, the attacker replaces their storage device with a cheaper one that offers the same performance. For example, an attacker might replace a more expensive 1TiB NVMe drive with a cheaper 1TB drive that has the same throughput. This allows the attacker to reduce their storage costs while maintaining their mining performance. The key idea behind the shrink attack is that the attacker can take advantage of the fact that the consensus mechanism may not distinguish between storage devices of different sizes as long as they offer the same throughput.

  2. Moore attack: This attack involves the attacker replacing their storage equipment with a newer, more advanced version that offers better performance at the same cost. For example, an attacker might replace a gen 4 storage device with a gen 5 device that has twice the throughput at the same cost. This allows the attacker to increase their mining performance without incurring additional expenses.

It is important to note that the Proof of Random Access (PoRA) consensus mechanism is well-scalable, meaning that there is no single machine limitation. This scalability allows for the possibility of using RAM rigs on tiny devices, such as single-board computers or even smartphones, to participate in the mining process. The absence of a single machine limitation opens up new opportunities for attackers to optimize their mining setups and potentially gain an advantage over honest miners.

Furthermore, the ability to perform partial fraction mining, where an attacker can allocate a portion of their storage space and throughput to one client while using the remaining resources for another client, adds another layer of complexity to the security model. This flexibility in resource allocation allows attackers to optimize their mining strategies and potentially gain an edge by serving multiple clients simultaneously.

Throughout the following sections, we will examine these attack vectors in detail, considering both unlimited and limited mining throughput scenarios. Our analysis will focus on determining the conditions under which these attacks could be economically viable for miners, and we will provide mathematical derivations to support our findings

Shrink attack

In the following sections, we consider three different attacks. The first attack assumes that the mining throughput is not limited and shows how in this case the adversary can gain advantage over honest miners by using a different hardware configuration. In the follwing two attacks, we assume that the mining throughput was limited to mitigate the first attack, but we show that new issues arise: adversary can be economically incentivised to drop some part of the stored file, and perform Moore attack using more efficient hardware than that of honest miners.

In our costs analyses we make a few reasonable assumptions about the relationship between the costs and make conservative estimates of adversary advantage.

Unlimited throughput: achieving advantage over honest miners

In this scenario, we consider the case where an attacker reduces the size of their memory module to gain an economic advantage. We make a pessimistic assumption that if the memory size is reduced by half, the maintenance cost (energy consumption) and throughput will remain the same, while the cost will be reduced by half. In reality, the energy consumption would likely be lower, but this assumption can only make our analysis more conservative.

To compare the cost efficiency of the attacker and the reference miner, we normalize the values by the cost and present them in the following table:

CostMaintenanceThroughput
reference1AA1
attacker1χA\chi Aχ\chi
  • The reference miner's cost of purchasing one unit of hardware is set to 1 as a baseline; this is without loss of generality as we normalize other values with respect to this, eliminating a free variable.

  • A1A \sim 1 represents the maintenance cost (energy consumption) per time unit for the reference miner, which is assumed to be close to 1 for simplicity.

  • The reference miner's throughput is also normalized to 1.

  • The attacker's cost is set to 1, assuming that one unit of attacker's hardware has the same cost as that of a reference miner. This is a conservative estimate, since one unit of attacker's hardware is assumed to be less efficient than that of a reference miner.

  • The attacker's maintenance cost is χA\chi A, where χ>1\chi > 1 represents the throughput advantage of the attacker. This is because the attacker's memory size is smaller, but the energy consumption per unit of memory remains the same.

  • The attacker's throughput is χ\chi, reflecting their advantage in terms of throughput per unit cost.

To compare the total cost efficiency, we calculate the throughput per unit of total cost (cost + maintenance) for both the reference miner and the attacker:

Reference miner: 11+A\frac{1}{1+A}

Attacker: χ1+χA=11/χ+A\frac{\chi}{1+\chi A} = \frac{1}{1/\chi+A}

Since χ>1\chi > 1 when the attacker reduces the memory size, we can conclude that:

11+A<χ1+χA=11/χ+A\frac{1}{1+A} < \frac{\chi}{1+\chi A} = \frac{1}{1/\chi+A}

This inequality demonstrates that the attacker has a better total cost efficiency compared to the reference miner.

Therefore, the original PoRA is not resistant to shrink attacks under unlimited mining throughput conditions. The only way to protect against this vulnerability is to limit the mining rewards, which would discourage attackers from exploiting this weakness.

Limited throughput: not storing part of the file

In this scenario, we consider the case where the mining throughput is limited to an optimal value of 1, and we analyze the cost efficiency for an attacker who uses only a fraction pp of their memory.

Let's define the following variables:

  • nn: the number of random accesses

  • q=1p1q = 1 - p \ll 1, assuming qn1qn \lesssim 1

  • nen_e: the efficient average number of accesses, given by ne=(1pn)/(1p)(1exp(qn))/qn_e = (1-p^n)/(1-p) \approx (1 - \exp(-qn)) / q

  • psp_s: the success probability, given by ps=pnexp(qn)p_s = p^n \approx \exp(-qn)

  • τ\tau: the slowdown of sampling, given by τ=ne/(nps)=(exp(qn)1)/(qn)\tau = n_e/(n \cdot p_s) = (\exp(qn)-1)/(qn)

  • BB: the sampling cost, assumed to be much smaller than 1 (B1B \ll 1) to ensure that the main cost of the algorithm is not CPU PoW

We can compare the reference miner and the attacker using the following table:

CostMaintenanceSamplingThroughput
reference1AABB1
attacker1χA\chi AχB\chi Bχ\chi
attackerppχpA\chi p AχpBτ\chi p B\tauχp\chi p
  • The reference miner's costs and throughput are normalized to 1.

  • The attacker's cost and throughput are scaled by χ\chi when using the full memory.

  • When the attacker uses only a fraction pp of their memory, their cost, maintenance, and throughput are scaled by pp, while the sampling cost is scaled by pτp\tau to account for the slowdown in sampling.

For qn1qn \lesssim 1, we have τ1\tau \sim 1, which means Bτ1B\tau \ll 1.

To consume all throughput, the attacker must satisfy the equation: χp=1\chi p = 1.

For efficient mining, the following condition must be met:

p(1+χA)+Bτ<1+A+Bp\cdot (1 + \chi A) + B \tau < 1 + A + B

Simplifying this condition, we get:

p+(τ1)B<1p + (\tau - 1) B < 1

q>(τ1)BqnB/2q > (\tau - 1) B \approx qnB/2

nB2n B \lesssim 2

To estimate BB, let's consider the example of a Samsung 970 SSD with a throughput of 2GB/s, TDP of 6W, and a value size of 1MB. The hash efficiency for CPU is 30MH/J, and for ASIC, it is 3GH/J.

The additional TDP for sampling will be:

  • For CPU: 2e9/1e6/30e6=6e-5W2e9/1e6/30e6 = 6\text{e-5}W

  • For ASIC: 2e9/1e6/3e9=6e-7W2e9/1e6/3e9 = 6\text{e-7}W

By dividing these values by the TDP, we can roughly estimate BB to be in the range of 1e-51\text{e-}5 to 1e-71\text{e-}7.

This means that nn should be greater than 1e51\text{e}5 to 1e71\text{e}7 to make the shrink attack inefficient, which may not be practical in real-world scenarios.

When qn1qn \lesssim 1, pp can take any value. For example, with a storage size of 1TB, value size of 1MB, n=1e4n=1\text{e}4, and B=1e-5B=1\text{e-}5, we get q=1e-4q=1\text{e-}4, which means that 100 MB of data could be forgotten while still providing economic benefits for the miner.

Limited throughput: Moore attack

When considering the Moore attack, it's important to note that miners will align their throughput to the limit imposed by the system. Let's analyze the cost efficiency of the reference miner and the attacker in this scenario.

Cost + MaintenanceSamplingThroughput
reference1BB1
attacker1BBχ\chi
attackerppθBpτ\theta Bp\tauθχp\theta \chi p
  • The reference miner's cost, maintenance, and throughput are normalized to 1, with a sampling cost of BB.

  • After upgrading their hardware, the attacker's cost and maintenance remain the same, but their throughput increases by a factor of χ\chi.

  • When the attacker uses only a fraction pp of their upgraded hardware, their sampling cost is scaled by θpτ\theta p\tau, where θ\theta is a throughput utilization parameter, and their throughput is scaled by θχp\theta \chi p.

θ\theta represents the throughput utilization parameter, which indicates the fraction of the attacker's upgraded throughput that they actually use. For example, if θ=0.8\theta = 0.8, the attacker is utilizing 80% of their upgraded throughput. This parameter allows us to model situations where the attacker may not be using their full upgraded capacity, either intentionally or due to technical limitations.

To consume all available throughput, the attacker must satisfy the equation: θχp=1\theta \chi p = 1.

For efficient mining, the following condition must be met:

p(1+θBτ)<1+Bp \cdot (1 + \theta B \tau) < 1 + B

Expanding this condition using the approximations for τ\tau and psp_s from the previous chapter, we get:

(1q)(1+χ1B(1+qn/2))<1+B(1-q) (1 + \chi^{-1} B (1 + qn/2)) < 1 + B

Simplifying further:

χ1nB/21Bχ1qnB/2<0\chi^{-1} nB/2 - 1 - B - \chi^{-1} qnB/2 < 0

χ1pnB/2<1+B\chi^{-1} pnB/2 < 1 + B

nB2χn B \lesssim 2 \chi

To find the optimal values for the arbitrary parameters θ\theta and pp, we need to perform additional calculations. Taking the partial derivative of p(1+χ1Bτ)p \cdot (1 + \chi^{-1} B \tau) with respect to qq, we get:

q(p(1+χ1Bτ))(1+χ1B(1+qn/2)+χ1Bn/2(1+qn/3))=χ1Bn/2(1+χ1B)+χ1Bqn(n/31/2)=0\partial_q (p \cdot (1 + \chi^{-1} B \tau)) \approx -(1 + \chi^{-1} B (1 + qn/2) + \chi^{-1} Bn/2 \cdot (1 + qn/3)) = \chi^{-1} Bn/2 - (1 + \chi^{-1} B) + \chi^{-1} B q n (n / 3 - 1/2) = 0

Solving for qq, we get:

q=Bn/2+χ+BBn(n/31/2)>0q = \frac{-Bn/2 + \chi + B}{B n (n / 3 - 1/2)} > 0

This result suggests that nn should be greater than 1e51\text{e}5 to 1e71\text{e}7 to make the Moore attack inefficient.

For example, consider a storage size of 1TB, value size of 1MB, n=1e4n=1\text{e}4, B=1e-5B=1\text{e-}5, and χ=2\chi=2. Plugging these values into the equation for qq, we get q=0.005q=0.005, which means that 5GB of data could be forgotten while still providing economic benefits for the attacker.

RAM rig

In the original PoRA paper, the authors compare the performance of a Samsung 970 EVO NVMe SSD and 256GB DDR4-3200 RAM. Based on the calculations in the previous sections, we arrive at a counterintuitive conclusion: when there are no throughput limitations, only the throughput matters, not the size of the storage. To further illustrate this point, let's compare the efficiency of a Crucial T705 1TB NVMe SSD and Crucial 8GB DDR5-4800 RAM.

Cost (USD)TDP (W)Throughput (GB/S)
NVMe1881513.7
DDR5251072

The table above compares the cost, thermal design power (TDP), and throughput of the two storage devices. The NVMe SSD has a higher cost and TDP but a lower throughput compared to the DDR5 RAM.

To calculate the cost efficiency of each device, we need to consider the maintenance cost and the amortization of the equipment over its lifetime. Let's assume that the maintenance cost for 1W of power is about 4.4 USD per year and that the equipment is amortized over 4 years.

For the NVMe SSD, the cost per 1 GB/s of throughput per year is:

(188/4+154.4)/13.7=8.25(188/4 + 15*4.4) / 13.7 = 8.25 USD

For the DDR5 RAM, the cost per 1 GB/s of throughput per year is:

(25/4+104.4)/72=0.70(25/4 + 10*4.4) / 72 = 0.70 USD

The results show that the DDR5 RAM is significantly more cost-efficient than the NVMe SSD when considering the cost per 1 GB/s of throughput per year. This finding supports the idea that, in the absence of throughput limitations, using high-throughput RAM can be more economically viable for mining than using NVMe SSDs, despite the difference in storage capacity.

Conclusion

The analysis of the shrink and Moore attacks on the PoRA consensus mechanism highlights potential vulnerabilities in the system. The article demonstrates that without proper limitations on mining rewards and a sufficiently high number of random accesses, attackers could gain economic benefits by using cheaper, smaller storage devices or leveraging advancements in technology to increase throughput. To mitigate these risks, the PoRA mechanism should be designed with appropriate parameters, such as limiting mining rewards and ensuring a high number of random accesses. Additionally, the comparison between NVMe storage and RAM suggests that RAM-based mining rigs could pose a significant threat to the security of the system, as they are more cost-effective per unit of throughput.

Further research

We are planning to publish soon an article with green (no PoW inside) proofs of storage, based on statistics, economics, and zkSNARK cryptography, suitable for our decentralized storage research, available at:

Minimal fully recursive zkDA rollup with sharded storage

· 4 min read

Current zk rollup state

zkRollups scale execution efficiently, but publish all blocks at L1. This is not scalable for storage and forbids recursive rollups: if we deploy a rollup on a rollup, we need to publish all the blocks of the inner rollup on the outer rollup, and the outer rollup will publish all its blocks on L1.

native rollup

There were some attempts to solve this problem, like validiums, but they are weak on both decentralization and security (2 of 3 in Vitalik's trilemma).

Existing improvements in unlocking data availability and decentralized storage

Chia

Chia introduced a novel consensus algorithm called Proof of Space and Time (PoST), which provides a more decentralized and energy-efficient alternative to Proof of Work (PoW): Proof of Space-Time (PoST). PoST is a consensus algorithm that uses storage space as a resource to secure the network. The current capacity of Chia Network is 33 EiB.

EthStorage

Ethstorage is replication-based DA and storage, managed by a smart contract.

Our results

In our research draft we propose a solution for storage and data availability, friendly to zk rollups and unlocking new scalability opportunities.

Sharding instead of replication

It is proposed to use kk of nn threshold data representation. So, any kk numbers from the source file are transformed into nn numbers. And any kk of these nn numbers can restore the source kk numbers. This is called Shamir's Secret Sharing.

This approach allows us to utilize storage 10-20 times more efficiently than the replication-based approach, according to our modeling.

Also, it gives us better protection from physical-level attacks, like target node destruction.

Unlimited horizontal scalability

We propose to use a 2-level nested rollup structure (below we will describe, why it is possible). The top-level rollup manages participants of low-level rollups and mixes them to prevent the accumulation of malicious participants in one low-level rollup. Low-level rollups manages the data, stored in the nodes.

Polynomial commitments everywhere

We propose to use Merkle trees on the top level of database. However, the minimal structure is a polynomial commitment to a cluster of data. So, it is very friendly to rollups, because we can use the same polynomial commitment to represent the rollup's block.

Also, out of the box we have data availability oracle (just provide random polynomial lookup on the commitment) and all linear algebra we needed for sharding.

Data mining

Nodes can use the data for mining, like in Chia. And the result of mining is zero-knowledge proof of data availability.

The complexity of storage is leveled, so it is the same complexity to store random data or zeros.

Nodes can join to network with trustless zk proof of their capacity.

Bring it all together

ZK Rollups usually publish on-chain proof of execution and data of the block. But our data availability and proof of storage are zk. So, we can merge it all together and publish the proof of execution and data availability and storage in one single ZK proof.

It unlocks the deployment of rollups on rollups, and the rollups on rollups on rollups, and so on. And way to transform Web2 into Web3.

Also, we can prevent the bloating of the blockchain: if we publish the snapshot state of the rollup, previous history could be removed.

zkDA rollup

Some economics

On 1st Jan 2024 cost of storage, 1GiB was:

  • Ethereum $1.8M
  • EthStorage $10k
  • Celestia $300
  • Near $10

Based on Hetzner sx294 with 8 blowup factor (what we need for >100 bits of security), the annual cost of storage 1GB is $0.15 usd.

The cost will be lower on specialized rigs.

Call for discussion and feedback

We believe our proposed solution has the potential to significantly improve the scalability and efficiency of zk rollups and upgrade Web2 to Web3. However, we acknowledge that this is still a research draft and there may be challenges or considerations we haven't fully addressed.

We welcome discussion, feedback, and constructive criticism from the community. If you have insights, ideas, or see potential issues with our approach, please share them.

Blockchain Sharded Storage. Web2 Costs and Web3 Security with Shamir Secret Sharing

· 25 min read

This is a crosspost with ethresear.ch/t/blockchain-sharded-storage-web2-costs-and-web3-security-with-shamir-secret-sharing.

Igor Gulamov, ZeroPool, March 2024 Thanks to Ivan Oleynikov for editing and feedback.

Abstract

CPU scaling for blockchain is solved. However, storage scaling is still a problem. This document describes a horizontally scalable fault-tolerant storage solution for blockchain that can process large amounts of data (beyond petabytes) with Web2 storage overhead and Web3 security. With this solution, rollups no longer need to store their blocks on-chain. In other words, we can upgrade validiums to rollups and nest the rollups recursively into each other with close to zero cost of data storage.

Our solution uses Shamir's Secret Sharing to split the payload into shards and distribute it among nodes for storage, and later retrieve the shards and recover the payload using Fast Fourier Transform. Nodes are paid by the file owner for storing the shards, which is periodically verified using zkSNARK. We employ a special shuffling technique to decide which nodes will store the shards of a given file. As long as at least half of the network behaves honestly, this technique ensures that a malicious adversary can not cause a DoS attack on the file by controlling a critical number of its shards.

We present the details of our solution, analyze the cryptographic security guarantees it provides and propose a set of economic incentives to motivate honest node behavior.

Introduction

Problem Statement

One of the solutions used for storage scaling on blockchain today is a replication of data. It stores each chunk of payload on multiple nodes. Nodes produce zero-knowledge proofs of data availability. When the number of nodes storing the chunks is low, the network distributes the chunks to other nodes.

Let's discuss the features of this approach that make it more expensive than Web2 storage.

  1. Redundancy. If we want to build a 50% fault-tolerant network with 128 bits of security, we need to replicate data 128 times. It means that the network should store and transfer 128 times more data than the payload.

  2. Preventing deduplication. The replicas of stored data are identical, therefore the nodes that are storing them could try to collude and deduplicate it: collectively store only one replica, saving the costs, but bill the network for storing many replicas. The network mitigates this by applying a distinct encoding to each replica and requiring each node to periodically prove that it holds the encoding given to it. This encoding makes conversion between replicas computationally hard, making deduplication impractical.

    The drawback of this approach is that zero-knowledge proofs now need to be aware of the encoding, adding extra computational overhead for honest nodes.

  3. Data distribution. The nodes holding replicas of a file may go offline at any moment. When this happens, the network redistributes the replicas to more nodes to ensure the needed level of redundancy. An adversary who controls a large portion of the network could use this mechanism to try to collect all replicas of a given file.

    The network mitigates this by periodically randomly shuffling nodes between the pools of different files. Informally speaking, this ensures that the fraction of the adversary's nodes in each pool stays close to the fraction of its nodes in the whole network.

In the following sections, we propose our solution to this problem with better security and performance than replication. Additionally, our solution is natively zkSNARK-friendly, as its polynomial computations can be efficiently done in a zkSNARK. That means that we can include proofs of data availability in proofs of rollup state transitions with little overhead. It will allow us to upgrade validiums to rollups with close to zero cost of data storage.

Our solution implements data redistribution similarly to the above to account for nodes going offline. We additionally periodically shuffle the nodes between the pools to make sure that malicious nodes are distributed evenly between the pools, and that the adversary can not monotonously increase the presence of malicious nodes in a given pool due to honest nodes of that pool going offline over time.

Use Cases

Our proposed solution can be seen as a very big decentralized HDD with large (megabytes) sectors. As bare metal HDDs, it provides CRUD operations on sectors, which is directly inefficient for many cases like small files or databases. However, it is very efficient for storing big files, like videos, images, and backups.

Also, it is efficient for rollups: the rollups can store their blocks in sectors of the network and merge state transition zk proofs with proofs of data availability.

This approach makes validiums obsolete. We can upgrade validiums to rollups keeping the same level of security and cost of the storage.

On the rollup level, we can implement all remaining use cases, like databases, small files, and so on.

This can help solve the problem of blockchain bloat. We can directly fulfill the rollup state at the checkpoint and then all history of the rollup before the checkpoint could be removed.

Also, very cheap storage may enable us to implement a lot of web2 solutions, like messengers, social networks, blogs, games, and so on on the blockchain with true decentralization.

Preliminaries

Shamir's Secret Sharing

Shamir's Secret Sharing is a method for distributing a secret among a group of participants, each of which is allocated a share of the secret. The secret can be reconstructed only when a sufficient number of shares are combined. The sufficient number is called the threshold. The threshold can be any number between 1 and the total number of shares. The secret cannot be reconstructed from any number of shares less than the threshold.

One of the simplest ways to implement Shamir's Secret Sharing is to use a polynomial of degree N-1. We can represent the N-sized secret as a polynomial of degree N-1. We can evaluate this polynomial at M points and get M values. Then we distribute these values among M participants. The secret can be restored from any N values.

The way we use Shamir's Secret Sharing here can be alternatively characterized as encoding the data with Reed–Solomon Error Correcting Code and decoding with erasures (not errors). Especially since the message we encode is not secret. In the following, we keep calling it Secret Sharing because more readers may be familiar with this term.

For well-selected NN and MM, we can restore the secret if most of the participants will go offline. We will use this property to build a fault-tolerant storage of publicly available data.

Polynomial Computation for Data Recovery

One can recover a secret shared using Shamir's scheme using Lagrange interpolation, we briefly outline the mechanism below.

Let's consider p(x)p(x) is a polynomial of degree N1N-1 and the secret is the evaluation representation of this polynomial over evaluation domain D={0, 1, 2, ..., N1}\mathbf{D}=\{0,\ 1,\ 2,\ ...,\ N-1\}:

S={p(0), p(1), p(2), ..., p(N1)}.\mathbf{S} = \{p(0),\ p(1),\ p(2),\ ...,\ p(N-1)\}.

We will compute the polynomial over the extended evaluation domain 0, 1, 2, ..., M10,\ 1,\ 2,\ ...,\ M-1 and distribute the values to M participants.

Let's represent the case when all participants excluding N are going offline. So, we get the following values:

V={p(k0), p(k1), p(k2), ..., p(kN1)}\mathbf{V} = \{p(k_0),\ p(k_1),\ p(k_2),\ ...,\ p(k_{N-1})\}

over evaluation domain

K={k0, k1, k2, ..., kN1}.\mathbf{K} = \{k_0,\ k_1,\ k_2,\ ...,\ k_{N-1}\}.

Let's define Lagrange polynomials over evaluation domain K\mathbf{K}:

Li(x)=ciji(xkj),\mathbf{L_i}(x) = c_i \prod_{j \neq i} \left(x - k_j\right),

where cic_i is a constant coefficient, so that Li(ki)=1\mathbf{L}_i(k_i) = 1.

Let's define matrix Lij=Li(j)\mathbf{L}_{ij}=\mathbf{L}_i(j).

Then the secret can be restored as follows:

Sj=iViLij\mathbf{S_j} = \sum_{i} \mathbf{V_i} \cdot L_{ij}

What happens if some of the participants are malicious and send incorrect values? There is more than one way to solve this. For our partial case, we will merkelize all values and distribute them to all participants with Merkle proofs. Then we can check the correctness of each value, checking the root of the Merkle proof. If the root is incorrect, we can ignore the value. In terms of error-correcting codes, this corresponds to an erasure.

zkSNARKs

Zero-Knowledge Succinct Non-Interactive Argument of Knowledge is a cryptographic primitive that lets a Prover convince a Verifier that it knows a secret witness yy such that P(x,y)P(x, y) for public polynomially-computable predicate PP and public instance value xx.

They are heavily used to provide privacy for the data a blockchain works with. In this architecture, a smart contract on blockchain only holds a commitment to its state, and clients initiate transactions asking to update that hash providing the zkSNARK proof of the transition being done correctly. This way, the state held by the smart contract (or some parts of such state) can remain private, while still ensuring that state transition happens according to some rules.

Another use-case for zkSNARKs is CPU scaling of blockchain. zkSNARKs allow verifying the proof faster than the computation of predicate PP would take. This way, if verifying state transition requires too many resources, we can use recursion and only verify the final result on the blockchain.

We use zkSNARKs in this solution for both. In the following description, we often make the use of zkSNARKs implicit.

Polynomial Commitment Schemes

Polynomial Commitment Schemes are commitment schemes where one can commit to a polynomial of a fixed degree, and then reveal individual points of that polynomial and prove that the degree of the polynomial committed to is limited.

Selection of Polynomial Commitment Scheme

We propose using FRI, because it is not additive-homomorphic, and it is more suitable for our case. The usage of additive-homomorphic commitment schemes could lead to attacks when the malicious nodes use MPC to compute proofs of data availability without storing all data.

To build a random proof of random opening of the polynomial commitment, the prover should keep all the data. Other nodes cannot help him to compute this proof with the MPC procedure.

Space-time Tradeoff and Plotting

For proof of space-time mining, we need to build a plot which is an array of high entropy data, and computing any one element of this array without storing the whole array should be a hard problem.

The approach how to build plots is described at AACKPR2017.

To build the plot, let's define

f1(x)=h(x),f_1(x)=h(x),

fi+1(x)=h(x,x1),f_{i+1}(x) = h(x, x_1), where

fi(x)+fi(x1)<s0,|f_i(x)+f_i(x_1)| < s_0,

x1=0mods1,x_1 = 0 \mod s_1,

hh is a hash function.

At AACKPR2017 it is shown that the space-time tradeoff formula for fnf_n takes the form

SnT=O(Nn)S^n T = O(N^n).

If n is big enough, it is optimal for a server to store all data.

To perform spacetime proof, the node receives a random challenge cc and should find a s0s_0-close preimage xcx_c of fnf_n:

fn(xc)c<s0|f_n(x_c)-c| < s_0, and also provide all computations of fn(xc)f_n(x_c).

Proof complexity is growing as O(2n)O(2^n), so in practice, it is useful to build proof for k=7k=7 or k=8k=8 (Chia proof of space construction). That means that if the node stores twice less data then it should compute 128 or 256 times more hashes to provide the proof.

Proofs with bigger kk could be used inside zkSNARKs.

Architecture

In this section, we first give a high-level overview of the proposed L1-L3 architecture. Then describe the commissioning and decommissioning of L3 pools. Finally, we discuss plotting, the mechanism nodes use to prove that they have enough space to store the shards.

Overview

Consider a 4-level model of the sharded storage network illustrated below.

Architecture

At the first level, we have the L1 blockchain. The L2 rollup publishes state-to-state transition proofs and the root hash of the state on the L1 blockchain.

We do not need to publish the data of blocks. We are describing sharded storage, so, all data will be safely stored at the nodes and the zk proof contains the proof of data availability.

At the second level, we have the L2 rollup. It checks proofs of space-time for new nodes, adds it to the list of active nodes, removes inactive nodes, and performs mixing of nodes between pools to prevent potential attacks. Also, state-to-state transition proofs for L3 rollups are published here.

At the third level, we have the L3 rollup. The sharding means that we need to convert the data into nn shards when knk\leq n shards are enough to restore the data. The L3 rollup is responsible for consistency between all nodes. Also, users rent space at the L3 rollup using their payment bridges. L3 rollup aggregates proof of the data availability using function interpolation at random points for data blocks.

The L3 rollups run their consensus protocols (e.g. using Proof of authority), and members of the consensus are the nodes of the corresponding pool. They can perform their operations and maintain their state by synchronizing every step with the L1 blockchain, and only referring to when they explicitly choose to (e.g. to save their state hash).

Users and smart contracts can rent space for the tokens with the L3 rollup. So, the set of all L3 rollups is working as a very big decentralized HDD with CRUD operations on sectors of this disk.

At the fourth level, we have storage nodes. The nodes are part of the consensus for the corresponding pool. Also, the nodes store the data and provide proof of data availability. All space of the nodes should be filled with special plots, like in Chia Network, but with some differences, making it more suitable for our case and ZK-friendly.

Commissioning and Decommissioning of L3 pools

Nodes can join and leave the pool at any time. The L2 rollup is responsible for commissioning and decommissioning L3 pools depending on the number of unallocated nodes and the amount of data in the pools.

Commissioning

Commissioning

Commissioning of the pool is a simple process: the L2 rollup selects a random set of nodes from existing nodes in other pools and replaces them with unallocated nodes. When the number of nodes in a new pool reaches a level with enough security, the pool is commissioned and can accept new files for storage.

Decommissioning

Decommissioning

Decommissioning is a more complex procedure. At first, the L2 rollup selects two pools with a low percentage of rented space (both should be <50%<50\% full). Then it moves all data from one pool to another. After that, the nodes of the empty pool are considered to be unallocated and the pool is removed from the list of active pools.

This solution works well if the data blocks are distributed amoung pools unequally, i.e. as many pools as possible are fully filled. This means that fewer pools and fewer nodes are in use, and the resources of those nodes are utlizied to the fullest.

This is similar to disk defragmentation problem: the placement of data on the disk impacts performance, and we would like to distribute it in a way that is more efficient. The difference is that our disk is very big and decentralized, and the user renting space is freely choosing the pool to put her data in (using L2 consensus to assign pools for each file upload will not scale well). Therefore, we address this problem by designing a special economic model, to incentivize the users to utilize the pools already in use to their fullest before touching fresh ones. We assign each pool a fee rate that depends on the percentage of currently rented space. Economic Model section describes it in more detail.

When a pool gets decommissioned, the nodes move the data to a new pool without any confirmation from the users owning that data. The whole procedure needs no interaction from the user, she can be offline the whole time. Later, when the user comes online and wants to retrieve her data, she can look at L2 records (L2 has the records since it decided to decommission the pool), figure out what pool currently stores her data and retrieve it from there.

Polynomial Representation of the Data

Let's consider FiF_i as an N-sized array of data we need to store. We can represent it as table FijF_{ij} with MM rows and KK columns, N=MKN=M \cdot K.

Then we can represent the table as bivariate polynomial F(x,y)F(x,y) of degree M1M-1 over xx and degree K1K-1 over yy:

It can be noted that F(x,y0)F(x,y_0) represents the linear combination of the columns of the table. To distribute the data to K1K_1 nodes, we can evaluate the polynomial at K1K_1 yy points and distribute the values to the nodes.

We can verify the following polynomial equation using the polynomial commitment scheme:

F(x,xM)F(x,y0)=(xMy0)Q(x),F(x,x^M)-F(x,y_0) = (x^M-y_0) \cdot Q(x),

where Q(x)Q(x) is a quotient polynomial.

Plotting

To prevent spam from malicious nodes with not enough space, we should implement an efficient mechanism, allowing nodes to prove, that they have enough space to store the data.

We use the technique described in Space-Time Tradeoff to achieve this.

State of the Node and Data Availability Mining

Each data sector of the node could be represented as a polynomial. The node can store all polynomial commitments inside the Merkle tree. Then proof of data availability could be computed as a set of random openings of the polynomial commitments at random points. Challenge values for the openings and commitment selection could be derived from the timestamps of the blocks of the L2 rollup. The proofs of data availability can be compressed using recursive zkSNARKs.

Cryptographic Analysis

Security Model

In our security model, we assume that at least 50% of the nodes on the network are honest, and the L1-2 consensus is secure, i.e. the L1 and L2 layers of our network are uncorrupted. The only thing that an adversary is allowed to do is spawn malicious nodes (no more than 50% of the network). The malicious nodes are allowed not to follow the prescribed protocol but can deviate from it as chosen by the adversary.

Honest nodes are assumed to not deviate from the protocol unless that lets them earn more (of L1 or L2 tokens that users pay for renter space) than honest behavior would.

Statistical Security Analysis

Let's consider pp as part of honest nodes in the network, So, if a total number of nodes is NN, pNpN are honest, and (1p)N(1-p)N of them are malicious. If shards are distributed by nodes by random, pp also is the probability, that the node will be honest. Then if we have nn shards with threshold kk, the probability that the secret cannot be restored means that only strictly less than kk shards are stored by honest nodes. The probability is defined by the following binomial distribution:

P(p,n,k)=i=0k1(ni)pi(1p)ni,\mathbf{P}(p,n,k) = \sum_{i=0}^{k-1} \binom{n}{i} p^i (1-p)^{n-i},

where

(ni)=n!i!(ni)!\binom{n}{i} = \frac{n!}{i!(n-i)!}

is a binomial coefficient.

For 0.05<p<0.950.05 < p < 0.95, n>30n>30, np>5np>5, n(1p)>5n(1-p)>5, we can use the normal approximation of the binomial distribution (source).

P(p,n,k)12[1+erf(k1/2np2np(1p))].\mathbf{P}(p,n,k) \approx \frac{1}{2} \left[1 + \mathrm{erf}\left(\frac{k-1/2-np}{\sqrt{2np(1-p)}}\right)\right].

The bits of statistical security of this solution could be defined as follows:

S(p,n,k)=log2P(p,n,k).\mathbf{S}(p,n,k) = -\log_2 \mathbf{P}(p,n,k).

Then we can calculate the statistical security for different values of pp, nn, and kk. For example, if p=1/2p=1/2, n=1024n=1024, k=256k=256,

then S(1/2,1024,256)=190\mathbf{S}(1/2,1024,256) = 190 bits of security.

Complexity Leveling and Protection Against Centralized Supernodes

Let's consider the following two attack vectors:

  1. Files have different entropy. Low entropy files are easier to store. But the reward is the same. That means that malicious nodes can generate large entropy files and store them on small disk drives. To prevent this attack, we need to level the complexity of the data.

  2. Also, the nodes can collude and store all the data on one node. If we use k-of-n sharding, this one supernode can store only source data, which has the same complexity as storing k shards only. This is not safe. To prevent this attack, we need to make the same complexity for decentralized and centralized cases, so the nodes will not have any profit from centralization.

All these attacks could be prevented with the following approach:

Each node generates a high entropy plot and commits to function GG, which is very close to this plot. This fact could be verified with random openings of the polynomial:

G(xi)=plot(xi)G(x_i) = \text{plot}(x_i)

If we perform enough random openings, we can be sure that the entropy of GG is high enough.

The seed of the plot should be derived from the commitment of the shard. Then the node can store the sum of the shard and plot and provide proof of data availability for this sum to receive the reward.

F(x)=F(x,y0)+G(x)F'(x) = F(x, y_0) + G(x)

So, minimal storage complexity for all nodes and one malicious supernode is the same, and complexity leveling is achieved: it is enough hard to store the array of zeros and the array of random values.

If GG is not exactly equal to the plot, it does not matter. When the network recovers the data, the honest node can restore the initial data by itself or send the deterministic script on how to do it.

Dynamic Nodes Mixing against Malicious Actors

If sharding was static over time, we would need just initially select the nodes for each pool. However the uptime of the nodes is not infinite, and as honest nodes go offline (for natural reasons), the adversary could use this to concentrate its malicious nodes in a given pool. If only malicious nodes in a given pool reach a critical amount, they can cause DoS and lose the data. To prevent this problem, we need to mix the nodes in the pools periodically. The mixing should be done in a way, that the malicious nodes cannot predict the new shard for the data.

Mixings

Let's consider nn as the number of nodes in a pool.

Each time an honest node leaves the pool, the network performs the following:

  1. Select a random node (from an unallocated or another pool), and move it to the current pool. If the selected node was previously in another pool, move a random unallocated node in place of it.

  2. Perform “mixing” mm times: select a random node from the current pool and swap it with a random node from outside of this pool (from unallocated or another pool).

During step 1, the number of malicious nodes in the pool will go up by 11 with probability pp. But then, each mixing will probabilistically balance the number of malicious nodes inside the pool with the number outside. The current pool will also be impacted by steps 1-2 being triggered in the other pools when some node leaves that pool; we assume that the node leaving the pool is honest (and the attacker is waiting til a lot of honest nodes leave the pool).

In our security model, the best strategy for the adversary is this:

  • keep the malicious nodes in the pool online,
  • wait for some honest node of that pool to go offline.

We can describe the evolution of the pool as a Markov process. To protect from this strategy, the network performs mm mixings. We can find the equilibrium distribution for this process and find the probability that less than kk nodes in the pool are honest.

Soundness

For example, if p=1/2p=1/2, k=64k=64, n=512n=512, m=3m=3, then this solution achieves 115115 bits of statistical security.

Economic Model

All nodes receive the same reward for storing the data or maintaining the free space, which is the same complexity due to complexity leveling.

The first source of rewards is token emission with a Bitcoin-like formula. Rewards are distributed to the nodes using proof of space-time mining, like in the Chia Network.

Another source of rewards is the fee for space rent.

The fee should depend on:

  1. Percentage of rented space. The more space is rented, the higher the fee. The dependency should be hyperbolic, so the fee will be very high when the pool is almost full. Then if somebody wants to rent all free space, the fee will grow very fast.

    ϕ=O(Ψ1),\phi = O\left(\Psi^{-1}\right),

    where Ψ\Psi is part of free space in the whole network, ϕ\phi is a fee.

  2. Percentage of rented space. If more space is rented, the multiplier is exponentially growing over time. If less space is rented, the multiplier is exponentially decreasing over time.

    dϕ=ϕ(Ψβ)γdt. \mathbf{d} \phi = \phi\cdot(\Psi - \beta)\cdot \gamma \mathbf{d}t.

    The solution of this equation is

    ϕ=O(exp(γ0t(Ψ(t)β)dt)).\phi = O\left(\exp(\gamma\int\limits_0^t (\Psi(t)-\beta) \mathbf{d}t)\right).

  3. Percentage of rented space of the pool. The more space is rented, the lower the fee. The dependency should be linear.

    ϕ=O(αψ),\phi = O\left(\alpha - \psi\right),

    where ψ\psi is part of the free space in the current pool.

The resulting formula takes the form:

ϕ=KαψΨexp(γ0t(Ψ(t)β)dt)\phi = K \cdot \frac{\alpha - \psi}{\Psi} \cdot \exp\left(\gamma \int\limits_0^t (\Psi(t)-\beta) \mathbf{d}t\right)

Also, to make the network more stable, we propose the following mechanism:

  1. Each node instantly receives only part of the reward. The rest of the reward is locked for some time. If the node goes offline, the locked reward is burned.

  2. Prolonging the rent of the existing space should be cheaper than renting a new space by some discount factor.

During the setup of the network, a significant part of rewards should be generated by the mining. Then, when the network is stable, the fee for space rent should be the main source of rewards.

Evaluation

Comparison with Replication

Replication is a partial case of sharding when threshold k=1k=1. We compute soundness for replication and sharding with different blowup factors and different levels of security and compare the results.

Blowupppk=128k=128k=64k=64k=32k=32k=1k=1
40.250001.3
80.250.10.20.32.1
160.2515.69.25.83.1
320.2590.847.525.74.4
640.25274.1139.672.26.5
40.381.41.31.32.3
80.3859.631.817.83.8
160.38261.9133.769.46.3
320.38731.5369.1187.610.8
640.381717.1862.3434.619.1
40.535.119.411.43.5
80.5224.7115.260.26
160.5701.1354180.310.7
320.51729.5868.8438.219.6
640.538451926.9967.736.9

From the modeling, we can observe:

  1. The blowup factor for replication is much higher than for sharding
  2. The blowup factor for sharding is growing slower than for replication when security is growing
  3. The blowup factor depends on the sharding threshold kk, the higher the threshold, the lower the blowup factor

Conclusion

This article has presented a novel sharded storage solution leveraging blockchain technology to tackle the significant challenge of scaling storage for vast data volumes, reaching beyond petabytes. By integrating Shamir's Secret Sharing and Fast Fourier Transform, we have developed a framework that not only surpasses the limitations of current replication-based methods but also seamlessly integrates with zkSNARK-friendly environments. This enables the secure and efficient storage of data with the cost-effectiveness of Web2 solutions while maintaining the robust security features characteristic of Web3 applications.

Our proposed architecture redefines the concept of data storage and retrieval within blockchain networks, offering a scalable, fault-tolerant solution that significantly reduces the necessity for on-chain data storage. This advancement allows for the transformation of validiums into rollups, thereby enhancing their utility and efficiency. The economic model underpinning our solution ensures a fair and incentivized participation of nodes, thereby promoting a healthy and dynamic ecosystem conducive to the long-term sustainability of the network.

Comparative analyses have highlighted the superiority of our sharded storage solution over traditional replication methods, demonstrating a significant reduction in the blowup factor required to achieve comparable levels of security. Furthermore, our approach to dynamic node mixing and space-time tradeoffs introduces a robust mechanism against potential malicious activities, ensuring the integrity and availability of data within the network.

The theoretical framework presented herein lays a solid foundation for future research and development in the field of blockchain storage solutions. It opens new avenues for the application of blockchain technology in areas previously constrained by storage limitations, such as large-scale data backups, content delivery networks, and decentralized applications requiring extensive data storage capabilities.

In conclusion, the Blockchain Sharded Storage solution represents a significant leap forward in the quest for scalable, secure, and cost-effective data storage on the blockchain. It addresses the critical challenges faced by current blockchain infrastructures, offering a viable pathway towards the realization of truly decentralized, efficient, and secure data storage systems. As we continue to explore and refine this technology, it is poised to become a cornerstone in the development of next-generation blockchain applications, furthering the integration of blockchain technology into mainstream use cases and marking a pivotal moment in the evolution of decentralized data storage.

Future Directions

In the future, we plan to do a more thorough economic modeling of this solution as well as do cryptographic security analysis in one of the established formal frameworks.

Minimal streaming zkVM with quasilinear prover complexity on the thin client and the same memory complexity, as native execution

· 5 min read

This is a crosspost with zkresear.ch/t/minimal-streaming-zkvm-with-quasilinear-prover-complexity-on-the-thin-client-and-the-same-memory-complexity-as-native-execution.

Introduction

Here we present a sketch draft of minimal zkVM architecture using 2023 year industry improvements. The main issue of zkVM is the cost of proving. If you need privacy, you cannot delegate the whole proving to the 3rd party, so you need to do all proof or part of it by yourself, ideally on your mobile device or browser. This environment is very limited in resources, so the main goal of zkVM is to reduce the cost of proving as much as possible.

The most costly parts of zkVM are lookups and permutations. Here we will show how to use an approach similar to EFG2023 for building zkVM with FRI polynomial commitments, log-derivative permutation arguments, quasi-linear performance complexity on the thin client and the same memory complexity, as native execution.

Preliminaries

Let's represent simple registry-based VM with a set of registers PC,R0,,RL1\mathbf{PC}, \mathbf{R}_0, \ldots, \mathbf{R}_{L-1} and memory addressed from 00 to M1M-1, where PC\mathbf{PC} is program counter, using for control flow, and Ri\mathbf{R}_i are general purpose registers.

The source code is a set

Code={(PC,Instruction)},\mathbb{Code} = \left\{(\mathbf{PC}, \mathbf{Instruction})\right\}, where PC\mathbf{PC} is unique and contains values from 00 to N1N-1.

The execution trace is a set

ETrace={(CLK,PC,Instruction,R0,,RL1)}\mathbb{ETrace} = \left\{(\mathbf{CLK}, \mathbf{PC}, \mathbf{Instruction}, \mathbf{R}_0, \ldots, \mathbf{R}_{L-1})\right\}, where CLK\mathbf{CLK} is unique and contains values from 00 to T1T-1.

The memory trace is a multiset (like a set, but with possible multiple same elements)

MTrace={(CLK,Address,Value)}{NULL}\mathbb{MTrace} = \left\{(\mathbf{CLK}, \mathbf{Address}, \mathbf{Value})\right\} \cup \left\{\mathbf{NULL}\right\}, where Address\mathbf{Address} is a memory address and Value\mathbf{Value} is a value, stored at this address. (CLK,Address,Value)(\mathbf{CLK}, \mathbf{Address}, \mathbf{Value}) is not unique, because the same address could be read multiple times by the same opcode execution. If the opcode writes to memory, the CLK\mathbf{CLK} for the value should correspond to the next opcode execution, reading the address or end of the program. NULL\mathbf{NULL} is a special element, used if the opcode is optionally not interacting with memory. Also, let's consider CLK=End\mathbf{CLK}=\mathbf{End} at the end of the program.

We need to prove, that VM with given InitialState\mathbf{InitialState} and InitialMemory\mathbf{InitialMemory} will produce FinalState\mathbf{FinalState} and FinalMemory\mathbf{FinalMemory} after execution of Code\mathbf{Code}.

Let's consider InitialState\mathbf{InitialState} and FinalState\mathbf{FinalState} as elements of ETrace\mathbb{ETrace}, and InitialMemory\mathbb{InitialMemory} and FinalMemory\mathbb{FinalMemory} as subset (not a submultiset, Address\mathbf{Address} should be unique) of MTrace\mathbb{MTrace} with fixed CLK\mathbf{CLK} (00 for InitialMemory\mathbb{InitialMemory} and End\mathbf{End} for FinalMemory\mathbb{FinalMemory}).

Proving

Correctness of instruction set

For each ETrace\mathbb{ETrace} element we should prove that it exists in Code\mathbb{Code}. As part of the zkSNARK protocol, it could be done as a lookup, where Code\mathbb{Code} is a lookup table.

Correctness of initial and final states

We should prove, that all addresses, used in InitialMemory\mathbb{InitialMemory} are unique. Also, we should prove, that Instruction\mathbf{Instruction} in FinalState\mathbf{FinalState} is HALT\mathbf{HALT}.

Correctness of execution

For simplification, we will represent proving all opcodes in the same circuit. Next, we will see, that it is not necessary, but it is easier to understand.

Let's consider the opcode can touch up to PP memory addresses. Then we can prove the following state transitions for each ETrace\mathbb{ETrace} element excluding the last one (because there is no next element for it).

InputStateCLK=i, InputMemCell0,CLKi,InputMemCellP1,CLKiInstructionCLK=iOutputStateCLK=i+1 or End,InputMemCell0,CLK=i or End,InputMemCellP1,CLK=i or End\mathbf{InputState}_{\mathbf{CLK}=i},\ \mathbf{InputMemCell}_{0, \mathbf{CLK} \leq i}, \ldots \mathbf{InputMemCell}_{P-1, \mathbf{CLK}\leq i}\\ \xrightarrow{\mathbf{Instruction}_{CLK=i}} \mathbf{OutputState}_{\mathbf{CLK}=i+1\ \text{or}\ \mathbf{End}},\\ \mathbf{InputMemCell}_{0, \mathbf{CLK} = i\ \text{or}\ \mathbf{End}}, \ldots \mathbf{InputMemCell}_{P-1, \mathbf{CLK}=i\ \text{or}\ \mathbf{End}}

InputState\mathbf{InputState} and OutputState\mathbf{OutputState} are elements of ETrace\mathbb{ETrace}, InputMemCell\mathbf{InputMemCell} and OutputMemCell\mathbf{OutputMemCell} are elements of MTrace\mathbb{MTrace}.

To prove the correctness of execution we need to prove the following statements:

{InputState}{FinalState}={OutputState}{InitialState}\left\{\mathbf{InputState}\right\} \cup \left\{\mathbf{FinalState}\right\} = \left\{\mathbf{OutputState}\right\} \cup \left\{\mathbf{InitialState}\right\}

i {InputMemCelli}FinalMemory=i {OutputMemCelli}InitialMemory\bigcup_i \ \left\{\mathbf{InputMemCell_i}\right\} \cup \mathbb{FinalMemory} = \bigcup_i \ \left\{\mathbf{OutputMemCell_i}\right\} \cup \mathbb{InitialMemory}

Splitting the circuit

The equations over sets to verify VM execution are additive. So, we can split the circuit into smaller parts, proving each part separately. The part for initial and final states remains the same, just in set and multiset equations we should add more chunked elements instead of single ones.

chunks{InputState}{FinalState}=chunks{OutputState}{InitialState}\bigcup_{\text{chunks}} \left\{\mathbf{InputState}\right\} \cup \left\{\mathbf{FinalState}\right\} = \bigcup_{\text{chunks}} \left\{\mathbf{OutputState}\right\} \cup \left\{\mathbf{InitialState}\right\}

chunksi {InputMemCelli}FinalMemory=chunksi {OutputMemCelli}InitialMemory\bigcup_{\text{chunks}} \bigcup_i \ \left\{\mathbf{InputMemCell_i}\right\} \cup \mathbb{FinalMemory} =\\ \bigcup_{\text{chunks}} \bigcup_i \ \left\{\mathbf{OutputMemCell_i}\right\} \cup \mathbb{InitialMemory}

Log-derivative permutation arguments allow us to easily prove equality of multisets A\mathbb{A} and B\mathbb{B} by proving equality of rational polynomial expression 1/(aix)=1/(bix)\sum 1/(a_i - x) = \sum 1/(b_i - x) at random point xx. We see this expression is additive also, so we can easily rewrite our multiset equations to polynomial equations.

We can split the execution circuit for the following purposes:

  1. Proving each opcode separately allows us to reduce the complexity of the circuit
  2. Splitting execution trace into small chunks allows us to reduce prover complexity because FRI complexity is O(nlogn)O(n \log n), where nn is the size of the circuit.

Important to note, that chunks should not be segments of execution trace, they could be sparse. Only the sum of chunks should cover the whole execution trace. This fact led us to a streaming proving strategy.

Streaming proving strategy

We will run the VM two times.

The first time, we produce an execution trace, we will split it into separate pools for each circuit. When the pool is full, we compute the hash of the evaluation domain of polynomials for this pool (without running full FRI protocol) and forget all data for this pool except the hashes. After executing the whole program we have a set of hashes, which we will use to compute the challenge for multiset expression.

The second time we produce the execution trace the same way, but make FRI proofs for each chunk of the execution trace.

Usage of the same challenge allows us to aggregate multiset expressions into a single equation and verify that the chunks cover the whole execution and memory trace.

As a result, we have multiple small proofs, which can be aggregated into a single proof on the local machine or cloud prover.

Conclusion

This approach allows us to reduce the resource cap of the prover, delegate part of proving to untrusted cloud prover, and provide us to run zkVM on mobile devices and browsers.

Running Sangria final proof in shielded mode on untrusted 3rd party prover

· 4 min read

This is a crosspost with zkresear.ch/t/running-sangria-final-proof-in-shielded-mode-on-untrusted-3rd-party-prover.

Introduction

Sangria is a folding protocol for the Plonk prover. In the original model, the prover works iteratively and merges a new execution trace with an execution trace accumulator.

Here we will show, how to build a special high entropy execution trace. After merging with the accumulator, the resulting execution trace could be shown by an untrusted prover with zero data leaks.

This approach allows us to perform linear complexity execution on a thin client and do hard computations on the server without data leaks.

Original protocol

Accumulated instance and witness:

Un:=(Xn,un,Wn,En),U'_n := (\mathbf{X}'_n, u'_n, \overline{W}'_n, \overline{E}'_n),

Wn:=(Wn,en,rWn,rEn)W'_n := (\mathbf{W}'_n, \mathbf{e}'_n, r'_{Wn}, r'_{En})

Iteration instance and witness:

Un:=(Xn,un,Wn,En),U_n := (\mathbf{X}_n, u_n, \overline{W}_n, \overline{E}_n),

Wn:=(Wn,e,rWn,rEn)W_n := (\mathbf{W}_n, \mathbf{e}, r_{Wn}, r_{En})

Where W=Com(ppW,W,rW), E=Com(ppW,e,rE)\overline{W}=\text{Com}(\text{pp}_W, \mathbf{W}, r_W),\ \overline{E} = \text{Com}(\text{pp}_W, \mathbf{e}, r_E).

We use the relaxed Plonk gate equation:

C(a,b,c,u,e)=abqM+qCu2+(aqL+cqO+bqR)u+eC(\mathbf{a}, {\mathbf{b}}, {\mathbf{c}}, u, {\mathbf{e}})={\mathbf{a}} {\mathbf{b}} {\mathbf{q_M}} + {\mathbf{q_C}} {u}^{2} + {\left({\mathbf{a}} {\mathbf{q_L}} + {\mathbf{c}} {\mathbf{q_O}} + {\mathbf{b}} {\mathbf{q_R}}\right)} {u} + {\mathbf{e}}

  1. Prover send to Verifier Tn=Com(ppW,tn,rTn)\overline{T}_n = \text{Com}(\text{pp}_W, \mathbf{t}_n, r_{Tn}),

where tn=2qCunun+(anbn+anbn)qM+(anqL+cnqO+bnqR)un+(anqL+cnqO+bnqR)unt_n=2 \, {\mathbf{q_C}} {u'_n} {u_n} + {\left({\mathbf{a}_n} {\mathbf{b}'_n} + {\mathbf{a}'_n} {\mathbf{b}_n}\right)} {\mathbf{q_M}} + {\left({\mathbf{a}_n} {\mathbf{q_L}} + {\mathbf{c}_n} {\mathbf{q_O}} + {\mathbf{b}_n} {\mathbf{q_R}}\right)} {u'_n} + {\left({\mathbf{a}'_n} {\mathbf{q_L}} + {\mathbf{c}'_n} {\mathbf{q_O}} + {\mathbf{b}'_n} {\mathbf{q_R}}\right)} {u_n}

  1. Verifier sends to prover random rr

  2. Prover and Verifier output the folded instance

Un+1=(Xn+1,un+1,Wn+1,En+1),U'_{n+1}=(\mathbf{X}'_{n+1}, u'_{n+1}, \overline{W}'_{n+1}, \overline{E}'_{n+1}),

where

Xn+1=Xn+rXn,\mathbf{X}'_{n+1} = \mathbf{X}'_n + r \mathbf{X}_n,

un+1=un+run,u'_{n+1} = u'_n + r u_n,

Wn+1=Wn+rWn,\overline{W}'_{n+1} = \overline{W}'_n + r \overline{W}_n,

En+1=En+r2EnrTn.\overline{E}'_{n+1} = \overline{E}'_n + r^2 \overline{E}_n - r \overline{T}_n.

  1. Prover output the folded witness

Wn+1=(Wn+1,en+1,rW n+1,rE n+1),W'_{n+1} = (\mathbf{W}'_{n+1}, \mathbf{e}'_{n+1}, r'_{W\ n+1}, r'_{E\ n+1}),

where

Wn+1=Wn+rWn,\mathbf{W}'_{n+1} = \mathbf{W}'_n + r \mathbf{W}_n,

en+1=en+r2enrtn,\mathbf{e}'_{n+1} = \mathbf{e}'_n + r^2 \mathbf{e}_n - r \mathbf{t}_n,

rW n+1=rW n+rrWn,r'_{W\ n+1} = r'_{W\ n} + r r_{Wn},

rE n+1=rE n+r2rEnrrTn.r'_{E\ n+1} = r'_{E\ n} + r^2 r_{En} - r r_{Tn}.

We can check, that C(an+1,bn+1,cn+1,un+1,en+1)=C(an,bn,cn,un,en)+r2C(an,bn,cn,un,en)C(\mathbf{a}'_{n+1}, \mathbf{b}'_{n+1}, \mathbf{c}'_{n+1}, u'_{n+1}, \mathbf{e}'_{n+1}) = C(\mathbf{a}'_n, \mathbf{b}'_n, \mathbf{c}'_n, u'_n, \mathbf{e}'_n) + r^2 C(\mathbf{a}_n, \mathbf{b}_n, \mathbf{c}_n, u_n, \mathbf{e}_n).

Sangria zero-knowledge protocol for untrusted 3rd party prover

Instead of proving the execution trace after the last step, we can merge it with a random execution trace and send the result to 3rd party prover. Zero-knowledge property means, that 3rd party prover can't learn anything about the initial execution trace.

Let's replace the final proving protocol with an additional round, where the client mixes the state with a special generated random trace and send the result to the untrusted 3rd party prover.

We will prove, that the prover can not learn anything about the initial execution trace.

Let's consider (Un+1,Wn+1)(U'_{n+1}, W'_{n+1}) as the final state of the protocol, (Un,Wn)(U'_n, W'_n) as the initial state of the protocol (Un,Wn)(U_n, W_n) as the random state.

For any possible initial state (U,W)(U', W') we try to roll back the protocol on the 3rd party prover side and find the corresponding merged state (U,W)(U, W).

W=Wn+1Wr,\mathbf{W} = \frac{\mathbf{W}'_{n+1} - \mathbf{W}'}{r},

rW=rW n+1rWr,r_W = \frac{r'_{W\ n+1} - r'_{W}}{r},

t=t(W,W)\mathbf{t} = \mathbf{t}(W, W')

e=en+1e+rtr2\mathbf{e} = \frac{\mathbf{e}'_{n+1} - \mathbf{e}' + r \mathbf{t}}{r^2}

rE=rE n+1rE+rrTnr2r_E = \frac{r'_{E\ n+1} - r'_{E} + r r_{Tn}}{r^2}

u=un+1uru = \frac{u'_{n+1} - u'}{r}

Assuming, that

C(an+1,bn+1,cn+1,un+1,en+1)=0,C(\mathbf{a}'_{n+1}, \mathbf{b}'_{n+1}, \mathbf{c}'_{n+1}, u'_{n+1}, \mathbf{e}'_{n+1}) = 0,

C(a,b,c,u,e)=0,C(\mathbf{a}', \mathbf{b}', \mathbf{c}', u', \mathbf{e}') = 0,

and substituting the variables, we get

C(a,b,c,u,e)=0.C(\mathbf{a}, \mathbf{b}, \mathbf{c}, u, \mathbf{e}) = 0.

That means that for the given to 3rd party prover execution trace and any possible initial execution trace, there is an existing execution trace, that can be merged with the given trace and the result will be the given trace.

Considering an,bn,cna_n, b_n, c_n as independent random variables, we get, that all initial information is hidden.

UPD: all intermediate computations are available on sage math here: snjax/sangria-delegated-zk

Fast Fourier Inspired Folding for Sangria

· 4 min read

This is a crosspost with zkresear.ch/t/fast-fourier-inspired-sangria.

Introduction

Sangria is the folding protocol for Plonk prover. In the original model, the prover works iteratively and merges a new execution trace with an execution trace accumulator.

Here we will show, how to build an optimized folding process, requiring only 2 or 1 scalar multiplications per folding on the verifier side.

Original Protocol

Accumulated instance and witness:

Un:=(Xn,un,Wn,En),U'_n := (\mathbf{X}'_n, u'_n, \overline{W}'_n, \overline{E}'_n), Wn:=(Wn,en,rWn,rEn)W'_n := (\mathbf{W}'_n, \mathbf{e}'_n, r'_{Wn}, r'_{En})

Iteration instance and witness:

Un:=(Xn,un,Wn,En),U_n := (\mathbf{X}_n, u_n, \overline{W}_n, \overline{E}_n), Wn:=(Wn,e,rWn,rEn)W_n := (\mathbf{W}_n, \mathbf{e}, r_{Wn}, r_{En})

Where W=Com(ppW,W,rW), E=Com(ppW,e,rE)\overline{W}=\text{Com}(\text{pp}_W, \mathbf{W}, r_W),\ \overline{E} = \text{Com}(\text{pp}_W, \mathbf{e}, r_E).

We use a relaxed Plonk gate equation: C(a,b,c,u,e)=abqM+qCu2+(aqL+cqO+bqR)u+eC(\mathbf{a}, {\mathbf{b}}, {\mathbf{c}}, u, {\mathbf{e}})={\mathbf{a}} {\mathbf{b}} {\mathbf{q_M}} + {\mathbf{q_C}} {u}^{2} + {\left({\mathbf{a}} {\mathbf{q_L}} + {\mathbf{c}} {\mathbf{q_O}} + {\mathbf{b}} {\mathbf{q_R}}\right)} {u} + {\mathbf{e}}

  1. Prover send to Verifier Tn=Com(ppW,tn,rTn)\overline{T}_n = \text{Com}(\text{pp}_W, \mathbf{t}_n, r_{Tn}),

Introduction

Sangria is the folding protocol for Plonk prover. In the original model, the prover works iteratively and merges a new execution trace with an execution trace accumulator.

Here we will show, how to build an optimized folding process, requiring only 2 or 1 scalar multiplications per folding on the verifier side.

Original Protocol

Accumulated instance and witness:

Un:=(Xn,un,Wn,En),U'_n := (\mathbf{X}'_n, u'_n, \overline{W}'_n, \overline{E}'_n), Wn:=(Wn,en,rWn,rEn)W'_n := (\mathbf{W}'_n, \mathbf{e}'_n, r'_{Wn}, r'_{En})

Iteration instance and witness:

Un:=(Xn,un,Wn,En),U_n := (\mathbf{X}_n, u_n, \overline{W}_n, \overline{E}_n), Wn:=(Wn,e,rWn,rEn)W_n := (\mathbf{W}_n, \mathbf{e}, r_{Wn}, r_{En})

Where W=Com(ppW,W,rW), E=Com(ppW,e,rE)\overline{W}=\text{Com}(\text{pp}_W, \mathbf{W}, r_W),\ \overline{E} = \text{Com}(\text{pp}_W, \mathbf{e}, r_E).

We use a relaxed Plonk gate equation: C(a,b,c,u,e)=abqM+qCu2+(aqL+cqO+bqR)u+eC(\mathbf{a}, {\mathbf{b}}, {\mathbf{c}}, u, {\mathbf{e}})={\mathbf{a}} {\mathbf{b}} {\mathbf{q_M}} + {\mathbf{q_C}} {u}^{2} + {\left({\mathbf{a}} {\mathbf{q_L}} + {\mathbf{c}} {\mathbf{q_O}} + {\mathbf{b}} {\mathbf{q_R}}\right)} {u} + {\mathbf{e}}

  1. Prover send to Verifier Tn=Com(ppW,tn,rTn)\overline{T}_n = \text{Com}(\text{pp}_W, \mathbf{t}_n, r_{Tn}), where tn=2qCunun+(anbn+anbn)qM+(anqL+cnqO+bnqR)un+(anqL+cnqO+bnqR)unt_n=2 \, {\mathbf{q_C}} {u'_n} {u_n} + {\left({\mathbf{a}_n} {\mathbf{b}'_n} + {\mathbf{a}'_n} {\mathbf{b}_n}\right)} {\mathbf{q_M}} + {\left({\mathbf{a}_n} {\mathbf{q_L}} + {\mathbf{c}_n} {\mathbf{q_O}} + {\mathbf{b}_n} {\mathbf{q_R}}\right)} {u'_n} + {\left({\mathbf{a}'_n} {\mathbf{q_L}} + {\mathbf{c}'_n} {\mathbf{q_O}} + {\mathbf{b}'_n} {\mathbf{q_R}}\right)} {u_n} where tn=2qCunun+(anbn+anbn)qM+(anqL+cnqO+bnqR)un+(anqL+cnqO+bnqR)unt_n=2 \, {\mathbf{q_C}} {u'_n} {u_n} + {\left({\mathbf{a}_n} {\mathbf{b}'_n} + {\mathbf{a}'_n} {\mathbf{b}_n}\right)} {\mathbf{q_M}} + {\left({\mathbf{a}_n} {\mathbf{q_L}} + {\mathbf{c}_n} {\mathbf{q_O}} + {\mathbf{b}_n} {\mathbf{q_R}}\right)} {u'_n} +\\ {\left({\mathbf{a}'_n} {\mathbf{q_L}} + {\mathbf{c}'_n} {\mathbf{q_O}} + {\mathbf{b}'_n} {\mathbf{q_R}}\right)} {u_n}
  2. Verifier sends to prover random rr
  3. Prover and Verifier output the folded instance Un+1=(Xn+1,un+1,Wn+1,En+1),U'_{n+1}=(\mathbf{X}'_{n+1}, u'_{n+1}, \overline{W}'_{n+1}, \overline{E}'_{n+1}), where Xn+1=Xn+rXn,\mathbf{X}'_{n+1} = \mathbf{X}'_n + r \mathbf{X}_n, un+1=un+run,u'_{n+1} = u'_n + r u_n, Wn+1=Wn+rWn,\overline{W}'_{n+1} = \overline{W}'_n + r \overline{W}_n, En+1=En+r2EnrTn.\overline{E}'_{n+1} = \overline{E}'_n + r^2 \overline{E}_n - r \overline{T}_n.
  4. Prover output the folded witness Wn+1=(Wn+1,en+1,rW n+1,rE n+1),W'_{n+1} = (\mathbf{W}'_{n+1}, \mathbf{e}'_{n+1}, r'_{W\ n+1}, r'_{E\ n+1}), where Wn+1=Wn+rWn,\mathbf{W}'_{n+1} = \mathbf{W}'_n + r \mathbf{W}_n, en+1=en+r2enrtn,\mathbf{e}'_{n+1} = \mathbf{e}'_n + r^2 \mathbf{e}_n - r \mathbf{t}_n, rW n+1=rW n+rrWn,r'_{W\ n+1} = r'_{W\ n} + r r_{Wn}, rE n+1=rE n+r2rEnrrTn.r'_{E\ n+1} = r'_{E\ n} + r^2 r_{En} - r r_{Tn}.

We can check, that C(an+1,bn+1,cn+1,un+1,en+1)=C(an,bn,cn,un,en)+r2C(an,bn,cn,un,en)C(\mathbf{a}'_{n+1}, \mathbf{b}'_{n+1}, \mathbf{c}'_{n+1}, u'_{n+1}, \mathbf{e}'_{n+1}) = C(\mathbf{a}'_n, \mathbf{b}'_n, \mathbf{c}'_n, u'_n, \mathbf{e}'_n) + r^2 C(\mathbf{a}_n, \mathbf{b}_n, \mathbf{c}_n, u_n, \mathbf{e}_n).

Fast-Fourier Inspired Approach

We see, that most operations on the verifier side are linear. So, we can use the approach from GW21.

Let's define the following functions:

fL(X)=a(X4)+Xb(X4)+X2c(X4)+X3e(X4),f_L(X) = a(X^4) + X b(X^4) + X^2 c(X^4) + X^3 e(X^4), fR(X)=a(X4)+Xb(X4)+X2c(X4)X3t(X4),f_R(X) = a(X^4) + X b(X^4) + X^2 c(X^4) - X^3 t(X^4), ϵR(X)=X3e(X4)\epsilon_R(X) = X^3 e(X^4)

where a,b,c,e,ta, b, c, e, t are polynomials corresponding to vectors a,b,c,e,t\mathbf{a}, \mathbf{b}, \mathbf{c}, \mathbf{e}, \mathbf{t}. It is important, that the field has a multiplicative subgroup of order 44. If we need more columns, we can use the same approach with bigger subgroups.

Then we can rewrite the witness part of the folding procedure as follows:

  1. Prover computes t\mathbf{t} and sends to verifier [fR n][f_{R\ n}], [ϵR n][\epsilon_{R\ n}]
  2. Verifier sends to prover random rr
  3. Prover and Verifier output the folded instance

[fL n+1]=[fL n]+r[fR n]+r2[ϵR n][f'_{L\ n+1}] = [f'_{L\ n}] + r [f_{R\ n}] + r^2 [\epsilon_{R\ n}]

  1. Prover output the folded witness

fL n+1=fL n+rfR n+r2ϵR n,f'_{L\ n+1} = f'_{L\ n} + r f_{R\ n} + r^2 \epsilon_{R\ n},

For final check we should make openings of fLf_L at points x,x1,x,x1x, x \sqrt{-1}, -x, -x \sqrt{-1}, where xx is random, and recover a(x4),b(x4),c(x4),e(x4)a(x^4), b(x^4), c(x^4), e(x^4).

It is important to note that the folding process complexity is still linear. We don't need an explicit representation of f(x) in the prover-side folding process:

[fL(x)]=[i=0n1(ai+bixλi(x4)+cix2λi(x4)+eix3λi(x4))]=i=0n1([ai]+bi[xλi(x4)]+ci[x2λi(x4)]+ei[x3λi(x4)]).[f_L(x)] = [\sum_{i=0}^{n-1} (a_i + b_i x \lambda_i(x^4) + c_i x^2 \lambda_i(x^4) + e_i x^3 \lambda_i(x^4))] =\\ \sum_{i=0}^{n-1} ([a_i] + b_i [x \lambda_i(x^4)] + c_i [x^2 \lambda_i(x^4)] + e_i [x^3 \lambda_i(x^4)]).

The proposed method provides only 2 scalar multiplications per folding instead of 5 or more. And it requires 4 times bigger CRS.

UPD: In the case of IVC, when the 2nd instance is original Plonk, ϵR(X)=0\epsilon_R(X)=0 and we need only one scalar multiplication per folding.