Scaling Bitcoin workshop : Montreal 2015

Validation costs

Validation costs and incentives

Eric Lombrozo (CodeShark)

How many of you have read the Satoshi whitepaper? It's a good paper. As we have discovered, it has some serious limitations that we think can be fixed but they are serious challenges for scaling. Amongst these are, it was really built as a single code base which was meant to be run in its entirety. All machines were going to participate equally in this p2p network. In a homogenous network, pretty much every node runs the same software, offers the same services, and this isn't really how the internet works. We have other kinds of devices that don't necessarily have the same capabilities as others, and in general we have a very horrogenous network. So a handheld networked device is probably not a good place to do mining if it only has intermittent internet access. At the same time, a point of sale device in a data center is probably not useful.

There's a section in the whitepaper covering these kind sof usecases, but they weren't really thought out. Then of course this is an infamous one, pooled mining was not entirely accounted for. Once you get mining pool, a large number of nodes no longer participate equally in the selection of transactions that get into blocks. And then there are more points of centralization, and perhaps cartels form. It amplifies errors. If any of these mining pool operators are running faulty software, it really quickly gets amplified. Whereas with a horogenous network, it would probably not cause network screwup.

Economies of scale have centralization pressures. Miners want to reduce latency as much as possible. It creates pressure to put all this equipment in one location. They want to have special connections to each other, which are no longer the p2p links from the homogenous network.

And of course there was the crazy phrase "1 cpu = 1 vote" which is physically impossible. Also, the idea was that honesty would be more profitable than subversion or dishonesty. Most of the CPUs are not being used for mining, and there are incentives still for using CPU power for other kinds of attacks on the network, such as transaction spam I guess.

Simplified payment verification (SPV) appears to be a last minute idea for dealing with a situation where most devices are not mining anymore. The idea was to have a client-server architecture where the servers were miners and the clients basically get short proofs that their transactions are valid in a way that is cheap to verify on relatively small devices.

There's a few problems with this and faud proofs. There's certain things that we can't verify using SPV. We cannot fully verify that blocks are actually valid, we cannot verify for transactions that we did not ask for. We can't verify whether an output has been spent; if there's a spend proof; but there's no unspent proof. We need better cryptographic commitment structures. We want to prove these things and construct short efficient proofs and make it possible to have small devices participate in the network and not have to trust anyone.

Bitcoin has a global state. It's the longest chain with the most pow-wise difficulty. Specific systems might have a smaller view into that for what they are interested in. A cryptographic commitment structure allows you to verify what's yours, without downloading everything from everyone else. It has a root hash that allows you to compare your state against someone else, their view of the state might be different from mine.

One of the key features of bitcoin is permissionless updates, anyone can broadcast a transaction, any miner can try to mine a block, you don't need any passwords or anything like that to actually enter anything into the blockchain. This is a key feature of blockchains. This is what makes them useful. They are censorship resistant. If you remove this property, you get a really inefficient database. So this is key.

You need to be able to make sure that what you have is correct and follows the consensus rules. We use merkle trees for this. They allow you to verify the state of the system without having all of the data. You just need the pieces you want, and a little bit extra to construct the hashes. This is a logarithmic scaling thing. In the Satoshi SPV, every block has a root hash for transactions in that block, and then you can just present the transactions that you are interested. The SPV proofs are able to demonstrate that the transaction is in that block, or someone withheld that transaction or that someone withheld that transaction from you.

Another big problem with this approach is that you really need to scan the whole blockchain to be able to know where the first transaction occurred. I want to know th etime the first output got paid. What's the first block I need to download? You basically need to go all the way to the beginning and ask for these proofs.

So there are structure inefficiencies. Then there are certain things we cannot prove. There's this whole architecture difference, a change in topology that was not accounted for. It's not impossible to do this. But the original whitepaper does not consider other kinds of network architectures other than a homogenous p2p mesh network. And also there's a huge privacy issue; if you are querying these things from the other mahcines, those other machines will be learning about the private information that you have.

There's a huge issue that has been very much not spoken enough about, is incentives. Really right now we're relying on altruism or nodes that have indirect incentives to run full validation nodes and provide these proofs to other nodes in order to keep the network in operation. I think that one of the things that made bitcoin succeed in the beginning was that incentives were aligned. People had incentives to run network nodes, that grew the network considerably. It wasn't that you could use bitcoin at retailers or use them on exchanges, that came later. The reason why it was successful was because you got paid for contributing resources to the network. We're going to have to revive this if we want bitcoin to continue to grow. That's something that we have over payment processors; they are not going to get people to voluntarily donate their computers to the network, and yet bitcoin did that. That's the magic of bitcoin. We need to fix the incentives.

Several different commitment structures have been proposed. There's been a lot of different tradeoffs, these tradeoffs are small compared to doing it versus not doing it. What's the best commitment structures? But for instance one example is radix trees where this allows you to quickly query for any particular piece of data that you want because the actual data itself is the key. This would allow you to query for transaction outputs that you're interested in, and still verify it with combining this with the merkle tree idea. You would be able to have all the transaction outputs, or any other kind of data that you want to query, you could quickly access it and see that it is part of the whole global structure.

We can "flip the chain", you can query for your transaction outputs, query for the committed state that you're interested in, make sure it's part of the blockchain, and have much more efficient queries, and allow you to prove much more than what we're able to do right now.

You might not want to verify everything. Not every node needs to do full validation. There could be partial validation. You can check limits, you can make sure that it doesn't go beyond the max block size, there are output max limits, there are conservation limits to check for false minting or false inflation or double spending; we want to make sure the coins actually exist and haven't been spent before, we need to make sure that the crypto stuff works out and that transactions are properly signed and so on. We could check these things if we had commitment structures that allowed it.

We could also use merkle sum trees. This allows you to verify that the total for all inputs and all outputs, without having all of the details. You can just check your particular outputs and make sure they are included in that structure without having to download the entire structure.

The second thing is that we don't really have good querying mechanisms for the SPV stuff. We have a mechanism for where you can use bloom filters to ask other nodes to give us filtered blocks, however this introduces the whole client-server architecture again, and we really have no solid model for how to approach this. So here obviously, if you make a query with personal info, like what you're interested in, the other servers are going to know this. With bloom filters, there's an inherent tradeoff, you can have a lot of privacy in one extreme which means you download everything, and on the other side the other node knows whta you're interested in. Bloom filters have a false positive rate, which makes it a little bit more private; you might not care about some of the transactions that the other side feeds you, but this has some weaknesses unfortunately. There's a tradeoff that, it's not something we have been able to find a way around yet.

One idea is to commit to the bloom filters in the blocks so that when you download a block, you can check if there are any transactions of interest to you in the block, not necessarily from the same peer. You could query from many different peers who may not be colluding with each other, and that would give you some more privacy.

And as for incentives, verifiers have an incentive to make sure they are not being ripped off; not many people want to run a full node. This is not desirable. We want everyone to be able to verify everything for themselves. As for provers, we're relying on altruism or people who have indirect incentives to run these nodes. They are not being paid for this. As this number of nodes is reduced, as the smaller it gets, the more prone to censorship things get, and the more single points of failure there might be. This is a huge problem that needs to be addressed.

Off-chain micropayments might be able to address this. You can have ways of querying stuff and paying small amounts in efficient ways to get that data and to incentivize people to run the provers and the nodes that actually construct these SPV proofs.

Cheap to cooperate, expensive to attack. We are looking to get around obstacles that are caused by skewed incentives; but if we create artifical incentives using externalities, then we might have to add hack after hack after hack, and it sort of kills the incentive for people to grow the network decentrally. We need the growth to be organic. It needs to be cheap to cooperate. There needs to be a way to offer resources, and do resource bidding; those who want those services need a way to find those services, and then there needs to be a way to pay them for those services through micropayments. There shouldn't be a need for everyone to validate everything, because full validation will probably not scale. We will either have to shard this, probabilistic checking schemes, we're going to have to sacrifice something to make it possible for there to be millions or thousands and thousands of transactions per second. It's not just resources, it's just that nobody wants to be the one that- there's a tragedy of commons element, as long as one person in the network is an honest node, sure that's great but who's going to do that, and the fewer there are we run into the problems we talked about. Delegated risk, and have other people watch the blockchain and alert you; these mechanisms are being worked on, and I think it has promise, this means that you can have a small device, you don't have to run full validation, if you can do this in a decentralized trustless way, I think this solves a lot of problems. We need scalable commitment structure. We need a consensus layer that can verify global transactions in a way that is easy to verify. I would like to see a lot more attention put into this; if we're going to talk about scalability, commitment structures are where the most gains are to be had.