Re: Protocol Buffers for Bitcoin

There has been a discussion going on elsewhere about using protocol buffers for bitcoin. To summarise the advantages:

-> Small encoding
-> Very fast
-> Implementations in loads of languages (So writing new clients become a lot simpler)
-> Forwards compatible (indeed, this is most of the point of protocol buffers)
-> Extremely simpleto use in code

So initially I would suggest storing the wallet file using protocol buffers, this isn’t a breaking change and immediately makes the wallet file easier for other programs to parse. Eventually I would hope that bitcoin could use protocol buffers for networking.

Some people have been suggesting that protocol buffers might be larger than the custom written packet layout. I suspect that actually it would be *smaller* due to some of the clever encoding used in protocol buffers. To resolve this, I think a test is in order, I shall encode a wallet file/network packet using protocol buffers and compare the size the packets in the current scheme. However, I have no idea what’s in a packet, what data is stored in a packet, and in what format?

The reason I didn’t use protocol buffers or boost serialization is because they looked too complex to make absolutely airtight and secure.  Their code is too large to read and be sure that there’s no way to form an input that would do something unexpected.

I hate reinventing the wheel and only resorted to writing my own serialization routines reluctantly.  The serialization format we have is as dead simple and flat as possible.  There is no extra freedom in the way the input stream is formed.  At each point, the next field in the data structure is expected.  The only choices given are those that the receiver is expecting.  There is versioning so upgrades are possible.

CAddress is about the only object with significant reserved space in it.  (about 7 bytes for flags and 12 bytes for possible future IPv6 expansion)

The larger things we have like blocks and transactions can’t be optimized much more for size.  The bulk of their data is hashes and keys and signatures, which are uncompressible.  The serialization overhead is very small, usually 1 byte for size fields.

On Gavin’s idea about an existing P2P broadcast infrastructure, I doubt one exists.  There are few P2P systems that only need broadcast.  There are some libraries like Chord that try to provide a distributed hash table infrastructure, but that’s a huge difficult problem that we don’t need or want.  Those libraries are also much harder to install than ourselves.

846 total views, 1 views today