Weekly musing #4 - Labor Day 2017

1900's Labor Day parade, Toronto, Canada
1900's Labor Day parade, Toronto, Canada

Bitcoin Unlimited

Quite a week! Too much excitement!

Memory Exhaustion attack, BU release 1.0.1.4

What began as a gradual decline of BU nodes on the network turned into the sharp decline on 24 April.

Cross-checking with others on Slack and examination of logs followed. An initial diagnosis of the attack was soon made, as fortunately BU developer Peter Tschipper had already been working on a thinblocks patch that was believed to mitigate this kind of attack. His patch turned out to be effective, and provided a good first line of defense to the BU network. Nodes were upgraded within hours, and the patch was reviewed and improved for several hours until it was deemed fit for including in a release.

I will leave an account of the further details to the official BU Incident Report.

By way of this incident, I had opportunity to revisit my Gitian build machine, as I wanted to check if I could participate in verifying the formal release build. I reconfigured a VM that I had previously set up for doing a Gitian build of other projects (Classic, BTCfork). This VM had given me some trouble with Gitian builds of Classic, where it produced binaries with different checksums. I took this chance to see if a similar problem would occur when trying to do a Gitian build for BU.

With instructions provided by Andrea Suisani, I was able to reproduce the Linux release build results. Unfortunately I noticed that my VM is very slow in comparison to the fast builders used by others. So in the time it took them to build all platforms, I only managed to build the Linux binaries. A contributing factor may have been that it was the first time I did the BU build on there, and it had to build all the dependencies first. Maybe next time will be quicker.

Of course, if you're going for a faster build, you may want to set up the host system on a native machine, and not bother with a separate VM. Maybe in a while I'll get a faster machine for these kind of builds and tests.

I'm going to provide a link to the instructions on how to set up a VM or physical system for Bitcoin, and pass on the additional instructions provided to me specifically for BU:

sudo apt-get install git apt-cacher-ng python-vm-builder ruby qemu-utils
git clone https://github.com/BitcoinUnlimited/BitcoinUnlimited.git BU
cd BU
git checkout release
cd ..
git clone git@github.com:devrandom/gitian-builder.git
cd gitian-builder
./bin/make-base-vm --arch amd64 --suite trusty    # this need to be done only once
# for the --commit flag you need to set a branch or a tag.
# You could use also --num-make #JOBS and --memory MEM (memory to allocate in MiB)
./bin/gbuild --url bitcoin=https://github.com/BitcoinUnlimited/BitcoinUnlimited.git --commit bitcoin=release ../BU/contrib/gitian-descriptors/gitian-linux.yml

Hopefully these will help someone else achieve a working Gitian build too. BU needs more people who can do Gitian builds, so that it can respond quickly with hotfix releases.

Testing for unwanted XthinBlocks code execution

After it became clear that the vulnerability exploited by the attack was again in thinblocks code, I made a check to gain confidence that BU's --use-thinblocks=0 switch was effective. I instrumented the code with some assert(0) statements as many relevant thinblocks code sections as I could find. These would force a crash if those sections were entered. This is an easy check that anyone can do for a new feature. It does not require in-depth understanding of how the feature works - just find all references and apply some common sense while booby-trapping the code.

I found one case where thinblocks code was still executed as part of Xpedited functionality even though I had configured use-thinblocks=0 on my node. This was reported and fixed by Peter Tschipper's pull request 486

There was general agreement that Xpedited, which depends on thinblocks, should not be enabled if thinblocks is disabled.

Personally, I would prefer if every new major feature had a clear switch to turn it on/off. Even if features are good and useful, I would consider it safer to deploy it with new features disables by default until those features have been tested for a sufficient time, and then set the default to be activated. On a software development level, this is more effort, but it makes it much easier for most users to get back to a safe operation if there is a problem with the new feature. Power users may have the ability to run multiple versions of the software at once, but this is probably not the case for the majority of users. Maybe like Gavin Andresen said, we need to get to know our customers a little more. It should be sufficient to raise a PR (or a BUIP if want to force a vote on the issue) to change the default on/off setting for a feature, then members can vote and the non-member public can also give their opinions to the discussion.

An arborist pruning a tree
near Statue of Liberty, New York.
An arborist pruning a tree near Statue of Liberty, New York.

Pruning

The pruning feature introduced in Bitcoin Core 0.11 was designed with 1MB blocks in mind, and needs some adaptation for bigger blocks. I raised issue 473 to address this. It will be needed once we obtain bigger blocks, otherwise pruned nodes on machines with strictly limited space might run over their configured soft limits or even hit hard quotas / run out of disk space completely.

In the PR, I propose a fix that would involve making the minimum number of blocks to retain a configuration item, while keeping the existing prune=<size> parameter. The user would be warned in case of the pruned size being exceeded, and could choose between keeping less blocks, or increasing the space for pruning.

Jameson Lopp kindly pointed out a related consideration - that the AD parameter could influence the minimum number of blocks to be retained in order to be able to perform a re-organization safely.

To start development, I experimented with the pruning.py regression test. Unfortunately, I found this test, which is normally disabled due to its long runtime and high disk usage (>4GB), to be highly unreliable and afflicted by problems. It did not pass on the release branch without rolling back to a certain commit, and on the dev branch it went into an endless loop (confirmed by other testers too).

The endless loop behavior led me to open a feature request for loop detection and abort in the test framework (PR 480).

I opened a separate issue to collect data about the pruning test status for various environments. Several contributors assisted with testing. The issue is still being progressed - so far I found one environment where the test passed (if rolled back to a 'healthy' commit). It seems that there are issues related to the p2p networking which may be interfering and causing client bans and timeouts. The work on the pruning test definitely involves fixes to the client, and I'm working closely with other developers on that.

Log privacy considerations

A Github user raised an issue about privacy concerns to do with IP address logging in the BU client. This is definitely a consideration close to my heart.

Usually the logips=0 option should prevent any unnecessary output of IP addresses in the log file and debug console. What exactly constitutes a need to override this, if at all - this is a question.

Most error messages provide very little information if the network information is not included. Traditionally these seem to be unprotected by the logips flag.

There are, however, error instances such as 'BAN THRESHOLD EXCEEDED' messages where IP logging should definitely be applied only if requested.

I wonder if it is not simpler to apply the policy that if logips=0, then definitely do not log an IP address. We may need to output a special abbreviated trace in such cases, if the notification is still important.

To ban or not to ban (peers)

In the followup to the memory exhaustion attack, Gregory Maxwell raised Issue 485 on the BU Github:

Short-id collisions allow attackers to cause arbritary bans of thinblock peers

Subsequent discussions on BU Slack highlighted a need for further improvements to the thinblock subsystem. I think BU has a good handle on the issues, and is more aware now that improvements in this area have to be made and assessed very carefully.

One point that is becoming clearer:

If this is left up to the developer to decide, the response can range from "ignore" to "ban immediately", with variations on gradual banning in-between those extremes. And choosing a too harsh penalty (e.g. immediate ban) can result in further problems.

While it would be great to have a consistent policy that software developers could apply to solve every problem around misbehaving peers, I think that is wishful thinking, because attackers are very creative, and the network is anyway a constantly changing sea.

The key lies, I think, in viewing this as policy and giving the node operators the necessary tools to defend their nodes, by making the response more configurable.

I think we should make all Denial-of-Service defense responses tweakable, and have a user menu (and RPC interface) for adjusting them.

Of course, developers still need to think about what a sensible response level for a new protective measure would be. But they could deliver a parameter file with all the defaults, and let the user easily override the configuration to suit their needs.

If a response is found unsuitable, there is no need to immediately issue a corrective software release - it would be sufficient to publish an advisory notice recommending to change a certain parameter in response to a network threat.

The 1MB fork transaction

Last Sunday, someone (/u/1MBforKTR1gAqRLkNbQg) posted a very interesting thread on Reddit which contained a link to a very large transaction which, if mined, would create a block > 1MB and thereby create a chain fork.

This proposal soon turned up as a change request to BU to allow such transactions to be pushed (p2p) over the network. On the same day, it was also submitted to Core, who rejected it without much discussion.

I replied to the BU issue and PR, stating why I did not think such a change was needed (a miner wishing to create a > 1MB fork chain could mine the transaction already). However, I was open to arguing the case for/against it.

For some reason, the poster who raised the issue did not respond with more rationale on BU's issue tracker after my enquiry. I was a little disappointed to find a Reddit update post from them which stated that 'No one else reached out'. We did reach out and try to discuss their pull request!

Unfortunately, we have little discussion on the PR so far, with everyone attending to other urgent matters. I presume this will result in the issue being closed if the poster does not engage further.

As was also pointed out, the poster holds the keys to funds donated to that the addresses in that transaction. I like the improvements suggested by /u/edmundedgar in this comment thread on the original proposal:

In my opinion, the ability for separate coins by using the 'faucet' feature of this transaction is one of the interesting aspects of this proposal. Perhaps something like this can be leveraged by an exchange or miner willing to produce such a block. I would definitely be more relaxed about the keys for such a transaction being in known hands.

The road to CompactBlocks

This week's other issues left no time for me to start deeper investigation of CompactBlocks. Other devs have expressed interest in an experimental branch for it, and there is positive support expressed for it in BUIP051. This makes me optimistic that it can be put up for vote and perhaps even reach agreement for implementation.

BTCfork

After the recent merge from upstream BU 'dev' branch, there were some minor leftover issues manifesting as test failures. These were mainly related to build system changes and moving from Python2 to Python3. Those have been fixed, and happily upstream BU did not break any of the fork-related functionality - all tests pass. Fortunately we did have good unit tests in the arith.py module, which helped when troubleshooting an integer division problem which broke the mvf-bu-retarget.py test (wrong expected difficulty after reset). The problem turned out to be in the helper procedure that emulates the C++ difficulty adjustment algorithm. On Python3 scripts need to use floor division (//) otherwise a floating point division will be performed. This resulted in a slightly different difficulty value, and for a while I was worried that the upstream merge might have subtly changed something about the fork trigger and resulting difficulty reset.

A remaining problem now is that one Travis test fails because it checks the code formatting (using clang-format). This currently fails for some of the code which MVF modified, so those source files will need to be reformatted. I'm hoping to get around to that in a day or two, as soon as I have a native clang-format on my old development box. I've decided to build LLVM / Clang from source on there. I was shocked to find that the LLVM+clang(+extra tools) build took up 40GB of disk space, and at times I had to reduce the number of parallel build threads to just one because my memory was being exhausted by the build! I think I would recommend just installing clang through your package manager :-)

Before I merge from upstream again, I'll await some stabilization on BU's 'dev' branch after the recent memory exhaustion attack fixes. There is still much work ongoing to polish those fixes and make BU's custom thinblock and expedited code as robust as possible.

Footnotes

[1] Labor Day image in Public Domain via Wikimedia Commons
https://commons.wikimedia.org/wiki/File%3A1900s_Toronto_LabourDay_Parade.jpg

[2] Pruning image in Public Domain via Wikimedia Commons
Photo under license of National Park Service, United States Government, Frederick Law Olmsted National Historic Site.
https://commons.wikimedia.org/wiki/File%3AArborist_pruning_Statue_of_Liberty.jpg