Welcome to freetrader's weekly musings (new)
Introduction
I've decided to write up a little weekly retrospective. My goal is to share some thoughts, notes and tips for those who wish to delve into Bitcoin development or support the current efforts of the community to address the on-chain scaling problems.
Since this is the first of these episodes, I'll touch on some other topics as well which I think might interest some of you who've been following things in the 'forking' realm.
What's happening with BTCfork?
I must say I've neglected BTCfork a little in the last two months while devoting more time to BU. My last stint on BTCfork was to get Travis CI properly working on the project's MVF repos, which it is now, and starting to set up automated consistency checks for the requirements and design specifications (which should later be extended to cover the code as well - more about this later).
Travis CI turned out to be quite easy to set up, based on the good work of the base projects (Bitcoin Unlimited and Core). The only snag to watch out for when setting up Travis on a Bitcoin fork project is that one will need to contact Travis support to allow the project, because such projects are automatically identified as doing Bitcoin mining (during their automated tests). Obviously CI providers don't want their resources to be used up by people doing Bitcoin mining.
Travis support staff were awesome and resolved the issue right away. I can only recommend the service to anyone.
The next step in MVF-BU will be to rebase on latest BU release. I am waiting for a next regular release (more bugfixes after the recent exploit hotfixes).
Why not fork right now?!
As the recent black hat exploits against BU 1.0.1 version showed, bugs are easily introduced, and there are no doubt sleeping ones that we do not know about yet, but attackers might.
Firstly, it is necessary that we upgrade MVF-BU to the most recent BU patches, otherwise it can be trivially attacked right now. I'll probably end up manually rebasing the MVF changes on latest BU instead of trying to merge from upstream, as when I looked at the merge set it was more than 400 files with many collisions.
Secondly, there's still quite a bit of work to be done on the MVF regarding clean network separation.
It may well be that the big-blocks front (BU, Classic, XT, probably others too like Infinity) soon gets enough miner support for a majority fork.
In that case, I am most likely going to resume my work on BTCfork only after assisting wherever I can with BU. They have done a great job at attracting more support from the broad community, and they are really putting a lot of effort into improving the quality of their code - a process which I'm grateful and happy to be able to participate in.
Code quality improvements in Bitcoin Unlimited
I'm going to mention a few things I've seen that give me reason to be very optimistic about code quality in BU and neighboring projects.
Firstly, BU has embarked on a mission of cleaning up the C++ code by enforcing automated style checks, using clang-format and some custom tools to sensibly arrange comments.
This is being trial-run on the files which have been introduced by BU, and will probably be either rolled out incrementally as further files are modified, or applied to the whole codebase in a 'flag day' run.
Either way, a long-hoped dream of developers will come true as the code comes out looking much more readable, and can be kept that way by the CI checks and more tools provided to developers.
BU lead dev Andrew Stone has also declared that commented code sections introduced by BU can be cleanup up now. This process has started in earnest [1].
Secondly, BU is in the process of examining the code - both its own additions and the existing base - for more lurking bugs. I can't reveal much more, but to say that I'm satisfied that the project is fully aware of good practice and starting to implement it carefully.
This also applies to code reviews on Github.
I spent several hours reviewing PRs in BU, and I'm very happy to see other devs like Tom Zander assist with excellent code review inputs.
In some cases, developers have found that automated testing is very difficult to apply to their change using the existing frameworks for unit tests and 'qa' (regression) tests.
A proposal would be that rather than silently merging PRs which are affected by this "difficulty to test", the developers should try to explain (on Github) why the code changes are not readily testable by the existing automated frameworks. This would open the discussion and examination. In some cases, maybe a better solution could be found either by adapting the change, or by enhancing a framework.
If there is absolutely no direct solution, then there is a still a remaining option to test such changes:
- write up a manual test script which a tester can perform.
This is not very unusual in legacy applications which were not developed from the ground up with automated testing in mind.
For example, the bitcoin-qt graphical user interface (GUI) is not automatically tested at all. It would be very helpful if there were documented test procedures for such areas where manual testing is still the only solution.
Right now, BU has started using Github PR threads to record when we do manual test steps and what results are obtained by reviewers/testers who try to verify by testing. But long-term, this is also not optimal, because the test knowledge remains buried in some PRs and will be difficult to find again.
Much better will be if projects can start to
keep manual test procedures in version control
together with the code, for example in a
doc/manual_tests
folder.
The test procedures need to be specific enough for the most part so that a tester can treat the software as a black box - just set it up as instructed, perform the listed steps and check for some expected outputs.
On previous projects I worked on, such manual tests were automated over time as test frameworks became mature enough.
A side benefit of committing test procedures and test results to a repository is that the evidence of testing becomes public.
It strikes me as obvious that a lot of testing is done on the Bitcoin codebase, but much of it is not adequately (publicly) recorded, and thus becomes 'lost' to others. In fact, others may even doubt that it has taken place. And they should doubt that is has been tested adequately and correctly if the evidence is not public.
Improving Python code quality
There is quite a bit of Python code in the client.
In BU it is ~100 files, with about 11,5 KLOC in the
qa/
folder alone. Most of this code has not been
run through the standard tools used to keep Python
code clean, like flake8
and pylint
.
Or if it has, it was a long time ago, because it
has deteriorated. It would be helpful to clean all
that code up too, not just the C++ code.
Such Python cleanup is also quite easy to put in automated tools and scripts to help developers. Style checks can be enforced through CI as well.
As an example of cleanup, I have applied some basic tools on a test which was modified by Bitcoin Classic.
I am planning to write a helper script which will make it easier to run such cleanup on all Python scripts.
On improving specifications in Bitcoin, part 1
Before writing software, I prefer to think about the problem first, and come up with a set of requirements, and design changes. In my experience, the time invested in this more than pays for itself.
Specification can be done even when faced with a legacy application to which you intend to make incremental changes. An example would be the set of 'forking' changes made in the MVFs to their baseline clients (BU and Core).
In the Bitcoin world, currently we are used to BIPs and BUIPs providing specifications of changes.
So far, these specifications are relatively informal, if we look at them from an engineering point of view. They currently span a gamut of levels without clear layering. You will find bits of C++ code in BIPs, when pseudocode would be more useful and intelligible to readers who are not necessarily C++ programmers.
Good specifications must cover several levels, from the birds-eye view of the end user, to the software design decisions. It is not practical to expect all of this from a single document like a BIP.
But these different levels of specification are needed for various parties to easily obtain and digest the information they need about working with and maintaining the system. So we can see that the Bitcoin project has quite a large gap to fill in addressing the need for a good specification.
Good requirements [2] in particular help everyone to:
- record and identify what is the expected behavior of the system
- assess whether its actual behavior is in line with expectation, or whether the system needs corrections or enhancement
- reduce communicative friction when talking about particular aspects of its behavior
Requirements should be clear and identifiable. Where necessary, they can introduce special, well-defined terms and it is better if they use formal language [3] to rule out ambiguity.
They need to be easily referenced by other documents related to a system, e.g. design documents, test documents, even bug reports.
Usually, high-level requirements are translated into low-level requirements (e.g. user requirements are refined into system requirement, which in turn can be refined into even lower-level software requirements).
The software requirements in turn are what drives the design of the software, which leads to implemented code.
If you want to have reasonable assurance that the code has been implemented to meet the requirements, it helps to be able to trace the work done - from the requirements, through the design, to the actual code code - and back. This traceability helps with controlling quality, because it makes it easier to check that the code covers the features it is supposed to, and does not do things for which no-one asked.
At the end of the day, maintaining software is a difficult and expensive task whose effort can grow substantially as the size of the codebase increases. Nobody wants to maintain features that no-one uses.
In the BTCfork context I decided right away that we want to have a better requirements and design specification. This requires a bit of tooling to ensure consistency between the requirements and design, and what ends up in the code. I've begun to write such a tool, spec_checker which can cross-check the textual requirements and design documents automatically when changes are made to them.
In conjunction with a CI system, this tool can be used to report when changes are made that leave the specification (and later: the software too) in an inconsistent state.
To see an example of its usage in conjunction with Travis,
you can look at how it is enabled in this commit
on the bitcoinfork-collaborative-spec
repo.
Of course, BTCfork's MVF specifications are still incomplete, and the tool does not yet check whether the requirements are properly traced down to the code level.
It's a first step in that direction - more steps are needed. This will be the first of many 'instalments' on the topic of specifications, what we can improve in Bitcoin, and how we can get to a proper specification of the protocol - a huge topic and endeavour for sure. But it can be done.
[1] https://github.com/BitcoinUnlimited/BitcoinUnlimited/pull/423
[2] https://www.win.tue.nl/~wstomv/edu/2ip30/references/smart-requirements.pdf