Tuesday, September 12, 2017

Visualizing pull request merge cadence

This summer I had the pleasure to mentor Martin Nečas, a talented student from Střední průmyslová škola Brno, Purkyňova (a high school in Brno, CZ).
Martin has so many interests and wishes to learn everything that he can, and delivers on doing so, what makes me appreciate his eagerness for knowledge.

His summer internship project at Red Hat was to enable developers to visualize the time it takes to merge Pull Requests in GitHub repositories. In particular, we intended to use that to understand the healthiness of the merge processes in OpenShift-related repositories, and to serve as a guide to gauge the effects of changes to the said processes.


It all started back in early May, and in the first weeks Martin took the time to learn about OpenShift, containers, Linux and Python. He has been using Fedora on his laptop since then, and I know he really enjoys it, specially when he discovers new ways of doing things in the command line.

Due to my time constraints, it was only in June 1 when we started discussing his project in more detail. He was quick to grasp the idea and kept coming back with working prototypes. At first he was sold on learning Go, but then decided to stick with Python and Django and leverage his pre-existing skills.

His open source project GithubGraphs can be seen live at http://martin.codingkwoon.com, it is a web application that communicates with GitHub through its GraphQL API to fetch data about PRs and stores it locally to generate interactive visualizations. It is worth noting Martin's ability to play with the GitHub v3 REST API, and then, once he realized the downsides of using it, learning about GraphQL and using it effectively to improve the performance of the project.

This project also let Martin write much more JavaScript code than he has ever written before! And he did it well. Through several iterations, collecting feedback from different sources, he got to a good looking and functional design, well done!

In August, Martin reached out to OpenShift developers to announce his project publicly and get feedback. The reception was positive. Now I wish the original goals of the project will be fulfilled.

As we wrap up our collaboration, Martin is looking for new projects and more exciting times at Red Hat. Good luck!

Martin's GitHub profile: https://github.com/ocasek
Martin's blog: http://mnecas.blogspot.cz/

Thursday, April 6, 2017

Go types and assignability

Yesterday, during our weekly coding dojo, we had a bit of a puzzling moment when we realized one of the rules of assignability in Go.

A coding dojo is a suitable place for these "aha" moments, because we have freedom to try things and take the time to understand details that we could possibly otherwise let go and simply never come across.

On the assignability of values with identical underlying types

When writing table tests in Go, it is a common idiom to have a slice of structs to hold data for each test case. Our struct had a field of a custom type, a tic-tac-toe Game, and the underlying type was a slice of slice of ints ([][]int).

We experimented with different ways of writing a composite literal to create a value holding all test cases including the Game field.

This was what we eventually committed after the dojo: tictactoe_test.go#L6-L18. However, a single Git commit does not capture all the alternatives we tried.

This Go Playground snippet shows how we can use Game and [][]int interchangeably in the context of an assignment, and why we can do it I highlight in this quote from The Go Programming Language Specification:
A value x is assignable to a variable of type T ("x is assignable to T") in any of these cases:
  • x's type is identical to T.
  • x's type V and T have identical underlying types and at least one of V or T is not a named type.
  • T is an interface type and x implements T.
  • x is a bidirectional channel value, T is a channel type, x's type V and T have identical element types, and at least one of V or T is not a named type.
  • x is the predeclared identifier nil and T is a pointer, function, slice, map, channel, or interface type.
  • x is an untyped constant representable by a value of type T.
Game and [][]int have identical underlying types ([][]int), as follows from the definition of types:
Each type T has an underlying type: If T is one of the predeclared boolean, numeric, or string types, or a type literal, the corresponding underlying type is T itself. Otherwise, T's underlying type is the underlying type of the type to which T refers in its type declaration.
And [][]int is not a named type, thus satisfying the assignability rule.

The rule applies recursively!

Yes, it also works if we have intermediate named types, for example if we would define Game as a slice of Row.

What does not work

It is as well interesting to explore what is not allowed by the assignability rules.

Two named types with the same underlying type are NOT directly assignable

We need at least one of the types involved in an assignment to be unnamed, see how this snippet failed to compile.

Note, however, that an explicit conversion is possible.

I think this is reasonable, because two named types can have completely different method sets, and we don't want implicit type conversions... well, the reason why I said assignability was "puzzling" at the very beginning is that it would be probably okay if a [][]int literal was not assignable to a Game. I suspect the reason that assignment is possible has to do with either convenience or some implication for the type system I have not thought of.

So that's the summary of one of the things we learned in the dojo yesterday. Did you know about it? Was it useful?

Monday, March 20, 2017

Indentation styles

pro-tip for indentation style

Style 1: bad

                                                                 "v" + openshift_image_tag))
Suppose during a refactor we rename docker_or_rpm_images -> images, result:
                                                                 "v" + openshift_image_tag))
Bad indentation... this style is painful to maintain.

Style 2: better

            self.qualified_docker_images(self.image_from_base_name(image_base_name), "v" + openshift_image_tag))
Suppose during a refactor we rename docker_or_rpm_images -> images, result:
            self.qualified_docker_images(self.image_from_base_name(image_base_name), "v" + openshift_image_tag))
We still have a long and apparently complicated line, but this time the indentation stays consistent, no need to update adjacent lines to the one that was automatically changed with the rename.
For more details:

(post content extracted from a PR I was reviewing recently)

Monday, January 30, 2017

Do not squash commits mindlessly!

I've always wanted to write about my frustration with projects and code reviewers that insist on mindlessly squashing all commits of a pull request before merge, instead of accepting well factored commits.

Because I've just come across a blog post from Matthew Garrett about the same topic, I'll cite him:


(...) When you're crafting commits for merge, think about your commit history as a textbook. Start with the building blocks of your feature and make them one commit. Build your functionality on top of them in another. Tie that functionality into the core project and make another commit. Add client support. Add docs. Include your tests. Allow someone to follow the growth of your feature over time, with each commit being a chapter of that story. (...) 

It's terrible when we go see the history of a certain line of code just to discover it was changed last together with other 20,000 lines, in a commit with a poor message, in a pull request with empty description...

Does TDD work?! Baby steps?!

I believe one good quality of thinkers, engineers, programmers, you name it, is skepticism.
Not taking things for granted allows one to dig deeper, try to understand pros and cons, dive into a detailed level of understanding.

TDD is old enough that you can find online (and physical books even) a myriad of related material, both favoring it and bashing it. I'd say that questioning it is great, whether you are in favor, against or indifferent towards it.

I'd like to share some links I've (re)visited recently:


The first one is a question, "why does TDD work?", or perhaps it should've been "does TDD work?!?"

I sympathize with this comment (emphasis mine):

TDD is, in my opinion, mainly about ways to make sure you can check parts rather than a big 'all or nothing' at the end. But the adagium of TDD, 'build a test first' is not meant to be 'make a test before you think about what you want to accomplish'. Because thinking of a test IS part of designing. Specifying what you want that exact part to do, is designing. Before you ever start to type, you have already done some designing. (In that way I think the term 'test driven design' is misleadingly implying a oneway path, where it is really a feedback loop).

Good design can come either through picking tests intelligently, or by any other technique... how do you learn and discover good designs? You either learn from existing designs, or you try things out.
I think the former is more effective, and immensely prepares you for the latter.

To a mathematician's mind, I'm speaking to you now, though I see programming are much more than maths, as programming projects are also related to arts, social science, communication, and more...
The way how I learned many topics in maths (and physics, etc): I was taught some theory, and then I did hundreds, perhaps thousands, of exercises. Often an exercise builds up on ideas you've learned in previous challenges. Often a problem is hard enough that you need to be presented with a solution... then you understand it, it makes the problem seem much easier than it originally was. And them you feel empowered to solve similarly tough exercises, and to innovate when faced with the unseen.

What if TDD doesn't work as in this quote from Peter Norvig:

“Well, the first thing you do is write a test that says I get the right answer at the end,” and then you run it and see that it fails, and then you say, “What do I need next?”—that doesn’t seem like the right way to design something to me. It seems like only if it was so simple that the solution was preordained would that make sense. I think you have to think about it first. You have to say, “What are the pieces? How can I write tests for pieces until I know what some of them are?” And then, once you’ve done that, then it is good discipline to have tests for each of those pieces and to understand well how they interact with each other and the boundary cases and so on. Those should all have tests. But I don’t think you drive the whole design by saying, “This test has failed.”

Indeed. Writing arbitrary tests and making them pass don't take you anywhere.

The second link I share is a good commented example of how it goes when the order of tests in unfavorable and we get stuck. One needs to think. You need to build a baggage, a toolbox, and it takes time ;-)

At all times, let's keep in mind that who drives the show is YOU (by writing tests or anything else).

Like other techniques, TDD doesn't always work, for it depend on individual and collective "talent" to get things done at a certain quality level.

(...) how come it does not always work?

Because testing requires a VERY different mindset than building does. Not every one is able to switch back and from, in fact some people will not be able to build proper tests simply because they cannot set their mind to destroy their creation. This will yield projects with too few tests or tests just enough to reach a target metrics (code coverage comes to mind). They will happy path tests and exception tests but will forget about the corner cases and boundary conditions.

Others will just rely on tests forgoing design partly or altogether. (...)

One positive thing about practicing collectively in a coding dojo is the ability to criticize design decisions, and to share that "baggage" on things that work, and things that doesn't, most often being able to show why it does or doesn't. And, looking from a different perspective, there's space to learn things one would not otherwise think in isolation.

Baby steps

This is notably quite long already... but closing with the third link, here goes something about the "size" of baby steps, something that people being introduced to TDD and baby steps are often asking, when they start to think that one ought to write senseless code when you know an obvious implementation. No, you do not need to write mindless code nor are you supposed to do that.

If you know how to do something, if it is obvious for you and you are comfortable, go ahead and do it. Stay in the flow. Now, when your instincts fail, there is the "fake until you make it" technique to keep you up in the game. Faking endlessly and thoughtlessly will not magically solve the problem for you, but gives you time to observe.
When overconfidence fails you, you can learn to know when to step back.

A whiteboard or a piece a paper, or an interactive interpreter, all might be equally good tools for fostering thinking.

We practice TDD and Baby Steps in coding dojos, though those are not magical techniques that solve all problems. The breath of domains we write programs for is so vast that no single technique could possibly be a silver bullet. Those two are certainly not sufficient in the toolbox of one aspiring to be a good programmer.

What are other useful techniques for you?