Sunday, 30 August 2020

Surprising Words In Luke 1:16-17

This is another interesting, perhaps slightly weird, verse that is often read and glossed over. Luke writes of a prophecy delivered to Zechariah about his yet-to-be born son, John the Baptist:

"And he will turn many of the children of Israel to the Lord their God, and he will go before him in the spirit and power of Elijah, to turn the hearts of the fathers to the children, and the disobedient to the wisdom of the just, to make ready for the Lord a people prepared."
Turn people back to God, return the disobedient to wisdom, prepare people to receive the Messiah ... these are unsurprising prophetic priorities. But turn the hearts of fathers to their children? Why was that a priority for God in that era?

Understanding this depends on knowing that it's actually a quote from Malachi 4:6:

"He [returning Elijah] will turn the hearts of the parents to their children, and the hearts of the children to their parents; or else I will come and strike the land with total destruction."
Quoting that here makes sense because John the Baptist is later identified as acting in the spirit of Elijah, fulfilling this prophecy. Thus John will not just turn the hearts of fathers to their children, but also those of children to their fathers, i.e. strengthen familial love in general. Still, it's very interesting that while Biblical teaching often emphasizes the duty of children to honor their parents (e.g. the fifth Commandment), Luke has instead chosen to emphasize the duty of parents to love their children.

I think it's also interesting that Luke highlights familial love here when the rest of the Gospels often seem to give it short shrift. Jesus sometimes leaves his family waiting, he tells his followers they are his mother and brothers, and his disciples leave their families to follow him. Luke's choice here helps keep those events in perspective.

Sunday, 23 August 2020

What's So Amazing About Mark 10:32

I've probably read this sentence twenty times without really thinking about it:

They were on their way up to Jerusalem, with Jesus leading the way, and the disciples were astonished, while those who followed were afraid.

Sandwiched between Jesus explaining how hard it is for the rich to enter the kingdom of God and predicting his own death, this sentence is easy to gloss over as a mere connective, but it raises interesting questions. Why mention that Jesus was leading the way — didn't he always lead the way? Why were the disciples astonished? Why were those who followed afraid?

Fortunately in this case the context suggests satisfactory answers. Jesus is nearing Jerusalem, the seat of civil and religious powers who are hostile to him and his message. Everyone can sense an impending confrontation that will either end in his triumph, or if history is any guide, much more likely his death. (Spoiler! It will be both.) No wonder his followers are afraid. No wonder the disciples are amazed that Jesus is deliberately pressing on towards this confrontation — and perhaps that amazement gives them confidence to be less fearful than the other followers. We can't know exactly what Jesus was thinking at this moment, but it's easy to imagine him being afraid but driven onward by his Messianic mission.

With that in mind, this simple sentence paints an extraordinary psychological scene — a worthy subject for a painting or a short drama. Too bad those aren't my skills.

I expect to keep finding new treasures no matter how many times I read the Bible.

Saturday, 8 August 2020

Scaling Debuginfo For Zero-Cost Abstractions

In yesterday's post I raised the issue that Rust and C++ promise zero-cost abstractions, but with the standard build configurations people use today, you have to choose between "zero-cost abstractions" and "fast and debuggable builds". Reflecting on this a bit more, I realized that the DWARF debuginfo format used with ELF binaries inherently conflicts with the goal of having both zero-cost abstractions and fast debuggable builds, because of the way it handles inline functions (and possibly for other reasons).

Suppose we have a non-inline function F calling non-inline function G, at K sites, each time via a stack of N inline trivial wrapper functions I1 ... IN. Zero-cost abstraction means this compiles to the same code as F calling G directly K times. However, correct DWARF debuginfo for F must contain KxN explicit DW_TAG_inlined_subroutines, one for each instance of an inlined function :-(. Tracking and then emitting all this debuginfo will necessarily slow down the build, and make it slower as you add more layers of abstraction. Maybe this isn't (yet) a problem in practice compared to the cost of optimizations, but it is a fundamental limitation. Fixing it would probably require changing the debuginfo format so that, at least for the "minimal optimizations for fast debuggable builds" mode, the debuginfo for an inline function can be emitted just once and then parameterized for each call site where it is inlined. I don't have a clear idea of how that would work! This probably means the format and emission of debuginfo needs to be carefully codesigned with the optimizer to achieve fast debuggable builds with zero-cost abstractions.

Friday, 7 August 2020

What Is The Minimal Set Of Optimizations Needed For Zero-Cost Abstraction?

A compelling feature of Rust and C++ is "zero-cost abstractions". You can write "high level" code, e.g. using iterators, that compiles down to the same machine code as the low-level code that you'd write by hand. You can add layers of abstraction, e.g. wrapping a primitive value in a struct and providing a specialized API for it, without adding run-time overhead. However, "zero-cost" only applies if you enable an adequate set of compiler optimizations. Unfortunately, enabling those optimizations slows down compilation and, with current compilers, trashes a lot of debug information, making it very difficult to debug with those binaries. Since the Rust standard library (and increasingly the C++ standard library) makes heavy use of "zero-cost" abstractions, using non-optimized builds for the sake of better debugging and build times creates binaries that are many times slower than release builds, often untenably slow. So the question is: how can we get fast builds and quality debuginfo while keeping zero-cost abstractions?

An obvious approach is limit the set of enabled compiler optimizations to the minimum necessary to achieve zero-cost abstraction, and hope that that produces acceptable build speed and debuginfo quality. But what is that set? If it's "all optimizations", then this is no help. I guess it depends on the exact set of "zero-cost abstraction" patterns we want to support. Here are some optimizations that I think need to be on the list:

  • Inlining: Most or all abstractions depend on inlining functions to achieve zero-cost, because they encapsulate code into functions that you'd otherwise write yourself. So, we must have aggressive inlining.
  • Copy propagation: After inlining functions, there will be chains of assignments through parameters and results that will need to be shortened using copy propagation.
  • Limited dead store elimination and temporary elimination: After copy propagation, a lot of temporary values aren't needed. We shouldn't store their values or allocate space for them.
  • Scalar replacement: Many abstractions package up variables into structs and encapsulate operations on the struct into functions. It seems likely that the optimizer needs to be able to undo that, taking structs apart and treating primitive components like any other scalar variable. This would typically be needed to make the above optimizations work on those components.

Here are some optimizations that I think don't need to be on the list:

  • Register allocation: Unoptimized debug builds typically do almost no register allocation: local variables live in stack slots. Thus, if unnecessary temporaries are eliminated (see above) storing the surviving variables on the stack is not penalizing abstraction.
  • Vectorization: Unoptimized debug builds typically don't do any vectorization, so again we can just keep on not doing it without penalizing abstraction.
  • Tail calls: I can't think of a Rust or C++ abstraction that relies on TCO to be zero-cost.

Here's an optimization I'm not sure about:

  • Common subexpression elimination: Are there common abstractions that we think should be zero-cost in Rust and C++ that require CSE? I can't think of any off the top of my head.

It would be very interesting to dive into this further and actually configure the Rust compiler with a promising set of optimizations and evaluate the results.

A somewhat orthogonal approach to the whole problem is to simply improve the debuginfo quality and build speed of optimized builds. The former is mostly a lot of engineering and architectural work to preserve debuginfo through various optimization passes. There are issues where some optimizations (e.g. register allocation) cause variable values to be unavailable at program points where they're technically still in scope — but good record-and-replay debuggers like rr or better still omniscient debuggers like Pernosco can get around that. Build speed is really the harder problem: doing lots of optimizations, especially with all the bookkeeping to preserve debuginfo, is inherently expensive.

I wonder what a compiler backend would look like if you designed from scratch with the goal of good debuginfo and the fastest possible builds while enabling zero-cost abstractions.