Tuesday, 2 October 2018

The Costs Of Programming Language Fragmentation

People keep inventing new programming languages. I'm surprised by how many brand-new languages are adopted by more than just their creators, despite the network effects that would seem to discourage such adoption. Good! Innovation and progress in programming languages depend on such adoption. However, let's not forget that fragmentation of programming languages reduces the sum of those beneficial network effects.

One example is library ecosystems. Every new language needs a set of libraries for commonly used functionality. Some of those libraries can be bindings to existing libraries in other languages, but it's common for new languages to trigger reimplementation of, e.g., container data structures, HTTP clients, and random number generators. If the new language did not exist, that effort could have been spent on improving existing libraries or some other useful endeavour.

Another example is community support. Every new language needs an online community (IRC, StackOverflow, etc) for developers to help one another with questions. Fragmenting users across communities makes it harder for people to find answers.

Obviously the efforts needed to implement and maintain languages and runtimes themselves represents a cost, since focusing efforts on a smaller number of languages would normally mean better results.

I understand the appeal of creating new programming languages from scratch; like other green-field development, the lure of freedom from other people's decisions is hard to resist. I understand that people's time is their own to spend. However, I hope people consider carefully the social costs of creating a new programming language especially if it becomes popular, and understand that in some cases creating a popular new language could actually be irresponsible.

Tuesday, 25 September 2018

More Realistic Goals For C++ Lifetimes 1.0

Over two years ago I wrote about the C++ Lifetimes proposal and some of my concerns about it. Just recently, version 1.0 was released with a blog post by Herb Sutter.

Comparing the two versions shows many important changes. The new version is much clearer and more worked-out, but there are also significant material changes. In particular the goal has changed dramatically. Consider the "Goal" section of version (emphasis original)

Goal: Eliminate leaks and dangling for */&/iterators/views/ranges
We want freedom from leaks and dangling – not only for raw pointers and references, but all generalized Pointers such as iterators—while staying true to C++ and being adoptable:
1. We cannot tolerate leaks (failure to free) or dangling (use-after-free). For example, a safe std:: library must prevent dangling uses such as auto& bad = vec[0]; vec.push_back(); bad = 42;.
Version 1.0 doesn't have a "Goal" section, but its introduction says
This paper defines the Lifetime profile of the C++ Core Guidelines. It shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time.
The new goal is much more modest, I think much more reasonable, and highly desirable! (Partly because "modern C++" has introduced some extremely dangerous new idioms.)

The limited scope of this proposal becomes concrete when you consider its definition of "Owner". An Owner can own at most one type of data and it has to behave much like a container or smart pointer. For example, consider a data structure owning two types of data:

class X {
    X() : a(new int(0)), b(new char(0)) {}
    int* get_a() { return &*a; }
    char* get_b() { return &*b; }
    unique_ptr<int> a;
    unique_ptr<char> b;
This structure cannot be an Owner. It is also not an Aggregate (a struct/class with public fields whose fields are treated as separate variables for the purposes of analysis). It has to be a Value. The analysis has no way to refer to data owned by Values; as far as I can tell, there is no way to specify or infer accurate lifetimes for the return values of get_a and get_b, and apparently in this case the analysis defaults to conservative assumptions that do not warn. (The full example linked above has a trivial dangling pointer with no warnings.) I think this is the right approach, given the goal is to catch some common errors involving misuse of pointers, references and standard library features. However, people need to understand that code free of C++ Lifetime warnings can still easily cause memory corruption. (This vindicates the title of my previous blog post to some extent; insofar as C++ Lifetimes was intended to create a safe subset of C++, that promise has not eventuated.)

The new version has much more emphasis on annotation. The old version barely mentioned the existence of a [[lifetime]] annotation; the new version describes it and shows more examples. It's now clear you can use [[lifetime]] to group function parameters and into lifetime-equivalence classes, and you can also annotate return values and output parameters.

The new version comes with a partial Clang implementation, available on godbolt.org. Unfortunately that implementation seems to be very partial. For example the following buggy program is accepted without warnings:

int& f(int& a) {
    return a;
int& hello() {
    int x = 0;
    return f(x);
It's pretty clear from the spec that this should report a warning, and the corresponding program using pointers does produce a warning. OTOH there are some trivial false positives I don't understand:
int* hello(int*& a) {
    return a;
:2:5: warning: returning a dangling Pointer [-Wlifetime]
    return a;
:1:12: note: it was never initialized here
int* hello(int*& a) {
The state of this implementation makes it unreliable as a guide to how this proposal will work in practice, IMHO.

Monday, 17 September 2018

The Danger Of GMail's "Smart Replies"

At first I was annoyed by GMail's "Smart Reply" buttons because they represent a temptation to delegate (more) shaping of my human interactions to Google's AI ... a temptation that, for some reason known only to Google, can be disabled in the GMail mobile app but not the desktop Web client. I do not want the words I use to communicate, or the words others use to communicate to me, to be shaped by the suggestions of an algorithm that is most likely opaque even to its masters, let alone a mere consumer like me.

I just realized, though, that they're potentially a lot worse than that. I got an email suggesting I take an action, and the suggested "smart replies" are:

  • Sounds like a good idea.
  • I like that idea.
  • Yes, I agree.
But ... what if I don't agree? Does showing me only positive responses actually prime my brain to make me more likely to agree? Is it possible to tweak the wording of an email to ensure the algorithm produces responses of a particular type? (Probably.) More importantly, did anyone at Google actually consider and study such effects before rolling out this feature? Or did the team just roll out the feature, collect the bonus, and move on? If they did study it, are the results public and what were they? Wouldn't it be wise to require this kind of study and disclosure before subtly interfering with the cognitive processes of hundreds of millions of people?

For now I'm switching back to GMail Classic, and when (I assume) Google forces the new UI on me anyway, the path of least resistance will be to use a Firefox extension to block the Smart Reply buttons (yay Web!). Of course hundreds of millions of people will unwittingly submit to Google's reckless mental meddling.

Tuesday, 4 September 2018

"Crazy Rich Asians"

Pretty good movie. A few observations... (Spoilers!)

I don't know what the ultra-rich really get up to, but for me the most absurd part of the movie was the MJ scene. Eleanor's early hand was trash; there was no way she she could have amassed the pungs-and-bamboos winning hand she did, not with Rachel also collecting bamboos.

Maybe I misunderstood everything, but didn't the Astrid-Michael subplot undermine the main plot by proving Eleanor was right all along? Michael and Astrid set aside their different backgrounds and family disapproval to marry (presumably for love), but Michael couldn't cope with the pressure and ruins their marriage ... just like Eleanor fears will happen with Rachel. Main plot: true love wins! Subplot: ... er no it doesn't.

The entire movie screams "FIRST WORLD PROBLEMS". In particular the idea that a man like Michael could not simply be grateful for his situation is marginally plausible but nearly unforgivable.

My source tells me the actors' Cantonese was pretty bad.

I'd watch Michelle Yeoh read the phone book.

Saturday, 1 September 2018

Rangitoto Fog

Visiting Rangitoto is one of my favourite things to do in Auckland. Catch the 9:15am ferry from the downtown terminal, arrive on the island just before 10am, walk up to the top, see the incredible views over Auckland and the Hauraki Gulf, and then back down via the lava caves and easily make the 12:45pm ferry getting you back to the city by 1:30pm. You've experienced a unique 600-year-old island with extraordinary geology, flora and fauna, and had a good walk, in four hours.

Today was extra-special. Very thick fog blanketed the harbour and inner Gulf, and the ferry proceeded very slowly to the island; the trip that normally takes 30 minutes took 75. We passed a number of becalmed yachts that apparently were supposed to be racing, but instead were drifting aimlessly through the fog. It was surreal. Once we finally reached the island and headed inland, we almost immediately left the fog, but the fog left behind spiderwebs sparkling with dew and rocks steaming in the sun. From Rangitoto's summit we could still see large fog banks covering Waiheke Island, Motuihe Island, and much of the inner Gulf. It was wonderful!

Friday, 24 August 2018

Long Live The Desktop Computer

Eight years ago I bought a Dell Studio XPS 8100 desktop for a home computer at a moderate price (NZD 3,100). I've just replaced a failing 1TB hard drive with a 500GB SSD, but other than that I've done no upgrades. What's interesting to me is that it's still a perfectly good machine: quad-core i7, 12GB RAM, NVIDIA GPU with 2GB VRAM. Everything I do, this machine could still do well, including software development for work. I guess if I wanted to play the latest AAA game titles or use a 4K monitor on it, I'd be unhappy, but I can't think of anything else I'd even consider doing that would be a problem, and those could be addressed by upgrading the video card. If this machine doesn't fail catastrophically I can see us continuing to use it for many more years. (I run Linux on it; the situation might be different if it was Windows.)

This is interesting because up until 2010 I'd been in the habit of upgrading computers at least every five years because they would improve dramatically over that time in ways that mattered to me. That stopped happening. It hasn't entirely stopped for everyone — Mozilla developers are getting new desktops with double-digit numbers of cores to speed up Firefox builds — but I run my heavy-duty workloads in the cloud now, because really big machines aren't efficiently utilized by a single developer. I guess the economics of utilization and colocation will making cloud-based heavy lifting (not necessarily public clouds) increasingly prevalent over time.

One of the implications is that declining desktop sales don't necessarily mean declining desktop usage. I think they must at least partly reflect longer upgrade cycles.

Another implication is that component reliability for desktops is becoming more important. It doesn't really matter if parts wear out after five years, if you're going to replace the whole machine before then anyway. If the expected lifespan of a machine is fifteen years, it's worth buying more reliable parts.

Another implication is longevity bottlenecks might shift to relatively minor features like what types of USB ports your machine has. I guess some of this can be alleviated by upgrades and dongles but it's worth thinking about.

Friday, 17 August 2018

ASAN And LSAN Work In rr

AddressSanitizer has worked in rr for a while. I just found that LeakSanitizer wasn't working and landed a fix for that. This means you can record an ASAN build and if there's an ASAN error, or LSAN finds a leak, you can replay it in rr knowing the exact addresses of the data that leaked — along with the usual rr goodness of reverse execution, watchpoints, etc. Well, hopefully. Report an issue if you find more problems.

Interestingly, LSAN doesn't work under gdb, but it does work under rr! LSAN uses the ptrace() API to examine threads when it looks for leaks, and it can't ptrace a thread that gdb is already ptracing (the ptrace design deeply relies on there being only one ptracer per thread). rr uses ptrace too, but when one rr tracee thread tries to ptrace another rr tracee thread, rr emulates the ptrace calls so that they work as if rr wasn't present.