Wednesday 21 July 2010
Coding Style As A Failure Of Language Design
Variance in coding style is a huge problem. Reading code where the style varies all over the place is painful. Moving code from one place to another and having to restyle it is awful. Constantly adjusting the style in which you're writing code to conform to the local style of the project, module, file, function or line you're modifying is miring.
Therefore projects adopt style rules to encourage and enforce a uniform style for the project's code. However, these rules still have to be learned, and adherence to them checked and corrected, usually by humans. This takes a lot of time and effort, and imperfect enforcement means code style consistency gradually decays over time. And even if it were not so, code moving between projects looks out of place because style rules are rarely identical between projects --- unless you reformat it all, in which case you damage the relationship with the original code.
I see this as a failure of language design. Languages already make rules about syntax that are somewhat arbitrary. Projects imposing additional syntax restrictions indicate that the language did not constrain the syntax enough; if the language syntax was sufficiently constrained, projects would not feel the need to do it. Syntax would be uniform within and across projects, and developers would not need to learn multiple variants of the same language. More syntactic restrictions would be checked and enforced by the compiler, reducing the need for human (or even tool-assisted) review. IDE assistance could be more precise.
Two major counter-arguments arise. People will argue that coding style is a personal preference and therefore diversity should be allowed. This is true if you only participate in particularly small projects, but if you work in a large project then --- unless you are exceptionally fortunate --- you will have to deal with a coding style that is not your preference, no matter what. (Maciej Stachowiak once said that willingness to subjugate one's personal preferences to a project's preferences is a useful barometer of character, and I agree!)
A more interesting counter-argument is that many coding style rules aren't sufficiently formalized so as to be machine-checkable, and might even be very difficult to formalize at all. This is true; for example, line-breaking rules or variable naming rules might be very difficult to formalize. So I relax my thesis to claim that at least those rules which can be formalized should be baked into the language.
(Figuring out exactly which rules can be formalized, and exploring alternative syntax designs that maximize automatic style checkability while still being nice syntax, sound like fun research! Programming language syntax is one of those areas that I think has been greatly under-researched, especially from the HCI point of view.)
Comments
> rules aren't sufficiently formalized so as to be machine-
> checkable, and might even be very difficult to formalize at all.
I think that something like StyleCop (stylecop.codeplex.com) shows that for some languages - even C heritage languages - this can be done.
I can't find an HTML formatted StyleCop rules list - only a .chm list :-( But trust me the list is extensive.
Pretty much any formalized coding style I've worked with requires that you lay out your code pretty much exactly how Python tells you to, without mixing tabs and spaces randomly. So it's requiring you to write code how you're going to have to write it anyway, and you're near-guaranteed to be able to follow the style of any code you read by other people... and yet it's one of the first complaints you hear about the language.
And sorry for hitting tab-enter too early, so an unfinished comment (above) posted itself.
But coding style may sometimes vary according to the underlying "logic style" of one or another file. I've seen HTML (in spam mail, usually ;-) ) where each level of tag nesting got one additional 8-column hard tab, and the result was ugly. Depending on the structure of the page, 4, 2, or even 1 space per indent level is sometimes better.
Pretty much everyone, every project and every book there uses the style suggested by PEP 8.
Often complemented by the Google Python Style Guide if someone wants even more details.
Maybe someone working on a new experimental language could try the idea out :)
What about the gofmt-style solution? Seems like a good practical approach to the problem.
I guess really the problem is that it's all just plain text until it hits the compiler/interpreter.
Perhaps something like working on a editor formatted view (to your preference) over a style independent source (something like an AST maybe). Then perhaps the only rules could be naming conventions, or at least not things regarding whitespace.
But people like being able to use whatever editor they like (vi, emacs, notepad) to code in.
We've already got various code reformatting tools in IDE's or standalone, perhaps they could be improved and integrated with a compiler like clang as a project policy and allow other tools like diff, merge to work at a style independent level.
How do you manage exceptions to the rules, e.g. per line character limit? Set a soft limit of say 80 chars and hard limit of 100 the prevent things being awkwardly reformatted for the sake of a couple of characters.
If it's a new language and the source can be stored in a machine friendly text format then surely thing get a lot simpler, but would anyone use it?
But you'd think this would be a good option to have, at least, no? Kind of like the old -tt option, but much more comprehensive. I wonder if anyone's suggested it before. Some syntax like "import __strict__", perhaps.
"an API is not about programming, data structures, or algorithms—an API is a user interface, just as much as a GUI. The user at the using end of the API is a programmer—that is, a human being. Even though we tend to think of APIs as machine interfaces, they are not: they are human-machine interfaces."
http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext
Aryeh: yes, I'm sure it would rub some people the wrong way. But those people would probably not get along with project style guides either. See my quote from Maciej. I think you'd have a better chance of pulling this off in a new language since with existing languages people are used to coding any way they want.
Callum, Paul: "structured editing", where the source code is no longer plain text, was tried extensively in the 70s and 80s (see for example CMU's Gandalf project). Those efforts failed. I think there are very good reasons for preferring plain text: you can save "bad" intermediate states, and you can continue using existing editors, and version control systems, and bazillions of other tools that process text.
voracity: gofmt sounds like a step in the right direction but I think a slightly harder line is warranted. If you require that gofmt be run before checking in and before presenting code for review, why not before you run the compiler as well?
Havvy: personally I'd rather focus my creativity on what the program does rather than on where I put my curly brackets. By analogy, I'm much more productive writing plain text than in a WYSIWYG editor like Word, since in Word there's the temptation to fiddle with formatting instead of just write.
I have a feeling it might be more realistic to make the feature optional. You could either have a an option in the compiler or some kind of strict keyword that would enforce a certain style for projects that want it, and still let other people choose their own styles if they wanted to for their own projects.
I recall that, just a bit over a decade ago, some Mac based Pascal environment (sorry, I don't remember the name) had the auto-formatting pretty well done. So it's probably possible, at least, even with an introductory language - it relieves the new programmer the need to figure out coding style.
--
The problem with forcing coding style is that sometimes you need to bend the rules for clarity; for example, most of the time various projects tend to try to go for a certain maximum line width (80, 120, whatever), but often it also comes with a caveat that just a bit over is fine too.
It seems clear to me that good programming style is simply a matter of taste and experience, and can't be encoded in a syntactic constraint.
But there are lots of rules in project style guides, for example about indentation, or brace placement, that are.
Python is the closest we've got so far, but it needs to go further. The ":" at the end of an if/for should be removed... it's implied by the indentation on the following line. As others have said they need to choose from tabs or spaces too. Tabs feel more semantically correct but, in my experience, spaces don't get as mangled in large projects.
@ROC: you say that structured editing is a failed experiment that's no longer used; but it's just a matter of perspective. Almost everyone uses syntax highlighting and that's just a limited form of structured editing. IDE's do various source code transforms& refactorings for your, including partial layouting - that's structured editing that again almost every IDE supports and most people use. Auto-complete or intellisense are simply variants of structured editing that most people appreciate. Code folding is common in IDE's and definitely in XML editing tools - that's structured editing.
These things just need to be taken further; having a comprehensible language serialization (i.e. source code) is important, but that doesn't mean the view needs to be the fixed width terminal-style character matrix it is now.
It's not an all-or-nothing feature; and it's pretty clear which direction the arrow of time is pointing - and a good thing too. How much work would it be to have a large project where each contributor views and edits with his own spacing, indent, newline & bracketing style yet behind the scenes a code-formatter ensures a standardized format is actually exchanged via source control? That's possible with some hassle today...
http://www.artima.com/weblogs/viewpost.jsp?thread=74230
Came across it some time ago when reading Spolsky's "Best Software Writing".