Eyes Above The Waves

Robert O'Callahan. Christian. Repatriate Kiwi. Hacker.

Monday 27 August 2007


This is a hot topic in New Zealand right now, and an interesting relevant rant about OOXML technical issues just appeared on Slashdot, so I feel like getting my 2c in...

Rod Drury's blog has an interesting comment:

The antagonists seem to change between the spec being either

a) Too long and detailed for people to be able to implement or;

b) Too vague and lacking in specific detail for people to implement.

It's entirely possible for both of these to be true at the same time --- when the thing you're trying to specify is much too complicated, so that a specification with all the necessary details is much too large to be correctly implemented by anyone. That is exactly the situation OOXML is in; it wants to specify every detail of the behaviour of Microsoft Office, millions of lines of code accumulated over 20 years.

People are worried about the cost of migrating documents from Word format to ODF. I don't think anyone should be advocating such a migration; legacy documents, if converted at all, should simply be printed to PDF for archival. The real issue is what format new documents should be produced in. OOXML is clearly a horrible format from a purely technical point of view --- not even its defenders seem to be challenging this --- so there's a strong argument that you do not want to be producing new documents in OOXML if you have a choice. I think having OOXML stamped by ISO as a de jure standard would send the wrong message about that. On the other hand, making it an standard does not help at all with the problem of preserving legacy documents.

The overall most cost-effective and future-proof solution IMHO is for ODF to be the single de jure office document standard, for Microsoft to ship ODF read/write capability in its products, and for people who care about the longevity and openness of their documents to make ODF the default for new documents, while continuing to work on old documents in their existing formats. Presumably Microsoft opposes this approach because it would ever-so-slowly weaken people's dependence on their products.

Someone might wonder whether support for the WHATWG's efforts to standardize "Web-compatible" HTML is consistent with opposition to Microsoft's efforts to standardize "Word-compatible" OOXML. It is, because there are actually many differences. The main difference is that Ian and his gang aren't just specifying "whatever IE does"; they typically look at what a number of implementations do, and choose the behaviour that makes the most sense while remaining "Web compatible" (which is often, but certainly not always, what IE does). This is made easier because HTML and its related technologies originated in the standards world, whereas Microsoft's Office formats have always been wholly their own monstrous babies. (Another major difference, I suspect, is that conversion of HTML documents is a lot less feasible due to greater reliance on embedded scripting to give meaning to pages.)


I worry that sentiments like "everybody should use ODF going forward" will hurt further development of office suites, because you can kind of end up limited by mandated formats.
A complex file format can be seen as a specification for the program that writes it -- you must implement every feature mentioned in the standard to be useful, and you can't really implement any additional features, because the format limits the way you can represent them in your files.
This means that you end up writing a file format based on the lowest common denominator of current programs, and then it can be difficult to move things forward from there.
I realize that a common objection to that argument will be that 'feature bloat' isn't a good thing, and that a format that discourages it isn't a problem, but it's not just about adding more and more incremental features. Consider Apple's recent Numbers spreadsheet. It uses a fundamentally different model than Excel does (multiple tables arranged on a canvas instead of a single infinite grid), and really could not have been implemented if it was required to produce only Excel or ODF-compatible files.
So I'd claim that you do get some freedom-from-Microsoft benefit by mandating formats like ODF, but you also risk choking off innovation in ways that people might not like much long-term.
It also bothers me that so many open source advocates in this fight are talking out of both sides of their mouths about MS Office compatibility. Witness this comment from the post you linked:
"It [Open XML] is hardly an open standard when parts of the standard require knowledge of the workings of proprietary software."
followed very quickly by:
"Indeed, OpenOffice.org handles older .doc files better than current versions of Word."
So on the one hand, requiring knowledge of the workings of proprietry software is a fatal flaw, but on the other hand, open suites handle compatibility with that proprietary software better than MS does.
You can't really have it both ways.
Robert O'Callahan
> I worry that sentiments like "everybody should
> use ODF going forward" will hurt further
> development of office suites, because you can
> kind of end up limited by mandated formats.
That's a good point. I don't think that's the right sentiment. I think people have to decide for themselves whether longevity, interoperability and standards matter more to them than cool innovative features; the answer should sometimes be "no", and in that case, arguments about standards and ODF do not apply.
> So on the one hand, requiring knowledge of the
> workings of proprietry software is a fatal
> flaw, but on the other hand, open suites handle
> compatibility with that proprietary software
> better than MS does.
> You can't really have it both ways.
Sure you can: reverse engineering proprietary formats is a necessary evil for a a real-world office suite, but an unnecessary and much greater evil when you make it part of a standard. One of the major reasons to have a standard is to avoid this reverse engineering requirement.
(I don't believe that line about OpenOffice handling old .doc files better, but then I have little experience in this area.)
Hmm, should have collared you for our little NZOSS team Rob.
Nat, at issue was not really whether ODF is the be all and end all of document standards...it was whether OOXML was a good enough standard to be ISO credited. It isn't and in all likelihood, never will be.
So, given the above what would be the best outcome? Best would be for MS to recognise that it can and should contribute to ODF, particularly in critical areas where that specification is weak.
I have used (my understanding of) Rob's example of the HTML standardisation process with regards to innovation.
As I understand it the cutting edge "new stuff" that he is busy churning out is openly documented by a group of like minded folks. As this new stuff gets proven these documents work their way up to standards quality for eventual inclusion in the specification. That's why ODF is at 1.4 but ISO has only accredited 1.1 (from memory).
Chris Jay
Personally I think both OOXML and ODF are on the wrong side of history. Office suites are still built for the days of "personal computing", when data was stuck on C: drives and collaboration meant picking up the phone.
Let's not design a horseless carriage, let's design a car. For at least 80% of cases, HTML is far better than either OOXML and ODF, because of collaboration, hosting, search engines, the hyperlink, widgets, customization, the mobile internet, and new business models. And that advantage is increasing every day.
It is all the same: Open XML or ODF migration. The backwards argument is bogus. OO.org import of the legacy format is better than Microsoft's.
Microsoft can adapt ODF and is prepared to do that. But instead they push their own broken format in order to weaken ODf adoption.
Alan Trick
Chris: HTML is a great language, but it was designed for a different sort of document than ODF is designed for. HTML isn't good at being represented in a WYSIWYG interface, it's too flexible and too lenient. Also, ODF is XML, so it can take advantage of things like SVG and MATHML.
There are rich-text editors that use HTML natively (Google Docs), those millions of HTML editors for websites, etc. However, they all produce some bastardized version of HTML. Most of them get really funny on you if you try to use them on a document with proper CSS and such. Also, they are either missing a lot of useful formatting features or are too complicated for most regular users.
And by the way, there's nothing on your list that couldn't work fine with ODF.