Friday, 26 November 2010

Measuring FPS

It seems standard practice for Web authors to measure performance in FPS (frames per second). There are two large problems with this.

First, FPS is meaningless above 60 or so, because you normally can't get more frames than that onto the screen. A well-designed software stack will actually prevent FPS from rising above 60 --- those extra frames are wasting power. More than 60 FPS means you have a bug (or you are doing very specialized full-screen gaming --- i.e. not a Web browser). I discussed this in more detail here.

Second, Web authors normally measure FPS as "how many times did my event handler run per second". But what they actually should be measuring is how many different frames the browser was able to paint to the screen per second, and that's a very different thing, since a setTimeout handler (for example) can run multiple times between browser repaints. It's understandable that Web authors would make such a mistake, since browsers haven't provided a way to measure the real FPS. However, in Firefox 4 we have added an API for this: mozPaintCount. Use it!

I think authors need to get into the habit of measuring FPS using mozPaintCount or something similar on other browsers and determining first whether their app is able to reach 50-60FPS. If it can, then further performance improvements can only be measured by increasing the complexity of the scene until FPS starts dropping off. Microsoft's FishIE Tank is the right approach. (I don't know how they try to measure FPS though; they're probably not using mozPaintCount, so it's probably not very accurate.)

Most Unusual Rugby Commentary

This is very weird. French philosopher Bernard-Henri Lévy writes about the recent rugby match between Ireland and the All Blacks ... in exactly the style you'd expect from a French philosopher. It's cool yet slightly annoying ... again, as you'd expect from a French philosopher!

Wednesday, 24 November 2010

Herald Interview

Adam Gifford from the Herald gave me a phone interview a couple of weekends ago, and his column is now online. I like it --- thanks Adam!

Tuesday, 23 November 2010

Brain Drain Vs Foreign Invasion

Apparently a lot of people in the USA are upset about the H1-B visa program, especially as it applies to "IT" workers.

I've always found it ironic that at the same time Americans complain about foreigners stealing US jobs, people in the originating countries complain about the "brain drain" of talent moving to the US. Can both groups be right? Would everyone be better off if talent stayed at home?

I tend to think not. I suspect the complainants on the US side undervalue the contributions of foreign workers. If they successfully shut down visa programs, jobs will simply be outsourced to where the workers are. If outsourcing is throttled, entire companies will move. In any event the whole US economy will suffer.

Personally, I think a reduced inflow of talent to the USA would be a good outcome. "Brain drain" effects are destabilizing; they create vicious cycles in the originating countries, and don't deliver commensurate benefits to the destination country.

A confounding issue here is that "IT" and "high tech" are not synonymous. A lot of H1B jobs are IT drudgery; changes there will not affect genuine innovation or national competitiveness. Making it difficult for PhD graduates to stay is a different story, but I suspect this important distinction will be lost in the battle.

Wednesday, 10 November 2010

Clinton Bedogni Prize

Last night at the New Zealand Open Source Awards in Wellington I was honoured to be presented with the Clinton Bedogni Prize for Open Systems. The prize is funded by an endowment from the Bedogni family to commemorate their son, who had a passion for Linux and open source.

I explained to the audience that I love making software, and I'm actually amazed that people pay me to do it, and especially to then give the results away to the world! To get a prize on top of that is incredible :-). I went on to thank my wife for all the time I spent hacking on Gecko, especially during the first five years when it was strictly a hobby. Then I thanked the partners and family members of open source contributors everywhere for the same reason.

Tuesday, 9 November 2010

Eclipse Breakthrough

I've been using Eclipse for quite a while now as my code editor. A few years ago I tried using the "smart" CDT features on Gecko but they didn't work very well. Now, five years later (how time flies!) I am very pleased to report that they work much much better! I'm sure the software has improved (Eclipse 3.6.1, CDT 7), but my laptop now having four CPU cores and 8GB of memory has probably helped too :-).

I create an "empty" C++ project with the source directory pointing to an existing hg repository. I turn off the automatic build options, but enable all the indexing features. I turn off the "scalability" option for the C++ editor --- setting the number-of-lines threshold to 100,000 doesn't seem to cause any problems for me. In the project properties, I add resource filters to exclude everything under the .hg and object directories. In the "Paths And Symbols" page of the project properties, I add all the symbol definitions from the generated mozilla-config.h file of my debug build. This probably could be automated but I did it manually. Then I let the indexer run to completion. It doesn't take very long --- minutes, not hours.

With all that done, many features seem to mostly work, at least in layout and gfx code:

  • Ctrl-click on an identifier jumps to its definition
  • Ctrl-click on an identifier at its definition jumps to the declaration if there's a separate declaration
  • Hovering over an identifer shows its definition
  • Opening a "Type Hierarchy" shows the superclasses and subclasses of the selected class (or the class of the method containing the caret)
  • Opening a "Call Hierarchy" shows the callers/callees of the selected method
  • Opening "Quick Type Hierarchy" on a virtual method declaration shows the classes implementing that method
  • Opening "References / Project" shows all references to the selected declaration
  • Context-sensitive autocomplete often works.

All very useful! And all very smooth and fast, on my laptop anyway. Major kudos to the Eclipse team, this should really improve my productivity.

Of course, it's not perfect. For example, autocomplete after "foo->", where 'foo' is an nsCOMPtr or nsRefPtr, doesn't seem to work. However, the features work well enough that I'm getting into the habit of using the features and expecting them to work.

Tuesday, 2 November 2010

Implementing A High-Performance Emulator In Javascript Using Run-Time Code Generation

For a while I've been thinking about exploiting fast browser JITs and JS "eval()" to build really fast emulators and other language runtimes. Tonight I was feeling jumpy so I went ahead and hacked something up.

I started by defining a really simple imaginary CPU instruction set, writing a simple program for that instruction set, and implementing a trivial interpreter to execute it. It executes 100M instructions in about 7.5 seconds (Firefox opt trunk on my laptop), which is already pretty good! But we can do a lot better.

My idea is to write a trace compiler in Javascript for the emulated CPU. It's actually very easy. When we need to execute CPU instructions starting at a given PC address, we build up a string containing the Javascript code that needs to be executed for each instruction starting at the given address. I terminate the trace after any branch instruction. I then wrap the code string up into a function and call eval() on it to get a real Javascript function I can execute to get the effect of the trace. From there the browser JIT will compile my function into real native machine code, which I can call over and over again, every time control reaches that address. We're effectively translating from the emulated CPU all the way down to bare-metal x86, and in just a couple of hundred lines of JS! (Ok, plus the JS engine :-).) This very hacked-together compiler runs 100M instructions in about 1.2 seconds. That's a 6x speedup over the interpreter, but more importantly it means I'm getting over 80MIPS of performance, in the browser! Insane!!!

The compiler is super-simple and it could probably be improved in many ways to get even better performance. The key optimization I implemented was to deal with the "cycle counter" my imaginary CPU has --- essentially an emulated interrupt timer that counts down once per instruction executed. I generate two versions of each trace --- one with cycle counter checks at each instruction, and another with all the checks removed and a simple guard at the top of the trace to ensure that the cycle counter won't fire before we reach the end of the trace; if the guard fails, we take the slow check-on-every-instruction path. The demo page shows the Javascript code that the compiler emits to do this.

Of course, there isn't anything special about JS here, you could do the same thing with Java or .NET bytecodes, or in other languages with "eval". Still, since there are a lot of browser-based emulators out there, it would be cool to see people try this technique. It's hard to predict how well it would do in a real emulator, but I have hopes. Real CPU instruction sets involve more complex instruction decoding than my toy, and the trace compiler is able to hide that.

Update I thought of a couple more simple optimizations overnight, plus Jesse suggested one, so I implemented them all in about 15 minutes. Developing compilers with an edit/reload cycle is fun!

  • Jesse suggested using "new Function(code)" instead of eval(). Total time now 1.05 seconds.
  • I wrapped the body of each function in "while (true)". Then, wherever the trace sets the new PC to the PC at the start of the trace (i.e., it's a simple loop), we can replace that with "continue;" to actually loop inside our trace function without returning to the trace dispatch loop. Total time now 0.8 seconds.
  • I turned the "memory" and "regs" arrays into JS typed arrays (using feature detection to fall back for older browsers). Total time now 0.65 seconds.

So performance about doubled, now we're at 150MIPS. That's enough for one day!

(For perspective, the maximum clock of my laptop CPU is about 3GHz, so we're executing one emulated instruction every 20 native processor cycles.That's amazing!)

Oh, and just for kicks, totally unscientific benchmarks of other browsers. IE9 beta 1: interpreter 7.3 seconds, trace compiler 2.8 seconds. Chrome 7.0.517.41: interpreter 6.7 seconds, trace compiler 1.6 seconds. Neither of those browsers support typed arrays. I should note that typed arrays sped up the interpreter a lot in Firefox, now about 3.3 seconds.