Or: Why XMir is slower than X, and how we'll fix it
We've had a bunch of testing of XMir now; plenty of bugs, and plenty of missing functionality.
One of the bugs that people have noticed is a 10-20% performance drop over raw X. This is really several bits of missing functionality - we're doing a lot more work than we need to be. Oddly enough, people have also been mentioning that it feels "smoother" - which might be placebo, or unrelated updates, or might be to do with something in the Mir/XMir stack. It's hard to tell; it's hard to measure "smoother". We're not faster, but faster is not the same as smoother.
Currently we do a lot of work in submitting rendering from an X client to the screen, most of which we can make unnecessary.
The simple bit
The simple part is composite bypass support for Mir - most of the time unity-system-compositor does not need to do any compositing - there's just a single full-screen XMir window, and Mir just needs to flip that to the display. This is in progress. This cuts out an unnecessary fullscreen blit.
The complicated part is in XMir itself
The fundamental problem is the mismatch between rendering models - X wants the contents of buffers to be persistent; Mir has a GLish new-buffer-each-frame. This means each time XMir gets a new buffer from Mir it needs to blit the previous frame on first, and can't simply render straight to Mir's buffer. Now, we can (but don't yet) reduce the size of this blit by tracking what's changed since XMir last saw the buffer - and a lot of the time that's going to be a lot smaller than fullscreen - but there's still some overhead¹.
Fortunately, there's an way around this. GLX matches Mir's buffer semantics nicely - each time a client SwapBuffers it gets a shiny new backbuffer to render into. So, rather like Compiz's unredirect-fullscreen-windows option, if we've got a fullscreen² GLX window we can hand the buffer received from Mir directly to the client and avoid the copy.
Even better, this doesn't apply only to fullscreen games - GNOME Shell, KWin, and Unity are all fullscreen GLX applications.
As always, there are interesting complications - applications can draw on their GL window with X calls, and applications can try to be fancy and only update a part of their frontbuffer rather than calling SwapBuffers; in either case we can't bypass. Unity does neither, but Shell and KWin might.
Enter the cursor
In addition to the two unnecessary fullscreen blits - X root window to Mir buffer, Mir buffer to framebuffer - XMir currently uses X's software cursor code. This causes two problems. Firstly, it means we're doing X11 drawing on top of whatever's underneath, so we can't do the SwapBuffers trick. Secondly, it causes a software fallback whenever you move the cursor, making the driver download the root window into CPU accessible memory, do some CPU twiddling, and then upload again to GPU memory. This is bad, but not terrible, for Intel chips where the GPU and CPU share the same memory but with different caches and layouts. It's terrible for cards with discrete memory. Both these problems go away once we support setting the HW cursor image in Mir.
Once those three pieces land there shouldn't be a meaningful performance difference between XMir-on-Mir and X-on-the-hardware.
¹: If we implemented a single-buffer scheme in Mir we could get rid of this entirely at the cost of either losing vsync or blocking X rendering until vsync. That's probably not a good tradeoff.
²: Technically, if we've got a GLX client whose size matches that of the underlying Mir buffer. For the moment, that means "fullscreen", but when we do rootless XMir for 14.04 all windows will be backed by a Mir buffer of the same size.