Skip navigation

batchbuffer at 0x08ac0000:
0x08ac0000: HEAD 0xff808698: UNKNOWN
0x08ac0004: 0xff808698: UNKNOWN
0x08ac0008: 0xff808698: UNKNOWN
0x08ac000c: 0xff808698: UNKNOWN
0x08ac0010: 0xff808698: UNKNOWN
0x08ac0014: 0xff808698: UNKNOWN
0x08ac0018: 0xff808698: UNKNOWN
0x08ac001c: 0xff808698: UNKNOWN
0x08ac0020: 0xff808698: UNKNOWN
0x08ac0024: 0xff808698: UNKNOWN
0x08ac0028: 0x00484c58: MI_NOOP

Last instruction on ringbuffer:
0x007da9f8: 0x18800080: MI_BATCH_BUFFER_START
0x007da9fc: 0x08ac0001: dword 1
0x007daa00: 0x02000004: MI_FLUSH
0x007daa04: 0x00000000: MI_NOOP

We tried to execute an image surface? How did that happen? Time to go shopping.

So how do we diagnose bugs in the drivers when they all result in either a system crash or just a frozen display?

If it is a GPU hang (check dmesg or Xorg.log for a telltale error message), then the debugging process is gradually getting easier. First the kernel detects that the GPU has stopped processing data. It does this with a simple watch dog that polls the GPU for any activity since the last check. If the active head pointer has not advanced in the last two checks, we declare it hung and attempt to reset the GPU. Currently the reset procedure only works for the more recent chips, i965 and above, but in principle we might be able to recover at least the i915 generation as well. Importantly, just before we reset the GPU state, we record what instructions the GPU was executing at the time of the hang and some associated state. At present, this involves scanning the ringbuffer and list of known buffers for the last batch buffer executed and for wherever the GPU ended up. Those buffers are then copied, preserving them for later inspection, and the reset procedure is begun.

Some time afterwards it becomes apparent that the system is misbehaving. Unfortunately this has a tendency to be a frozen X server or even a crash, though the drivers are beginning to grow a little more resilient to their own bugs. Once noticed, we can copy the error report from /sys/kernel/debug/dri/0/i915_error_state (you might have to run “mount -tdebugfs none /sys/kernel/debug” to gain access to the debug information exported by your kernel, your distribution may even automate this by listening for the whimpers from the kernel) and begin a diagnosis.

Lets turn to a real example that I am currently puzzling over.

Running the i915_error_state through intel_error_decode (from the intel-gpu-tools package) prints:

Time: 1278512451 s 982711 us
PCI ID: 0x27a2
EIR: 0x00000000
PGTBL_ER: 0x00000000
INSTPM: 0x00000000
IPEIR: 0x00000000
IPEHR: 0x02000011
INSTDONE: 0x7ffffa41
busy: Bypass FIFO
busy: Color calculator
busy: Map filter
ACTHD: 0x0de88144
seqno: 0x0004d8fe
Buffers [40]:
0e33e000 16384 00000060 00000000 0004d8ff dirty purgeable
09747000 4096 00000003 00000000 0004d8ff
0de88000 16384 00000048 00000000 0004d8ff dirty purgeable
03dc4000 4096 00000006 00000000 0004d900 dirty purgeable
0e20a000 393216 00000002 00000000 0004d900 dirty purgeable
0e3a0000 16384 00000006 00000000 0004d900 dirty purgeable
01800000 8388608 00000002 00000000 0004d900 P X dirty (name: 64) (fence: 2)
01000000 8388608 00000006 00000000 0004d901 P X dirty (name: 1) (fence: 0)
0e360000 16384 00000060 00000000 0004d901 dirty purgeable
03dc5000 4096 00000002 00000002 0004d901 dirty purgeable
0e374000 16384 00000048 00000000 0004d901 dirty purgeable
03e00000 4096 00000002 00000002 0004d902 dirty
03dff000 4096 00000044 00000000 0004d902 dirty
07a74000 16384 00000060 00000000 0004d902 dirty purgeable
0e358000 16384 00000048 00000000 0004d902 dirty purgeable
0e335000 4096 00000045 00000000 0004d903 dirty
0e336000 4096 00000045 00000000 0004d903 dirty
03dc6000 4096 00000006 00000002 0004d903 dirty
0e35c000 16384 00000060 00000000 0004d903 dirty purgeable
03dc7000 229376 00000002 00000002 0004d903 dirty
0e3e3000 16384 00000048 00000000 0004d903 dirty purgeable
097ce000 229376 00000002 00000002 0004d904 dirty
0e37c000 16384 00000060 00000000 0004d904 dirty
0e388000 16384 00000048 00000000 0004d904 dirty
04800000 8388608 00000002 00000002 0004d905 P X dirty (name: 3) (fence: 6)
04000000 8388608 00000005 00000000 0004d905 X (name: 10)
07b81000 32768 00000060 00000000 0004d905 dirty purgeable
03548000 327680 00000006 00000000 0004d905 dirty (name: 8)
02193000 163840 00000006 00000000 0004d905 dirty (name: 12)
021bb000 163840 00000006 00000000 0004d905 dirty (name: 15)
03598000 196608 00000006 00000000 0004d905 dirty (name: 18)
0a000000 8388608 00000005 00000000 0004d905 X (name: 19)
0be87000 327680 00000006 00000000 0004d905 dirty (name: 63)
0be4f000 114688 00000006 00000000 0004d905 dirty (name: 79)
0be6b000 114688 00000006 00000000 0004d905 dirty (name: 82)
0f000000 8388608 00000005 00000000 0004d905 X (name: 86)
0fe86000 327680 00000006 00000000 0004d905 dirty (name: 49)
0fe36000 163840 00000006 00000000 0004d905 dirty (name: 52)
0fe5e000 163840 00000006 00000000 0004d905 dirty (name: 55)
07bdf000 16384 00000048 00000000 0004d905 dirty purgeable
batchbuffer at 0x0de88000:
0x0de88000: 0x66014140: 3DSTATE_ANTI_ALIASING
0x0de88004: 0x6ba008a1: 3DSTATE_INDEPENDENT_ALPHA_BLEND
0x0de88008: 0x7d990000: 3DSTATE_DEFAULT_DIFFUSE
0x0de8800c: 0x00000000: dword 1
0x0de88010: 0x7d9a0000: 3DSTATE_DEFAULT_SPECULAR
0x0de88014: 0x00000000: dword 1
0x0de88018: 0x7d980000: 3DSTATE_DEFAULT_Z
0x0de8801c: 0x00000000: dword 1
0x0de88020: 0x76fac688: 3DSTATE_COORD_SET_BINDINGS
0x0de88024: 0x6700a770: 3DSTATE_RASTERIZATION_RULES
0x0de88028: 0x6db3ffff: 3DSTATE_MODES_4
0x0de8802c: 0x7d040382: 3DSTATE_LOAD_STATE_IMMEDIATE_1
0x0de88030: 0x00000000: S3
0x0de88034: 0x009020c0: S4
0x0de88038: 0x00000000: S5
0x0de8803c: 0x7c800002: 3DSTATE_SCISSOR_ENABLE
0x0de88040: 0x7d810001: 3DSTATE_SCISSOR_RECTANGLE
0x0de88044: 0x00000000: dword 1
0x0de88048: 0x00000000: dword 2
0x0de8804c: 0x7c880002: 3DSTATE_DEPTH_SUBRECTANGLE_DISABLE
0x0de88050: 0x7d070000: 3DSTATE_LOAD_INDIRECT
0x0de88054: 0x00000000: MI_NOOP
0x0de88058: 0x7d830000: 3DSTATE_SPAN_STIPPLE
0x0de8805c: 0x00000000: dword 1
0x0de88060: 0x68000002: 3DSTATE_BACKFACE_STENCIL_OPS
0x0de88064: 0x00000000: MI_NOOP
0x0de88068: 0x7d000003: 3DSTATE_MAP_STATE
0x0de8806c: 0x00000001: mask
0x0de88070: 0x01000000: map 0 MS2
0x0de88074: 0x63f3fd92: map 0 MS3 [width=1280, height=800, tiling=X]
0x0de88078: 0xffe00000: map 0 MS4 [pitch=8192]
0x0de8807c: 0x7d010003: 3DSTATE_SAMPLER_STATE
0x0de88080: 0x00000001: mask
0x0de88084: 0x00000000: sampler 0 SS2
0x0de88088: 0x00004820: sampler 0 SS3
0x0de8808c: 0x00000000: sampler 0 SS4
0x0de88090: 0x7d8e0001: 3DSTATE_BUFFER_INFO
0x0de88094: 0x03000040: dword 1
0x0de88098: 0x03dc5000: dword 2
0x0de8809c: 0x7d850000: 3DSTATE_DEST_BUFFER_VARIABLES
0x0de880a0: 0x00880300: a8r8g8b8 format, early Z disabled
0x0de880a4: 0x7d800003: 3DSTATE_DRAWING_RECTANGLE
0x0de880a8: 0x00000000: dword 1
0x0de880ac: 0x00000000: dword 2
0x0de880b0: 0x00000000: dword 3
0x0de880b4: 0x00000000: dword 4
0x0de880b8: 0x7d040441: 3DSTATE_LOAD_STATE_IMMEDIATE_1
0x0de880bc: 0xfffffff0: S2
0x0de880c0: 0x00008214: S6
0x0de880c4: 0x7d050008: 3DSTATE_PIXEL_SHADER_PROGRAM
0x0de880c8: 0x19083c00: PS000: DCL T0.xyzw
0x0de880cc: 0x00000000: PS000
0x0de880d0: 0x00000000: PS000
0x0de880d4: 0x19180000: PS001: DCL S0 2D
0x0de880d8: 0x00000000: PS001
0x0de880dc: 0x00000000: PS001
0x0de880e0: 0x15200000: PS002: TEXLD oC, S0, T0
0x0de880e4: 0x01000000: PS002
0x0de880e8: 0x00000000: PS002
0x0de880ec: 0x7d040031: 3DSTATE_LOAD_STATE_IMMEDIATE_1
0x0de880f0: 0x0e33e000: S0
0x0de880f4: 0x04040000: S1
0x0de880f8: 0x7f9c0003: 3DPRIMITIVE sequential indirect RECTLIST, 3 starting from 0
0x0de880fc: 0x00000000: start
0x0de88100: 0x54f00006: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src tile 0, dst tile 0)
0x0de88104: 0x03cc0080: format 8888, dst pitch 128, clipping disabled
0x0de88108: 0x00000000: dst (0,0)
0x0de8810c: 0x00160016: dst (22,22)
0x0de88110: 0x03dc6000: dst offset 0x03dc6000
0x0de88114: 0x00090396: src (918,9)
0x0de88118: 0x00000040: src pitch 64
0x0de8811c: 0x03dc5000: src offset 0x03dc5000
0x0de88120: 0x54f00006: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src tile 0, dst tile 0)
0x0de88124: 0x03cc0080: format 8888, dst pitch 128, clipping disabled
0x0de88128: 0x00000000: dst (0,0)
0x0de8812c: 0x00160016: dst (22,22)
0x0de88130: 0x03dc6000: dst offset 0x03dc6000
0x0de88134: 0x00000000: src (0,0)
0x0de88138: 0x00000080: src pitch 128
0x0de8813c: 0x09747000: src offset 0x09747000
0x0de88140: 0x02000011: MI_FLUSH
0x0de88144: HEAD 0x05000000: MI_BATCH_BUFFER_END

The real question is what exactly that tells us and do we get any closer to finding the bug?
As can be seen, not a lot happens in this batch buffer. In fact just 3 operations. One copy using the 3D pipeline, with a lot of preamble, and two copies using the 2D blitter. If you look even closer, you will go mad.

Operation 1: 3D copy.
Destination: linear buffer 03dc5000, size 4096 bytes, pitch 64 bytes, drawing rectangle (0,0), (1, 1)
Source: X tiled buffer 01000000 size 8388608 bytes, pitch 8192 bytes, 1280×800 pixels. This looks like X’s front buffer.
What is actually copied to the 1×1 pixel is a mystery as we do not follow auxiliary buffers yet and so can not decode the vertices used. Presumably it is something a little more complicated than a mere blit as a texture sampler is required – though this may just be something as trivial as ensuring the alpha channel is initialized.

Operation 2: 2D copy.
Destination: linear buffer 03dc6000, size 4096 bytes, pitch 128 bytes.
Source: linear buffer 03dc5000, size 4096 bytes, pitch 64. From above we know its size to be 1×1 pixels.
Copy: (918, 9), (940, 31) to (0, 0), (22, 22).

Operation 3: 2D copy.
Destination: linear buffer 03dc6000, size 4096 bytes, pitch 128 bytes.
Source: linear buffer 09747000, size 4096 bytes, pitch 128 bytes.
Copy: (0, 0), (22, 22) to (0, 0), (22, 22).

So which operation appears entirely superfluous and yet accessed data far beyond the end of its buffer? That’s right the second copy. For a 1×1 surface, attempting to copy a 22×22 region is a bit strange, doing it 900 pixels outside the surface is just adding insult to injury. Not to mention that the user did not even want the data that we spent so much time fetching!

But where does that leave me? The impossible happened and now I have evidence to prove it! Oh, and a loathing for graphics libraries…

gstreamer + cairo == awesome

For the last few days, I’ve been listening to the gstreamer devs get excited about cairo, simply because it is awesome! Well actually because over the last couple of years the drivers have developed sufficiently that the abstraction offered by cairo solves several of the problems that the gstreamer developers have had to previously hacked around. This means that they get to dump all of their multi-threaded decoding and rendering problems on us, which makes them very happy.

In return for the extra work, we promise to make it fast as well. At the base we will be [eventually] including a jit into pixman, so that we can generate optimised code at runtime for the more obscure colorspace conversions without bloating the library by compiling thousands upon thousands of pre-computed, even hand-rolled, fast paths. But pixman is not the interface we want to present to applications, so we have an abstraction layer over that which allows applications to target GPUs, DSPs or the CPU with nary a care in the word – this is cairo. From the discussions we had, it emerged that to resolve many of the locking issues and general thread-safety concerns within gstreamer, required exposing some of the support cairo uses internally to reach its own thread-safety guarantees. However, not all backends are created equal and we will also need to fixes to the underlying libraries. For X11, this means we need to switch to xcb and for GL we need a new GLX extension to permit more efficient multi-threading.

Beyond that Benjamin is pressing for an aggressive timeline, cairo-1.10 in January so that we can make the next round of stable distributions. As my current development branch has over 40k lines of changes, with much more work required, this is quite a challenging schedule. Compared to that adding the extra interfaces and fixing the inevitable bugs is a tiny amount of work!

Just an update incorporating suggestions into the presentation of the graphs – the principal change being to trim outliers from the relative performance graph. Let me know what you think.
cairo-perf-chart-absolute
cairo-perf-chart-relative

[graphs uploaded outside of wordpess to avoid rescaling]

As nobody appreciated the relative factors as a performance metric, here are the absolute times for tiny.
cairo-perf-chart
cairo-perf-chart-relative

image xlib xvfb gl drm
evolution 69.9 74.8 186.4 146.3 103.6
firefox-planet-gnome 217.5 170.8 295.1 359.9 100.1
firefox-talos-gfx 342.1 193.5 305.1 645.3 48.9
firefox-talos-svg 375.3 684.8 621.1 709.7 876.1
gnome-system-monitor 39.6 149.6 64.5 348.8 19.3
gvim 167.8 90.5 295.9 262.0 148.9
poppler 41.5 69.8 50.4 138.9 14.4
swfdec-giant-steps 20.7 25.2 28.1 54.7 13.3
swfdec-youtube 41.3 39.5 52.4 46.0 11.8
vim 97.5 115.6 93.9 160.3 20.8

Times are in seconds.

[image] 1.9.2-564-g1b24626.image.tiny
[xlib] 1.9.2-564-g1b24626.xlib.tiny
[xvfb] 1.9.2-564-g1b24626.xvfb.tiny
[gl] 1.9.2-564-g1b24626.gl.tiny
[drm] 1.9.2-564-g1b24626.drm.tiny

My first hands-on with an i965. Running from a live image as I haven’t got a spare hard-drive to dual-boot yet. Aside from that and that cairo-drm on an i965 is completely unaccelerated (so the overhead of this backend is entirely down to buffer management). Caveat lector.
cairo-perf-chart

image drm xlib
evolution 0.0 -497.9 -594.1
firefox-planet-gnome 0.0 -157.0 -98.7
firefox-talos-gfx 0.0 -6.1 -139.7
firefox-talos-svg 0.0 -12.0 -105.4
gnome-system-monitor 0.0 -7.9 -9.4
gnome-terminal-vim 0.0 -11.2 -579.6
gvim 0.0 -658.0 -583.7
poppler 0.0 -0.2 -579.0
swfdec-giant-steps 0.0 -13.1 -169.3
swfdec-youtube 0.0 -4.5 -39.1

[image] 1.9.2-457-g3bc00af.image.karmic-alpha4
[drm] 1.9.2-457-g3bc00af.drm.karmic-alpha4
[xlib] 1.9.2-457-g3bc00af.xlib.karmic-alpha4

Hmm, I think there is room for improvement here.

Or, “why we don’t use glitz anymore”. Adding a cairo-glitz run on tiny (an i915):
cairo-perf-chart

image xlib drm gl glitz
epiphany-20090810 0.0 -106.5 -19.8 -235.8 -741.1
evolution-20090607 0.0 -100.3 -130.6 -440.6 -4150.4
evolution-20090618 0.0 -61.9 -69.4 -380.3
firefox-20090601 0.0 -103.5
firefox-periodic-table 0.0 -92.3 20.6 -228.6 -927.3
firefox-talos-gfx-20090702 0.0 8.3 381.6 -207.1 -230.5
firefox-world-map 0.0 -186.2 27.0 -51.1 -259.6
gnome-terminal-20090601 0.0 -29.7 207.9 -287.3 -262.0
gnome-terminal-20090728 0.0 60.0 406.7 -70.5 -129.4
poppler-20090811 0.0 -70.2 132.4 -344.2 -322.4
poppler-bug-12266 0.0 -101.6 59.0 -146.5
swfdec-fill-rate 0.0 -112.9 36.3 30.1
swfdec-fill-rate-2xaa 0.0 -69.3 345.1 45.7 -5498.1
swfdec-fill-rate-4xaa 0.0 -244.4 0.6 -3.2
swfdec-giant-steps 0.0 -41.7 -66.6 -260.8 -275.6
swfdec-youtube 0.0 12.1 192.9 7.6 -81.4

[image] 1.9.2-505-g2e9cad3.tiny
[xlib] 1.9.2-505-g2e9cad3.xlib.tiny
[drm] 1.9.2-525-g8c7de80.drm.tiny
[gl] 1.9.2-525-g8c7de80.gl.tiny
[glitz] 1.9.2-564-g1b24626.glitz.tiny

And some performance results for a really old machine, this must be almost 5 years old. 😉
cairo-perf-chart

image xlib
evolution 0.0 -179.1
firefox-planet-gnome 0.0 -67.9
firefox-talos-gfx 0.0 9.1
firefox-talos-svg 0.0 -376.1
gnome-system-monitor 0.0 -112.3
gvim 0.0 140.1
poppler 0.0 -84.3
swfdec-giant-steps 0.0 -151.3
swfdec-youtube 0.0 -88.6
vim 0.0 -12.1

[image] 1.9.2-560-g221285f.image.inspired
[xlib] 1.9.2-560-g221285f.xlib.inspired

Another snapshot of performance this time from minime (a G4 mini-mac with a Radeon 200 GPU):
cairo-perf-chart

image xlib gl
epiphany-20090810 0.0 -94.1 -1777.1
evolution-20090607 0.0 -704.5 -4270.2
evolution-20090618 0.0 -146.2 -3470.6
firefox-20090601 0.0 -142.3 -1326.4
firefox-periodic-table 0.0 -26.2 -819.2
firefox-talos-gfx-20090702 0.0 32.7 -358.7
firefox-talos-svg-20090702 0.0 -49.7 -280.5
firefox-world-map 0.0 -107.9 -1138.3
gnome-terminal-20090601 0.0 20.9 -1374.8
gnome-terminal-20090728 0.0 39.8 -1941.1
poppler-20090811 0.0 -10.7 -1192.6
poppler-bug-12266 0.0 -30.3 -416.5
swfdec-fill-rate 0.0 -50.2 -843.3
swfdec-fill-rate-2xaa 0.0 -45.7 -1569.2
swfdec-fill-rate-4xaa 0.0 -84.0 -1724.1
swfdec-giant-steps 0.0 -18.3 -3232.6
swfdec-youtube 0.0 -14.0 -1100.5

[image] 1.9.2-507-g0136989.image.minime
[xlib] 1.9.2-507-g0136989.xlib.minime
[gl] 1.9.2-507-g0136989.gl.minime

Oh dear, the performance is truly dire.

This time comparing different backends using the ‘fast’ wip/stroke-to-path branch:
cairo-perf-chart

image xlib drm gl
epiphany-20090810 0.0 -106.5 -19.8 -235.8
evolution-20090607 0.0 -100.3 -130.6 -440.6
evolution-20090618 0.0 -61.9 -69.4 -380.3
firefox-20090601 0.0 -103.5
firefox-periodic-table 0.0 -92.3 20.6 -228.6
firefox-talos-gfx-20090702 0.0 8.3 381.6 -207.1
firefox-world-map 0.0 -186.2 27.0 -51.1
gnome-terminal-20090601 0.0 -29.7 207.9 -287.3
gnome-terminal-20090728 0.0 60.0 406.7 -70.5
poppler-20090811 0.0 -70.2 132.4 -344.2
poppler-bug-12266 0.0 -101.6 59.0
swfdec-fill-rate 0.0 -112.9 36.3 30.1
swfdec-fill-rate-2xaa 0.0 -69.3 345.1 45.7
swfdec-fill-rate-4xaa 0.0 -244.4 0.6 -3.2
swfdec-giant-steps 0.0 -41.7 -66.6 -260.8
swfdec-youtube 0.0 12.1 192.9 7.6

[image] 1.9.2-505-g2e9cad3.tiny
[xlib] 1.9.2-505-g2e9cad3.xlib.tiny
[drm] 1.9.2-525-g8c7de80.drm.tiny
[gl] 1.9.2-525-g8c7de80.gl.tiny

As always there is more work to do.

Design a site like this with WordPress.com
Get started