{"id":2214,"date":"2022-03-01T05:04:25","date_gmt":"2022-03-01T05:04:25","guid":{"rendered":"https:\/\/cml-a.com\/content\/?p=2214"},"modified":"2022-03-01T05:04:25","modified_gmt":"2022-03-01T05:04:25","slug":"is-setpixel-still-slow","status":"publish","type":"post","link":"https:\/\/cml-a.com\/content\/2022\/03\/01\/is-setpixel-still-slow\/","title":{"rendered":"Is SetPixel still slow?"},"content":{"rendered":"\n<p>Answer: mostly, yes. Explanation below.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Part 1: Yes or No<\/h2>\n\n\n\n<p>Remember GDI? Say you're using GDI and Win32, and you want to draw some graphics to a window. What to do. You read the documentation and see what looks like the most obvious thing: \"<a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/wingdi\/nf-wingdi-setpixel\">SetPixel<\/a>\". Sounds good. Takes an x and y and a color. What more could you want? Super easy to use.<\/p>\n\n\n\n<p>But then, you see a bunch of cautionary notes. \"It's slow.\" \"It's inefficient.\" \"Don't do it.\" <\/p>\n\n\n\n<p>Don't do it? <\/p>\n\n\n\n<p>Well. All these cautionary notes you see are from days of yore:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Computers are faster now. Both CPU and GPU. Take an early CS algorithms class, experiment with solutions. You\u2019ll see sometimes the biggest optimization you can do is to get a faster computer.<\/li><li>An earlier Windows graphics driver model. Say, <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3darticles\/graphics-apis-in-windows-vista#background\">XPDM not WDDM<\/a>. WDDM means all hardware-accelerated graphics communicate through a \u201cDirect3D-centric driver model\u201d, and yes that includes GDI. Changes in driver model can impose changes in performance characteristics.<\/li><li>Different Windows presentation model. That's something this API is set up to negotiate with, so it could affect performance too. Nowadays you're probably using DWM. <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/dwm\/dwm-overview\">DWM was introduced with Windows Vista<\/a>.<\/li><\/ul>\n\n\n\n<p>The date stamps give you skepticism. Is that old advice still true?<\/p>\n\n\n\n<p>As a personal aside, I've definitely seen performance advice from people on dev forums that is super outdated and people get mis-led into following it anyway. For example for writing C++ code, to \"manually turn your giant switch case into a jump table\". I see jump tables in my generated code after compilation... The advice was outdated because of how much compilers have improved. I've noticed a tendency to trust performance advice \"just in case\", without testing to see if it matters.<\/p>\n\n\n\n<p>Let us run some tests to see if SetPixel is still slow.<\/p>\n\n\n\n<p>I wrote a benchmark program to compare<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/wingdi\/nf-wingdi-setpixel\">SetPixel<\/a>, plotting each pixel of a window sequentially one by one, against<\/li><li><a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/wingdi\/nf-wingdi-setdibits\">SetDIBits<\/a>, where all pixels of a window are set from memory at once.<\/li><\/ul>\n\n\n\n<p>In each case the target is a top-level window, comparing like sizes. Each mode effectively clears the window. The window is cleared to a different color each time, so you have some confidence it\u2019s actually working.<\/p>\n\n\n\n<p>Timing uses good old <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/profileapi\/nf-profileapi-queryperformancecounter\">QPC<\/a>. For the sizes of timespans involved, it was not necessary to get something more accurate. The timed interval includes all the GDI commands needed to see the clear on the target, so for SetDIBits that includes one extra <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/wingdi\/nf-wingdi-bitblt\">BitBlt<\/a> from a memory bitmap to the target to keep things fair.<\/p>\n\n\n\n<p>The source code of this benchmark is <a href=\"https:\/\/github.com\/clandrew\/setpixelsperf\">here<\/a>. <\/p>\n\n\n\n<p>Here are the results<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"\"><tbody><tr><td>\n  <strong>Width<\/strong>\n  <\/td><td>\n  <strong>Height<\/strong>\n  <\/td><td>\n  <strong>Pixel Count<\/strong>\n  <\/td><td>\n  <strong>SetPixel<\/strong>\n  <\/td><td>\n  <strong>SetDIBits<\/strong>\n  <\/td><\/tr><tr><td>\n  1000\n  <\/td><td>\n  1000\n  <\/td><td>\n  1000000\n  <\/td><td>\n  4.96194\n  <\/td><td>\n  0.0048658\n  <\/td><\/tr><tr><td>\n  950\n  <\/td><td>\n  950\n  <\/td><td>\n  902500\n  <\/td><td>\n  4.7488\n  <\/td><td>\n  0.0042761\n  <\/td><\/tr><tr><td>\n  900\n  <\/td><td>\n  900\n  <\/td><td>\n  810000\n  <\/td><td>\n  4.22436\n  <\/td><td>\n  0.0038637\n  <\/td><\/tr><tr><td>\n  850\n  <\/td><td>\n  850\n  <\/td><td>\n  722500\n  <\/td><td>\n  3.71547\n  <\/td><td>\n  0.0034435\n  <\/td><\/tr><tr><td>\n  800\n  <\/td><td>\n  800\n  <\/td><td>\n  640000\n  <\/td><td>\n  3.34327\n  <\/td><td>\n  0.0030824\n  <\/td><\/tr><tr><td>\n  750\n  <\/td><td>\n  750\n  <\/td><td>\n  562500\n  <\/td><td>\n  2.92991\n  <\/td><td>\n  0.0026711\n  <\/td><\/tr><tr><td>\n  700\n  <\/td><td>\n  700\n  <\/td><td>\n  490000\n  <\/td><td>\n  2.56865\n  <\/td><td>\n  0.0023415\n  <\/td><\/tr><tr><td>\n  650\n  <\/td><td>\n  650\n  <\/td><td>\n  422500\n  <\/td><td>\n  2.21742\n  <\/td><td>\n  0.0022196\n  <\/td><\/tr><tr><td>\n  600\n  <\/td><td>\n  600\n  <\/td><td>\n  360000\n  <\/td><td>\n  1.83416\n  <\/td><td>\n  0.0017374\n  <\/td><\/tr><tr><td>\n  550\n  <\/td><td>\n  550\n  <\/td><td>\n  302500\n  <\/td><td>\n  1.57133\n  <\/td><td>\n  0.0015125\n  <\/td><\/tr><tr><td>\n  500\n  <\/td><td>\n  500\n  <\/td><td>\n  250000\n  <\/td><td>\n  1.29894\n  <\/td><td>\n  0.001311\n  <\/td><\/tr><tr><td>\n  450\n  <\/td><td>\n  450\n  <\/td><td>\n  202500\n  <\/td><td>\n  1.05838\n  <\/td><td>\n  0.0010062\n  <\/td><\/tr><tr><td>\n  400\n  <\/td><td>\n  400\n  <\/td><td>\n  160000\n  <\/td><td>\n  0.826351\n  <\/td><td>\n  0.0009907\n  <\/td><\/tr><tr><td>\n  350\n  <\/td><td>\n  350\n  <\/td><td>\n  122500\n  <\/td><td>\n  0.641522\n  <\/td><td>\n  0.0006527\n  <\/td><\/tr><tr><td>\n  300\n  <\/td><td>\n  300\n  <\/td><td>\n  90000\n  <\/td><td>\n  0.467687\n  <\/td><td>\n  0.0004657\n  <\/td><\/tr><tr><td>\n  250\n  <\/td><td>\n  250\n  <\/td><td>\n  62500\n  <\/td><td>\n  0.327808\n  <\/td><td>\n  0.0003364\n  <\/td><\/tr><tr><td>\n  200\n  <\/td><td>\n  200\n  <\/td><td>\n  40000\n  <\/td><td>\n  0.21523\n  <\/td><td>\n  0.0002422\n  <\/td><\/tr><tr><td>\n  150\n  <\/td><td>\n  150\n  <\/td><td>\n  22500\n  <\/td><td>\n  0.118702\n  <\/td><td>\n  0.0001515\n  <\/td><\/tr><tr><td>\n  100\n  <\/td><td>\n  100\n  <\/td><td>\n  10000\n  <\/td><td>\n  0.0542065\n  <\/td><td>\n  9.37E-05\n  <\/td><\/tr><tr><td>\n  75\n  <\/td><td>\n  75\n  <\/td><td>\n  5625\n  <\/td><td>\n  0.0315026\n  <\/td><td>\n  0.000122\n  <\/td><\/tr><tr><td>\n  50\n  <\/td><td>\n  50\n  <\/td><td>\n  2500\n  <\/td><td>\n  0.0143235\n  <\/td><td>\n  6.17E-05\n  <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Viewed as a graph:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/cml-a.com\/content\/wp-content\/uploads\/2022\/03\/image.png\" alt=\"\" class=\"wp-image-2215\" width=\"731\" height=\"656\"\/><\/figure>\n\n\n\n<p> Conclusion: yeah, SetDIBits is <strong>still<\/strong> way faster than SetPixel in general, in all cases.<\/p>\n\n\n\n<p>For small numbers of pixels, the difference doesn't matter\nas much. For setting lots of pixels, the difference is a lot.<\/p>\n\n\n\n<p>I tested this on an Intel Core i7-10700K, with {NVIDIA\nGeForce 1070 and WARP} with all similar results.<\/p>\n\n\n\n<p>So the old advice is still true. Don't use SetPixel, especially if you\u2019re setting a lot of pixels. Use something else like SetDIBits instead.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Part 2: Why<\/h2>\n\n\n\n<p>My benchmark told me that it\u2019s still slow, but the next question I had was \u2018why\u2019. I took a closer look and did some more thinking about why it could be.<\/p>\n\n\n\n<p>It's not one reason. There's multiple reasons.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. There's no DDI for SetPixel.<\/h3>\n\n\n\n<p>You can take a look through the public documentation for display devices interfaces, and see what\u2019s there. Or, take a stab at it and use the Windows Driver Kit and the provided documentation to write a display driver yourself. You\u2019ll see what\u2019s there. You\u2019ll see various things. You\u2019ll see various blit-related functions in winddi.h. For example, <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/winddi\/nf-winddi-drvbitblt\">DrvBitBlt<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>BOOL DrvBitBlt(\n  [in, out]      SURFOBJ  *psoTrg,\n  [in, optional] SURFOBJ  *psoSrc,\n  [in, optional] SURFOBJ  *psoMask,\n  [in]           CLIPOBJ  *pco,\n  [in, optional] XLATEOBJ *pxlo,\n  [in]           RECTL    *prclTrg,\n  [in, optional] POINTL   *pptlSrc,\n  [in, optional] POINTL   *pptlMask,\n  [in, optional] BRUSHOBJ *pbo,\n  [in, optional] POINTL   *pptlBrush,\n  [in]           ROP4     rop4\n);<\/code><\/pre>\n\n\n\n<p>That said, you may also notice what\u2019s <strong>not<\/strong> there. In particular, there\u2019s no DDI for SetPixel. Nothing simple like that, which takes an x, y, and color. It\u2019s important to relate this to the diagrams on the \u201cGraphics APIs in Windows\u201d <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3darticles\/graphics-apis-in-windows-vista#background\">article<\/a>, which shows that GDI talks to the driver for both XPDM and WDDM. It shows that every time you call SetPixel, then what the driver sees is actually far richer than that. It would get told about a brush, a mask, a clip. It\u2019s easy to imagine a cost to formulating all of those, since they you don\u2019t specify them at the API level and the API is structured so they can be arbitrary.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Cost of talking to the presentation model<\/h3>\n\n\n\n<p>There\u2019s a maybe-interesting experiment you can do. Write a\nWin32 application with your usual WM_PAINT handler. Run the application. Hide\nthe window behind other windows, then reveal it once again. Does your paint\nhandler get called? To reveal the newly-revealed area? <strong>No<\/strong>, normally it\ndoesn\u2019t. <\/p>\n\n\n\n<p>So what that must logically mean is that Windows kept some\nkind of buffer, or copy of your window contents somewhere. Seems like a good\nidea if you think about it. Would you really want moving windows around to be\nexecuting everyone\u2019s paint handlers all the time, including yours? Probably\nnot. It\u2019s the good old perf-memory tradeoff in favor of perf, and it seems\nworth it.<\/p>\n\n\n\n<p>Given that you\u2019re drawing to an intermediate buffer, then there\u2019s\nstill an extra step needed in copying this intermediate buffer to the final\ntarget. Which parts should be copied, and when? It seems wasteful to be copying\neverything all the time. To know what needs to get re-copied, logically there\nhas to be some notion of an <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/gdi\/the-update-region\">\u201cupdate\u201d\nregion<\/a>, or a \u201cdirty\u201d region. <\/p>\n\n\n\n<p>If you\u2019re an application, you might even want to\naggressively optimize and only paint the update region. Can you do that? At\nleast at one point, yes you could. The update region gets communicated&nbsp; to the application through WM_PAINT- see the\narticle \u201c<a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/gdi\/redrawing-in-the-update-region\">Redrawing\nin the Update Region<\/a>\u201d. There\u2019s a code example of clipping accordingly. Now,\nwhen I tried things out in my application I noticed that PAINTSTRUCT::rcPaint\nis always the full window, even in response to a small region invalidated with\nInvalidateRect, but the idea is at least formalized in the API there. <\/p>\n\n\n\n<p>Still, there\u2019s a cost to dealing with update regions. If you\nchange one pixel, that one-pixel area is part of the update region. Change the\npixel next to it, the region needs to be updated again. And so on. Could we\nhave gotten away with having a bigger, coarser update region? Maybe. You just\nnever know that at the time.<\/p>\n\n\n\n<p>If you had some way of pre-declaring which regions of the window you\u2019re going to change, (e.g., through a different API like BitBlt), then you wouldn\u2019t have this problem.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. Advancements in the presentation model help, but not enough<\/h3>\n\n\n\n<p>In Windows, there is <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/dwm\/dwm-overview\">DWM<\/a>-\nthe Desktop Window Manager. This went out with Windows Vista and brought about\nall kinds of performance improvements and opportunity for visual enhancements. <\/p>\n\n\n\n<p>Like the documentation says, DWM makes it possible to remove\nlevel of indirection (copying) when drawing contents of Windows. <\/p>\n\n\n\n<p>But it doesn\u2019t negate the fact that there still is tracking of update regions, and all the costs associated with that.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4. Advancements in driver model help, but not enough<\/h3>\n\n\n\n<p>DWM and Direct3D, as components that talk to the driver\nthrough the 3D stack, have a notion of \u201cframes\u201d and a particular time at which\nwork is \u201cflushed\u201d to the GPU device. <\/p>\n\n\n\n<p>By contrast, GDI doesn\u2019t have the concept of \u201cframes\u201d or flushing anything. Closest thing would be the release of the GDI device context, but that\u2019s not strictly treated as a sign to flush. You can see it yourself in how your Win32 GDI applications are structured. You draw in response to <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/gdi\/wm-paint\">WM_PAINT<\/a>. Yes there is <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/winuser\/nf-winuser-endpaint\">EndPaint<\/a>, but EndPaint doesn\u2019t flush your stuff. Try it if you want- comment out EndPaint. I tried it just to check and everything still works without it. <\/p>\n\n\n\n<p>Since there isn\u2019t a strict notion of \u201cflushing to the device\u201d, SetPixel pixels have to be dispatched basically immediately rather than batched up. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5. 3D acceleration helps, but not enough<\/h3>\n\n\n\n<p>Nowadays, GDI blits are indeed 3D accelerated. <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>They <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows-hardware\/drivers\/display\/display-hardware-acceleration-slider\">were<\/a>, <\/li><li>then they <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct2d\/comparing-direct2d-and-gdi#availability-of-hardware-acceleration\">weren\u2019t for Vista<\/a>, <\/li><li>then they <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows-hardware\/drivers\/display\/gdi-hardware-acceleration\">were again<\/a>.<\/li><\/ul>\n\n\n\n<p>I noticed this firsthand, too. Very lazy way to check- in the \u201cPerformance\u201d tab in Task manager when I was testing my application, I saw little blips in the 3D queue. These coincided with activity in the SetPixel micro-benchmark.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/cml-a.com\/content\/wp-content\/uploads\/2022\/03\/image-1.png\" alt=\"\" class=\"wp-image-2216\"\/><\/figure>\n\n\n\n<p>Again, very lazy check. Good to know we are still\naccelerating these 2D blits, even as the graphics stack has advanced to a point\nof making 3D graphics a first-class citizen. Hardware acceleration is great for\na lot of things, like copying large amounts of memory around at once, applying\ncompression or decompression, or manipulating data in other ways that lend\nitself to data-parallelism. <\/p>\n\n\n\n<p>Unfortunately, literally none of that helps this scenario.\nParallelism? How? At a given time, the driver doesn\u2019t know if you\u2019re done\nplotting or what you will plot next or where. And it can\u2019t buffer up the\noperations and execute them together, because it, like Windows, doesn\u2019t know\nwhen you\u2019re done. Maybe, it could use some heuristic.<\/p>\n\n\n\n<p>But that brings this to the punchline: even if the driver had psychic powers, it could see into the future and know exactly what the application is going to do and did an absolutely perfect job of coalescing neighboring blits together, it doesn\u2019t negate any of the above costs, especially 1. and 2. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Even in the current year, don\u2019t use SetPixel for more than a handful of pixels. There\u2019s reasons to believe the <strong>sources of the<\/strong> <strong>bottlenecks<\/strong> to have changed over 30 years, yet even still the <strong>result<\/strong> is the same. It\u2019s slow and the old advice is still true.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Epilogue: some fantasy world<\/h2>\n\n\n\n<p>This post was about how things are. But, what could be? What would it take for SetPixel not to be slow? The most tempting way to think about this is to flatten or punch holes through the software stack. That works, even if it feels like a cop-out.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Answer: mostly, yes. Explanation below. Part 1: Yes or No Remember GDI? Say you&#8217;re using GDI and Win32, and you want to draw some graphics to a window. What to do. You read the documentation and see what looks like the most obvious thing: &#8220;SetPixel&#8221;. Sounds good. Takes an x and y and a color. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[178,222,224],"class_list":["post-2214","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-software-development-project","tag-win32","tag-windows"],"_links":{"self":[{"href":"https:\/\/cml-a.com\/content\/wp-json\/wp\/v2\/posts\/2214","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cml-a.com\/content\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cml-a.com\/content\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cml-a.com\/content\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cml-a.com\/content\/wp-json\/wp\/v2\/comments?post=2214"}],"version-history":[{"count":0,"href":"https:\/\/cml-a.com\/content\/wp-json\/wp\/v2\/posts\/2214\/revisions"}],"wp:attachment":[{"href":"https:\/\/cml-a.com\/content\/wp-json\/wp\/v2\/media?parent=2214"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cml-a.com\/content\/wp-json\/wp\/v2\/categories?post=2214"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cml-a.com\/content\/wp-json\/wp\/v2\/tags?post=2214"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}