C# How to Improve Efficiency in Direct2D Drawing - c#
Good morning,
I have been teaching myself a bit of Direct2D programming in C#, utilizing native wrappers that are available (currently using d2dSharp, but have also tried SharpDX). I'm running into problems with efficiency, though, where the basic drawing Direct2D drawing methods are taking approximately 250 ms to draw 45,000 basic polygons. The performance I am seeing is on par, or even slower than, Windows GDI+. I'm hoping that someone can take a look at what I've done and propose a way(s) that I can dramatically improve the time it takes to draw.
The background to this is that I have a personal project in which I am developing a basic but functional CAD interface capable of performing a variety of tasks, including 2D finite element analysis. In order to make it at all useful, the interface needs to be able to display tens-of-thousands of primitive elements (polygons, circles, rectangles, points, arcs, etc.).
I initially wrote the drawing methods using Windows GDI+ (System.Drawing), and performance is pretty good until I reach about 3,000 elements on screen at any given time. The screen must be updated any time the user pans, zooms, draws new elements, deletes elements, moves, rotates, etc. Now, in order to improve efficiency, I utilize a quad tree data structure to store my elements, and I only draw elements that actually fall within the bounds of the control window. This helped significantly when zoomed in, but obviously, when fully zoomed out and displaying all elements, it makes no difference. I also use a timer and tick events to update the screen at the refresh rate (60 Hz), so I'm not trying to update thousands of times per second or on every mouse event.
This is my first time programming with DirectX and Direct2D, so I'm definitely learning here. That being said, I've spent days reviewing tutorials, examples, and forums, and could not find much that helped. I've tried a dozen different methods of drawing, pre-processing, multi-threading, etc. My code is below
Code to Loop Through and Draw Elements
List<IDrawingElement> elementsInBounds = GetElementsInDraftingWindow();
_d2dContainer.Target.BeginDraw();
_d2dContainer.Target.Clear(ColorD2D.FromKnown(Colors.White, 1));
if (elementsInBounds.Count > 0)
{
Stopwatch watch = new Stopwatch();
watch.Start();
#region Using Drawing Element DrawDX Method
foreach (IDrawingElement elem in elementsInBounds)
{
elem.DrawDX(ref _d2dContainer.Target, ref _d2dContainer.Factory, ZeroPoint, DrawingScale, _selectedElementBrush, _selectedElementPointBrush);
}
#endregion
watch.Stop();
double drawingTime = watch.ElapsedMilliseconds;
Console.WriteLine("DirectX drawing time = " + drawingTime);
watch.Reset();
watch.Start();
Matrix3x2 scale = Matrix3x2.Scale(new SizeFD2D((float)DrawingScale, (float)DrawingScale), new PointFD2D(0, 0));
Matrix3x2 translate = Matrix3x2.Translation((float)ZeroPoint.X, (float)ZeroPoint.Y);
_d2dContainer.Target.Transform = scale * translate;
watch.Stop();
double transformTime = watch.ElapsedMilliseconds;
Console.WriteLine("DirectX transform time = " + transformTime);
}
DrawDX Function for Polygon
public override void DrawDX(ref WindowRenderTarget rt, ref Direct2DFactory fac, Point zeroPoint, double drawingScale, SolidColorBrush selectedLineBrush, SolidColorBrush selectedPointBrush)
{
if (_pathGeometry == null)
{
CreatePathGeometry(ref fac);
}
float brushWidth = (float)(Layer.Width / (drawingScale));
brushWidth = (float)(brushWidth * 2);
if (Selected == false)
{
rt.DrawGeometry(Layer.Direct2DBrush, brushWidth, _pathGeometry);
//Note that _pathGeometry is a PathGeometry
}
else
{
rt.DrawGeometry(selectedLineBrush, brushWidth, _pathGeometry);
}
}
Code to Create Direct2D Factory & Render Target
private void CreateD2DResources(float dpiX, float dpiY)
{
Factory = Direct2DFactory.CreateFactory(FactoryType.SingleThreaded, DebugLevel.None, FactoryVersion.Auto);
RenderTargetProperties props = new RenderTargetProperties(
RenderTargetType.Default, new PixelFormat(DxgiFormat.B8G8R8A8_UNORM,
AlphaMode.Premultiplied), dpiX, dpiY, RenderTargetUsage.None, FeatureLevel.Default);
Target = Factory.CreateWindowRenderTarget(_targetPanel, PresentOptions.None, props);
Target.AntialiasMode = AntialiasMode.Aliased;
if (_selectionBoxLeftStrokeStyle != null)
{
_selectionBoxLeftStrokeStyle.Dispose();
}
_selectionBoxLeftStrokeStyle = Factory.CreateStrokeStyle(new StrokeStyleProperties1(LineCapStyle.Flat,
LineCapStyle.Flat, LineCapStyle.Flat, LineJoin.Bevel, 10, DashStyle.Dash, 0, StrokeTransformType.Normal), null);
}
I create a Direct2D factory and render target once and keep references to them at all times (that way I'm not recreating each time). I also create all of the brushes when the drawing layer (which describes color, width, etc.) is created. As such, I am not creating a new brush every time I draw, simply referencing a brush that already exists. Same with the geometry, as can be seen in the second code-snippet. I create the geometry once, and only update the geometry if the element itself is moved or rotated. Otherwise, I simply apply a transform to the render target after drawing.
Based on my stopwatches, the time taken to loop through and call the elem.DrawDX methods takes about 225-250 ms (for 45,000 polygons). The time taken to apply the transform is 0-1 ms, so it appears that the bottleneck is in the RenderTarget.DrawGeometry() function.
I've done the same tests with RenderTarget.DrawEllipse() or RenderTarget.DrawRectangle(), as I've read that using DrawGeometry is slower than DrawRectangle or DrawEllipse as the rectangle / ellipse geometry is known beforehand. However, in all of my tests, it hasn't mattered which draw function I use, the time for the same number of elements is always about equal.
I've tried building a multi-threaded Direct2D factory and running the draw functions through tasks, but that is much slower (about two times slower). The Direct2D methods appear to be utilizing my graphics card (hardware accelerated is enabled), as when I monitor my graphics card usage, it spikes when the screen is updating (my laptop has an NVIDIA Quadro mobile graphics card).
Apologies for the long-winded post. I hope this was enough background and description of things I've tried. Thanks in advance for any help!
Edit #1
So changed the code from iterating over a list using foreach to iterating over an array using for and that cut the drawing time down by half! I hadn't realized how much slower lists were than arrays (I knew there was some performance advantage, but didn't realize this much!). It still, however, takes 125 ms to draw. This is much better, but still not smooth. Any other suggestions?
Direct2D can be used with P/Invoke
See the sample "VB Direct2D Pixel Perfect Collision"
from https://social.msdn.microsoft.com/Forums/en-US/cea42526-4b82-454d-9d79-2e1d94083552/collisions?forum=vbgeneral
the animation is perfect, even done in VB
Related
Efficient way to draw circles
I'm drawing circles in the following way: for each pixel px { if (isInside(px)) px.color = white else px.color = black } bool isInside(pixel p) { for each circle cir { if (PixelInsideCircle(p, cir)) return true } return false } bool PixelInsideCircle(pixel p, circle cir) { float x = p.pos.x - cir.pos.x, y = p.pos.y - cir.pos.y; return x*x + y*y <= cir.radius*cir.radius } Here's the result: There are around 50 circles. Any way to optimize it? I'm using unity3d. I'm filling the RenderTexture using compute shader and directly drew (Graphics.Blit) to the camera. I'm drawing only circles and I want to increase the circles from 50 to 1000. I've tried to use aabb and kd tree but could not figure out how to correctly implement it, using tree only worsen the performance. Thought to use intersection test for every column but not sure if it's a good idea. I'm making this for android and ios. Any help ?
I do not code in/with unity/C#/DirectX but if you insist on filling by pixels see Is there a more efficient way of texturing a circle? for some ideas on easing the math ... I would not use compute shaders but render QUADS (AABB) for each circle instead using Vertex+Fragment shaders. As next step I would try to use Geometry shader that emits triangle fan around your circle (so the ratio between filled and empty space is better) this also require just center and radius instead of AABB so you can use POINTS instead of QUADS see: rendering cubics in GLSL Its doing similar things (but its in GLSL). Also I noted you have: return (p.pos.x - cir.pos.x)^2 + (p.pos.y - cir.pos.y)^2 - (cir.radius)^2 <= 0 try to change it to: return (p.pos.x - cir.pos.x)^2 + (p.pos.y - cir.pos.y)^2 <= (cir.radius)^2 its one less operation. Also (cir.radius)^2 should be passed to Fragment from Vertex (or Geometry) so it does not need to be computed on per pixel basis
Using compute shaders and checking distances is probably the faster way. In the worst case, for 1000 circles, it would execute "PixelInsideCircle" 1000 times and in the best case just once per pixel. When a pixel is found inside a circle it leaves the loop and return white. This is faster than any other solution on CPU (quadtree) + GPU (compute shaders). Let your GPU run everything in a single loop per pixel. Only the pixel number (width * height) * circles will affect the performance. You could go for a smaller texture (50~99%) and upscale on the Blit, its even better for mobile since screens are smaller. Also other solutions using meshes, circle textures would be bad as mobile GPUs are memory bandwidth bound, passing more commands and data around is worse than calculating on the GPU itself. You can try replacing your "PixelInsideCircle" with HLSL: distance() or length(), (they'll probaly have a internal Sqrt though), but since they are internal functions, maybe they are faster. Just test it. Do you run this once like a map generator or its run every frame ?
Unexplainable performance issues with BitmapSource in WPF
I have in my application a 3D world and data for this 3D world. The UI around the application is done with WPF and so far it seems to be working ok. But now I am implementing the following functionality: If you click on the terrain in the 3D view it will show the textures used in this chunk of terrain in a WPF control. The image data of the textures is compressed (S3TC) and I handle creation of BGRA8 data in a separate thread. Once its ready I'm using the main windows dispatcher to do the WPF related tasks. Now to show you this in code: foreach (var pair in loadTasks) { var img = pair.Item2; var loadInfo = TextureLoader.LoadToArgbImage(pair.Item1); if (loadInfo == null) continue; EditorWindowController.Instance.WindowDispatcher.BeginInvoke(new Action(img => { var watch = Stopwatch.StartNew(); var source = BitmapSource.Create(loadInfo.Width, loadInfo.Height, 96, 96, PixelFormats.Bgra32, null, loadInfo.Layers[0], loadInfo.Width * 4); watch.Stop(); img.Source = source; Log.Debug(watch.ElapsedMilliseconds); })); } While I cant argue with the visual output there is a weird performance issue. As you can see I have added a stopwatch to check where the time is consumed and I found the culprit: BitmapSource.Create. Typically I have 5-6 elemets in loadTasks and the images are 256x256 pixels. Interestingly now the first invocation shows 280-285ms for BitmapSource.Create. The next 4-5 all are below 1ms. This consistently happens every time I click the terrain and the loop is started. The only way to avoid the penalty in the first element is to click on the terrain constantly but as soon as I don't click the terrain (and therefore do not invoke the code above) for 1-2 seconds the next call to BitmapSource.Create gets the 280ms penalty again. Since anything above 5ms is far beyond any reasonable or acceptable time to create 256x256 bitmap (my S3TC decompression does all 10(!) mip layers in less than 2 ms) I guess there has to be something else going on here? FYI: All properties of loadInfo are static properties and do not perform any calculations you cant see in the code.
Speeding up emguCV methods
I have a question about speeding up a couple of emguCV calls. Currently I have a capture card that takes in a camera at 1920x1080#30Hz. Using directshow with a sample grabber I capture each frame and display it on a form. Now I have written an image stabilizer but the fastest I can run it is about 20Hz. The first thing I do in my stabilizer is scale the 1920x1080 down to 640x480 beacuse it makes the feature track much faster. Then I use goodFeaturesToTrack previousFrameGray.GoodFeaturesToTrack(sampleSize, sampleQuality, minimumDistance, blockSize)[0]); which takes about 12-15ms. The next thing I do is an optical flow calculation using this OpticalFlow.PyrLK(previousFrameGray, frame_gray, prev_corner.ToArray(), new Size(15, 15), 5, new MCvTermCriteria(5), out temp, out status, out err); and that takes about 15-18ms. The last time consuming method I call is the warpAffine function Image<Bgr, byte> warped_frame = frame.WarpAffine(T, interpMethod, Emgu.CV.CvEnum.WARP.CV_WARP_DEFAULT, new Bgr(BackgroundColor)); this takes about 10-12ms. The rest of the calculations, image scaling and what not take a total of around 7-8ms. So the total time for a frame calculation is about 48ms or about 21Hz. Somehow I need to get the total time under 33ms. So now for my questions. First: If I switch to using the GPU for goodFeatures and opticalFlow will that provide the nessesary increase in speed if any? Second: Are there any other methods besides using the GPU that could speed up these calculations?
Well I finally converted the functions to their GPU counterparts and got the speed increase I was looking for. I went from 48ms down to 22ms.
Speeding Up Image Handling
This is a follow on to my question How To Handle Image as Background to CAD Application I applied the resizing/resampling code but it is not making any difference. I am sure I do not know enough about GDI+ etc so please excuse me if I seem muddled. I am using a third party graphics library (Piccolo). I do not know enough to be sure what it is doing under the hood other than it evenually wraps GDI+. My test is to rotate the display at different zoom levels - this is the process that causes the worst performance hit. I know I am rotating the camera view. At zoom levels up to 1.0 there is no performance degradation and rotation is smooth using the mouse wheel. The image has to be scaled to the CAD units of 1m per pixel at a zoom level of 1.0. I have resized/ resampled the image to match that. I have tried different ways to speed this up based on the code given me in the last question: public static Bitmap ResampleImage(Image img, Size size) { using (logger.VerboseCall()) { var bmp = new Bitmap(size.Width, size.Height, PixelFormat.Format32bppPArgb); using (var gr = Graphics.FromImage(bmp)) { gr.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.Low; gr.CompositingQuality = System.Drawing.Drawing2D.CompositingQuality.HighSpeed; gr.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighSpeed; gr.DrawImage(img, new Rectangle(Point.Empty, size)); } return bmp; } } I guess this speeds up the resample but as far as I can tell has no effect on the performance when trying to rotate the display at high zoom levels. User a performance profiler (ANTS) I am able to find the code that is causing the performance hit: protected override void Paint(PPaintContext paintContext) { using (PUtil.logger.DebugCall()) { try { if (Image != null) { RectangleF b = Bounds; Graphics g = paintContext.Graphics; g.DrawImage(image, b); } } catch (Exception ex) { PUtil.logger.Error(string.Format("{0}\r\n{1}", ex.Message, ex.StackTrace)); //----catch GDI OOM exceptions } } } The performance hit is entirely in g.DrawImage(image, b); Bounds is the bounds of the image of course. The catch block is there to catch GDI+ OOM exceptions which seem worse at high zoom levels also. The number of times this is called seems to increase as the zoom level increases.... There is another hit in the code painting the camera view but I have not enough information to explain that yet except that this seems to paint all the layers attached to the camera - and all the objects on them I assume - when when the cameras view matrix and clip are applied to the paintContext (whatever that means). So is there some other call to g.DrawImage(image, b); that I could use? Or am I at the mercy of the graphics engine? Unfortunately it is so embedded that it would be very hard to change for me Thanks again
I think you you use,if I'm not mistake, PImageNode object form Piccolo. The quantity of calls to that method could increase because Piccolo engine traces "real" drawing area on the user screen, based on zoom level (kind of Culling) and draws only the nodes which are Visible ones. If you have a lot of PImageNode objects on your scene and make ZoomOut it will increase the quantity of PImageNode objects need to be drawn, so the calls to that method. What about the performance: 1) Try to use SetStyle(ControlStyles.DoubleBuffer,true); of the PCanvas (if it's not yet setted up) 2) look here CodeProject Regards.
How many (low poly) models can XNA handle?
I'm aware that the following is a vague question, but I'm hitting performance problems that I did not anticipate in XNA. I have a low poly model (It has 18 faces and 14 vertices) that I'm trying to draw to the screen a (high!) number of times. I get over 60 FPS (on a decent machine) until I draw this model 5000+ times. Am I asking too much here? I'd very much like to double or triple that number (10-15k) at least. My code for actually drawing the models is given below. I have tried to eliminate as much computation from the draw cycle as possible, is there more I can squeeze from it, or better alternatives all together? Note: tile.Offset is computed once during initialisation, not every cycle. foreach (var tile in Tiles) { var myModel = tile.Model; Matrix[] transforms = new Matrix[myModel.Bones.Count]; myModel.CopyAbsoluteBoneTransformsTo(transforms); foreach (ModelMesh mesh in myModel.Meshes) { foreach (BasicEffect effect in mesh.Effects) { // effect.EnableDefaultLighting(); effect.World = transforms[mesh.ParentBone.Index] * Matrix.CreateTranslation(tile.Offset); effect.View = CameraManager.ViewMatrix; effect.Projection = CameraManager.ProjectionMatrix; } mesh.Draw(); } }
You're quite clearly hitting the batch limit. See this presentation and this answer and this answer for details. Put simply: there is a limit to how many draw calls you can submit to the GPU each second. The batch limit is a CPU-based limit, so you'll probably see that your CPU gets pegged once you get to your 5000+ models. Worse still, when your game is doing other calculations, it will reduce the CPU time available to submit those batches. (And it's important to note that, conversely, you are almost certainly not hitting GPU limits. No need to worry about mesh complexity yet.) There are a number of ways to reduce your batch count. Frustrum culling is one. Probably the best one to persue in your case is Geometry Instancing, this lets you draw multiple models in a single batch. Here is an XNA sample that does this. Better still, if it's static geometry, can you simply bake it all into one or a few big meshes?
As with any performance problem there are limits where a particular approach works. You need to measure and see where problems are. The best option is to use profiler but even basic measurements like looking at CPU load may show what bottlencks you have. As a first investiagtion step I'd recommend to remove all computations (like matrix multiplications) and see you get improvments - this would mean that CPU is still doing more work than GPU. Make sure you are not doing measurements on debug build - it could make application significantly slower if it is CPU bound. Side note: GPU works the best when you send large operations relatively infrequently. Your code does more or less opposite - send huge number of very small drawing requests. You should be able to batch your primitives and get better performance. There are samples around how to render large number of simple objects (including ones in DirectX SDK), searching for "gpu rendering crowds" can give you starting point.