Parallel Image Proccessing with C# 5.0 - c#

I'm trying to do some image processing with C# using that same old GDI techniques, iterating through every pixel with a nested for-loop, then using the GetPixel and SetPixel methods on that (Bitmap) image.
I have already got the same results with the pointers approach (using unsafe context) but I'm trying now to do the old-school Get/Set-Pixel Methods to play with my Bitmaps ...
Bitmap ToGrayscale(Bitmap source)
{
for (int y = 0; y < source.Height;y++ )
{
for (int x = 0; x < source.Width; x++)
{
Color current = source.GetPixel(x, y);
int avg = (current.R + current.B + current.G) / 3;
Color output = Color.FromArgb(avg, avg, avg);
source.SetPixel(x, y, output);
}
}
return source;
}
considering performance with the code above ... it takes just tooooo much to finish while stressing the user out waiting for his 1800x1600 image to finish processing.
So i thought that i could use the technique that we use working with HLSL, running a seperate function for each pixel (Pixel Shader engine (as i was tought) copies the function returning the float4 (Color) thousands of times on GPU to do the processing parallel).
So I tried to run a separate Task (function) for each pixel, putting these Task variables into a List and the 'await' for List.ToArray(). But I failed doing that as every new Task 'awaits' to be finished before the next one runs.
I wanted to call a new Task for each pixel to run this :
Color current = source.GetPixel(x, y);
int avg = (current.R + current.B + current.G) / 3;
Color output = Color.FromArgb(avg, avg, avg);
source.SetPixel(x, y, output);
At the end of the day I got my self an async non-blocking code but not parallel ...
Any suggestions guys?

GetPixel and SetPixel are likely the main bottleneck here.
Instead of trying to parallelize this, I would recommend using Bitmap.LockBits to handle the parsing much more efficiently.
That being said, you can parallelize your current version via:
Bitmap ToGrayscale(Bitmap source)
{
Parallel.For(0, source.Height, y =>
{
for (int x = 0; x < source.Width; x++)
{
Color current = source.GetPixel(x, y);
int avg = (current.R + current.B + current.G) / 3;
Color output = Color.FromArgb(avg, avg, avg);
source.SetPixel(x, y, output);
}
});
return source;
}
However, the Bitmap class is not thread safe, so this will likely cause issues.
A better approach would be to use LockBits, then parallelize working on the raw data directly (as above). Note that I'm only parallelizing the outer loop (on purpose) as this will prevent over saturation of the cores with work items.

Using tasks will just set up multiple threads on the CPU - it won't use the graphics processor. Also, I'm pretty sure that the Bitmap objects you are working with are not thread safe, so you won't be able to use multiple threads to access them anyway.
If all you are trying to do is convert an image to grayscale, I would look at built-in functionality first. In general, something built into the .NET framework can use lower level 'unsafe' code to do things more efficiently than would be possible otherwise, without being unsafe. Try How to: Convert an Image to Greyscale.
If you really want to use multiple threads for your custom bitmap processing, I think you will have to make a byte array, modify it in a multithreaded way, then create a bitmap object at the end from the byte array. Take a look at https://stackoverflow.com/a/15290190/1453269 for some pointers on how to do that.

A good way to parallelize your work is not to dispatch a task per pixel, but to dispatch as many threads as your processor cores.
You also say you are able to manipulate your pixels through pointers, so if you take that route, here goes another important advice: Have each thread work on neighboring pixels.
A valid scenario would be thread 1 working with the first 25% pixels, thread 2 with the next 25% and so on until thread 4.
The above is very important to avoid False Sharing, in which you are effectively dismissing your cache's services, making your algorithm a lot slower.
Other than this, you could probably work with your graphics card, but that is totally out of my league.
EDIT: As noted by Panagiotis in the comments, a task may not correlate to a thread, and as such you have to be cautious about what API you'll use to parallelize your work and how you will do it.

Related

How to parallelize method for creating a List and track each element position/index in C#

I have a method which creates 1000 surfaces in a loop and store them in the list:
List<Surface> surfaces = new List<Surface>();
for (int i = 0; i < 1000; i++){
Surface surface = builder.buildSurface(int length, int width, new Position (x, y, z);
surfaces.Add(surface);
x +=2; y+= 4, z++; // each surface is shifted every iteration
}
When all surfaces were generated, I can output them in any specific order, for example in matrix form A[i][j].
How can I parallelize creation of surfaces and track position/index? I think each surface creation can be executed in parallel, but I want to know it's index for output according to surface position.
Is it possible to create a TheadPool which will generate 1000 surfaces in parallel and store them in ConcurrentDictionary<index, Surface>() so I can output them in a specific order?
P.S. I tried to split the method in half with Parallel.Invoke:
Parallel.Invoke(
() => builder.buildLeft(),
() => builder.buildRight());
So the execution time reduced almost 2 times, but I want to utilize all cores of CPU for such time consuming task.
You can use PLINQ (parallel linq) as foolowing
var surfaces = ParallelEnumerable.Range(0,1000)
.AsOrdered()
.Select(i=>builder.buildSurface(
length, width,
new Position (i*2, i*4, i)))
.ToList();
This sample assumes that x, y, z values are connected with item index i as x=i*2, y=i*4, z=i. If you need some different approach of calculation of x, y, z then you may need to prepare collection with such a data separately before calling PLINQ.
Please note AsOrdered() call above - it tells PLINQ to preserve order when building output list, but this preserving is not free. So, if you really not need exactly the List of items, but any storage with specified index/key is ok, you can try unordered PLINQ with Dictionary instead of List
var surfaces = ParallelEnumerable.Range(0,1000)
.ToDictionary(i=>i, i=>builder.buildSurface(
length, width,
new Position (i*2, i*4, i)));
I do not know, which variant will be faster, but you can perform tests yourself before making decision.

C# How to Improve Efficiency in Direct2D Drawing

Good morning,
I have been teaching myself a bit of Direct2D programming in C#, utilizing native wrappers that are available (currently using d2dSharp, but have also tried SharpDX). I'm running into problems with efficiency, though, where the basic drawing Direct2D drawing methods are taking approximately 250 ms to draw 45,000 basic polygons. The performance I am seeing is on par, or even slower than, Windows GDI+. I'm hoping that someone can take a look at what I've done and propose a way(s) that I can dramatically improve the time it takes to draw.
The background to this is that I have a personal project in which I am developing a basic but functional CAD interface capable of performing a variety of tasks, including 2D finite element analysis. In order to make it at all useful, the interface needs to be able to display tens-of-thousands of primitive elements (polygons, circles, rectangles, points, arcs, etc.).
I initially wrote the drawing methods using Windows GDI+ (System.Drawing), and performance is pretty good until I reach about 3,000 elements on screen at any given time. The screen must be updated any time the user pans, zooms, draws new elements, deletes elements, moves, rotates, etc. Now, in order to improve efficiency, I utilize a quad tree data structure to store my elements, and I only draw elements that actually fall within the bounds of the control window. This helped significantly when zoomed in, but obviously, when fully zoomed out and displaying all elements, it makes no difference. I also use a timer and tick events to update the screen at the refresh rate (60 Hz), so I'm not trying to update thousands of times per second or on every mouse event.
This is my first time programming with DirectX and Direct2D, so I'm definitely learning here. That being said, I've spent days reviewing tutorials, examples, and forums, and could not find much that helped. I've tried a dozen different methods of drawing, pre-processing, multi-threading, etc. My code is below
Code to Loop Through and Draw Elements
List<IDrawingElement> elementsInBounds = GetElementsInDraftingWindow();
_d2dContainer.Target.BeginDraw();
_d2dContainer.Target.Clear(ColorD2D.FromKnown(Colors.White, 1));
if (elementsInBounds.Count > 0)
{
Stopwatch watch = new Stopwatch();
watch.Start();
#region Using Drawing Element DrawDX Method
foreach (IDrawingElement elem in elementsInBounds)
{
elem.DrawDX(ref _d2dContainer.Target, ref _d2dContainer.Factory, ZeroPoint, DrawingScale, _selectedElementBrush, _selectedElementPointBrush);
}
#endregion
watch.Stop();
double drawingTime = watch.ElapsedMilliseconds;
Console.WriteLine("DirectX drawing time = " + drawingTime);
watch.Reset();
watch.Start();
Matrix3x2 scale = Matrix3x2.Scale(new SizeFD2D((float)DrawingScale, (float)DrawingScale), new PointFD2D(0, 0));
Matrix3x2 translate = Matrix3x2.Translation((float)ZeroPoint.X, (float)ZeroPoint.Y);
_d2dContainer.Target.Transform = scale * translate;
watch.Stop();
double transformTime = watch.ElapsedMilliseconds;
Console.WriteLine("DirectX transform time = " + transformTime);
}
DrawDX Function for Polygon
public override void DrawDX(ref WindowRenderTarget rt, ref Direct2DFactory fac, Point zeroPoint, double drawingScale, SolidColorBrush selectedLineBrush, SolidColorBrush selectedPointBrush)
{
if (_pathGeometry == null)
{
CreatePathGeometry(ref fac);
}
float brushWidth = (float)(Layer.Width / (drawingScale));
brushWidth = (float)(brushWidth * 2);
if (Selected == false)
{
rt.DrawGeometry(Layer.Direct2DBrush, brushWidth, _pathGeometry);
//Note that _pathGeometry is a PathGeometry
}
else
{
rt.DrawGeometry(selectedLineBrush, brushWidth, _pathGeometry);
}
}
Code to Create Direct2D Factory & Render Target
private void CreateD2DResources(float dpiX, float dpiY)
{
Factory = Direct2DFactory.CreateFactory(FactoryType.SingleThreaded, DebugLevel.None, FactoryVersion.Auto);
RenderTargetProperties props = new RenderTargetProperties(
RenderTargetType.Default, new PixelFormat(DxgiFormat.B8G8R8A8_UNORM,
AlphaMode.Premultiplied), dpiX, dpiY, RenderTargetUsage.None, FeatureLevel.Default);
Target = Factory.CreateWindowRenderTarget(_targetPanel, PresentOptions.None, props);
Target.AntialiasMode = AntialiasMode.Aliased;
if (_selectionBoxLeftStrokeStyle != null)
{
_selectionBoxLeftStrokeStyle.Dispose();
}
_selectionBoxLeftStrokeStyle = Factory.CreateStrokeStyle(new StrokeStyleProperties1(LineCapStyle.Flat,
LineCapStyle.Flat, LineCapStyle.Flat, LineJoin.Bevel, 10, DashStyle.Dash, 0, StrokeTransformType.Normal), null);
}
I create a Direct2D factory and render target once and keep references to them at all times (that way I'm not recreating each time). I also create all of the brushes when the drawing layer (which describes color, width, etc.) is created. As such, I am not creating a new brush every time I draw, simply referencing a brush that already exists. Same with the geometry, as can be seen in the second code-snippet. I create the geometry once, and only update the geometry if the element itself is moved or rotated. Otherwise, I simply apply a transform to the render target after drawing.
Based on my stopwatches, the time taken to loop through and call the elem.DrawDX methods takes about 225-250 ms (for 45,000 polygons). The time taken to apply the transform is 0-1 ms, so it appears that the bottleneck is in the RenderTarget.DrawGeometry() function.
I've done the same tests with RenderTarget.DrawEllipse() or RenderTarget.DrawRectangle(), as I've read that using DrawGeometry is slower than DrawRectangle or DrawEllipse as the rectangle / ellipse geometry is known beforehand. However, in all of my tests, it hasn't mattered which draw function I use, the time for the same number of elements is always about equal.
I've tried building a multi-threaded Direct2D factory and running the draw functions through tasks, but that is much slower (about two times slower). The Direct2D methods appear to be utilizing my graphics card (hardware accelerated is enabled), as when I monitor my graphics card usage, it spikes when the screen is updating (my laptop has an NVIDIA Quadro mobile graphics card).
Apologies for the long-winded post. I hope this was enough background and description of things I've tried. Thanks in advance for any help!
Edit #1
So changed the code from iterating over a list using foreach to iterating over an array using for and that cut the drawing time down by half! I hadn't realized how much slower lists were than arrays (I knew there was some performance advantage, but didn't realize this much!). It still, however, takes 125 ms to draw. This is much better, but still not smooth. Any other suggestions?
Direct2D can be used with P/Invoke
See the sample "VB Direct2D Pixel Perfect Collision"
from https://social.msdn.microsoft.com/Forums/en-US/cea42526-4b82-454d-9d79-2e1d94083552/collisions?forum=vbgeneral
the animation is perfect, even done in VB

Image format conversion is going abnormally slow

I am receiving an image from an external system in the form of a sequence of BGR values followed by an empty byte. The sequence looks sort of like...
[B,G,R,0,B,G,R,0,...,B,G,R,0] where each BGR0 is a single pixel in an image.
I need this in a .NET Bitmap so I can perform some manipulations on it and have come up with the following function to do so:
private Bitmap fillBitmap(byte[] data, int width, int height)
{
Bitmap map = new Bitmap(width, height);
for (int i = 0; i < data.Length; i += 4)
{
int y = ((i / 4) / width);
int x = ((i / 4) - (y * width));
int b = data[i];
int g = data[i + 1];
int r = data[i + 2];
Color pixel = Color.FromArgb(r, g, b);
map.SetPixel(x, y, pixel);
}
return map;
}
This would normally be ok except that most of my images are 1920x1200... so I have a loop that's iterating over 2 million times. Even then that wouldn't be so bad as 2 million iterations shouldn't be very taxing on any modern processor.
But for some reason, this loop can take upwards of 5-15 seconds to run on a pretty beefy Xeon on my server. It would be trivial to parallelize the loop but I suspect there is an even better way of going about this. Any help would be greatly appreciated!
The description of the Bitmap.LockBits Method says,
You can change the color of an image with the SetPixel method, although the LockBits method offers better performance for large-scale changes.
An alternative, I'd guess, might be to use the Bitmap(Stream) Constructor, after you create a Stream which matches the file format of a bitmap.
Check some FastBitmap implementation that will help you set the array from which you will be able to generate the image.
A call to GetPixel and SetPixel makes an interop call to native functions. Each time you call one of the functions the Bitmap image is locked the corresponding pixel/bytes is modified and then the image is finally unlocked. You can imagine performing this repeatedly is highly inefficient.
As others have suggested use the LockBits method, although you will have to use unsafe code I imagine. Now if this is allowed, use it. It allows direct access to the Bitmap's pixels in an unmanaged memory buffer.

Fast comparison of two Bitmap objects on a pixel per pixel basis

I am currently implementing a method that accepts two bitmap objects. We can assume that said objects are of equal dimensions etc. The return of the method is a list of Pixel changes (this is stored in a self-made object). This is being developed in an iterative manner so the current implementation was a basic one... simply work through each pixel and compare it to its counterpart. This method for generating changes is slower than acceptable (500ms or so), as such I am looking for a faster process.
Ideas that have crossed my mind are to break down the image into strips and run each comparison on a new thread or to compare zones of the screen as objects first then only examine in detail as required.
current code for your understanding...
for (int x = 0; x < screenShotBMP.Width; x++)
{
for (int y = 0; y < screenShotBMP.Height; y++)
{
if (screenShotBMP.GetPixel(x, y) != _PreviousFrame.GetPixel(x, y))
{
_pixelChanges.Add(new PixelChangeJob(screenShotBMP.GetPixel(x,y), x, y));
}
}
}
As you will deduct from the code the concept of the class in question is to take a screenshot and generate a list of pixel changes from the previously taken screenshot.
You should definitely look at the Lockbits method of manipulating bitmap data.
It is orders of magnitude faster than GetPixel/SetPixel.
EDIT:
Check this link for some code (albeit in VB, but you should get the drift) that almost does what you want. It is simply checking two bitmaps for equality and returning true or false. You could change the function so each pixel check adds to your _pixelChanges list if necessary, and return this list instead of a boolean.
Also, it may be faster if you swap round the iterator loops. i.e. have the inner loop iterating over X, and the outer loop iterating over Y.
Use BitBlt with option XOR.... . Should be much faster.

Is there any way to do Image Quantization safely and with no Marshalling?

I'm currently using Brendan Tompkins ImageQuantization dll.
http://codebetter.com/blogs/brendan.tompkins/archive/2007/06/14/gif-image-color-quantizer-now-with-safe-goodness.aspx
But it doesn't run in medium trust in asp.net.
Does anyone know of a Image Quantization library that does run in medium trust?
Update
I don't care if the solution is slow. I just need something that works.
You should be able to replace the code using Marshal with explicit reading of the underlying stream via something like BinaryReader. This may be slower since you must read the stream entirely into your managed memory or seek into it rather than relying on the copy already in unmanaged memory being quickly accessible but is fundamentally your only option.
You simply cannot go spelunking into unmanaged memory from a medium trust context, even if only performing read operations.
Having looked at the linked code there's a reason you're not allowed to do this sort of thing. For starters he's ignoring the 64/32bit aspect of the IntPtr!
The underlying BitMapData class he's using is utterly predicated on having unfettered read access to arbitrary memory, this is never happening under medium trust.
A significant rewrite of his base functionality will be required to either use BitMap's directly (with the slow GetPixel calls) or read the data directly via conventional stream apis, dropping it into an array(s) and then parse it out yourself. Neither of these are likely to be pleasant. The former will be much slower (I would expect order of magnitude due to the high overhead per pixel read), the later less slow (though still slower) but has much more associated effort in terms of rewriting the low level parsing of the image data.
Here's a rough guide to what you need to change based on the current code:
from Quantizer.cs
public Bitmap Quantize(Image source)
{
// Get the size of the source image
int height = source.Height;
int width = source.Width;
// And construct a rectangle from these dimensions
Rectangle bounds = new Rectangle(0, 0, width, height);
// First off take a 32bpp copy of the image
Bitmap copy = new Bitmap(width, height, PixelFormat.Format32bppArgb);
// And construct an 8bpp version
Bitmap output = new Bitmap(width, height, PixelFormat.Format8bppIndexed);
// Now lock the bitmap into memory
using (Graphics g = Graphics.FromImage(copy))
{
g.PageUnit = GraphicsUnit.Pixel;
// Draw the source image onto the copy bitmap,
// which will effect a widening as appropriate.
g.DrawImage(source, bounds);
}
//!! BEGIN CHANGES - no locking here
//!! simply use copy not a pointer to it
//!! you could also simply write directly to a buffer then make the final immage in one go but I don't bother here
// Call the FirstPass function if not a single pass algorithm.
// For something like an octree quantizer, this will run through
// all image pixels, build a data structure, and create a palette.
if (!_singlePass)
FirstPass(copy, width, height);
// Then set the color palette on the output bitmap. I'm passing in the current palette
// as there's no way to construct a new, empty palette.
output.Palette = GetPalette(output.Palette);
// Then call the second pass which actually does the conversion
SecondPass(copy, output, width, height, bounds);
//!! END CHANGES
// Last but not least, return the output bitmap
return output;
}
//!! Completely changed, note that I assume all the code is changed to just use Color rather than Color32
protected virtual void FirstPass(Bitmap source, int width, int height)
{
// Loop through each row
for (int row = 0; row < height; row++)
{
// And loop through each column
for (int col = 0; col < width; col++)
{
InitialQuantizePixel(source.GetPixel(col, row));
} // Now I have the pixel, call the FirstPassQuantize function...
}
}
you would need to do roughly the same in the other functions.
This removes any need for Color32, the Bitmap class will deal with all that for you.
Bitmap.SetPixel() will deal with the second pass. Note that this is the easiest way to port things but absolutely not the fastest way to do it within a medium trust environment.

Categories