I am currently implementing a method that accepts two bitmap objects. We can assume that said objects are of equal dimensions etc. The return of the method is a list of Pixel changes (this is stored in a self-made object). This is being developed in an iterative manner so the current implementation was a basic one... simply work through each pixel and compare it to its counterpart. This method for generating changes is slower than acceptable (500ms or so), as such I am looking for a faster process.
Ideas that have crossed my mind are to break down the image into strips and run each comparison on a new thread or to compare zones of the screen as objects first then only examine in detail as required.
current code for your understanding...
for (int x = 0; x < screenShotBMP.Width; x++)
{
for (int y = 0; y < screenShotBMP.Height; y++)
{
if (screenShotBMP.GetPixel(x, y) != _PreviousFrame.GetPixel(x, y))
{
_pixelChanges.Add(new PixelChangeJob(screenShotBMP.GetPixel(x,y), x, y));
}
}
}
As you will deduct from the code the concept of the class in question is to take a screenshot and generate a list of pixel changes from the previously taken screenshot.
You should definitely look at the Lockbits method of manipulating bitmap data.
It is orders of magnitude faster than GetPixel/SetPixel.
EDIT:
Check this link for some code (albeit in VB, but you should get the drift) that almost does what you want. It is simply checking two bitmaps for equality and returning true or false. You could change the function so each pixel check adds to your _pixelChanges list if necessary, and return this list instead of a boolean.
Also, it may be faster if you swap round the iterator loops. i.e. have the inner loop iterating over X, and the outer loop iterating over Y.
Use BitBlt with option XOR.... . Should be much faster.
Related
I have a method which creates 1000 surfaces in a loop and store them in the list:
List<Surface> surfaces = new List<Surface>();
for (int i = 0; i < 1000; i++){
Surface surface = builder.buildSurface(int length, int width, new Position (x, y, z);
surfaces.Add(surface);
x +=2; y+= 4, z++; // each surface is shifted every iteration
}
When all surfaces were generated, I can output them in any specific order, for example in matrix form A[i][j].
How can I parallelize creation of surfaces and track position/index? I think each surface creation can be executed in parallel, but I want to know it's index for output according to surface position.
Is it possible to create a TheadPool which will generate 1000 surfaces in parallel and store them in ConcurrentDictionary<index, Surface>() so I can output them in a specific order?
P.S. I tried to split the method in half with Parallel.Invoke:
Parallel.Invoke(
() => builder.buildLeft(),
() => builder.buildRight());
So the execution time reduced almost 2 times, but I want to utilize all cores of CPU for such time consuming task.
You can use PLINQ (parallel linq) as foolowing
var surfaces = ParallelEnumerable.Range(0,1000)
.AsOrdered()
.Select(i=>builder.buildSurface(
length, width,
new Position (i*2, i*4, i)))
.ToList();
This sample assumes that x, y, z values are connected with item index i as x=i*2, y=i*4, z=i. If you need some different approach of calculation of x, y, z then you may need to prepare collection with such a data separately before calling PLINQ.
Please note AsOrdered() call above - it tells PLINQ to preserve order when building output list, but this preserving is not free. So, if you really not need exactly the List of items, but any storage with specified index/key is ok, you can try unordered PLINQ with Dictionary instead of List
var surfaces = ParallelEnumerable.Range(0,1000)
.ToDictionary(i=>i, i=>builder.buildSurface(
length, width,
new Position (i*2, i*4, i)));
I do not know, which variant will be faster, but you can perform tests yourself before making decision.
I'll do my best to explain the problem, which first requires an explanation of the project. I apologize ahead of time if it's a bit scattered. I have ADHD, which has made communicating on this platform massively difficult, so please bear with me. This is necessary to mention because stack overflow has consistently suppressed my questions for being scattered and I'm tired of this intolerance.
The project generates a 3D Worley noise map using my own unoptimized algorithm, which, at the moment, requires iterating through every index of the 3D array and individually calculating every element's value on the CPU one at a time. also, this will be in 3D world space.
Additionally, I needed to write my own class for the randomized points or "nodes" because this program iteratively moves these nodes in pseudorandom directions, and each node is associated with a procedure for calculating map index values, which is just assigned via an integer between 1 and 6. after each iteration, the map is regenerated. Without the nodes, this can't work.
Code for the Nodes on repl.it
Obviously, this is extremely slow, and I need to implement multithreading and compute shaders, which is the conclusion I've come to. Still, I'm faced with a massive problem: I've no idea how to use hlsl or compute shaders, and I cannot, for the life of me, find any resources on hlsl for C#/java/python programmers that would help me wrap my head around anything. ANY resources explaining hlsl on a basic level would be enormously helpful.
Now, for the specific problem of this question: I have no idea how to start. I have one vague idea of an approach that is derived from my ignorance about multithreading. I could use 32 individual 32x32 RWStructuredTexture2D<float> arrays that I stack after calling my shader to create a 3D texture; however, to do this, I need to be able to pass my nodes to the shader, and every use of compute shaders I've seen only has one parameter, uint3 id : SV_DispatchThreadID, which makes no sense to me. I've briefly considered making a struct for the nodes in my shader, but I still have no idea how to get that information to my shader.
For the actual question: how do I throw nodes at this and then get 32 32x32 float arrays out of it?
here's some pseudocode for the in-betweens.
//somehow set this up to have 32 different threads
//make an hlsl equivalent of a float[,]
#params NodeSet nodes and z coordinate
#return float[32,32]
//NodeSet is just a collection of Nodes that has some convenience methods.
float[,] CSMain(#params) {
for(int x = 0; x < 32; x++)
for(int y = 0; y < 32; y++)
//set value of element
return floatArr;
}
Second question: should I even be using Compute Shaders for this problem?
You should use 32 x 32 x 32 threads and output to a 1d buffer of length 32 x 32 x 32 based on the id of the thread
Pseudocode:
#pragma kernel PseudoCode
[numthreads(32, 32, 32)]
void PseudoCode (uint3 id : SV_DispatchThreadID) {
float3 pos = id; // or id - float3(15.5,15.5,15.5) to center, etc.
int outputIndex = id.x*32*32+id.y*32+id.z;
for (int i = 0; i < nodecount ; i++)
outputbuffer[outputIndex] += GetInfluence(nodeData[i], pos);
outputBuffer[outputIndex] = NormalizeOutput(outputBuffer[outputIndex]);
}
I have calculated that the current Mandelbrot iterates 208,200 times. But if I use a break to control the iterations it outputs kinda like a printer that has ran out of ink half way through, so I am obviously not doing it correctly, does anyone know how iteration controls should be implemented?
int iterations = 0;
for (x = 0; x < x1; x ++)
{
for (y = 0; y < y1; y++)
{
// PAINT CONTROLS HERE
if (iterations > 200000)
{
break;
}
iterations++;
}
}
You need to change the values of y1 and x1 to control the "depth" of your Mandelbrot set.
By breaking at a certain number of iterations, you've gone "deep" for a while (because x1 and y1 are large) and then just stop part way through.
It's not clear what you're asking. But taking the two most obvious interpretations of "iterations":
1) You mean to reduce the maximum iterations per-pixel. I wouldn't say this affects the "smoothness" of the resulting image, but "smooth" is not a well-defined technical term in the first place, so maybe this is what you mean. It's certainly more consistent with how the Mandelbrot set is visualized.
If this is the meaning you intend, then in your per-pixel loop (which you did not include in your code example), you need to reset the iteration count to 0 for each pixel, and then stop iterating if and when you hit the maximum you've chosen. Pixels where you hit the maximum before the iterated value for the pixel are in the set.
Typically this maximum would be at least 100 or so, which is enough to give you the basic shape of the set. For fine detail at high zoom factors, this can be in the 10's or 100's of thousands of iterations.
2) You mean to reduce the number of pixels you've actually computed. To me, this affects the "smoothness" of the image, because the resulting image is essentially lower-resolution.
If this is what you mean, then you need to either change the pixel width and height of the computed image (i.e. make x1 and y1 smaller), or change the X and Y step sizes in your loop and then fill in the image with larger rectangles of the correct color.
Without a better code example, it's impossible to offer more specific advice.
I'm trying to do some image processing with C# using that same old GDI techniques, iterating through every pixel with a nested for-loop, then using the GetPixel and SetPixel methods on that (Bitmap) image.
I have already got the same results with the pointers approach (using unsafe context) but I'm trying now to do the old-school Get/Set-Pixel Methods to play with my Bitmaps ...
Bitmap ToGrayscale(Bitmap source)
{
for (int y = 0; y < source.Height;y++ )
{
for (int x = 0; x < source.Width; x++)
{
Color current = source.GetPixel(x, y);
int avg = (current.R + current.B + current.G) / 3;
Color output = Color.FromArgb(avg, avg, avg);
source.SetPixel(x, y, output);
}
}
return source;
}
considering performance with the code above ... it takes just tooooo much to finish while stressing the user out waiting for his 1800x1600 image to finish processing.
So i thought that i could use the technique that we use working with HLSL, running a seperate function for each pixel (Pixel Shader engine (as i was tought) copies the function returning the float4 (Color) thousands of times on GPU to do the processing parallel).
So I tried to run a separate Task (function) for each pixel, putting these Task variables into a List and the 'await' for List.ToArray(). But I failed doing that as every new Task 'awaits' to be finished before the next one runs.
I wanted to call a new Task for each pixel to run this :
Color current = source.GetPixel(x, y);
int avg = (current.R + current.B + current.G) / 3;
Color output = Color.FromArgb(avg, avg, avg);
source.SetPixel(x, y, output);
At the end of the day I got my self an async non-blocking code but not parallel ...
Any suggestions guys?
GetPixel and SetPixel are likely the main bottleneck here.
Instead of trying to parallelize this, I would recommend using Bitmap.LockBits to handle the parsing much more efficiently.
That being said, you can parallelize your current version via:
Bitmap ToGrayscale(Bitmap source)
{
Parallel.For(0, source.Height, y =>
{
for (int x = 0; x < source.Width; x++)
{
Color current = source.GetPixel(x, y);
int avg = (current.R + current.B + current.G) / 3;
Color output = Color.FromArgb(avg, avg, avg);
source.SetPixel(x, y, output);
}
});
return source;
}
However, the Bitmap class is not thread safe, so this will likely cause issues.
A better approach would be to use LockBits, then parallelize working on the raw data directly (as above). Note that I'm only parallelizing the outer loop (on purpose) as this will prevent over saturation of the cores with work items.
Using tasks will just set up multiple threads on the CPU - it won't use the graphics processor. Also, I'm pretty sure that the Bitmap objects you are working with are not thread safe, so you won't be able to use multiple threads to access them anyway.
If all you are trying to do is convert an image to grayscale, I would look at built-in functionality first. In general, something built into the .NET framework can use lower level 'unsafe' code to do things more efficiently than would be possible otherwise, without being unsafe. Try How to: Convert an Image to Greyscale.
If you really want to use multiple threads for your custom bitmap processing, I think you will have to make a byte array, modify it in a multithreaded way, then create a bitmap object at the end from the byte array. Take a look at https://stackoverflow.com/a/15290190/1453269 for some pointers on how to do that.
A good way to parallelize your work is not to dispatch a task per pixel, but to dispatch as many threads as your processor cores.
You also say you are able to manipulate your pixels through pointers, so if you take that route, here goes another important advice: Have each thread work on neighboring pixels.
A valid scenario would be thread 1 working with the first 25% pixels, thread 2 with the next 25% and so on until thread 4.
The above is very important to avoid False Sharing, in which you are effectively dismissing your cache's services, making your algorithm a lot slower.
Other than this, you could probably work with your graphics card, but that is totally out of my league.
EDIT: As noted by Panagiotis in the comments, a task may not correlate to a thread, and as such you have to be cautious about what API you'll use to parallelize your work and how you will do it.
I basically have a grid, lets say 100 x 100 which is filled with url's of a photo collection. Some of these are duplicates as I may only have 50 photos but I want to duplicate them to make sure the 100 x 100 grid is filled.
I randomly fill the grid with the URL's and then display them which is fine. The problem I have is that obviously sometimes photos with the same URL are randomly places together either on the x axis or y axis or sometimes both.
How can I make sure that I fill the grid so that these images with the same URL are as far apart as possible thus preventing 2 of the same photos appearing next to each other.
Any help appreciated
Mike
If you really want "as far apart as possible" then (1) I bet you're out of luck and (2) if that were achievable it would probably produce not-very-random-looking results. But if all you want is "somewhat far apart", it's not so bad. Here are a few things you can do.
(1) Classify grid positions according to the parity of their x,y coordinates: that is, whether they're odd and even. Divide the photos into four roughly-equal-sized batches. Now select from different batches according to the parity of the coordinates. The following code (which is a bit too "clever"; sorry) does this, modulo bugs and typos.
System.Random rng = new System.Random();
for (int x=0; x<nx; ++x) {
for (int y=0; y<ny; ++y) {
k = ((x&1)<<1) + (y&1); // 0..3
int n_photos_in_batch = (n_photos+3-k) >> 2;
int photo_number = (rng.Next(0,n_photos_in_batch-1) << 2) + k;
// use this photo
}
}
Downsides: doesn't do anything to move copies of a photo any further away from one another than one step. Reduces randomness somewhat since all copies of any given photo will be in a fixed subset of positions; in some contexts this may be visible and look rather silly.
Variations: we're basically covering the grid with 2x2 tiles, and restricting the range of photos allowed to occur in each tile. You could use larger tiles, or differently-shaped tiles, or arrange them differently. For instance, if you say k = ((x&1)<<1) ^ (y&3) you get 2x2 tiles arranged in a kinda-hexagonal pattern, which is actually probably better than the version above.
(2) Loop over positions in your grid (raster order will do, though there might be better alternatives) and for each one choose a photo that (a) doesn't already occur too near to the position you're looking at and (b) is otherwise random. The following code (again, modulo bugs and typos) does something like this, though for large grids you might want to make it more efficient.
System.Random rng = new System.Random();
radius = MAX_RADIUS; // preferably not too big, so that the search isn't too slow
while ((2*radius+1)*(2*radius+1) >= n_photos) --radius; // gratuitously inefficient!
for (int x=0; x<nx; ++x) {
for (int y=0; y<ny; ++y) {
// which photos already appear too near to here?
System.Collections.BitArray unseen = new System.Collections.BitArray(n_photos,True);
for (x1=x-radius; x1<=x+radius; ++x1) {
for (int y1=y-radius; y1<=y+radius; ++y1) {
if (0 <= x1 && x1 < nx && 0 <= y1 && y1 < nx && (y1<y || (y1==y && x1<x))) {
unseen[photos[x1,y1]] = False;
}
}
}
// now choose a random one of them
int n_unseen = 0;
for (int i=0; i<n_photos; ++i) if (unseen[i]) ++n_unseen;
System.Debug.Assert(n_unseen>0, "no photos available");
int j = rng.Next(0,n_unseen-1);
for (int i=0; i<n_photos; ++i) {
if (unseen[i]) {
if (j==0) { photos[x,y] = i; break; }
--j;
}
}
}
}
Notes: This is much more expensive than option 1. The validity check on x1,y1 is gratuitously inefficient here, of course. So is the choice of radius. The obvious more-efficient versions of these, however, may break down if you adopt some of the variations I'm about to list. This code, as it stands, won't do anything to keep photos apart if there are fewer than 9. The choice of radius is actually completely bogus, for the grid-traversal order I've used, because there are never more than 2r^2+2r "excluded" positions; again, that may change if you traverse the grid in a different order. Etc.
Variations: there's no real reason why the region you search over should be square. Circular might well be better, for instance. You could, with some extra work, construct a region that always has exactly as many points in it as you have photos (though if you do that you'll get a mostly-periodic pattern of photos, so better to be a bit less aggressive ). It might be better to process the grid entries in a different position -- e.g., spiralling out from the centre.
(3) Option 2 above will keep photos unique within a certain range (about as large as it can be given how many different photos you have) but not care about keeping copies further away apart from that. You could, instead, decide how bad it is having two identical photos at any given distance and then choose photos to minimize total badness. This will be even more expensive than option 2. I shan't bother giving sample code; you can probably work out how you might do it.
[EDITED to add ...]
(4) Here's a cute variation on the theme of (1). It will work best when the grid is square and its size is a power of 2, but you can adapt it to work more generally. It takes time only proportional to the size of your grid, however many photos you have. For each position (x,y): Throw away all but the bottom k bits of the coordinates, for some k. Bit-reverse them and interleave the bits, giving a number m from 0 to 2^(2k)-1. Choose k so that this is somewhere on the order of, say, n_photos/4. Now, at position (x,y) you'll put photo number round(n_photos*m/2^(2k) + smallish_random_number). There are a few details I'll leave for you to fill in :-).
Fastest way is somthing like this:
You have array of n imgs URL & grid x*y
Find a central cell of the grid.
Randomly extract imgs URL from array and put each URL around central cell (first URL put to the center)
Do it until you don't fill all grid cells or while you have URLs in array.
If every URL is used then you should take URLs from concentric circles that you are made. Folow from the central cell to the circle with the bigest radius.
URLs taken by this method you should randomly put around biggest circle.
This algorithm will work if you have enough URLs for drawing less then 2 disks to the grid.
You can successfully modify it if you will follow the rule that URLs from one set must fill as big circle as it can.
What you want is a space-filling-curve for example a hilbert curve. It fills your grid with a continous line separating each square by only 1 bit. Because the nature of a sfc is to recursivley fill the space and maintain a neighborhood you can exploit this and place the picture along the line. If you don't want to place the same picture in the direct neighboorhood you can use a depth-seach on the sfc on each node eliminates copies.