I'm stuck at saving an array of System.Drawing.Bitmap type, each bitmap to separate file.
I have an array "survey". This array stores several Lists of double type.
For each List i want to create a bitmap and then save it as a bmp file.
The line raport[i].Save(Path.Combine(myfilepath, nets[i] + ".bmp")); returns TypeInitializationException - and i don't know why.
The piece nets[i] is a dictionary (int, string) with expected file names.
public void save_results()
{
System.Drawing.Bitmap[] raport = new System.Drawing.Bitmap[survey.Length];
for (int i = 0; i < survey.Length; i++)
{
raport[i] = new System.Drawing.Bitmap(survey[i].Count, 1000);
for (int x = 0; x < survey[i].Count; x++)
for (int y = 0; y < 1000; y++)
raport[i].SetPixel(x, y, Color.FromArgb(255, 255, 255));
for (int x = 0; x < survey[i].Count; x++)
raport[i].SetPixel(x, (int)(1000 - Math.Floor(survey[i][x] * 1000) >= 1000 ? 999 : 1000 - Math.Floor(survey[i][x] * 1000)), Color.FromArgb(0, 0, 0));
raport[i].Save(Path.Combine(myfilepath, nets[i] + ".bmp"));
}
}
Finally, the problem was associated with the variable "myfilepath".
The variable was 'compiled' from few file paths - and all of those strings should have been static:
public static string mydoc= Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
public static string myfilepath_p = Path.Combine(mydoc, "Demeter");
public static string myfilepath= Path.Combine(myfilepath_p, "regresja_liniowa");
Originally, only the 'final' variable used in the cited code was static, what caused an error.
Rest of the code worked fine.
Related
I'm new to CUDA and trying to figure out how to pass 2d array to the kernel.
I have to following working code for 1 dimension array:
class Program
{
static void Main(string[] args)
{
int N = 10;
int deviceID = 0;
CudaContext ctx = new CudaContext(deviceID);
CudaKernel kernel = ctx.LoadKernel(#"doubleIt.ptx", "DoubleIt");
kernel.GridDimensions = (N + 255) / 256;
kernel.BlockDimensions = Math.Min(N,256);
// Allocate input vectors h_A in host memory
float[] h_A = new float[N];
// Initialize input vectors h_A
for (int i = 0; i < N; i++)
{
h_A[i] = i;
}
// Allocate vectors in device memory and copy vectors from host memory to device memory
CudaDeviceVariable<float> d_A = h_A;
CudaDeviceVariable<float> d_C = new CudaDeviceVariable<float>(N);
// Invoke kernel
kernel.Run(d_A.DevicePointer, d_C.DevicePointer, N);
// Copy result from device memory to host memory
float[] h_C = d_C;
// h_C contains the result in host memory
}
}
with the following kernel code:
__global__ void DoubleIt(const float* A, float* C, int N)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < N)
C[i] = A[i] * 2;
}
as I said, everything works fine but I want to work with 2d array as follow:
// Allocate input vectors h_A in host memory
int W = 10;
float[][] h_A = new float[N][];
// Initialize input vectors h_A
for (int i = 0; i < N; i++)
{
h_A[i] = new float[W];
for (int j = 0; j < W; j++)
{
h_A[i][j] = i*W+j;
}
}
I need all the 2nd dimension to be on the same thread so the kernel.BlockDimensions must stay as 1 dimension and each kernel thread need to get 1d array with 10 elements.
so my bottom question is: How shell I copy this 2d array to the device and how to use it in the kernel? (as to the example it supposed to have total of 10 threads).
Short answer: you shouldn't do it...
Long answer: Jagged arrays are difficult to handle in general. Instead of one continuous segment of memory for your data, you have plenty small ones lying sparsely somewhere in your memory. What happens if you copy the data to GPU? If you had one large continuous segment you call the cudaMemcpy/CopyToDevice functions and copy the entire block at once. But same as you allocate jagged arrays in a for loop, you’d have to copy your data line by line into a CudaDeviceVariable<CUdeviceptr>, where each entry points to a CudaDeviceVariable<float>. In parallel you maintain a host array CudaDeviceVariable<float>[] that manages your CUdeviceptrs on host side. Copying data in general is already quite slow, doing it this way is probably a real performance killer...
To conclude: If you can, use flattened arrays and index the entries with index y * DimX + x. Even better on GPU side, use pitched memory, where the allocation is done so that each line starts on a "good" address: Index then turns to y * Pitch + x (simplified). The 2D copy methods in CUDA are made for these pitched memory allocations where each line gets some additional bytes added.
For completeness: In C# you also have 2-dimensional arrays like float[,]. You can also use these on host side instead of flattened 1D arrays. But I wouldn’t recommend to do so, as the ISO standard of .net does not guarantee that the internal memory is actually continuous, an assumption that managedCuda must use in order to use these arrays. Current .net framework doesn’t have any internal weirdness, but who knows if it will stay like this...
This would realize the jagged array copy:
float[][] data_h;
CudaDeviceVariable<CUdeviceptr> data_d;
CUdeviceptr[] ptrsToData_h; //represents data_d on host side
CudaDeviceVariable<float>[] arrayOfarray_d; //Array of CudaDeviceVariables to manage memory, source for pointers in ptrsToData_h.
int sizeX = 512;
int sizeY = 256;
data_h = new float[sizeX][];
arrayOfarray_d = new CudaDeviceVariable<float>[sizeX];
data_d = new CudaDeviceVariable<CUdeviceptr>(sizeX);
ptrsToData_h = new CUdeviceptr[sizeX];
for (int x = 0; x < sizeX; x++)
{
data_h[x] = new float[sizeY];
arrayOfarray_d[x] = new CudaDeviceVariable<float>(sizeY);
ptrsToData_h[x] = arrayOfarray_d[x].DevicePointer;
//ToDo: init data on host...
}
//Copy the pointers once:
data_d.CopyToDevice(ptrsToData_h);
//Copy data:
for (int x = 0; x < sizeX; x++)
{
arrayOfarray_d[x].CopyToDevice(data_h[x]);
}
//Call a kernel:
kernel.Run(data_d.DevicePointer /*, other parameters*/);
//kernel in *cu file:
//__global__ void kernel(float** data_d, ...)
This is a sample for CudaPitchedDeviceVariable:
int dimX = 512;
int dimY = 512;
float[] array_host = new float[dimX * dimY];
CudaPitchedDeviceVariable<float> arrayPitched_d = new CudaPitchedDeviceVariable<float>(dimX, dimY);
for (int y = 0; y < dimY; y++)
{
for (int x = 0; x < dimX; x++)
{
array_host[y * dimX + x] = x * y;
}
}
arrayPitched_d.CopyToDevice(array_host);
kernel.Run(arrayPitched_d.DevicePointer, arrayPitched_d.Pitch, dimX, dimY);
//Correspondend kernel:
extern "C"
__global__ void kernel(float* data, size_t pitch, int dimX, int dimY)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= dimX || y >= dimY)
return;
//pointer arithmetic: add y*pitch to char* pointer as pitch is given in bytes,
//which gives the start of line y. Convert to float* and add x, to get the
//value at entry x of line y:
float value = *(((float*)((char*)data + y * pitch)) + x);
*(((float*)((char*)data + y * pitch)) + x) = value + 1;
//Or simpler if you don't like pointers:
float* line = (float*)((char*)data + y * pitch);
float value2 = line[x];
}
This is the method in question:
Color[][] ChopUpTiles()
{
int numTilesPerRow = terrainTiles.width / tileResolution;
int numRows = terrainTiles.height / tileResolution;
Color[][] tiles = new Color[numTilesPerRow * numRows][];
for (int y = 0; y < numRows; y++)
{
for (int x = 0; x < numTilesPerRow; x++)
{
tiles[y * numTilesPerRow + x] = terrainTiles.GetPixels(x * tileResolution , y * tileResolution, tileResolution, tileResolution);
}
}
return tiles;
}
It's a pretty basic function, and works - as long as the tileset in question only has one row. If it has more then a single row, it freaks out. Suddenly, using "tiles[1]" no longer returns tile 1. Instead, it returns... tile 15. I have no idea why it's acting this way, or where the math is wrong. Can someone spot what's going on?
Don't you mean tiles[y][numTilesPerRow + x] or tiles[y][x] or something along those lines? because i don't know what you are trying to do, but you are retrieving an entire row not a tile itself.
also, i think Color[][] tiles = new Color[numTilesPerRow * numRows][]; should be Color[][] tiles = new Color[numRows][numTilesPerRow]; or am i wrong?
Basically, you have a multi-dimensional Array yet you are treating it as a single-dimensional Array
I'm using Naudio AsioOut object to pass data from input buffer to my delayProc() function and then to output buffer.
The delayProc() needs float[] buffer type, and this is possible using e.GetAsInterleavedSamples(). The problem is I need to re-convert it to a multidimensional IntPtr, to do this I'm using AsioSampleConvertor class.
When I try to apply the effect it shows me an error: AccessViolationException on the code of AsioSampleConvertor class.
So I think the problem is due to the conversion from float[] to IntPtr[]..
I give you some code:
OnAudioAvailable()
floatIn = new float[e.SamplesPerBuffer * e.InputBuffers.Length];//*2
e.GetAsInterleavedSamples(floatIn);
floatOut = delayProc(floatIn, e.SamplesPerBuffer * e.InputBuffers.Length, 1.5f);
//conversione da float[] a IntPtr[L][R]
Outp = Marshal.AllocHGlobal(sizeof(float)*floatOut.Length);
Marshal.Copy(floatOut, 0, Outp, floatOut.Length);
NAudio.Wave.Asio.ASIOSampleConvertor.ConvertorFloatToInt2Channels(Outp, e.OutputBuffers, e.InputBuffers.Length, floatOut.Length);
delayProc()
private float[] delayProc(float[] sourceBuffer, int sampleCount, float delay)
{
if (OldBuf == null)
{
OldBuf = new float[sampleCount];
}
float[] BufDly = new float[(int)(sampleCount * delay)];
int delayLength = (int)(BufDly.Length - (BufDly.Length / delay));
for (int j = sampleCount - delayLength; j < sampleCount; j++)
for (int i = 0; i < delayLength; i++)
BufDly[i] = OldBuf[j];
for (int j = 0; j < sampleCount; j++)
for (int i = delayLength; i < BufDly.Length; i++)
BufDly[i] = sourceBuffer[j];
for (int i = 0; i < sampleCount; i++)
OldBuf[i] = sourceBuffer[i];
return BufDly;
}
AsioSampleConvertor
public static void ConvertorFloatToInt2Channels(IntPtr inputInterleavedBuffer, IntPtr[] asioOutputBuffers, int nbChannels, int nbSamples)
{
unsafe
{
float* inputSamples = (float*)inputInterleavedBuffer;
int* leftSamples = (int*)asioOutputBuffers[0];
int* rightSamples = (int*)asioOutputBuffers[1];
for (int i = 0; i < nbSamples; i++)
{
*leftSamples++ = clampToInt(inputSamples[0]);
*rightSamples++ = clampToInt(inputSamples[1]);
inputSamples += 2;
}
}
}
ClampToInt()
private static int clampToInt(double sampleValue)
{
sampleValue = (sampleValue < -1.0) ? -1.0 : (sampleValue > 1.0) ? 1.0 : sampleValue;
return (int)(sampleValue * 2147483647.0);
}
If you need some other code, just ask me.
When you call ConvertorFloatToInt2Channels you are passing in the total number of samples across all channels, then trying to read that many pairs of samples. So you are trying to read twice as many samples from your input buffer as are actually there. Using unsafe code you are trying to address well past the end of the allocated block, which results in the access violation you are getting.
Change the for loop in your ConvertorFloatToInt2Channels method to read:
for (int i = 0; i < nbSamples; i += 2)
This will stop your code from trying to read double the number of items actually present in the source memory block.
Incidentally, why are you messing around with allocating global memory and using unsafe code here? Why not process them as managed arrays? Processing the data itself isn't much slower, and you save on all the overheads of copying data to and from unmanaged memory.
Try this:
public static void FloatMonoToIntStereo(float[] samples, float[] leftChannel, float[] rightChannel)
{
for (int i = 0, j = 0; i < samples.Length; i += 2, j++)
{
leftChannel[j] = (int)(samples[i] * Int32.MaxValue);
rightChannel[j] = (int)(samples[i + 1] * Int32.MaxValue);
}
}
On my machine that processes around 12 million samples per second, converting the samples to integer and splitting the channels. About half that speed if I allocate the buffers for every set of results. About half again when I write that to use unsafe code, AllocHGlobal etc.
Never assume that unsafe code is faster.
I have a two-dimensions array of a fixed size of 50 elements. I need to ask the user for some values and insert them into the array. The problem is "How do I make sure I'm not overwriting anything that's in there?"
There will already be some content in the array when I start the program, but I don't know how much. How can I find the next available ID in the array, to insert my content there without overwriting anything that could be already in there?
I tried using array.GetUpperBound and array.GetLength, however they return fixed values no matter how many elements are already in the array.
I have to use an array, I can't use lists or anything like that.
What can I do to find out the next "free" position in my array?
Thank you very much for helping.
Well if you are using Array, all your values will contain a default value.For example if you have an two-dimensional int array like this:
var arr = new int[2, 3];
arr[1,2] will be equal to 0 which is default value for int.Anyway you can define an extension method to find available position for a two-dimensional array like this:
public static class MyExtensions
{
public static void FindAvailablePosition<T>(this T[,] source, out int x, out int y)
{
for (int i = 0; i < source.GetLength(0); i++)
{
for (int j = 0; j < source.GetLength(1); j++)
{
if (source[i, j].Equals(default(T)))
{
x = i;
y = j;
return;
}
}
}
x = -1;
y = -1;
}
}
And you can use it like this:
var arr = new int[2, 3];
arr[0, 0] = 12; // for example
int x, y;
arr.FindAvailablePosition(out x,out y);
// now x = 0, y = 1
I need some help with optimisation of my CCL algorithm implementation. I use it to detect black areas on the image. On a 2000x2000 it takes 11 seconds, which is pretty much. I need to reduce the running time to the lowest value possible to achieve. Also, I would be glad to know if there is any other algorithm out there which allows you to do the same thing, but faster than this one. So here is my code:
//The method returns a dictionary, where the key is the label
//and the list contains all the pixels with that label
public Dictionary<short, LinkedList<Point>> ProcessCCL()
{
Color backgroundColor = this.image.Palette.Entries[1];
//Matrix to store pixels' labels
short[,] labels = new short[this.image.Width, this.image.Height];
//I particulary don't like how I store the label equality table
//But I don't know how else can I store it
//I use LinkedList to add and remove items faster
Dictionary<short, LinkedList<short>> equalityTable = new Dictionary<short, LinkedList<short>>();
//Current label
short currentKey = 1;
for (int x = 1; x < this.bitmap.Width; x++)
{
for (int y = 1; y < this.bitmap.Height; y++)
{
if (!GetPixelColor(x, y).Equals(backgroundColor))
{
//Minumum label of the neighbours' labels
short label = Math.Min(labels[x - 1, y], labels[x, y - 1]);
//If there are no neighbours
if (label == 0)
{
//Create a new unique label
labels[x, y] = currentKey;
equalityTable.Add(currentKey, new LinkedList<short>());
equalityTable[currentKey].AddFirst(currentKey);
currentKey++;
}
else
{
labels[x, y] = label;
short west = labels[x - 1, y], north = labels[x, y - 1];
//A little trick:
//Because of those "ifs" the lowest label value
//will always be the first in the list
//but I'm afraid that because of them
//the running time also increases
if (!equalityTable[label].Contains(west))
if (west < equalityTable[label].First.Value)
equalityTable[label].AddFirst(west);
if (!equalityTable[label].Contains(north))
if (north < equalityTable[label].First.Value)
equalityTable[label].AddFirst(north);
}
}
}
}
//This dictionary will be returned as the result
//I'm not proud of using dictionary here too, I guess there
//is a better way to store the result
Dictionary<short, LinkedList<Point>> result = new Dictionary<short, LinkedList<Point>>();
//I define the variable outside the loops in order
//to reuse the memory address
short cellValue;
for (int x = 0; x < this.bitmap.Width; x++)
{
for (int y = 0; y < this.bitmap.Height; y++)
{
cellValue = labels[x, y];
//If the pixel is not a background
if (cellValue != 0)
{
//Take the minimum value from the label equality table
short value = equalityTable[cellValue].First.Value;
//I'd like to get rid of these lines
if (!result.ContainsKey(value))
result.Add(value, new LinkedList<Point>());
result[value].AddLast(new Point(x, y));
}
}
}
return result;
}
Thanks in advance!
You could split your picture in multiple sub-pictures and process them in parallel and then merge the results.
1 pass: 4 tasks, each processing a 1000x1000 sub-picture
2 pass: 2 tasks, each processing 2 of the sub-pictures from pass 1
3 pass: 1 task, processing the result of pass 2
For C# I recommend the Task Parallel Library (TPL), which allows to easily define tasks depending and waiting for each other. Following code project articel gives you a basic introduction into the TPL: The Basics of Task Parallelism via C#.
I would process one scan line at a time, keeping track of the beginning and end of each run of black pixels.
Then I would, on each scan line, compare it to the runs on the previous line. If there is a run on the current line that does not overlap a run on the previous line, it represents a new blob. If there is a run on the previous line that overlaps a run on the current line, it gets the same blob label as the previous. etc. etc. You get the idea.
I would try not to use dictionaries and such.
In my experience, randomly halting the program shows that those things may make programming incrementally easier, but they can exact a serious performance cost due to new-ing.
The problem is about GetPixelColor(x, y), it take very long time to access image data.
Set/GetPixel function are terribly slow in C#, so if you need to use them a lot, you should use Bitmap.lockBits instead.
private void ProcessUsingLockbits(Bitmap ProcessedBitmap)
{
BitmapData bitmapData = ProcessedBitmap.LockBits(new Rectangle(0, 0, ProcessedBitmap.Width, ProcessedBitmap.Height), ImageLockMode.ReadWrite, ProcessedBitmap.PixelFormat);
int BytesPerPixel = System.Drawing.Bitmap.GetPixelFormatSize(ProcessedBitmap.PixelFormat) / 8;
int ByteCount = bitmapData.Stride * ProcessedBitmap.Height;
byte[] Pixels = new byte[ByteCount];
IntPtr PtrFirstPixel = bitmapData.Scan0;
Marshal.Copy(PtrFirstPixel, Pixels, 0, Pixels.Length);
int HeightInPixels = bitmapData.Height;
int WidthInBytes = bitmapData.Width * BytesPerPixel;
for (int y = 0; y < HeightInPixels; y++)
{
int CurrentLine = y * bitmapData.Stride;
for (int x = 0; x < WidthInBytes; x = x + BytesPerPixel)
{
int OldBlue = Pixels[CurrentLine + x];
int OldGreen = Pixels[CurrentLine + x + 1];
int OldRed = Pixels[CurrentLine + x + 2];
// Transform blue and clip to 255:
Pixels[CurrentLine + x] = (byte)((OldBlue + BlueMagnitudeToAdd > 255) ? 255 : OldBlue + BlueMagnitudeToAdd);
// Transform green and clip to 255:
Pixels[CurrentLine + x + 1] = (byte)((OldGreen + GreenMagnitudeToAdd > 255) ? 255 : OldGreen + GreenMagnitudeToAdd);
// Transform red and clip to 255:
Pixels[CurrentLine + x + 2] = (byte)((OldRed + RedMagnitudeToAdd > 255) ? 255 : OldRed + RedMagnitudeToAdd);
}
}
// Copy modified bytes back:
Marshal.Copy(Pixels, 0, PtrFirstPixel, Pixels.Length);
ProcessedBitmap.UnlockBits(bitmapData);
}
Here is the basic code to access pixel data.
And I made a function to transform this into a 2D matrix, it's easier to manipulate (but little slower)
private void bitmap_to_matrix()
{
unsafe
{
bitmapData = ProcessedBitmap.LockBits(new Rectangle(0, 0, ProcessedBitmap.Width, ProcessedBitmap.Height), ImageLockMode.ReadWrite, ProcessedBitmap.PixelFormat);
int BytesPerPixel = System.Drawing.Bitmap.GetPixelFormatSize(ProcessedBitmap.PixelFormat) / 8;
int HeightInPixels = ProcessedBitmap.Height;
int WidthInPixels = ProcessedBitmap.Width;
int WidthInBytes = ProcessedBitmap.Width * BytesPerPixel;
byte* PtrFirstPixel = (byte*)bitmapData.Scan0;
Parallel.For(0, HeightInPixels, y =>
{
byte* CurrentLine = PtrFirstPixel + (y * bitmapData.Stride);
for (int x = 0; x < WidthInBytes; x = x + BytesPerPixel)
{
// Conversion in grey level
double rst = CurrentLine[x] * 0.0721 + CurrentLine[x + 1] * 0.7154 + CurrentLine[x + 2] * 0.2125;
// Fill the grey matix
TG[x / 3, y] = (int)rst;
}
});
}
}
And the website where the code comes
"High performance SystemDrawingBitmap"
Thanks to the author for his really good job !
Hope this will help !