Incorrectly including cloo in C#? - c#

I'm trying to get this demo to build but I get this error
I've tried this with mono and visual studio 2010, same problem
The error occurs on line
program.Build(null, null, null, IntPtr.Zero);
EDIT
C#
using System;
using Cloo;
using System.Collections.Concurrent;
using System.Threading.Tasks;
using System.IO;
namespace ClooTest
{
class MainClass
{
public static void Main (string[] args)
{
// pick first platform
ComputePlatform platform = ComputePlatform.Platforms[0];
// create context with all gpu devices
ComputeContext context = new ComputeContext(ComputeDeviceTypes.Gpu,
new ComputeContextPropertyList(platform), null, IntPtr.Zero);
// create a command queue with first gpu found
ComputeCommandQueue queue = new ComputeCommandQueue
(
context,
context.Devices[0],
ComputeCommandQueueFlags.None
);
// load opencl source
StreamReader streamReader = new StreamReader("kernels.cl");
string clSource = streamReader.ReadToEnd();
streamReader.Close();
// create program with opencl source
ComputeProgram program = new ComputeProgram(context, clSource);
// compile opencl source
program.Build(null, null, null, IntPtr.Zero);
// load chosen kernel from program
ComputeKernel kernel = program.CreateKernel("helloWorld");
// create a ten integer array and its length
int[] message = new int[] { 1, 2, 3, 4, 5 };
int messageSize = message.Length;
// allocate a memory buffer with the message (the int array)
ComputeBuffer<int> messageBuffer = new ComputeBuffer<int>(context,
ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, message);
kernel.SetMemoryArgument(0, messageBuffer); // set the integer array
kernel.SetValueArgument(1, messageSize); // set the array size
// execute kernel
queue.ExecuteTask(kernel, null);
// wait for completion
queue.Finish();
}
}
}
OpenCL
kernel void helloWorld(global read_only int* message, int messageSize) {
for (int i = 0; i < messageSize; i++) {
printf("%d", message[i]);
}
}
EDIT

Yeah print probably isn't very well supported. I would suggest performing your "Hello world" with some simple number crunching instead. Maybe something like:
kernel void IncrementNumber(global float4 *celldata_in, global float4 *celldata_out) {
int index = get_global_id(0);
float4 a = celldata_in[index];
a.w = a.w + 1;
celldata_out[index] = a;
}

Related

How to parse binary data transmitted over GPIB with IVI library

I am working with a Fluke 8588 and communicating with it using Ivi.Visa.Interop I am trying to use the digitizer function to get a large number of samples of a 5V 1k Hz sinewave. To improve the transfer time of the data the manual mentions a setting for using a binary packed data format. It provides 2 and 4 byte packing.
This is the smallest example I could put together:
using System;
using System.Threading;
using Ivi.Visa.Interop;
namespace Example
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Initiallizing Equipment");
int timeOut = 3000;
string resourceName = "GPIB0::1::INSTR";
ResourceManager rm = new ResourceManager();
FormattedIO488 fluke8588 = new FormattedIO488
{
IO = (IMessage)rm.Open(resourceName, AccessMode.NO_LOCK, timeOut)
};
Console.WriteLine("Starting Setup");
fluke8588.WriteString("FORMAT:DATA PACKED,4");
fluke8588.WriteString("TRIGGER:COUNT 100000");
Console.WriteLine("Initiate Readings");
fluke8588.WriteString("INITIATE:IMMEDIATE");
Thread.Sleep(3000);
Console.WriteLine("Readings Complete");
Console.WriteLine("Fetching Reading");
fluke8588.WriteString("FETCH?");
string response = fluke8588.ReadString();
Byte[] bytes = System.Text.Encoding.ASCII.GetBytes(response);
fluke8588.WriteString("FORMAT:DATA:SCALE?");
double scale = Convert.ToDouble(fluke8588.ReadString());
int parityMask = 0x8;
for (int i = 0; i < 100000; i += 4)
{
int raw = (int)((bytes[i] << 24) | (bytes[i + 1] << 16) | (bytes[i + 2] << 8) | (bytes[i + 3]));
int parity = (parityMask & bytes[i]) == parityMask ? -1 : 1;
int number = raw;
if (parity == -1)
{
number = ~raw * parity;
}
Console.WriteLine(number * scale);
}
Console.Read();
}
}
}
The resulting data looks like this:
I preformed the steps "manually" using a tool called NI Max. I get a header followed by the 10 4 byte integers and ending with a new line char. the negative integers are 2s complement, which was not specified in the manual but I was able to determine after I had enough samples.
TRIGGER:COUNT was only set to 10 at the time this image was taken.
How can I get this result in c#?
I found that I was using the wrong Encoding, changing from System.Text.Encoding.ASCII.GetBytes(response) to
System.Text.Encoding encoding = System.Text.Encoding.GetEncoding(1252);
Byte[] bytes = encoding.GetBytes(response);
got the desired result.
That said, I also learned there is an alternative option to FormattedIO488.ReadString for binary data, using FormattedIO488.ReadIEEEBlock(IEEEBinaryType.BinaryType_I4) this will return an array of integers and requires no extra effort with twiddling bits, this is the solution I would suggest.
using System;
using System.Linq;
using Ivi.Visa.Interop;
using System.Threading;
using System.Collections.Generic;
namespace Example
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Initiallizing Equipment");
int timeOut = 3000;
string resourceName = "GPIB0::1::INSTR";
ResourceManager rm = new ResourceManager();
FormattedIO488 fluke8588 = new FormattedIO488
{
IO = (IMessage)rm.Open(resourceName, AccessMode.NO_LOCK, timeOut)
};
Console.WriteLine("Starting Setup");
fluke8588.WriteString("FORMAT:DATA PACKED,4");
fluke8588.WriteString("TRIGGER:COUNT 100000");
Console.WriteLine("Initiate Readings");
fluke8588.WriteString("INITIATE:IMMEDIATE");
Thread.Sleep(3000);
Console.WriteLine("Readings Complete");
Console.WriteLine("Fetching Reading");
fluke8588.WriteString("FETCH?");
List<int> response = new List<int>(fluke8588.ReadIEEEBlock(IEEEBinaryType.BinaryType_I4));
fluke8588.WriteString("FORMAT:DATA:SCALE?");
double scale = Convert.ToDouble(fluke8588.ReadString());
foreach (var value in response.Select(i => i * scale).ToList())
{
Console.WriteLine(value);
}
Console.Read();
}
}
}
Result data looks like:

Read binary objects from a file in C# written out by a C++ program

I am trying to read objects from very large files containing padded structs that were written into it by a C++ process. I was using an example to memory map the large file and try to deserialize the data into an object but I now can see that it won't work this way.
How can I extract all the objects from the files to use in C#? I'm probably way off but I've provided the code. The objects have a 8 byte milliseconds member followed by 21 16bit integers, which needs 6bytes of padding to align to a 8byte boundary.
[Serializable]
unsafe public struct DataStruct
{
public UInt64 milliseconds;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 21)]
public fixed Int16 data[21];
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
public fixed Int16 padding[3];
};
[Serializable]
public class DataArray
{
public DataStruct[] samples;
}
public static class Helper
{
public static Int16[] GetData(this DataStruct data)
{
unsafe
{
Int16[] output = new Int16[21];
for (int index = 0; index < 21; ++index)
output[index] = data.data[index];
return output;
}
}
}
class FileThreadSupport
{
struct DataFileInfo
{
public string path;
public UInt64 start;
public UInt64 stop;
public UInt64 elements;
};
// Create our epoch timestamp
private static readonly DateTime epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
// Output TCP client
private Support.AsyncTcpClient output;
// Directory which contains our data
private string replay_directory;
// Files to be read from
private DataFileInfo[] file_infos;
// Current timestamp of when the process was started
UInt64 process_start = 0;
// Object from current file
DataArray current_file_data;
// Offset into current files
UInt64 current_file_index = 0;
// Offset into current files
UInt64 current_file_offset = 0;
// Run flag
bool run = true;
public FileThreadSupport(ref Support.AsyncTcpClient output, ref Engine.A.Information info, ref Support.Configuration configuration)
{
// Set our output directory
replay_directory = configuration.getString("replay_directory");
if (replay_directory.Length == 0)
{
Console.WriteLine("Configuration does not provide a replay directory");
return;
}
// Check the directory for playable files
if(!loadDataDirectory(replay_directory))
{
Console.WriteLine("Replay directory {} did not have any valid files", replay_directory);
}
// Set the output TCP client
this.output = output;
}
private bool loadDataDirectory(string directory)
{
string[] files = Directory.GetFiles(directory, "*.*", SearchOption.TopDirectoryOnly);
file_infos = new DataFileInfo[files.Length];
int index = 0;
foreach (string file in files)
{
string[] parts = file.Split('\\');
string name = parts.Last();
parts = name.Split('.');
if (parts.Length != 2)
continue;
UInt64 start, stop = 0;
if (!UInt64.TryParse(parts[0], out start) || !UInt64.TryParse(parts[1], out stop))
continue;
long size = new System.IO.FileInfo(file).Length;
// Add to our file info array
file_infos[index] = new DataFileInfo
{
path = file,
start = start,
stop = stop,
elements = (ulong)(new System.IO.FileInfo(file).Length / 56
/*System.Runtime.InteropServices.Marshal.SizeOf(typeof(DataStruct))*/)
};
++index;
}
// Sort the array
Array.Sort(file_infos, delegate (DataFileInfo x, DataFileInfo y) { return x.start.CompareTo(y.start); });
// Return whether or not there were files found
return (files.Length > 0);
}
public void start()
{
process_start = (ulong)DateTime.Now.ToUniversalTime().Subtract(epoch).TotalMilliseconds;
UInt64 num_samples = 0;
while(run)
{
// Get our samples and add it to the sample
DataStruct[] result = getData(100);
Engine.A.A message = new Engine.A.A();
for (int i = 0; i < result.Length; ++i)
{
Engine.A.Data sample = new Engine.A.Data();
sample.Time = process_start + num_samples * 4;
Int16[] signal_data = Helper.GetData(result[i]);
for(int e = 0; e < signal_data.Length; ++e)
sample.Value[e] = signal_data[e];
message.Signal.Add(sample);
++num_samples;
}
// Send out the websocket
this.output.SendAsync(message.ToByteArray());
// Sleep 100 milliseconds
Thread.Sleep(100);
}
}
public void stop()
{
run = false;
}
private DataStruct[] getData(UInt64 milliseconds)
{
if (file_infos.Length == 0)
return new DataStruct[0];
if (current_file_data == null)
{
current_file_data = ReadObjectFromMMF(file_infos[current_file_index].path) as DataArray;
if(current_file_data.samples.Length == 0)
return new DataStruct[0];
}
UInt64 elements_to_read = (UInt64) milliseconds / 4;
DataStruct[] result = new DataStruct[elements_to_read];
Array.Copy(current_file_data.samples, (int)current_file_offset, result, 0, (int) Math.Min(elements_to_read, file_infos[current_file_index].elements - current_file_offset));
while((UInt64) result.Length != elements_to_read)
{
current_file_index = (current_file_index + 1) % (ulong) file_infos.Length;
current_file_data = ReadObjectFromMMF(file_infos[current_file_index].path) as DataArray;
if (current_file_data.samples.Length == 0)
return new DataStruct[0];
current_file_offset = 0;
Array.Copy(current_file_data.samples, (int)current_file_offset, result, result.Length, (int)Math.Min(elements_to_read, file_infos[current_file_index].elements - current_file_offset));
}
return result;
}
private object ByteArrayToObject(byte[] buffer)
{
BinaryFormatter binaryFormatter = new BinaryFormatter(); // Create new BinaryFormatter
MemoryStream memoryStream = new MemoryStream(buffer); // Convert buffer to memorystream
return binaryFormatter.Deserialize(memoryStream); // Deserialize stream to an object
}
private object ReadObjectFromMMF(string file)
{
// Get a handle to an existing memory mapped file
using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(file, FileMode.Open))
{
// Create a view accessor from which to read the data
using (MemoryMappedViewAccessor mmfReader = mmf.CreateViewAccessor())
{
// Create a data buffer and read entire MMF view into buffer
byte[] buffer = new byte[mmfReader.Capacity];
mmfReader.ReadArray<byte>(0, buffer, 0, buffer.Length);
// Convert the buffer to a .NET object
return ByteArrayToObject(buffer);
}
}
}
Well for one thing you're not using that memory mapped file well at all, you're just sequentially reading it all in a buffer, which is both needlessly inefficient and much slower than if you simply opened the file to read normally. The selling point of memory mapped files is repeated random access and random updates backed by the OS's virtual memory paging.
And you definitely don't need to read the entire file in memory, since your data is so strongly structured. You know exactly how many bytes to read for a record: Marshal.SizeOf<DataStruct>().
Then you need to get rid of all that serialization noise. Again your data is strongly typed, just read it. Get rid of those fixed arrays and use regular arrays, you're already instructing the marshaller how to read them with MarshalAs attributes (good). That also gets rid of that helper function that just copies an array for some unknown reason.
Your reading loop is very simple: read the correct number of bytes for one entry, use Marshal.PtrToStructure to convert it to a readable structure and add it to a list to return at the end. Bonus points if you can use .Net Core and Unsafe.As or Unsafe.Cast.
Edit: and don't use object returns, you know exactly what you're returning, write it down.

How do I implement matrix multiplication using opencl in C#

Can someone please guide me on how can I perform matrix multiplication in C# to use the GPU using opencl.
I have looked at opencl example here:
https://www.codeproject.com/Articles/1116907/How-to-Use-Your-GPU-in-NET
But I am not sure how to proceed for matrix multiplication.
yes as say doqtor, you need to flatten into 1D. So i have an example to use more args :
class Program
{
static string CalculateKernel
{
get
{
return #"
kernel void Calc(global int* m1, global int* m2, int size)
{
for(int i = 0; i < size; i++)
{
printf("" %d / %d\n"",m1[i],m2[i] );
}
}";
}
}
static void Main(string[] args)
{
int[] r1 = new int[]
{1, 2, 3, 4};
int[] r2 = new int[]
{4, 3, 2, 1};
int rowSize = r1.Length;
// pick first platform
ComputePlatform platform = ComputePlatform.Platforms[0];
// create context with all gpu devices
ComputeContext context = new ComputeContext(ComputeDeviceTypes.Gpu,
new ComputeContextPropertyList(platform), null, IntPtr.Zero);
// create a command queue with first gpu found
ComputeCommandQueue queue = new ComputeCommandQueue(context,
context.Devices[0], ComputeCommandQueueFlags.None);
// load opencl source and
// create program with opencl source
ComputeProgram program = new ComputeProgram(context, CalculateKernel);
// compile opencl source
program.Build(null, null, null, IntPtr.Zero);
// load chosen kernel from program
ComputeKernel kernel = program.CreateKernel("Calc");
// allocate a memory buffer with the message (the int array)
ComputeBuffer<int> row1Buffer = new ComputeBuffer<int>(context,
ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, r1);
// allocate a memory buffer with the message (the int array)
ComputeBuffer<int> row2Buffer = new ComputeBuffer<int>(context,
ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, r2);
kernel.SetMemoryArgument(0, row1Buffer); // set the integer array
kernel.SetMemoryArgument(1, row2Buffer); // set the integer array
kernel.SetValueArgument(2, rowSize); // set the array size
// execute kernel
queue.ExecuteTask(kernel, null);
// wait for completion
queue.Finish();
Console.WriteLine("Finished");
Console.ReadKey();
}
another sample with the reading of result from gpubuffer:
class Program
{
static string CalculateKernel
{
get
{
// you could put your matrix algorithm here an take the result in array m3
return #"
kernel void Calc(global int* m1, global int* m2, int size, global int* m3)
{
for(int i = 0; i < size; i++)
{
int val = m2[i];
printf("" %d / %d\n"",m1[i],m2[i] );
m3[i] = val * 4;
}
}";
}
}
static void Main(string[] args)
{
int[] r1 = new int[]
{8, 2, 3, 4};
int[] r2 = new int[]
{4, 3, 2, 5};
int[] r3 = new int[4];
int rowSize = r1.Length;
// pick first platform
ComputePlatform platform = ComputePlatform.Platforms[0];
// create context with all gpu devices
ComputeContext context = new ComputeContext(ComputeDeviceTypes.Gpu,
new ComputeContextPropertyList(platform), null, IntPtr.Zero);
// create a command queue with first gpu found
ComputeCommandQueue queue = new ComputeCommandQueue(context,
context.Devices[0], ComputeCommandQueueFlags.None);
// load opencl source and
// create program with opencl source
ComputeProgram program = new ComputeProgram(context, CalculateKernel);
// compile opencl source
program.Build(null, null, null, IntPtr.Zero);
// load chosen kernel from program
ComputeKernel kernel = program.CreateKernel("Calc");
// allocate a memory buffer with the message (the int array)
ComputeBuffer<int> row1Buffer = new ComputeBuffer<int>(context,
ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, r1);
// allocate a memory buffer with the message (the int array)
ComputeBuffer<int> row2Buffer = new ComputeBuffer<int>(context,
ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, r2);
// allocate a memory buffer with the message (the int array)
ComputeBuffer<int> resultBuffer = new ComputeBuffer<int>(context,
ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, new int[4]);
kernel.SetMemoryArgument(0, row1Buffer); // set the integer array
kernel.SetMemoryArgument(1, row2Buffer); // set the integer array
kernel.SetValueArgument(2, rowSize); // set the array size
kernel.SetMemoryArgument(3, resultBuffer); // set the integer array
// execute kernel
queue.ExecuteTask(kernel, null);
// wait for completion
queue.Finish();
GCHandle arrCHandle = GCHandle.Alloc(r3, GCHandleType.Pinned);
queue.Read<int>(resultBuffer, true, 0, r3.Length, arrCHandle.AddrOfPinnedObject(), null);
Console.WriteLine("display result from gpu buffer:");
for (int i = 0; i<r3.Length;i++)
Console.WriteLine(r3[i]);
arrCHandle.Free();
row1Buffer.Dispose();
row2Buffer.Dispose();
kernel.Dispose();
program.Dispose();
queue.Dispose();
context.Dispose();
Console.WriteLine("Finished");
Console.ReadKey();
}
}
you just adapt the kernel program to calculate the multiplication of 2 matrix
result of last program:
8 / 4
2 / 3
3 / 2
4 / 5
display result from gpu buffer:
16
12
8
20
Finished
to flatten 2d to 1d its really easy take this sample:
int[,] twoD = { { 1, 2,3 }, { 3, 4,5 } };
int[] oneD = twoD.Cast<int>().ToArray();
and see this link to do 1D -> 2D
I found a very good reference source for using OpenCL with dot Net.
This site is well structured and very useful. It also has matrix multiplication case study example.
OpenCL Tutorial

Copy array to struct array as fast as possible in C#

I am working with Unity 4.5, grabbing images as bytes arrays (each byte represent a channel, taking 4 bytes per pixel (rgba) and displaying them on a texture converting the array to a Color32 array, using this loop:
img = new Color32[byteArray.Length / nChannels]; //nChannels being 4
for (int i=0; i< img.Length; i++) {
img[i].r = byteArray[i*nChannels];
img[i].g = byteArray[i*nChannels+1];
img[i].b = byteArray[i*nChannels+2];
img[i].a = byteArray[i*nChannels+3];
}
Then, it is applied to the texture using:
tex.SetPixels32(img);
However, this slows down the application significantly (this loop is executed on every single frame), and I would like to know if there is any other way to speed up the copying process. I've found some people (Fast copy of Color32[] array to byte[] array) using the Marshal.Copy functions in order to do the reverse process (Color32 to byte array), but I have not been able to make it work to copy a byte array to a Color32 array. Does anybody know a faster way?
Thank you in advance!
Yes, Marshal.Copy is the way to go. I've answered a similar question here.
Here's a generic method to copy from struct[] to byte[] and vice versa
private static byte[] ToByteArray<T>(T[] source) where T : struct
{
GCHandle handle = GCHandle.Alloc(source, GCHandleType.Pinned);
try
{
IntPtr pointer = handle.AddrOfPinnedObject();
byte[] destination = new byte[source.Length * Marshal.SizeOf(typeof(T))];
Marshal.Copy(pointer, destination, 0, destination.Length);
return destination;
}
finally
{
if (handle.IsAllocated)
handle.Free();
}
}
private static T[] FromByteArray<T>(byte[] source) where T : struct
{
T[] destination = new T[source.Length / Marshal.SizeOf(typeof(T))];
GCHandle handle = GCHandle.Alloc(destination, GCHandleType.Pinned);
try
{
IntPtr pointer = handle.AddrOfPinnedObject();
Marshal.Copy(source, 0, pointer, source.Length);
return destination;
}
finally
{
if (handle.IsAllocated)
handle.Free();
}
}
Use it as:
[StructLayout(LayoutKind.Sequential)]
public struct Demo
{
public double X;
public double Y;
}
private static void Main()
{
Demo[] array = new Demo[2];
array[0] = new Demo { X = 5.6, Y = 6.6 };
array[1] = new Demo { X = 7.6, Y = 8.6 };
byte[] bytes = ToByteArray(array);
Demo[] array2 = FromByteArray<Demo>(bytes);
}
This code requires unsafe switch but should be fast. I think you should benchmark these answers...
var bytes = new byte[] { 1, 2, 3, 4 };
var colors = MemCopyUtils.ByteArrayToColor32Array(bytes);
public class MemCopyUtils
{
unsafe delegate void MemCpyDelegate(byte* dst, byte* src, int len);
static MemCpyDelegate MemCpy;
static MemCopyUtils()
{
InitMemCpy();
}
static void InitMemCpy()
{
var mi = typeof(Buffer).GetMethod(
name: "Memcpy",
bindingAttr: BindingFlags.NonPublic | BindingFlags.Static,
binder: null,
types: new Type[] { typeof(byte*), typeof(byte*), typeof(int) },
modifiers: null);
MemCpy = (MemCpyDelegate)Delegate.CreateDelegate(typeof(MemCpyDelegate), mi);
}
public unsafe static Color32[] ByteArrayToColor32Array(byte[] bytes)
{
Color32[] colors = new Color32[bytes.Length / sizeof(Color32)];
fixed (void* tempC = &colors[0])
fixed (byte* pBytes = bytes)
{
byte* pColors = (byte*)tempC;
MemCpy(pColors, pBytes, bytes.Length);
}
return colors;
}
}
Using Parallel.For may give you a significant performance increase.
img = new Color32[byteArray.Length / nChannels]; //nChannels being 4
Parallel.For(0, img.Length, i =>
{
img[i].r = byteArray[i*nChannels];
img[i].g = byteArray[i*nChannels+1];
img[i].b = byteArray[i*nChannels+2];
img[i].a = byteArray[i*nChannels+3];
});
Example on MSDN
I haven't profiled it, but using fixed to ensure your memory doesn't get moved around and to remove bounds checks on array accesses might provide some benefit:
img = new Color32[byteArray.Length / nChannels]; //nChannels being 4
fixed (byte* ba = byteArray)
{
fixed (Color32* c = img)
{
byte* byteArrayPtr = ba;
Color32* colorPtr = c;
for (int i = 0; i < img.Length; i++)
{
(*colorPtr).r = *byteArrayPtr++;
(*colorPtr).g = *byteArrayPtr++;
(*colorPtr).b = *byteArrayPtr++;
(*colorPtr).a = *byteArrayPtr++;
colorPtr++;
}
}
}
It might not provide much more benefit on 64-bit systems - I believe that the bounds checking is is more highly optimized. Also, this is an unsafe operation, so take care.
public Color32[] GetColorArray(byte[] myByte)
{
if (myByte.Length % 1 != 0)
throw new Exception("Must have an even length");
var colors = new Color32[myByte.Length / nChannels];
for (var i = 0; i < myByte.Length; i += nChannels)
{
colors[i / nChannels] = new Color32(
(byte)(myByte[i] & 0xF8),
(byte)(((myByte[i] & 7) << 5) | ((myByte[i + 1] & 0xE0) >> 3)),
(byte)((myByte[i + 1] & 0x1F) << 3),
(byte)1);
}
return colors;
}
Worked about 30-50 times faster than just i++. The "extras" is just styling. This code is doing, in one "line", in the for loop, what you're declaring in 4 lines plus it is much quicker. Cheers :)
Referenced + Referenced code: Here

Cloo OpenCL c# Problem

I am trying to get a simple Cloo program to run but it is not working, can anyone tell me why?
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using Cloo;
using System.Runtime.InteropServices;
namespace HelloWorld
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
ComputeContextPropertyList cpl = new ComputeContextPropertyList(ComputePlatform.Platforms[0]);
ComputeContext context = new ComputeContext(ComputeDeviceTypes.Gpu, cpl, null, IntPtr.Zero);
string kernelSource = #"
kernel void VectorAdd(
global read_only float* a,
global read_only float* b,
global write_only float* c )
{
int index = get_global_id(0);
c[index] = a[index] + b[index];
}
";
long count = 20;
float[] arrA = new float[count];
float[] arrB = new float[count];
float[] arrC = new float[count];
Random rand = new Random();
for (int i = 0; i < count; i++)
{
arrA[i] = (float)(rand.NextDouble() * 100);
arrB[i] = (float)(rand.NextDouble() * 100);
}
ComputeBuffer<float> a = new ComputeBuffer<float>(context, ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, arrA);
ComputeBuffer<float> b = new ComputeBuffer<float>(context, ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, arrB);
ComputeBuffer<float> c = new ComputeBuffer<float>(context, ComputeMemoryFlags.WriteOnly, arrC.Length);
ComputeProgram program = new ComputeProgram(context, new string[] { kernelSource });
program.Build(null, null, null, IntPtr.Zero);
ComputeKernel kernel = program.CreateKernel("VectorAdd");
kernel.SetMemoryArgument(0, a);
kernel.SetMemoryArgument(1, b);
kernel.SetMemoryArgument(2, c);
ComputeCommandQueue commands = new ComputeCommandQueue(context, context.Devices[0], ComputeCommandQueueFlags.None);
ComputeEventList events = new ComputeEventList();
commands.Execute(kernel, null, new long[] { count }, null, events);
arrC = new float[count];
GCHandle arrCHandle = GCHandle.Alloc(arrC, GCHandleType.Pinned);
commands.Read(c, false, 0, count, arrCHandle.AddrOfPinnedObject(), events);
commands.Finish();
arrCHandle.Free();
for (int i = 0; i < count; i++)
richTextBox1.Text += "{" + arrA[i] + "} + {" + arrB[i] + "} = {" + arrC[i] + "} \n";
}
}
}
This is the error the program gives me
an unhandled exception of type
'Cloo.InvalidBinaryComputeException'
occurred in Cloo.dll
Additional information: OpenCL error
code detected: InvalidBinary.
This looks identical to the VectorAddTest that is distributed with Cloo.
Are you using ATI Stream? If yes, this may be a problem related to this: http://sourceforge.net/tracker/?func=detail&aid=2946105&group_id=290412&atid=1229014
Clootils works around this problem by allocating a console. Check Clootils/Program.cs for implementation details.
Change the kernel name - I think the name VectorAdd conflicts with something.

Categories