Speed up nested loops and bitwise operations with Alea GPU - c#

I'm trying to use Alea to speed up a program I'm working on but I need some help.
What I need to do is a lot of bitcount and bitwise operations with values stored in two arrays.
For each element of my first array I have to do a bitwise & operation with each element of my second array, then count the bits set to 1 of the & result.
If the result is greater than/equal to a certain value I need to exit the inner for and go to the next element of my first array.
The first array is usually a big one, with millions of elements, the second one is usually less than 200.000 elements.
Trying to do all these operations in parallel, here is my code:
[GpuManaged]
private long[] Check(long[] arr1, long[] arr2, int limit)
{
Gpu.FreeAllImplicitMemory(true);
var gpu = Gpu.Default;
long[] result = new long[arr1.Length];
gpu.For(0, arr1.Length, i =>
{
bool found = false;
long b = arr1[i];
for (int i2 = 0; i2 < arr2.Length; i2++)
{
if (LibDevice.__nv_popcll(b & arr2[i2]) >= limit)
{
found = true;
break;
}
}
if (!found)
{
result[i] = b;
}
});
return result;
}
This works as expected but is just a little faster than my version running in parallel on a quad core CPU.
I'm certainly missing something here, it's my very first attempt to write GPU code.
By the way, my NVIDIA is a GeForce GT 740M.
EDIT
The following code is 2x faster than the previous one, at least on my PC. Many thanks to Michael Randall for pointing me in the right direction.
private static int[] CheckWithKernel(Gpu gpu, int[] arr1, int[] arr2, int limit)
{
var lp = new LaunchParam(16, 256);
var result = new int[arr1.Length];
try
{
using (var dArr1 = gpu.AllocateDevice(arr1))
using (var dArr2 = gpu.AllocateDevice(arr2))
using (var dResult = gpu.AllocateDevice<int>(arr1.Length))
{
gpu.Launch(Kernel, lp, arr1.Length, arr2.Length, dArr1.Ptr, dArr2.Ptr, dResult.Ptr, limit);
Gpu.Copy(dResult, result);
return result;
}
}
finally
{
Gpu.Free(arr1);
Gpu.Free(arr2);
Gpu.Free(result);
}
}
private static void Kernel(int a1, int a2, deviceptr<int> arr1, deviceptr<int> arr2, deviceptr<int> arr3, int limit)
{
var iinit = blockIdx.x * blockDim.x + threadIdx.x;
var istep = gridDim.x * blockDim.x;
for (var i = iinit; i < a1; i += istep)
{
bool found = false;
int b = arr1[i];
for (var j = 0; j < a2; j++)
{
if (LibDevice.__nv_popcll(b & arr2[j]) >= limit)
{
found = true;
break;
}
}
if (!found)
{
arr3[i] = b;
}
}
}

Update
It seems pinning wont work with GCHandle.Alloc()
However the point of this answer is you will get a much greater performance gain out of direct memory access.
http://www.aleagpu.com/release/3_0_3/doc/advanced_features_csharp.html
Directly Working with Device Memory
Device memory provides even more flexibility as it also allows all
kind of pointer arithmetics. Device memory is allocated with
Memory<T> Gpu.AllocateDevice<T>(int length)
Memory<T> Gpu.AllocateDevice<T>(T[] array)
The first overload creates a device memory object for the specified
type T and length on the selected GPU. The second one allocates
storage on the GPU and copies the .NET array into it. Both return a
Memory<T> object, which implements IDisposable and can therefore
support the using syntax which ensures proper disposal once the
Memory<T> object goes out of scope. A Memory<T> object has properties
to determine the length, the GPU or the device on which it lives. The
Memory<T>.Ptr property returns a deviceptr<T>, which can be used in
GPU code to access the actual data or to perform pointer arithmetics.
The following example illustrates a simple use case of device
pointers. The kernel only operates on part of the data, defined by an
offset.
using (var dArg1 = gpu.AllocateDevice(arg1))
using (var dArg2 = gpu.AllocateDevice(arg2))
using (var dOutput = gpu.AllocateDevice<int>(Length/2))
{
// pointer arithmetics to access subset of data
gpu.Launch(Kernel, lp, dOutput.Length, dOutput.Ptr, dArg1.Ptr + Length/2, dArg2.Ptr + Length / 2);
var result = dOutput.ToArray();
var expected = arg1.Skip(Length/2).Zip(arg2.Skip(Length/2), (x, y) => x + y);
Assert.That(result, Is.EqualTo(expected));
}
Original Answer
Disregarding the logic going on, or how relevant this is to GPU code. However you could compliment your Parallel routine and possibly speed things up by by Pinning your Arrays in memory with GCHandle.Alloc() and the GCHandleType.Pinned flag and using Direct Pointer access (if you can run unsafe code)
Notes
You will cop a hit from pinning the memory, however for large arrays you can realize a lot of performance from direct access*
You will have to mark your assembly unsafe in Build Properties*
This is obviously untested and just an example*
You could used fixed, however the Parallel Lambda makes it fiddlier
Example
private unsafe long[] Check(long[] arr1, long[] arr2, int limit)
{
Gpu.FreeAllImplicitMemory(true);
var gpu = Gpu.Default;
var result = new long[arr1.Length];
// Create some pinned memory
var resultHandle = GCHandle.Alloc(result, GCHandleType.Pinned);
var arr2Handle = GCHandle.Alloc(result, GCHandleType.Pinned);
var arr1Handle = GCHandle.Alloc(result, GCHandleType.Pinned);
// Get the addresses
var resultPtr = (int*)resultHandle.AddrOfPinnedObject().ToPointer();
var arr2Ptr = (int*)arr2Handle.AddrOfPinnedObject().ToPointer();
var arr1Ptr = (int*)arr2Handle.AddrOfPinnedObject().ToPointer();
// I hate nasty lambda statements. I always find local methods easier to read.
void Workload(int i)
{
var found = false;
var b = *(arr1Ptr + i);
for (var j = 0; j < arr2.Length; j++)
{
if (LibDevice.__nv_popcll(b & *(arr2Ptr + j)) >= limit)
{
found = true;
break;
}
}
if (!found)
{
*(resultPtr + i) = b;
}
}
try
{
gpu.For(0, arr1.Length, i => Workload(i));
}
finally
{
// Make sure we free resources
arr1Handle.Free();
arr2Handle.Free();
resultHandle.Free();
}
return result;
}
Additional Resources
GCHandle.Alloc Method (Object)
A new GCHandle that protects the object from garbage collection. This
GCHandle must be released with Free when it is no longer needed.
GCHandleType Enumeration
Pinned : This handle type is similar to Normal, but allows the address of the pinned object to be taken. This prevents the garbage
collector from moving the object and hence undermines the efficiency
of the garbage collector. Use the Free method to free the allocated
handle as soon as possible.
Unsafe Code and Pointers (C# Programming Guide)
In the common language runtime (CLR), unsafe code is referred to as
unverifiable code. Unsafe code in C# is not necessarily dangerous; it
is just code whose safety cannot be verified by the CLR. The CLR will
therefore only execute unsafe code if it is in a fully trusted
assembly. If you use unsafe code, it is your responsibility to ensure
that your code does not introduce security risks or pointer errors.
A note, there has since been an update, this:
http://www.aleagpu.com/release/3_0_3/doc/advanced_features_csharp.html
is now this:
http://www.aleagpu.com/release/3_0_4/doc/advanced_features_csharp.html
some of the samples and info have changed or moved in release 3.0.4.

Related

Can Interlocked CompareExchange be used correctly in this multithreaded round-robin implementation?

I need to round-robin some calls between N different connections because of some rate limits in a multithreaded context. I've decided to implement this functionality using a list and a "counter," which is supposed to "jump by one" between instances on each call.
I'll illustrate this concept with a minimal example (using a class called A to stand in for the connections)
class A
{
public A()
{
var newIndex = Interlocked.Increment(ref index);
ID = newIndex.ToString();
}
private static int index;
public string ID;
}
static int crt = 0;
static List<A> Items = Enumerable.Range(1, 15).Select(i => new A()).ToList();
static int itemsCount = Items.Count;
static A GetInstance()
{
var newIndex = Interlocked.Increment(ref crt);
var instance = Items[newIndex % itemsCount];
//Console.WriteLine($"{DateTime.Now.Ticks}, {Guid.NewGuid()}, Got instance: {instance.ID}");
return instance;
}
static void Test()
{
var sw = Stopwatch.StartNew();
var tasks = Enumerable.Range(1, 1000000).Select(i => Task.Run(GetInstance)).ToArray();
Task.WaitAll(tasks);
}
This works as expected in that it ensures that calls are round-robin-ed between the connections. I will probably stick to this implementation in the "real" code (with a long instead of an int for the counter)
However, even if it is unlikely to reach int.MaxValue in my use case, I wondered if there is a way to "safely overflow" the counter.
I know that "%" in C# is "Remainder" rather than "Modulus," which would mean that some ?: gymnastics would be required to always return positives, which I want to avoid.
So what I wanted to cume up with is instead something like:
static A GetInstance()
{
var newIndex = Interlocked.Increment(ref crt);
Interlocked.CompareExchange(ref crt, 0, itemsCount); //?? the return value is the original value, how to know if it succeeded
var instance = Items[newIndex];
//Console.WriteLine($"{DateTime.Now.Ticks}, {Guid.NewGuid()}, Got instance: {instance.ID}");
return instance;
}
What I am expecting is that Interlocked.CompareExchange(ref crt, 0, itemsCount) would be "won" by only one thread, setting the counter back to 0 once it reaches the number of connections available. However, I don't know how to use this in this context.
Can CompareExchange or another mechanism in Interlocked be used here?
You could probably:
static int crt = -1;
static readonly IReadOnlyList<A> Items = Enumerable.Range(1, 15).Select(i => new A()).ToList();
static readonly int itemsCount = Items.Count;
static readonly int maxItemCount = itemsCount * 100;
static A GetInstance()
{
int newIndex;
while (true)
{
newIndex = Interlocked.Increment(ref crt);
if (newIndex >= itemsCount)
{
while (newIndex >= itemsCount && Interlocked.CompareExchange(ref crt, -1, newIndex) != newIndex)
{
// There is an implicit memory barrier caused by the Interlockd.CompareExchange around the
// next line
// See for example https://afana.me/archive/2015/07/10/memory-barriers-in-dot-net.aspx/
// A full memory barrier is the strongest and interesting one. At least all of the following generate a full memory barrier implicitly:
// Interlocked class mehods
newIndex = crt;
}
continue;
}
break;
}
var instance = Items[newIndex % itemsCount];
//Console.WriteLine($"{DateTime.Now.Ticks}, {Guid.NewGuid()}, Got instance: {instance.ID}");
return instance;
}
But I have to say the truth... I'm not sure if it is correct (it should be), and explaining it is hard, and if anyone touches it in any way it will break.
The basic idea is to have a "low" ceiling for crt (we don't want to overflow, it would break everything... so we want to keep veeeeeery far from int.MaxValue, or you could use uint).
The maximum possible value is:
maxItemCount = (int.MaxValue - MaximumNumberOfThreads) / itemsCount * itemsCount;
The / itemsCount * itemsCount is because we want the rounds to be equally distributed. In the example I give I use a probably much lower number (itemsCount * 100) because lowering this ceiling will only cause the reset more often, but the reset isn't so much slow that it is truly important (it depends on what you are doing on the threads. If they are very small threads that only use cpu then the reset is slow, but if not then it isn't).
Then when we overflow this ceiling we try to move it back to -1 (our starting point). We know that at the same time other bad bad threads could Interlocked.Increment it and create a race on this reset. Thanks to the Interlocked.CompareExchange only one thread can successfully reset the counter, but the other racing threads will immediately see this and break from their attempts.
Mmmh... The if can be rewritten as:
if (newIndex >= itemsCount)
{
int newIndex2;
while (newIndex >= itemsCount && (newIndex2 = Interlocked.CompareExchange(ref crt, 0, newIndex)) != newIndex)
{
// If the Interlocked.CompareExchange is successfull, the while will end and so we won't be here,
// if it fails, newIndex2 is the current value of crt
newIndex = newIndex2;
}
continue;
}
No, the Interlocked class offers no mechanism that would allow you to restore an Int32 value back to zero in case it overflows. The reason is that it is possible for two threads to invoke concurrently the var newIndex = Interlocked.Increment(ref crt); statement, in which case both with overflow the counter, and then none will succeed in updating the value back to zero. This functionality is just beyond the capabilities of the Interlocked class. To make such complex operations atomic you'll need to use some other synchronization mechanism, like a lock.
Update: xanatos's answer proves that the above statement is wrong. It is also proven wrong by the answers of this 9-year old question. Below are two implementation of an InterlockedIncrementRoundRobin method. The first is a simplified version of this answer, by Alex Sorokoletov:
public static int InterlockedRoundRobinIncrement(ref int location, int modulo)
{
// Arguments validation omitted (the modulo should be a positive number)
uint current = unchecked((uint)Interlocked.Increment(ref location));
return (int)(current % modulo);
}
This implementation is very efficient, but it has the drawback that the backing int value is not directly usable, since it circles through the whole range of the Int32 type (including negative values). The usable information comes by the return value of the method itself, which is guaranteed to be in the range [0..modulo]. If you want to read the current value without incrementing it, you would need another similar method that does the same int -> uint -> int conversion:
public static int InterlockedRoundRobinRead(ref int location, int modulo)
{
uint current = unchecked((uint)Volatile.Read(ref location));
return (int)(current % modulo);
}
It also has the drawback that once every 4,294,967,296 increments, and unless the modulo is a power of 2, it returns a 0 value prematurely, before having reached the modulo - 1 value. In other words the rollover logic is technically flawed. This may or may not be a big issue, depending on the application.
The second implementation is a modified version of xanatos's algorithm:
public static int InterlockedRoundRobinIncrement(ref int location, int modulo)
{
// Arguments validation omitted (the modulo should be a positive number)
while (true)
{
int current = Interlocked.Increment(ref location);
if (current >= 0 && current < modulo) return current;
// Overflow. Try to zero the number.
while (true)
{
int current2 = Interlocked.CompareExchange(ref location, 0, current);
if (current2 == current) return 0; // Success
current = current2;
if (current >= 0 && current < modulo)
{
break; // Another thread zeroed the number. Retry increment.
}
}
}
}
This is slightly less efficient (especially for small modulo values), because once in a while an Interlocked.Increment operation results to an out-of-range value, and the value is rejected and the operation repeated. It does have the advantage though that the backing int value remains in the [0..modulo] range, except for some very brief time spans, during some of this method's invocations.
An alternative to using CompareExchange would be to simply let the values overflow.
I have tested this and could not prove it wrong (so far), but of course that does not mean that it isn't.
//this incurs some cost, but "should" ensure that the int range
// is mapped to the unit range (int.MinValue is mapped to 0 in the uint range)
static ulong toPositive(int i) => (uint)1 + long.MaxValue + (uint)i;
static A GetInstance()
{
//this seems to overflow safely without unchecked
var newCounter = Interlocked.Increment(ref crt);
//convert the counter to a list index, that is map the unsigned value
//to a signed range and get the value modulus the itemCount value
var newIndex = (int)(toPositive(newCounter) % (ulong)itemsCount);
var instance = Items[newIndex];
//Console.WriteLine($"{DateTime.Now.Ticks}, Got instance: {instance.ID}");
return instance;
}
PS: Another part of the xy problem part of my question: At a friend's suggestion I am currently investigating using a LinkedList or something similar (with locks) to achieve the same purpose.

Unsafe.As from byte array to ulong array

I'm currently looking at porting my metro hash implementon to use C#7 features, as several parts might profit from ref locals to improve performance.
The hash does the calculations on a ulong[4] array, but the result is a 16 byte array. Currently I'm copying the ulong array to the result byte buffer, but this takes a bit of time.
So i'm wondering if System.Runtime.CompilerServices.Unsafe is safe to use here:
var result = new byte[16];
ulong[] state = Unsafe.As<byte[], ulong[]>(ref result);
ref var firstState = ref state[0];
ref var secondState = ref state[1];
ulong thirdState = 0;
ulong fourthState = 0;
The above code snippet means that I'm using the result buffer also for parts of my state calculations and not only for the final output.
My unit tests are successful and according to benchmarkdotnet skipping the block copy would result in a 20% performance increase, which is high enough for me to find out if it is correct to use it.
In current .NET terms, this would be a good fit for Span<T>:
Span<byte> result = new byte[16];
Span<ulong> state = MemoryMarshal.Cast<byte, ulong>(result);
This enforces lengths etc, while having good JIT behaviour and not requiring unsafe. You can even stackalloc the original buffer (from C# 7.2 onwards):
Span<byte> result = stackalloc byte[16];
Span<ulong> state = MemoryMarshal.Cast<byte, ulong>(result);
Note that Span<T> gets the length change correct; it is also trivial to cast into a Span<Vector<T>> if you want to use SIMD for hardware acceleration.
What you're doing seems fine, just be careful because there's nothing to stop you from doing this:
byte[] x = new byte[16];
long[] y = Unsafe.As<byte[], long[]>(ref x);
Console.WriteLine(y.Length); // still 16
for (int i = 0; i < y.Length; i++)
Console.WriteLine(y[i]); // reads random memory from your program, could cause crash
C# supports "fixed buffers", here's the kind of thing we can do:
public unsafe struct Bytes
{
public fixed byte bytes[16];
}
then
public unsafe static Bytes Convert (long[] longs)
{
fixed (long * longs_ptr = longs)
return *((Bytes*)(longs_ptr));
}
Try it. (1D arrays of primitive types in C# are always stored as a contiguous block of memory which is why taking the address of the (managed) arrays is fine).
You could also even return the pointer for more speed:
public unsafe static Bytes * Convert (long[] longs)
{
fixed (long * longs_ptr = longs)
return ((Bytes*)(longs_ptr));
}
and manipulate/access the bytes as you want.
var s = Convert(longs);
var b = s->bytes[0];

Marshalling C array in C# - Simple HelloWorld

Building off of my marshalling helloworld question, I'm running into issues marshalling an array allocated in C to C#. I've spent hours researching where I might be going wrong, but everything I've tried ends up with errors such as AccessViolationException.
The function that handles creating an array in C is below.
__declspec(dllexport) int __cdecl import_csv(char *path, struct human ***persons, int *numPersons)
{
int res;
FILE *csv;
char line[1024];
struct human **humans;
csv = fopen(path, "r");
if (csv == NULL) {
return errno;
}
*numPersons = 0; // init to sane value
/*
* All I'm trying to do for now is get more than one working.
* Starting with 2 seems reasonable. My test CSV file only has 2 lines.
*/
humans = calloc(2, sizeof(struct human *));
if (humans == NULL)
return ENOMEM;
while (fgets(line, 1024, csv)) {
char *tmp = strdup(line);
struct human *person;
humans[*numPersons] = calloc(1, sizeof(*person));
person = humans[*numPersons]; // easier to work with
if (person == NULL) {
return ENOMEM;
}
person->contact = calloc(1, sizeof(*(person->contact)));
if (person->contact == NULL) {
return ENOMEM;
}
res = parse_human(line, person);
if (res != 0) {
return res;
}
(*numPersons)++;
}
(*persons) = humans;
fclose(csv);
return 0;
}
The C# code:
IntPtr humansPtr = IntPtr.Zero;
int numHumans = 0;
HelloLibrary.import_csv(args[0], ref humansPtr, ref numHumans);
HelloLibrary.human[] humans = new HelloLibrary.human[numHumans];
IntPtr[] ptrs = new IntPtr[numHumans];
IntPtr aIndex = (IntPtr)Marshal.PtrToStructure(humansPtr, typeof(IntPtr));
// Populate the array of IntPtr
for (int i = 0; i < numHumans; i++)
{
ptrs[i] = new IntPtr(aIndex.ToInt64() +
(Marshal.SizeOf(typeof(IntPtr)) * i));
}
// Marshal the array of human structs
for (int i = 0; i < numHumans; i++)
{
humans[i] = (HelloLibrary.human)Marshal.PtrToStructure(
ptrs[i],
typeof(HelloLibrary.human));
}
// Use the marshalled data
foreach (HelloLibrary.human human in humans)
{
Console.WriteLine("first:'{0}'", human.first);
Console.WriteLine("last:'{0}'", human.last);
HelloLibrary.contact_info contact = (HelloLibrary.contact_info)Marshal.
PtrToStructure(human.contact, typeof(HelloLibrary.contact_info));
Console.WriteLine("cell:'{0}'", contact.cell);
Console.WriteLine("home:'{0}'", contact.home);
}
The first human struct gets marshalled fine. I get the access violation exceptions after the first one. I feel like I'm missing something with marshalling structs with struct pointers inside them. I hope I have some simple mistake I'm overlooking. Do you see anything wrong with this code?
See this GitHub gist for full source.
// Populate the array of IntPtr
This is where you went wrong. You are getting back a pointer to an array of pointers. You got the first one correct, actually reading the pointer value from the array. But then your for() loop got it wrong, just adding 4 (or 8) to the first pointer value. Instead of reading them from the array. Fix:
IntPtr[] ptrs = new IntPtr[numHumans];
// Populate the array of IntPtr
for (int i = 0; i < numHumans; i++)
{
ptrs[i] = (IntPtr)Marshal.PtrToStructure(humansPtr, typeof(IntPtr));
humansPtr = new IntPtr(humansPtr.ToInt64() + IntPtr.Size);
}
Or much more cleanly since marshaling arrays of simple types is already supported:
IntPtr[] ptrs = new IntPtr[numHumans];
Marshal.Copy(humansPtr, ptrs, 0, numHumans);
I found the bug by using the Debug + Windows + Memory + Memory 1. Put humansPtr in the Address field, switched to 4-byte integer view and observed that the C code was doing it correctly. Then quickly found out that ptrs[] did not contain the values I saw in the Memory window.
Not sure why you are writing code like this, other than as a mental exercise. It is not the correct way to go about it, you are for example completely ignoring the need to release the memory again. Which is very nontrivial. Parsing CSV files in C# is quite simple and just as fast as doing it in C, it is I/O bound, not execute-bound. You'll easily avoid these almost impossible to debug bugs and get lots of help from the .NET Framework.

What is the advantage of using unsafe vs safe C# code?

unsafe static void SquarePtrParam (int* p)
{
*p *= *p;
}
VS
static void SquarePtrParam (ref int p)
{
p *= p;
}
Safe code can run in any situation where you can run C# code (Silverlight, shared hosting ASP.NET, XNA, SQL Server, etc.), while unsafe code require elevated trust. This means you can run your code in more places and with fewer restrictions.
Also, it's safe, meaning you don't have to worry about doing something wrong and crashing your process.
Your example is not a good one, the JIT compiler already generates the code like that. Under the hood references are pointers too. This needed to be fast, managed code would never have been competitive.
The garbage collected heap is pretty incompatible with pointers, you have to pin objects to make it possible to create a pointer to them. Without the pinning, the garbage collector could move the object and your code randomly fails, destroying the heap integrity. Pinning has a non-zero cost, both in the operation and the loss of efficiency you'll suffer, well after you unpinned, when a garbage collection happens while an object is pinned.
Pointers are highly effective when accessing unmanaged memory. The canonical example is image processing that requires accessing the pixels of a bitmap. And it is a way to quickly access pinned arrays with all the safety interlocks removed, array index checking isn't free when you don't iterate them.
There's only one reason for using unsafe code: Raw performance.
Using unsafe code, you can use C++ like pointers, without very much checking by the runtime. No checks means you are on your own, but there's less overhead.
I've only seen it in action for speeding up image/bitmap manipulation. But you could also use it for inline string manipulation (yes, making strings mutable!!! Bad idea anyway unless you want to build StringBuilder). Other usages include matrix calculations or other heavy mathematics. And probably interfacing with the OS, and some hacking.
A perfect example is described in book J.Richter "CLR via C#", 3 edition, Ch. 16:
The following C# code demonstrates three techniques (safe, jagged, and unsafe), for accessing a two-dimensional array:
using System;
using System.Diagnostics;
public static class Program {
private const Int32 c_numElements = 10000;
public static void Main() {
const Int32 testCount = 10;
Stopwatch sw;
// Declare a two-dimensional array
Int32[,] a2Dim = new Int32[c_numElements, c_numElements];
// Declare a two-dimensional array as a jagged array (a vector of vectors)
Int32[][] aJagged = new Int32[c_numElements][];
for (Int32 x = 0; x < c_numElements; x++)
aJagged[x] = new Int32[c_numElements];
// 1: Access all elements of the array using the usual, safe technique
sw = Stopwatch.StartNew();
for (Int32 test = 0; test < testCount; test++)
Safe2DimArrayAccess(a2Dim);
Console.WriteLine("{0}: Safe2DimArrayAccess", sw.Elapsed);
// 2: Access all elements of the array using the jagged array technique
sw = Stopwatch.StartNew();
for (Int32 test = 0; test < testCount; test++)
SafeJaggedArrayAccess(aJagged);
Console.WriteLine("{0}: SafeJaggedArrayAccess", sw.Elapsed);
// 3: Access all elements of the array using the unsafe technique
sw = Stopwatch.StartNew();
for (Int32 test = 0; test < testCount; test++)
Unsafe2DimArrayAccess(a2Dim);
Console.WriteLine("{0}: Unsafe2DimArrayAccess", sw.Elapsed);
Console.ReadLine();
}
private static Int32 Safe2DimArrayAccess(Int32[,] a) {
Int32 sum = 0;
for (Int32 x = 0; x < c_numElements; x++) {
for (Int32 y = 0; y < c_numElements; y++) {
sum += a[x, y];
}
}
return sum;
}
private static Int32 SafeJaggedArrayAccess(Int32[][] a) {
Int32 sum = 0;
for (Int32 x = 0; x < c_numElements; x++) {
for (Int32 y = 0; y < c_numElements; y++) {
sum += a[x][y];
}
}
return sum;
}
private static unsafe Int32 Unsafe2DimArrayAccess(Int32[,] a) {
Int32 sum = 0;
fixed (Int32* pi = a) {
for (Int32 x = 0; x < c_numElements; x++) {
Int32 baseOfDim = x * c_numElements;
for (Int32 y = 0; y < c_numElements; y++) {
sum += pi[baseOfDim + y];
}
}
}
return sum;
}
}
The Unsafe2DimArrayAccess method is marked with the unsafe modifier, which is required
to use C#’s fixed statement. To compile this code, you’ll have to specify the /unsafe switch when invoking the C# compiler or check the “Allow Unsafe Code” check box on the Build tab of the Project Properties pane in Microsoft Visual Studio.
When I run this program on my machine, I get the following output:
00:00:02.0017692: Safe2DimArrayAccess
00:00:01.5197844: SafeJaggedArrayAccess
00:00:01.7343436: Unsafe2DimArrayAccess
As you can see, the safe two-dimensional array access technique is the slowest. The safe jagged array access technique takes a little less time to complete than the safe two-dimensional array access technique. However, you should note that creating the jagged array is more time-consuming than creating the multi-dimensional array because creating the jagged array requires an object to be allocated on the heap for each dimension, causing the garbage collector to kick in periodically. So there is a trade-off: If you need to create a lot of “multidimensional arrays” and you intend to access the elements infrequently, it is quicker to create a multi-dimensional array. If you need to create the “multi-dimensional array” just once, and you access its elements frequently, a jagged array will give you better performance. Certainly, in most applications, the latter scenario is more common.
I don't think there is an advantage to using unsafe code in the example you've given. I've only really used unsafe code when I've needed to interact with unmanaged code, for example when calling out to non-com dll interfaces.

What's the best way to do a backwards loop in C/C#/C++?

I need to move backwards through an array, so I have code like this:
for (int i = myArray.Length - 1; i >= 0; i--)
{
// Do something
myArray[i] = 42;
}
Is there a better way of doing this?
Update: I was hoping that maybe C# had some built-in mechanism for this like:
foreachbackwards (int i in myArray)
{
// so easy
}
While admittedly a bit obscure, I would say that the most typographically pleasing way of doing this is
for (int i = myArray.Length; i --> 0; )
{
//do something
}
In C++ you basicially have the choice between iterating using iterators, or indices.
Depending on whether you have a plain array, or a std::vector, you use different techniques.
Using std::vector
Using iterators
C++ allows you to do this using std::reverse_iterator:
for(std::vector<T>::reverse_iterator it = v.rbegin(); it != v.rend(); ++it) {
/* std::cout << *it; ... */
}
Using indices
The unsigned integral type returned by `std::vector::size` is *not* always `std::size_t`. It can be greater or less. This is crucial for the loop to work.
for(std::vector<int>::size_type i = someVector.size() - 1;
i != (std::vector<int>::size_type) -1; i--) {
/* std::cout << someVector[i]; ... */
}
It works, since unsigned integral types values are defined by means of modulo their count of bits. Thus, if you are setting -N, you end up at (2 ^ BIT_SIZE) -N
Using Arrays
Using iterators
We are using `std::reverse_iterator` to do the iterating.
for(std::reverse_iterator<element_type*> it(a + sizeof a / sizeof *a), itb(a);
it != itb;
++it) {
/* std::cout << *it; .... */
}
Using indices
We can safely use `std::size_t` here, as opposed to above, since `sizeof` always returns `std::size_t` by definition.
for(std::size_t i = (sizeof a / sizeof *a) - 1; i != (std::size_t) -1; i--) {
/* std::cout << a[i]; ... */
}
Avoiding pitfalls with sizeof applied to pointers
Actually the above way of determining the size of an array sucks. If a is actually a pointer instead of an array (which happens quite often, and beginners will confuse it), it will silently fail. A better way is to use the following, which will fail at compile time, if given a pointer:
template<typename T, std::size_t N> char (& array_size(T(&)[N]) )[N];
It works by getting the size of the passed array first, and then declaring to return a reference to an array of type char of the same size. char is defined to have sizeof of: 1. So the returned array will have a sizeof of: N * 1, which is what we are looking for, with only compile time evaluation and zero runtime overhead.
Instead of doing
(sizeof a / sizeof *a)
Change your code so that it now does
(sizeof array_size(a))
I would always prefer clear code against 'typographically pleasing' code.
Thus, I would always use :
for (int i = myArray.Length - 1; i >= 0; i--)
{
// Do something ...
}
You can consider it as the standard way to loop backwards.
Just my two cents...
In C#, using Visual Studio 2005 or later, type 'forr' and hit [TAB] [TAB]. This will expand to a for loop that goes backwards through a collection.
It's so easy to get wrong (at least for me), that I thought putting this snippet in would be a good idea.
That said, I like Array.Reverse() / Enumerable.Reverse() and then iterate forwards better - they more clearly state intent.
In C# using Linq:
foreach(var item in myArray.Reverse())
{
// do something
}
That's definitely the best way for any array whose length is a signed integral type. For arrays whose lengths are an unsigned integral type (e.g. an std::vector in C++), then you need to modify the end condition slightly:
for(size_t i = myArray.size() - 1; i != (size_t)-1; i--)
// blah
If you just said i >= 0, this is always true for an unsigned integer, so the loop will be an infinite loop.
Looks good to me. If the indexer was unsigned (uint etc), you might have to take that into account. Call me lazy, but in that (unsigned) case, I might just use a counter-variable:
uint pos = arr.Length;
for(uint i = 0; i < arr.Length ; i++)
{
arr[--pos] = 42;
}
(actually, even here you'd need to be careful of cases like arr.Length = uint.MaxValue... maybe a != somewhere... of course, that is a very unlikely case!)
The best way to do that in C++ is probably to use iterator (or better, range) adaptors, which will lazily transform the sequence as it is being traversed.
Basically,
vector<value_type> range;
foreach(value_type v, range | reversed)
cout << v;
Displays the range "range" (here, it's empty, but i'm fairly sure you can add elements yourself) in reverse order.
Of course simply iterating the range is not much use, but passing that new range to algorithms and stuff is pretty cool.
This mechanism can also be used for much more powerful uses:
range | transformed(f) | filtered(p) | reversed
Will lazily compute the range "range", where function "f" is applied to all elements, elements for which "p" is not true are removed, and finally the resulting range is reversed.
Pipe syntax is the most readable IMO, given it's infix.
The Boost.Range library update pending review implements this, but it's pretty simple to do it yourself also. It's even more cool with a lambda DSEL to generate the function f and the predicate p in-line.
In C I like to do this:
int i = myArray.Length;
while (i--) {
myArray[i] = 42;
}
C# example added by MusiGenesis:
{int i = myArray.Length; while (i-- > 0)
{
myArray[i] = 42;
}}
I prefer a while loop. It's more clear to me than decrementing i in the condition of a for loop
int i = arrayLength;
while(i)
{
i--;
//do something with array[i]
}
i do this
if (list.Count > 0)
for (size_t i = list.Count - 1; ; i--)
{
//do your thing
if (i == 0) //for preventing unsigned wrap
break;
}
but for some reason visual studio 2019 gets angry and warns me "ill-defined loop" or something.. it doesnt trust me
edit: you can remove "i >= 0" from "for (size_t i = list.Count - 1; i >= 0; i--)" .. its unnecessary
I'm going to try answering my own question here, but I don't really like this, either:
for (int i = 0; i < myArray.Length; i++)
{
int iBackwards = myArray.Length - 1 - i; // ugh
myArray[iBackwards] = 666;
}
I'd use the code in the original question, but if you really wanted to use foreach and have an integer index in C#:
foreach (int i in Enumerable.Range(0, myArray.Length).Reverse())
{
myArray[i] = 42;
}
// this is how I always do it
for (i = n; --i >= 0;){
...
}
For C++:
As mentioned by others, when possible (i.e. when you only want each element at a time) it is strongly preferable to use iterators to both be explicit and avoid common pitfalls. Modern C++ has a more concise syntax for that with auto:
std::vector<int> vec = {1,2,3,4};
for (auto it = vec.rbegin(); it != vec.rend(); ++it) {
std::cout<<*it<<" ";
}
prints 4 3 2 1 .
You can also modify the value during the loop:
std::vector<int> vec = {1,2,3,4};
for (auto it = vec.rbegin(); it != vec.rend(); ++it) {
*it = *it + 10;
std::cout<<*it<<" ";
}
leading to 14 13 12 11 being printed and {11, 12, 13, 14} being in the std::vector afterwards.
If you don't plan on modifying the value during the loop, you should make sure that you get an error when you try to do that by accident, similarly to how one might write for(const auto& element : vec). This is possible like this:
std::vector<int> vec = {1,2,3,4};
for (auto it = vec.crbegin(); it != vec.crend(); ++it) { // used crbegin()/crend() here...
*it = *it + 10; // ... so that this is a compile-time error
std::cout<<*it<<" ";
}
The compiler error in this case for me is:
/tmp/main.cpp:20:9: error: assignment of read-only location ‘it.std::reverse_iterator<__gnu_cxx::__normal_iterator<const int*, std::vector<int> > >::operator*()’
20 | *it = *it + 10;
| ~~~~^~~~~~~~~~
Also note that you should make sure not to use different iterator types together:
std::vector<int> vec = {1,2,3,4};
for (auto it = vec.rbegin(); it != vec.end(); ++it) { // mixed rbegin() and end()
std::cout<<*it<<" ";
}
leads to the verbose error:
/tmp/main.cpp: In function ‘int main()’:
/tmp/main.cpp:19:33: error: no match for ‘operator!=’ (operand types are ‘std::reverse_iterator<__gnu_cxx::__normal_iterator<int*, std::vector<int> > >’ and ‘std::vector<int>::iterator’ {aka ‘__gnu_cxx::__normal_iterator<int*, std::vector<int> >’})
19 | for (auto it = vec.rbegin(); it != vec.end(); ++it) {
| ~~ ^~ ~~~~~~~~~
| | |
| | std::vector<int>::iterator {aka __gnu_cxx::__normal_iterator<int*, std::vector<int> >}
| std::reverse_iterator<__gnu_cxx::__normal_iterator<int*, std::vector<int> > >
If you have C-style arrays on the stack, you can do things like this:
int vec[] = {1,2,3,4};
for (auto it = std::crbegin(vec); it != std::crend(vec); ++it) {
std::cout<<*it<<" ";
}
If you really need the index, consider the following options:
check the range, then work with signed values, e.g.:
void loop_reverse(std::vector<int>& vec) {
if (vec.size() > static_cast<size_t>(std::numeric_limits<int>::max())) {
throw std::invalid_argument("Input too large");
}
const int sz = static_cast<int>(vec.size());
for(int i=sz-1; i >= 0; --i) {
// do something with i
}
}
Work with unsigned values, be careful, and add comments, e.g.:
void loop_reverse2(std::vector<int>& vec) {
for(size_t i=vec.size(); i-- > 0;) { // reverse indices from N-1 to 0
// do something with i
}
}
calculate the actual index separately, e.g.:
void loop_reverse3(std::vector<int>& vec) {
for(size_t offset=0; offset < vec.size(); ++offset) {
const size_t i = vec.size()-1-offset; // reverse indices from N-1 to 0
// do something with i
}
}
If you use C++ and want to use size_t, not int,
for (size_t i = yourVector.size(); i--;) {
// i is the index.
}
(Note that -1 is interpreted as a large positive number if it's size_t, thus a typical for-loop such as for (int i = yourVector.size()-1; i>=0; --i) doesn't work if size_t is used instead of int.)
Not that it matters after 13+ years but just for educational purposes and a bit of trivial learning;
The original code was;
for (int i = myArray.Length - 1; i >= 0; i--)
{
// Do something
myArray[i] = 42;
}
You don't really need to test 'i' again being greater or equal to zero since you simply need to only produce a 'false' result to terminate the loop. Therefore, you can simple do this where you are only testing 'i' itself if it is true or false since it will be (implicitly) false when it hits zero.;
for (int i = myArray.Length - 1; i; i--)
{
// Do something
myArray[i] = 42;
}
Like I stated, it doesn't really matter, but it is just interesting to understand the mechanics of what is going on inside the for() loop.
NOTE: This post ended up being far more detailed and therefore off topic, I apologize.
That being said my peers read it and believe it is valuable 'somewhere'. This thread is not the place. I would appreciate your feedback on where this should go (I am new to the site).
Anyway this is the C# version in .NET 3.5 which is amazing in that it works on any collection type using the defined semantics. This is a default measure (reuse!) not performance or CPU cycle minimization in most common dev scenario although that never seems to be what happens in the real world (premature optimization).
*** Extension method working over any collection type and taking an action delegate expecting a single value of the type, all executed over each item in reverse **
Requres 3.5:
public static void PerformOverReversed<T>(this IEnumerable<T> sequenceToReverse, Action<T> doForEachReversed)
{
foreach (var contextItem in sequenceToReverse.Reverse())
doForEachReversed(contextItem);
}
Older .NET versions or do you want to understand Linq internals better? Read on.. Or not..
ASSUMPTION: In the .NET type system the Array type inherits from the IEnumerable interface (not the generic IEnumerable only IEnumerable).
This is all you need to iterate from beginning to end, however you want to move in the opposite direction. As IEnumerable works on Array of type 'object' any type is valid,
CRITICAL MEASURE: We assume if you can process any sequence in reverse order that is 'better' then only being able to do it on integers.
Solution a for .NET CLR 2.0-3.0:
Description: We will accept any IEnumerable implementing instance with the mandate that each instance it contains is of the same type. So if we recieve an array the entire array contains instances of type X. If any other instances are of a type !=X an exception is thrown:
A singleton service:
public class ReverserService
{
private ReverserService() { }
/// <summary>
/// Most importantly uses yield command for efficiency
/// </summary>
/// <param name="enumerableInstance"></param>
/// <returns></returns>
public static IEnumerable ToReveresed(IEnumerable enumerableInstance)
{
if (enumerableInstance == null)
{
throw new ArgumentNullException("enumerableInstance");
}
// First we need to move forwarad and create a temp
// copy of a type that allows us to move backwards
// We can use ArrayList for this as the concrete
// type
IList reversedEnumerable = new ArrayList();
IEnumerator tempEnumerator = enumerableInstance.GetEnumerator();
while (tempEnumerator.MoveNext())
{
reversedEnumerable.Add(tempEnumerator.Current);
}
// Now we do the standard reverse over this using yield to return
// the result
// NOTE: This is an immutable result by design. That is
// a design goal for this simple question as well as most other set related
// requirements, which is why Linq results are immutable for example
// In fact this is foundational code to understand Linq
for (var i = reversedEnumerable.Count - 1; i >= 0; i--)
{
yield return reversedEnumerable[i];
}
}
}
public static class ExtensionMethods
{
public static IEnumerable ToReveresed(this IEnumerable enumerableInstance)
{
return ReverserService.ToReveresed(enumerableInstance);
}
}
[TestFixture]
public class Testing123
{
/// <summary>
/// .NET 1.1 CLR
/// </summary>
[Test]
public void Tester_fornet_1_dot_1()
{
const int initialSize = 1000;
// Create the baseline data
int[] myArray = new int[initialSize];
for (var i = 0; i < initialSize; i++)
{
myArray[i] = i + 1;
}
IEnumerable _revered = ReverserService.ToReveresed(myArray);
Assert.IsTrue(TestAndGetResult(_revered).Equals(1000));
}
[Test]
public void tester_why_this_is_good()
{
ArrayList names = new ArrayList();
names.Add("Jim");
names.Add("Bob");
names.Add("Eric");
names.Add("Sam");
IEnumerable _revered = ReverserService.ToReveresed(names);
Assert.IsTrue(TestAndGetResult(_revered).Equals("Sam"));
}
[Test]
public void tester_extension_method()
{
// Extension Methods No Linq (Linq does this for you as I will show)
var enumerableOfInt = Enumerable.Range(1, 1000);
// Use Extension Method - which simply wraps older clr code
IEnumerable _revered = enumerableOfInt.ToReveresed();
Assert.IsTrue(TestAndGetResult(_revered).Equals(1000));
}
[Test]
public void tester_linq_3_dot_5_clr()
{
// Extension Methods No Linq (Linq does this for you as I will show)
IEnumerable enumerableOfInt = Enumerable.Range(1, 1000);
// Reverse is Linq (which is are extension methods off IEnumerable<T>
// Note you must case IEnumerable (non generic) using OfType or Cast
IEnumerable _revered = enumerableOfInt.Cast<int>().Reverse();
Assert.IsTrue(TestAndGetResult(_revered).Equals(1000));
}
[Test]
public void tester_final_and_recommended_colution()
{
var enumerableOfInt = Enumerable.Range(1, 1000);
enumerableOfInt.PerformOverReversed(i => Debug.WriteLine(i));
}
private static object TestAndGetResult(IEnumerable enumerableIn)
{
// IEnumerable x = ReverserService.ToReveresed(names);
Assert.IsTrue(enumerableIn != null);
IEnumerator _test = enumerableIn.GetEnumerator();
// Move to first
Assert.IsTrue(_test.MoveNext());
return _test.Current;
}
}

Categories