Do C# collections care about cache friendlyness?

Do C# collections care about cache friendlyness? - c#

I've been running a lot of tests comparing an array of structs with an array of classes and a list of classes. Here's the test I've been running:
struct AStruct {
public int val;
}
class AClass {
public int val;
}
static void TestCacheCoherence()
{
int num = 10000;
int iterations = 1000;
int padding = 64;
List<Object> paddingL = new List<Object>();
AStruct[] structArray = new AStruct[num];
AClass[] classArray = new AClass[num];
List<AClass> classList = new List<AClass>();
for(int i=0;i<num;i++){
classArray[i] = new AClass();
if(padding >0) paddingL.Add(new byte[padding]);
}
for (int i = 0; i < num; i++)
{
classList.Add(new AClass());
if (padding > 0) paddingL.Add(new byte[padding]);
}
Console.WriteLine("\n");
stopwatch("StructArray", iterations, () =>
{
for (int i = 0; i < num; i++)
{
structArray[i].val *= 3;
}
});
stopwatch("ClassArray ", iterations, () =>
{
for (int i = 0; i < num; i++)
{
classArray[i].val *= 3;
}
});
stopwatch("ClassList ", iterations, () =>
{
for (int i = 0; i < num; i++)
{
classList[i].val *= 3;
}
});
}
static Stopwatch watch = new Stopwatch();
public static long stopwatch(string msg, int iterations, Action c)
{
watch.Restart();
for (int i = 0; i < iterations; i++)
{
c();
}
watch.Stop();
Console.WriteLine(msg +": " + watch.ElapsedTicks);
return watch.ElapsedTicks;
}
I'm running this in release mode with the following:
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); // Use only the second core
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
Thread.CurrentThread.Priority = ThreadPriority.Highest;
RESULTS:
With padding=0 I get:
StructArray: 21517
ClassArray: 42637
ClassList: 80679
With padding=64 I get:
StructArray: 21871
ClassArray: 82139
ClassList: 105309
With padding=128 I get:
StructArray: 21694
ClassArray: 76455
ClassList: 107330
I am a bit confused with these results, since I was expecting the difference to be bigger.
After all the structures are tiny and are laid one after the other in memory, while the classes are separated by up to 128 bytes of garbage.
Does this mean that I shouldn't even worry about cache friendlyness? Or is my test flawed?

There are a number of things going on here. The first is that your tests don't take GC's into account- it is distinctly possible that the arrays are being GC'd during the loop over the list (because the arrays are no longer used while you are iterating the list, they are eligible for collection).
The second is that you need to keep in mind that List<T> is backed by an array anyway. The only reading overhead is the additional function calls to go through List.

Related

Why is bubble sort running faster than selection sort?

I am working on a project that compares the time bubble and selection sort take. I made two separate programs and combined them into one and now bubble sort is running much faster than selection sort. I checked to make sure that the code wasn't just giving me 0s because of some conversion error and was running as intended. I am using System.Diagnostics; to measure the time. I also checked that the machine was not the problem, I ran it on Replit and got similar results.
{
class Program
{
public static int s1 = 0;
public static int s2 = 0;
static decimal bubblesort(int[] arr1)
{
int n = arr1.Length;
var sw1 = Stopwatch.StartNew();
for (int i = 0; i < n - 1; i++)
{
for (int j = 0; j < n - i - 1; j++)
{
if (arr1[j] > arr1[j + 1])
{
int tmp = arr1[j];
// swap tmp and arr[i] int tmp = arr[j];
arr1[j] = arr1[j + 1];
arr1[j + 1] = tmp;
s1++;
}
}
}
sw1.Stop();
// Console.WriteLine(sw1.ElapsedMilliseconds);
decimal a = Convert.ToDecimal(sw1.ElapsedMilliseconds);
return a;
}
static decimal selectionsort(int[] arr2)
{
int n = arr2.Length;
var sw1 = Stopwatch.StartNew();
// for (int e = 0; e < 1000; e++)
// {
for (int x = 0; x < arr2.Length - 1; x++)
{
int minPos = x;
for (int y = x + 1; y < arr2.Length; y++)
{
if (arr2[y] < arr2[minPos])
minPos = y;
}
if (x != minPos && minPos < arr2.Length)
{
int temp = arr2[minPos];
arr2[minPos] = arr2[x];
arr2[x] = temp;
s2++;
}
}
// }
sw1.Stop();
// Console.WriteLine(sw1.ElapsedMilliseconds);
decimal a = Convert.ToDecimal(sw1.ElapsedMilliseconds);
return a;
}
static void Main(string[] args)
{
Console.WriteLine("Enter the size of n");
int n = Convert.ToInt32(Console.ReadLine());
Random rnd = new System.Random();
decimal bs = 0M;
decimal ss = 0M;
int s = 0;
int[] arr1 = new int[n];
int tx = 1000; //tx is a variable that I can use to adjust sample size
decimal tm = Convert.ToDecimal(tx);
for (int i = 0; i < tx; i++)
{
for (int a = 0; a < n; a++)
{
arr1[a] = rnd.Next(0, 1000000);
}
ss += selectionsort(arr1);
bs += bubblesort(arr1);
}
bs = bs / tm;
ss = ss / tm;
Console.WriteLine("Bubble Sort took " + bs + " miliseconds");
Console.WriteLine("Selection Sort took " + ss + " miliseconds");
}
}
}
What is going on? What is causing bubble sort to be fast or what is slowing down Selection sort? How can I fix this?
I found that the problem was that the Selection Sort was looping 1000 times per method run in addition to the 1000 runs for sample size, causing the method to perform significantly worse than bubble sort. Thank you guys for help and thank you TheGeneral for showing me the benchmarking tools. Also, the array that was given as a parameter was a copy instead of a reference, as running through the loop manually showed me that the bubble sort was doing it's job and not sorting an already sorted array.

To solve your initial problem you just need to copy your arrays, you can do this easily with ToArray():
Creates an array from a IEnumerable.
ss += selectionsort(arr1.ToArray());
bs += bubblesort(arr1.ToArray());
However let's learn how to do a more reliable benchmark with BenchmarkDotNet:
BenchmarkDotNet Nuget
Official Documentation
Given
public class Sort
{
public static void BubbleSort(int[] arr1)
{
int n = arr1.Length;
for (int i = 0; i < n - 1; i++)
{
for (int j = 0; j < n - i - 1; j++)
{
if (arr1[j] > arr1[j + 1])
{
int tmp = arr1[j];
// swap tmp and arr[i] int tmp = arr[j];
arr1[j] = arr1[j + 1];
arr1[j + 1] = tmp;
}
}
}
}
public static void SelectionSort(int[] arr2)
{
int n = arr2.Length;
for (int x = 0; x < arr2.Length - 1; x++)
{
int minPos = x;
for (int y = x + 1; y < arr2.Length; y++)
{
if (arr2[y] < arr2[minPos])
minPos = y;
}
if (x != minPos && minPos < arr2.Length)
{
int temp = arr2[minPos];
arr2[minPos] = arr2[x];
arr2[x] = temp;
}
}
}
}
Benchmark code
[SimpleJob(RuntimeMoniker.Net50)]
[MemoryDiagnoser()]
public class SortBenchmark
{
private int[] data;
[Params(100, 1000)]
public int N;
[GlobalSetup]
public void Setup()
{
var r = new Random(42);
data = Enumerable
.Repeat(0, N)
.Select(i => r.Next(0, N))
.ToArray();
}
[Benchmark]
public void Bubble() => Sort.BubbleSort(data.ToArray());
[Benchmark]
public void Selection() => Sort.SelectionSort(data.ToArray());
}
Usage
static void Main(string[] args)
{
BenchmarkRunner.Run<SortBenchmark>();
}
Results
Method
N
Mean
Error
StdDev
Bubble
100
8.553 us
0.0753 us
0.0704 us
Selection
100
4.757 us
0.0247 us
0.0231 us
Bubble
1000
657.760 us
7.2581 us
6.7893 us
Selection
1000
300.395 us
2.3302 us
2.1796 us
Summary
What have we learnt? Your bubble sort code is slower ¯\_(ツ)_/¯

It looks like you're passing in the sorted array into Bubble Sort. Because arrays are passed by reference, the sort that you're doing on the array is editing the same contents of the array that will be eventually passed into bubble sort.
Make a second array and pass the second array into bubble sort.

Sort by selection in C#

I am a complete beginner in programming. Trying to make sorting a choice. Everything seems to be ok. Only there is one caveat. Only numbers up to 24 index are filled in the new array. I can’t understand what the problem is.
int[] Fillin(int[] mass)
{
Random r = new Random();
for(int i = 0; i < mass.Length; i++)
{
mass[i] = r.Next(1, 101);
}
return mass;
}
int SearchSmall(int[] mass)
{
int smallest = mass[0];
int small_index = 0;
for(int i = 1; i < mass.Length; i++)
{
if (mass[i] < smallest)
{
smallest = mass[i];
small_index = i;
}
}
return small_index;
}
int[] Remove(int[] massiv,int remind)
{
List<int> tmp = new List<int>(massiv);
tmp.RemoveAt(remind);
massiv = tmp.ToArray();
return massiv;
}
public int[] SortMass(int[] mass)
{
mass = Fillin(mass);
Print(mass);
Console.WriteLine("________________________________");
int[] newmass = new int[mass.Length];
int small;
for(int i = 0; i < mass.Length; i++)
{
small = SearchSmall(mass);
newmass[i] = mass[small];
mass = Remove(mass, small);
}
return newmass;
}

I think your main issue is that when you remove an element in the Remove function, the main loop in for (int i = 0; i < mass.Length; i++) will not check all elements o the initial array. A simple (and ugly) way to fix that would be not to remove the elements but to assign a very high value
public static int[] Remove(int[] massiv, int remind)
{
massiv[remind] = 999999;
return massiv;
}
Or as Legacy suggested simply modify the mass.length for newmass.lengh in the main loop.
As some others have mentioned this is not the best way to order an array, but it is an interesting exercise.

How can I fill array with unique random numbers with for-loop&if-statement?

I'm trying to fill one dimensional array with random BUT unique numbers (No single number should be same). As I guess I have a logical error in second for loop, but can't get it right.
P.S I'm not looking for a more "complex" solution - all I know at is this time is while,for,if.
P.P.S I know that it's a really beginner's problem and feel sorry for this kind of question.
int[] x = new int[10];
for (int i = 0; i < x.Length; i++)
{
x[i] = r.Next(9);
for (int j = 0; j <i; j++)
{
if (x[i] == x[j]) break;
}
}
for (int i = 0; i < x.Length; i++)
{
Console.WriteLine(x[i);
}

Here is a solution with your code.
int[] x = new int[10];
for (int i = 0; i < x.Length;)
{
bool stop = false;
x[i] = r.Next(9);
for (int j = 0; j <i; j++)
{
if (x[i] == x[j]) {
stop = true;
break;
}
}
if (!stop)
i++;
}
for (int i = 0; i < x.Length; i++)
{
Console.WriteLine(x[i]);
}

A simple trace of the posted code reveals some of the issues. To be specific, on the line…
if (x[i] == x[j]) break;
if the random number is “already” in the array, then simply breaking out of the j loop is going to SKIP the current i value into the x array. This means that whenever a duplicate is found, x[i] is going to be 0 (zero) the default value, then skipped.
The outer i loop is obviously looping through the x int array, this is pretty clear and looks ok. However, the second inner loop can’t really be a for loop… and here’s why… basically you need to find a random int, then loop through the existing ints to see if it already exists. Given this, in theory you could grab the same random number “many” times over before getting a unique one. Therefore, in this scenario… you really have NO idea how many times you will loop around before you find this unique number.
With that said, it may help to “break” your problem down. I am guessing a “method” that returns a “unique” int compared to the existing ints in the x array, may come in handy. Create an endless while loop, inside this loop, we would grab a random number, then loop through the “existing” ints. If the random number is not a duplicate, then we can simply return this value. This is all this method does and it may look something like below.
private static int GetNextInt(Random r, int[] x, int numberOfRandsFound) {
int currentRand;
bool itemAlreadyExist = false;
while (true) {
currentRand = r.Next(RandomNumberSize);
itemAlreadyExist = false;
for (int i = 0; i < numberOfRandsFound; i++) {
if (x[i] == currentRand) {
itemAlreadyExist = true;
break;
}
}
if (!itemAlreadyExist) {
return currentRand;
}
}
}
NOTE: Here would be a good time to describe a possible endless loop in this code…
Currently, the random numbers and the size of the array are the same, however, if the array size is “larger” than the random number spread, then the code above will NEVER exit. Example, if the current x array is set to size 11 and the random numbers is left at 10, then you will never be able to set the x[10] item since ALL possible random numbers are already used. I hope that makes sense.
Once we have the method above… the rest should be fairly straight forward.
static int DataSize;
static int RandomNumberSize;
static void Main(string[] args) {
Random random = new Random();
DataSize = 10;
RandomNumberSize = 10;
int numberOfRandsFound = 0;
int[] ArrayOfInts = new int[DataSize];
int currentRand;
for (int i = 0; i < ArrayOfInts.Length; i++) {
currentRand = GetNextInt(random, ArrayOfInts, numberOfRandsFound);
ArrayOfInts[i] = currentRand;
numberOfRandsFound++;
}
for (int i = 0; i < ArrayOfInts.Length; i++) {
Console.WriteLine(ArrayOfInts[i]);
}
Console.ReadKey();
}
Lastly as other have mentioned, this is much easier with a List<int>…
static int DataSize;
static int RandomNumberSize;
static void Main(string[] args) {
Random random = new Random();
DataSize = 10;
RandomNumberSize = 10;
List<int> listOfInts = new List<int>();
bool stillWorking = true;
int currentRand;
while (stillWorking) {
currentRand = random.Next(RandomNumberSize);
if (!listOfInts.Contains(currentRand)) {
listOfInts.Add(currentRand);
if (listOfInts.Count == DataSize)
stillWorking = false;
}
}
for (int i = 0; i < listOfInts.Count; i++) {
Console.WriteLine(i + " - " + listOfInts[i]);
}
Console.ReadKey();
}
Hope this helps ;-)

The typical solution is to generate the entire potential set in sequence (in this case an array with values from 0 to 9). Then shuffle the sequence.
private static Random rng = new Random();
public static void Shuffle(int[] items)
{
int n = list.Length;
while (n > 1) {
n--;
int k = rng.Next(n + 1);
int temp = items[k];
items[k] = items[n];
items[n] = temp;
}
}
static void Main(string[] args)
{
int[] x = new int[10];
for(int i = 0; i<x.Length; i++)
{
x[i] = i;
}
Shuffle(x);
for(int i = 0; i < x.Length; i++)
{
Console.WritLine(x[i]);
}
}
//alternate version of Main()
static void Main(string[] args)
{
var x = Enumerable.Range(0,10).ToArray();
Shuffle(x);
Console.WriteLine(String.Join("\n", x));
}

You can simply do this:
private void AddUniqueNumber()
{
Random r = new Random();
List<int> uniqueList = new List<int>();
int num = 0, count = 10;
for (int i = 0; i < count; i++)
{
num = r.Next(count);
if (!uniqueList.Contains(num))
uniqueList.Add(num);
}
}
Or:
int[] x = new int[10];
Random r1 = new Random();
int num = 0;
for (int i = 0; i < x.Length; i++)
{
num = r1.Next(10);
x[num] = num;
}

C# built in queue faster than my own [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I tried to make a queue that would be so fast as possible and I planned to make it so my taking out many features and you know everything from the beginning. That means I will never try to add more element than I have an array allocated for.
Even though I only implemented what I need, I lose to the built in queue when I get over (~2000) read and write operations.
I got curious what it is that makes the built in queue faster than my own that is built to the bare bone?
As you can see the queue is based on a circular array so I don't have to move any elements. I also just write over the data instead of creating a new node to save some time. (Even though in my test it didn't make any big differences.)
class Queue<T> {
private class Node {
public T data;
public Node(T data) {
this.data = data;
}
public Node() {
}
}
Node[] nodes;
int current;
int emptySpot;
public Queue(int size) {
nodes = new Node[size];
for (int i = 0; i < size; i++) {
nodes[i] = new Node();
}
this.current = 0;
this.emptySpot = 0;
}
public void Enqueue(T value){
nodes[emptySpot].data = value;
emptySpot++;
if (emptySpot >= nodes.Length) {
emptySpot = 0;
}
}
public T Dequeue() {
int ret = current;
current++;
if (current >= nodes.Length) {
current = 0;
}
return nodes[ret].data;
}
}
My testing code is done with the built in stop watch and everything is written out in ticks.
static void Main(string[] args) {
MinimalCollections.Queue<char> queue = new MinimalCollections.Queue<char>(5500);
Queue<char> CQueue = new Queue<char>(5500);
Stopwatch sw = new Stopwatch();
sw.Start();
for (int y = 0; y < 4; y++) {
for (int i = 0; i < 5500; i++) {
queue.Enqueue('f');
}
for (int i = 0; i < 5500; i++) {
queue.Dequeue();
}
}
sw.Stop();
Console.WriteLine("My queue method ticks is = {0}", sw.ElapsedTicks);
sw.Reset();
sw.Start();
for (int y = 0; y < 4; y++) {
for (int i = 0; i < 5500; i++) {
CQueue.Enqueue('f');
}
for (int i = 0; i < 5500; i++) {
CQueue.Dequeue();
}
}
sw.Stop();
Console.WriteLine("C# queue method ticks is = {0}", sw.ElapsedTicks);
Console.ReadKey();
}
The output is:
My queue method ticks is = 2416
C# queue method ticks is = 2320

One obvious overhead that I can see is the introduction of Node objects. This will be especially noticeable when you're actually using this as a Queue of value types such as char, because the built in implementation isn't wrapping the values into a reference type.
Here is how I would change your implementation:
class Queue<T>
{
T[] nodes;
int current;
int emptySpot;
public Queue(int size)
{
nodes = new T[size];
this.current = 0;
this.emptySpot = 0;
}
public void Enqueue(T value)
{
nodes[emptySpot] = value;
emptySpot++;
if (emptySpot >= nodes.Length)
{
emptySpot = 0;
}
}
public T Dequeue()
{
int ret = current;
current++;
if (current >= nodes.Length)
{
current = 0;
}
return nodes[ret];
}
}
This seems to fare much better (Release build, x64, win 8.1):
My queue method ticks is = 582
C# queue method ticks is = 2166

Can you clear caches and optimizations in C#?

I'm trying to test what built-in collections perform best under certain applications, such as intersection. To do so, I built the following test:
private static void Main(string[] args)
{
LoadTest<HashSet<object>>();
ClearEverythingHere(); // <<-- what can go here?
LoadTest<LinkedList<object>>();
Console.ReadKey(true);
}
private static void LoadTest<T>() where T : ICollection<object>, new()
{
const int n = 1 << 16;
const int c = 1 << 3;
var objs = new object[n << 1];
for (int i = 0; i < n << 1; i++)
objs[i] = new object();
var array = new T[c];
var r = new Random(123);
for (int s = 0; s < c; s++)
{
array[s] = new T();
for (int i = 0; i < n; i++)
array[s].Add(objs[r.Next(n << 1)]);
}
var sw = Stopwatch.StartNew();
IEnumerable<object> final = array[0];
for (int s = 1; s < c; s++)
final = final.Intersect(array[s]);
sw.Stop();
Console.WriteLine("Ticks elapsed: {0}", sw.ElapsedTicks);
}
If I uncomment both test methods from Main, the second test always completes much faster than the first, no matter which order I test the structures. Generally, the first intersection runs in a few hundred ticks, and the second finishes in less than ten. I would have thought having the tests in completely separate scopes would have prevented at least some of the (what I'm presuming is) caching that leads to such different results.
Is there an easy way to reset the application so that I don't have to worry about caching or optimizing for testing? I would like to be able to run one test, print the results, clear it out, and run another test? Yes, I could comment and uncomment, or possibly spawn two separate applications, but that's a lot of work for simple console tests.
Edit: I've modified the tests as per the suggestions in the answers.
private static void Main(string[] args)
{
const int n = 1 << 17;
const int c = 1 << 4;
var objs = new Item[n << 1];
for (int i = 0; i < (n << 1); i++)
objs[i] = new Item(i);
var items = new Item[c][];
var hash = new HashSet<Item>[c];
var list = new LinkedList<Item>[c];
var r = new Random();
for (int s = 0; s < c; s++)
{
items[s] = new Item[n];
for (int i = 0; i < n; i++)
items[s][i] = objs[r.Next(n << 1)];
hash[s] = new HashSet<Item>(items[s]);
list[s] = new LinkedList<Item>(items[s]);
}
Stopwatch stopwatch = Stopwatch.StartNew();
HashSet<Item> fHash = hash[0];
for (int s = 1; s < hash.Length; s++)
fHash.IntersectWith(hash[s]);
stopwatch.Stop();
Console.WriteLine("Intersecting values: {0}", fHash.Count);
Console.WriteLine("Ticks elapsed: {0}", stopwatch.ElapsedTicks);
stopwatch = Stopwatch.StartNew();
IEnumerable<Item> iEnum = list[0];
for (int s = 1; s < list.Length; s++)
iEnum = iEnum.Intersect(list[s]);
Item[] array = iEnum.ToArray();
stopwatch.Stop();
Console.WriteLine("Intersecting values: {0}", array.Length);
Console.WriteLine("Ticks elapsed: {0}", stopwatch.ElapsedTicks);
Console.ReadKey(true);
}
[DebuggerDisplay("Value = {_value}")]
private class Item
{
private readonly int _value;
public Item(int value)
{
_value = value;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj))
return false;
if (ReferenceEquals(this, obj))
return true;
if (obj.GetType() != typeof(Item))
return false;
return Equals(obj);
}
public override int GetHashCode()
{
return _value;
}
public override string ToString()
{
return _value.ToString();
}
}
This solved most of my problems. (And if you're wondering, HashSet.IntersectWith with appears much faster than IEnumerable.Intersect.)

There are few errors in your code.
Intersects is LINQ functions, so it means it is lazily evaluated. That means it gets executed only when the data is accesed. This can be done by either looping over the data or calling ToList or ToArray on this enumerable. By adding this, you get different result
Testing must be always done on same data. Try creating your data outside your test method and pass it as parameter.
First pass of code is usualy considered wrong, because JITing and such.
Try creating your own object and override Equals and GetHashCode. Like this it might not be correct to test it.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Do C# collections care about cache friendlyness? - c#

Related

Why is bubble sort running faster than selection sort?

Sort by selection in C#

How can I fill array with unique random numbers with for-loop&if-statement?

C# built in queue faster than my own [closed]

Can you clear caches and optimizations in C#?

Categories

Resources