An easy way of iterating over all unique pairs in a collection - c#

I have a c# 3 HashSet object containing several elements. I want to check something between every pair of them, without repeating [(a,b)=(b,a)], and without pairing an element with itself.
I thought about switching to some sort of List, so I can pair each element with all of his following elements. Is there an option of doing something like that with a general, unordered, Collection? Or IQuaryable?

For this, it would be easier when you can access them with an index, and although there exists an ElementAt<> extension it probably is faster to use .ToList() first.
When you have an addressable list:
for (int i = 0; i < list.Count-1; i++)
for (int j = i+1; j < list.Count; j++)
Foo(list[i], list[j]);

How about using the Distinct method that takes an IEqualityComparer?

Related

Compare and "Equalize" Collections

Let's suppose we have a List collection A and an int array B. Now we need to see, independent of order, which elements from the array B are present in collection A. Add the elements that are missing and delete the elements that are not to be found int array B.
I have done this using the code below :
for (int i = 0; i < A.Count; i++)
{
for (int k = 0; k < B.Length; k++)
{
if (A[i] == B[k]) goto Found;
}
A.RemoveAt(i);
Found: continue;
}
for (int i = 0; i < B.Length; i++)
{
for (int k = 0; k < A.Count; k++)
{
if (A[k] == B[i]) goto Found;
}
A.Add(B[i]);
Found: continue;
}
Is there a faster way to achieve the same result? Notice that I cannot just delete A and create a new one in accordance with B because this is just a simplified example.
It's futile, in the end you will get collection B all over again. just create collection A based on array B. simple as that!
The very short (and fairly fast) version would be
A.Clear();
A.AddRange(B);
but perhaps you don't really want that either. You can shorten your code a bit when using the Contains method, though:
for (int i = A.Count; i >= 0; i--) {
if (!B.Contains(A[i])) {
A.RemoveAt(i);
}
}
foreach (var item in B) {
if (!A.Contains(item)) {
A.Add(item);
}
}
The first loop cannot be a foreach loop because A is modified while it is being iterated over. It also runs backwards to ensure that every item is looked at.
However, this has quadratic runtime (more precisely: O(|A| · |B|)) and can get slow rather quickly (pun not intended) with large lists. For better runtime (albeit higher memory requirements) you may need to use HashSets for the Contains tests requiring only O(|A| + |B|) runtime performance at the cost of O(|A| + |B|) more memory.
This is a quite long-winded way of getting to the point, though: If you don't care about order of your items, then it seems like your lists are more like sets. In that case, a set data structure makes more sense because it can do those operations efficiently. And you apparently don't care about element order, because you're just adding missing items at the end anyway.
I think using LINQ should be fast:
A.RemoveAll(tmp => !B.Contains(tmp));
A.AddRange(B.Where(tmp => !A.Contains(tmp)));
EDIT: as pointed out by Joey this is still only O(|A| · |B|).
OK I will give you some more details. The example I set above was oversimplified. What I actually have is an XML file which is loaded on an XElement. There are child nodes with specific attributes mapping precisely to the properties of a custom type in my application. Every child node creates an instance of the aforementioned type.
For purposes of extensibility, if I need to add a new property to the custom type, I want all records in the XML file to be updated with the new attribute with an empty value. And if I delete a property, I want the opposite. So here, I must check the collection of Attributes towards the PropertyInfo[] of the custom type. I cannot delete all of the Attributes and recreate them again because their values will be lost as well.
In my approach, I could see that some checks were done twice and because I am really novice, I thought that maybe this was a casual issue that is dealt with a way I could not think of. Thank you all for the nice replys.

How to re-order data in memory to optimize cache access?

I want to shuffle a big dataset (of type List<Record>), then iterate over it many times. Typically, shuffling a list only shuffles the references, not the data. My algorithm's performance suffers tremendously (3x) because of frequent cache missing. I can do a deep copy of the shuffled data to make it cache friendly. However, that would double the memory usage.
Is there a more memory-efficient way to shuffle or re-order data so that the shuffled data is cache friendly?
Option 1:
Make Record a struct so the List<Record> holds contiguous data in memory.
Then either sort it directly, or (if the records are large) instead of sorting the list directly, make an array of indices (initially just {0, 1, ..., n - 1}) and then sort the indices by making the comparator compare the elements they refer to. Finally if you need the sorted array you can copy the elements in the shuffled order by looking at the indices.
Note that this may be more cache-unfriendly than directly sorting the structs, but at least it'll be a single pass through the data, so it is more likely to be faster, depending on the struct size. You can't really avoid it if the struct is large, so if you're not sure whether Record is large, you'll have to try both approaches and see whether sorting the records directly is more efficient.
If you can't change the type, then your only solution is to somehow make them contiguous in memory. The only realistic way of doing that is to perform an initial garbage collection, then allocate them in order, and keep your fingers crossed hoping that the runtime will allocate them contiguously. I can't think of any other way that could work if you can't make it a struct.
If you think another garbage collection run in the middle might mess up the order, you can try making a second array of GCHandle with pinned references to these objects. I don't recommend this, but it might be your only solution at that point.
Option 2:
Are you really using the entire record for sorting? That's unlikely. If not, then just extract the portion of each record that is relevant, sort those, and then re-shuffle the original data.
It is better for you not to touch the List. Instead you create an accessor method for you list. First you create an array of n elements in a random order e.g something like var arr = [2, 5, .., n-1, 0];
Then you create an access method:
Record get(List<Record> list, int i) {
return list[arr[i]];
}
By doing so the list remains untouched, but you get a random Record at every index.
Edit: to create a random order array:
int[] arr = new int[n];
// Fill the array with values 1 to n;
for (int i = 0; i < arr.Length; i++)
arr[i] = i + 1;
// Switch pairs of values for unbiased uniform random distribution:
Random rnd = new Random();
for (int i = 0; i < arr.Length - 1; i++) {
int j = rnd.Next(i, arr.Length);
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}

For Each - Inverted Order

I have a List<Object>. I want to iterate over this list, but I want the order to be inverted, so when I use the object, it will start from the last to the first. How can I do it, minimal code?
Thanks.
P.S. I'm using C#, WinForms.
Use the extension method Enumerable<T>.Reverse. This will iterate through the list in a reverse order and leave the original list intact.
foreach(var item in list.AsEnumerable().Reverse())
{
}
Reverse , however, traverses the list and caches your items in a reverse order when iteration starts. In 90% of the cases this is fine, because it's still a O(n) operation, but if you want to avoid this cache just use a plain old for
for(int i = list.Count - 1; i >= 0; i--) { }

C# shift two dimension array fast method

I have a 2D string array in C# and I need to shift that array to left in one dimension
how can I do that in efficient way
I dont want use nested for and i want an algurithm in O(n) not O(n2)
for (int i = 50; i < 300; i++)
{
for (int j = 0; j < 300; j++)
{
numbers[i-50, j] = numbers[i, j];
}
}
If you want to shift large amounts of data around quickly, use Array.Copy rather than a loop that copies individual characters.
If you swap to a byte array and use Array.Copy or Buffer.BlockCopy you will probably improve the performance a bit more (but if you have to convert to/from character arrays you may lose everything you've gained).
(edit: Now that you've posted example code): If you use references to the array rows then you may be able to shift the references rather than having to move the data itself. Any you can still shift the references using Array.Copy)
But if you change your approach so you don't need to shift the data, you'll gain considerably better performance - not doing the work at all if you can avoid it is always faster! Chances are you can wrap the data in an accessor layer that keeps track of how much the data has been shifted and modifies your indexes to return the data you are after. (This will slightly slow down access to the data, but saves you shifting the data, so may result in a net win - depending on how much you access relative to how much you shift)
The most efficient way would be to not shift it at all, but instead change how you access the array. For example, keep an offset that tells you where in the dimension the first column is.

Replace a substring in each element of a string array?

Hey, I have an array of strings and I want to replace a certain substring in each of those elements. Is there an easy way to do that besides iterating the array explicitly?
Thanks :-)
Ultimately, anything you do is going to do exactly that anyway. A simple for loop should be fine. There are pretty solutions involving lambdas, such as Array.ConvertAll / Enumerable.Select, but tbh it isn't necessary:
for(int i = 0 ; i < arr.Length ; i++) arr[i] = arr[i].Replace("foo","bar");
(the for loop has the most efficient handling for arrays; and foreach isn't an option due to mutating the iterator variable)
You could iterate the array implicitly
arrayOfStrings = arrayOfStrings.Select(s => s.Replace("abc", "xyz")).ToArray();

Categories