Compare and "Equalize" Collections

Compare and "Equalize" Collections - c#

Let's suppose we have a List collection A and an int array B. Now we need to see, independent of order, which elements from the array B are present in collection A. Add the elements that are missing and delete the elements that are not to be found int array B.
I have done this using the code below :
for (int i = 0; i < A.Count; i++)
{
for (int k = 0; k < B.Length; k++)
{
if (A[i] == B[k]) goto Found;
}
A.RemoveAt(i);
Found: continue;
}
for (int i = 0; i < B.Length; i++)
{
for (int k = 0; k < A.Count; k++)
{
if (A[k] == B[i]) goto Found;
}
A.Add(B[i]);
Found: continue;
}
Is there a faster way to achieve the same result? Notice that I cannot just delete A and create a new one in accordance with B because this is just a simplified example.

It's futile, in the end you will get collection B all over again. just create collection A based on array B. simple as that!

The very short (and fairly fast) version would be
A.Clear();
A.AddRange(B);
but perhaps you don't really want that either. You can shorten your code a bit when using the Contains method, though:
for (int i = A.Count; i >= 0; i--) {
if (!B.Contains(A[i])) {
A.RemoveAt(i);
}
}
foreach (var item in B) {
if (!A.Contains(item)) {
A.Add(item);
}
}
The first loop cannot be a foreach loop because A is modified while it is being iterated over. It also runs backwards to ensure that every item is looked at.
However, this has quadratic runtime (more precisely: O(|A| · |B|)) and can get slow rather quickly (pun not intended) with large lists. For better runtime (albeit higher memory requirements) you may need to use HashSets for the Contains tests requiring only O(|A| + |B|) runtime performance at the cost of O(|A| + |B|) more memory.
This is a quite long-winded way of getting to the point, though: If you don't care about order of your items, then it seems like your lists are more like sets. In that case, a set data structure makes more sense because it can do those operations efficiently. And you apparently don't care about element order, because you're just adding missing items at the end anyway.

I think using LINQ should be fast:
A.RemoveAll(tmp => !B.Contains(tmp));
A.AddRange(B.Where(tmp => !A.Contains(tmp)));
EDIT: as pointed out by Joey this is still only O(|A| · |B|).

OK I will give you some more details. The example I set above was oversimplified. What I actually have is an XML file which is loaded on an XElement. There are child nodes with specific attributes mapping precisely to the properties of a custom type in my application. Every child node creates an instance of the aforementioned type.
For purposes of extensibility, if I need to add a new property to the custom type, I want all records in the XML file to be updated with the new attribute with an empty value. And if I delete a property, I want the opposite. So here, I must check the collection of Attributes towards the PropertyInfo[] of the custom type. I cannot delete all of the Attributes and recreate them again because their values will be lost as well.
In my approach, I could see that some checks were done twice and because I am really novice, I thought that maybe this was a casual issue that is dealt with a way I could not think of. Thank you all for the nice replys.

Related

Implementing a Quick sort Algorithm C#

I am trying to implement and construct a Quick Sort algorithm form scratch, I have been browsing different postings and this seems like a popular topic, I'm just not entirely sure how to interpret what other people have done with it to get mine working.
public static void QuickSort(int[] A)
{
int i, j, pivot,counter,temp;
counter = 1; //Sets Initial counter to allow loop to execute
while (counter != 0)
{ i = -1; //Left bound
j = 0; //Right bound
counter = 0; //Count of transpositions per cycle set to zero
Random rand = new Random();
pivot = rand.Next(0, A.Length - 1); //Random Pivot Value Generated
for (int x = 0; x < A.Length; x++) //Executes contents for each item in the array
{
if (A[pivot] > A[j]) //Checks if pivot is greater than right bound
{
i++; //left bound incremented
temp = A[j];
A[j] = A[i];
A[i] = temp; //swaps left and right bound values
j++; //Right bound Incremented
counter++; //Increments number of transpositions for this cycle.
}
else j++; //else right bound is icremented
}
//Heres where it gets sketchy
temp = A[i+1];
A[i + 1] = A[pivot]; //pivot value is placed in Index 1+ left bound
for (int x =(i+2); x <A.Length; x++) //Shifts the remaining values in the array from index of pivot (1+ left bound) over one position to the end of the array (not functional)
{
temp = A[x];
A[x + 1] = temp;
}
}
}
As you can see I've sort of hit a wall with the shifting portion of the algorithm, and I'm honestly not sure how to continue without just copying someone's solution from the web

First step -- Actually understand what is happening in a Quicksort...
Basic Quicksort Algorithm
From your collection of items, select one (called the pivot). (How it is chosen is theoretically irrelevant but may matter on a practical level)
Separate the items in your collection into two new collections: Items less than the pivot, and Items greater than the pivot.
Sort the two new collections separately (how is irrelevant -- most implementations call themselves recursively).
Join the now sorted lower collection, the pivot, and the sort upper collection.
Now, this particular implementation is a bit bizarre. It's trying to do it in place, which is OK, but it doesn't seem to ever move the pivot, which means it's going to be overwritten (and therefore change in the middle of the partitioning)
A few nit-picky things.
Don't create a new Random object in the middle of the loop. Create it once outside the loop, and reuse it. There are several reason for this. 1) It takes a (relative) long time, and 2) it's seeded with the time in milleseconds, so two created within a millisecond of each other will create the same sequence -- here you'll always get the same pivot.
Actually, it's best not to use a random pivot at all. This was probably inspired by basic misunderstanding of the algorithm. Originally, it was proposed that, since it didn't matter which item was picked as the pivot, you might as well pick the first item, because it was easiest. But then it was discovered that an already sorted collections have a worst-case time with that as the pivot. So, naturally they went completely the other way and decided to go with a random pivot. That's dumb. For any given collection, there exists a pivot that cause a worst-case time. By using a random pivot, you increase your chance of hitting it by accident. The best choice: For an indexable collection (like an array), best to go with the item physically in the middle of the collection. It'll give you the best-case time for an already sorted collection, and it's worst-case is a pathological ordering that you're unlikely to hit upon. For a non-indexable collection (like a linked-list -- betcha' didn't know you could Quicksort a linked list), you pretty much have to go with using the first item, so be careful when you use it.
If the first time through the loop, a[0] is less than the pivot, i is equals to j, so you swap A[0] with itself.

You should implement IComparer.
This interface is used in conjunction with the Array.Sort and
Array.BinarySearch methods. It provides a way to customize the sort
order of a collection. See the Compare method for notes on parameters
and return value. The default implementation of this interface is the
Comparer class. For the generic version of this interface, see
System.Collections.Generic.IComparer<T>.
public class Reverse : IComparer
{
int IComparer.Compare(Object x, Object y) =>
((new CaseInsensitiveComparer()).Compare( y, x ));
}
The above method is an example, then you would:
IComparer reverse = new Reverse();
collection.Sort(reverse);
Simply pass your comparison to the Sort on the collection, it will perform and display reversed. You can go even more in depth with greater control and actually sort based on specific model properties and etc.
I would look into Sort and the IComparer above. More detail about IComparer here.

Random access on .NET lists is slow, but what if I always reference the first element?

I know that in general, .NET Lists are not good for random access. I've always been told that an array would be best for that. I have a program that needs to continually (like more than a billion times) access the first element of a .NET list, and I am wondering if this will slow anything down, or it won't matter because it's the first element in the list. I'm also doing a lot of other things like adding and removing items from the list as I go along, but the List is never empty.
I'm using F#, but I think this applies to any .NET language (I am using .NET Lists, not F# Lists). My list is about 100 elements long.

In F#, the .NET list (System.Collections.Generic.List) is aptly aliased as ResizeArray, which leaves little doubt as to what to expect. It's an array that can resize itself, and not really a list in the CS-classroom understanding of the term. Any performance differences between it and a simple array most likely come from the fact that compiler can be more aggressive about optimizing array usage.
Back to your question. If you only access the first element of a list, it doesn't matter what you choose. Both a ResizeArray and a list (using F# lingo) have O(1) access to the first element (head).
A list would be a preferable choice if your other operations also work on the head element, i.e. you only add elements from the head. If you want to append elements to the end of the list, or mutate some elements that already in, you'd get better mileage out of a ResizeArray.
That said, a ResizeArray in idomatic F# code is a rare sight. The usual approach favors (and doesn't suffer from using) immutable data structures, so seeing one usually would be a minor red flag for me.

There is not much difference between the performance of random access for an array and a list. Here's a test on my machine.
var list = Enumerable.Range(1, 100).ToList();
var array = Enumerable.Range(1, 100).ToArray();
int total = 0;
var sw = Stopwatch.StartNew();
for (int i = 0; i < 1000000000; i++) {
total ^= list[0];
}
Console.WriteLine("Time for list: {0}", sw.Elapsed);
sw.Restart();
for (int i = 0; i < 1000000000; i++) {
total ^= array[0];
}
Console.WriteLine("Time for list: {0}", sw.Elapsed);
This produces this output:
Time for list: 00:00:05.2002620
Time for array: 00:00:03.0159816
If you know you have a fixed size list, it makes sense to use an array, otherwise, there's not much cost to the list. (see update)
Update!
I found some pretty significant new information. After executing the script in release mode, the story changes quite a bit.
Time for list: 00:00:02.3048339
Time for array: 00:00:00.0805705
In this case, the performance of the array totally dominates the list. I'm pretty surprised, but the numbers don't lie.
Go with the array.

For Each - Inverted Order

I have a List<Object>. I want to iterate over this list, but I want the order to be inverted, so when I use the object, it will start from the last to the first. How can I do it, minimal code?
Thanks.
P.S. I'm using C#, WinForms.

Use the extension method Enumerable<T>.Reverse. This will iterate through the list in a reverse order and leave the original list intact.
foreach(var item in list.AsEnumerable().Reverse())
{
}
Reverse , however, traverses the list and caches your items in a reverse order when iteration starts. In 90% of the cases this is fine, because it's still a O(n) operation, but if you want to avoid this cache just use a plain old for
for(int i = list.Count - 1; i >= 0; i--) { }

C# Fastest way to find the index of item in a list

I want to know what is the fastest way to find the index of an item in a list. The reason I want to know is because I am making a XNA render but I started getting out of memory exceptions on larger models when I only used a vertex buffer, so I have now implemented an index buffer system. My problem is that I now have to continuously scan a list containing all my Vector3s for the index of the one that I want placed next in my index buffer. I am currently scanning for indices like this:
for (int i = 1; i < circleVect.Length; i++)
{
indices.Add(vertices.FindIndex(v3 => v3 == circleVect[i]));
indices.Add(vertices.FindIndex(v3 => v3 == circleVect[i - 1]));
indices.Add(vertices.FindIndex(v3 => v3 == middle));
}
This works fine except for the fact that it is rather slow. It takes almost 1 second for one cylinder to be calculated and I have more than 70 000 of them in my large model. I have therefore been seeing a loading screen for more than 20 minutes while loading my larger model and it still is not completed. This is unfortunately simply unacceptable. If I try loading my smaller model it takes more than 5 minutes whereas the unindexed loader took merely a second or so.
I have absolutely no formal training in C# and even less in XNA so I know this is probably a very inefficient way of calculating the indices so I would therefore appreciate it if any of you could assist me in creating a more efficient importer.
PS. If necessary I can change the list to an array but this would be a last resort option because it could potentially put strain on the system memory (causing an exception) and would mean quite a bit of coding for me.

in your code you are looping through your vertices 3 times one for each FindIndex line. Instead use one for loop and traverse and check for all three conditions in 1 go.

Your main performance killer is searching in a List of 70 000 items which takes 70 000 operations.
This will work a lot faster than searching in a list.
Dictionary<int, Vertice> vertices = new Dictionary<int, Vertice>();
for (int i = 1; i < vertices.Length; i++)
{
if (vertices.ContainsKey(vertices[i]))
{
indices.Add(vertices[i]);
}
if (vertices.ContainsKey(vertices[i - 1]))
{
indices.Add(vertices[i - 1]);
}
if (vertices.ContainsKey(vertices[middle]))
{
indices.Add(vertices[middle]);
}
}
edit: if searching in the list takes 2 seconds, searching in a dictionary will take 00:00:02 seconds

for (int i = 1; i < circleVect.Length; i++)
{
for (int j = 0; j < vertices.Count; j++)
{
var v3 = vertices[j];
if (v3 == circleVect[i] || v3 == circleVect[i - 1] || v3 == middle)
indices.Add(j);
}
}
You may want to consider adding logic to stop the inner loop from checking after all three of the indexes have been found.

Thanks for all the answers but none of them really worked and really appeard to be faster than what I had. I ended up having to create a structure where I could calculate the needed list index based on a given reference point.

An easy way of iterating over all unique pairs in a collection

I have a c# 3 HashSet object containing several elements. I want to check something between every pair of them, without repeating [(a,b)=(b,a)], and without pairing an element with itself.
I thought about switching to some sort of List, so I can pair each element with all of his following elements. Is there an option of doing something like that with a general, unordered, Collection? Or IQuaryable?

For this, it would be easier when you can access them with an index, and although there exists an ElementAt<> extension it probably is faster to use .ToList() first.
When you have an addressable list:
for (int i = 0; i < list.Count-1; i++)
for (int j = i+1; j < list.Count; j++)
Foo(list[i], list[j]);

How about using the Distinct method that takes an IEqualityComparer?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.