efficiently compare two BitArrays of the same length - c#

How would I do this? I am trying to count when both arrays have the same value of TRUE/1 at the same index. As you can see, my code has multiple bitarrays and is looping through each one and comparing them with a comparisonArray with another loop. It doesn't seem to be very efficient and I need it to be.
foreach (bitArrayTuple in bitarryList) {
for (int i = 0; i < arrayLength; i++)
if (bArrayTuple.Item2[i] && comparisonArray[i])
bitArrayTuple.Item1++;
}
where Item1 is the count and Item2 is a bitarray.

bool equals = ba1.Xor(ba2).OfType<bool>().All(e => !e);

There's not much of a way to do this, because BitArray doesn't let its internal array leak, and because .NET doesn't have the C++ equivalent of const to prevent external modification. You might want to just create your own class from scratch, or, if you feel like hacking, use reflection to get the private field inside the BitArray.

Would this work?
http://msdn.microsoft.com/en-us/library/system.collections.bitarray.and%28v=VS.90%29.aspx
It's like the single & operator in C.

Depending in the number of elements, BitVector32 may be usable. That would simply be an Int32 comparison.
If not possible, you will need to get hold of the int[] located on the m_array private field of each BitArray. Then compare the int[] of each (which is a comparison of 32 bits at a time).

I realize this is an old thread, but I've recently run into a need for this myself and have performed some benchmarks in order to determine which method is fastest:
Firstly, at the moment we can't use BitArray.Clone() because of a known bug in Microsoft's code that will not allow cloning of arrays that are larger than int.MaxValue / 32. We will need to avoid this method until they have fixed the bug.
With that in mind I have run benchmarks against 5 different implementations, all using the largest BitArray I could construct (size of int.MaxValue) with alternating bits. I have run the tests with equal and not equal arrays and resulting speed rankings are the same. Here are the implementations:
Implementation 1: Convert each BitArray into a byte[] and compare the arrays using the CompareTo() method.
Implementation 2: Convert each BitArray into a byte[] and compare the each set of bytes using an XOR operator (^).
Implementation 3: Convert each BitArray into a int[] and compare the arrays using the CompareTo() method.
Implementation 4: Convert each BitArray into a int[] and compare the each set of ints using an XOR operator (^).
Implementation 5: Use a for loop to iterate over each set of bool values and compare
The winner surprised me: Implementation 3.
I would have expected Implementation 4 to be the fastest, but as it turns out 3 is significantly faster.
In terms of speed, here are the implementations ranked fastest first:
Implementation 3
Implementation 4
Implementation 2
Implementation 1
Implementation 5
Here's my code for implementation 3:
public static bool Equals(this BitArray first, BitArray second)
{
// Short-circuit if the arrays are not equal in size
if (first.length != second.length)
return false;
// Convert the arrays to int[]s
int[] firstInts = new int[(int)Math.Ceiling((decimal)first.Count / 32)];
first.CopyTo(firstInts, 0);
int[] secondInts = new int[(int)Math.Ceiling((decimal)second.Count / 32)];
second.CopyTo(secondInts , 0);
// Look for differences
bool areDifferent = false;
for (int i = 0; i < firstInts.Length && !areDifferent; i++)
areDifferent = firstInts[i] != secondInts[i];
return !areDifferent;
}

Related

C# sort array of structs

I am running a simulation part of which requires sort of array of pairs of values.
When I used Array.Sort(v1,v2) it sorts 2 arrays based on first and all the simulation takes roughly 9 ms.
But I need to sort based on first then second so I created array of structs. See my code below.
private struct ValueWithWeight : IComparable<ValueWithWeight>
{
public double Value;
public double Weight;
public int CompareTo(ValueWithWeight other)
{
int cmp = this.Value.CompareTo(other.Value);
if (cmp != 0)
return cmp;
else
return this.Weight.CompareTo(other.Weight);
}
}
void Usage()
{
ValueWithWeight[] data = FillData();
Array.Sort(data);
}
Now it takes roughly 27ms. Is there any better way to sort ?
Since you're going to extremely optimize it please consider following:
Array.Sort runs over your array and performs comparison. In your case, there will not be unboxing since you implemented an interface on structure.
Array.Sort performs swap of elements while sorting. Swapping is internally memmove. Your structure takes at least 16 bytes. You can try to reduce impact by allocating your double values in class. Class will always occupy IntPtr.Size bytes (because you will store pointers) so it should copy less bytes.

Parallel For Loop

I am trying to utilize the parallel for loop in .NET Framework 4.0. However I noticed that, I am missing some elements in the result set.
I have snippet of code as below. lhs.ListData is a list of nullable double and rhs.ListData is a list of nullable double.
int recordCount = lhs.ListData.Count > rhs.ListData.Count ? rhs.ListData.Count : lhs.ListData.Count;
List<double?> listResult = new List<double?>(recordCount);
var rangePartitioner = Partitioner.Create(0, recordCount);
Parallel.ForEach(rangePartitioner, range =>
{
for (int index = range.Item1; index < range.Item2; index++)
{
double? result = lhs.ListData[index] * rhs.ListData[index];
listResult.Add(result);
}
});
lhs.ListData has the length of 7964 and rhs.ListData has the length of 7962. When I perform the "*" operation, listResult has only 7867 as output. There are null elements in the both input list.
I am not sure what is happening during the execution. Is there any reason why I am seeing less elements in the result set? Please advice...
The correct way to do this is to use LINQ's IEnumerable.AsParallel() extention. It does all of the partitioning for you, and everything in PLINQ is inherently thread-safe. There is another LINQ extension called Zip that zips together two collections into one, based on a function that you give it. However, this isn't exactly what you need as it only goes to the length of the shorter of the two lists, not the longer. It would probably be easies to do this, but first expand the shorter of the two lists to the length of the longer one by padding it with null at the end of the list.
IEnumerable<double?> lhs, rhs; // Assume these are filled with your numbers.
double?[] result = System.Linq.Enumerable.Zip(lhs, rhs, (a, b) => a * b).AsParallel().ToArray();
Here's the MSDN page on Zip:
http://msdn.microsoft.com/en-us/library/dd267698%28VS.100%29.aspx
That's probably because the operations on a List<T> (e.g. Add) are not thread safe - your results may vary. As a workaround you could use a lock, but that would very much reduce performance.
It looks like you just want each item in the result list to be the product of the items at the corresponding index in the two input lists, how about this instead using PLINQ:
var listResult = lhs.AsParallel()
.Zip(rhs.AsParallel(), (a,b) => a*b)
.ToList();
Not sure why you chose parallelism here, I would benchmark if this is even necessary - is this truly the bottleneck in your application?
You are using List<double?> to store results but Add method is not thread safe.
You can use explicit index to store the result (instead of calling Add):
listResult[index] = result;

Bug in Array.IStructuralEquatable.GetHashCode?

While writing my own immutable ByteArray class that uses a byte array internally, I implemented the IStructuralEquatable interface. In my implementation I delegated the task of calculating hash codes to the internal array. While testing it, to my great surprise, I found that my two different arrays had the same structural hash code, i.e. they returned the same value from GetHashCode. To reproduce:
IStructuralEquatable array11 = new int[] { 1, 1 };
IStructuralEquatable array12 = new int[] { 1, 2 };
IStructuralEquatable array22 = new int[] { 2, 2 };
var comparer = EqualityComparer<int>.Default;
Console.WriteLine(array11.GetHashCode(comparer)); // 32
Console.WriteLine(array12.GetHashCode(comparer)); // 32
Console.WriteLine(array22.GetHashCode(comparer)); // 64
IStructuralEquatable is quite new and unknown, but I read somewhere that it can be used to compare the contents of collections and arrays. Am I wrong, or is my .Net wrong?
Note that I am not talking about Object.GetHashCode!
Edit:
So, I am apparently wrong as unequal objects may have equal hash codes. But isn't GetHashCode returning a somewhat randomly distributed set of values a requirement? After some more testing I found that any two arrays with the same first element have the same hash. I still think this is strange behavior.
What you have described is not a bug. GetHashCode() does not guarantee unique hashes for nonequal objects.
From MSDN:
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.
EDIT
While the MSFT .NET implementation of GetHashCode() for Array.IStructuralEquatable obeys the principles in the above MSDN documentation, it appears that the authors did not implement it as intended.
Here is the code from "Array.cs":
int IStructuralEquatable.GetHashCode(IEqualityComparer comparer) {
if (comparer == null)
throw new ArgumentNullException("comparer");
Contract.EndContractBlock();
int ret = 0;
for (int i = (this.Length >= 8 ? this.Length - 8 : 0); i < this.Length; i++) {
ret = CombineHashCodes(ret, comparer.GetHashCode(GetValue(0)));
}
return ret;
}
Notice in particular this line:
ret = CombineHashCodes(ret, comparer.GetHashCode(GetValue(0)));
Unless I am mistaken, that 0 was intended to be i. Because of this, GetHashCode() always returns the same value for arrays with the same max(0, n-8th) element, where n is the length of the array. This isn't wrong (doesn't violate documentation), but it is clearly not as good as it would be if 0 were replaced with i. Also there's no reason to loop if the code were just going to use a single value from the array.
This bug has been fixed, at least as of .NET 4.6.2. You can see it through Reference Source.
ret = CombineHashCodes(ret, comparer.GetHashCode(GetValue(i)));
GetHashCode does not return unique values for instances that are not equal. However, instances that are equal will always return the same hash code.
To quote from Object.GetHashCode method:
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.
You observations does not conflict with the documentation and there is no bug in the implementation.

How to compare arrays in C#? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Easiest way to compare arrays in C#
How can I compare two arrays in C#?
I use the following code, but its result is false. I was expecting it to be true.
Array.Equals(childe1,grandFatherNode);
You can use the Enumerable.SequenceEqual() in the System.Linq to compare the contents in the array
bool isEqual = Enumerable.SequenceEqual(target1, target2);
You're comparing the object references, and they are not the same. You need to compare the array contents.
.NET2 solution
An option is iterating through the array elements and call Equals() for each element. Remember that you need to override the Equals() method for the array elements, if they are not the same object reference.
An alternative is using this generic method to compare two generic arrays:
static bool ArraysEqual<T>(T[] a1, T[] a2)
{
if (ReferenceEquals(a1, a2))
return true;
if (a1 == null || a2 == null)
return false;
if (a1.Length != a2.Length)
return false;
var comparer = EqualityComparer<T>.Default;
for (int i = 0; i < a1.Length; i++)
{
if (!comparer.Equals(a1[i], a2[i])) return false;
}
return true;
}
.NET 3.5 or higher solution
Or use SequenceEqual if Linq is available for you (.NET Framework >= 3.5)
There is no static Equals method in the Array class, so what you are using is actually Object.Equals, which determines if the two object references point to the same object.
If you want to check if the arrays contains the same items in the same order, you can use the SequenceEquals extension method:
childe1.SequenceEqual(grandFatherNode)
Edit:
To use SequenceEquals with multidimensional arrays, you can use an extension to enumerate them. Here is an extension to enumerate a two dimensional array:
public static IEnumerable<T> Flatten<T>(this T[,] items) {
for (int i = 0; i < items.GetLength(0); i++)
for (int j = 0; j < items.GetLength(1); j++)
yield return items[i, j];
}
Usage:
childe1.Flatten().SequenceEqual(grandFatherNode.Flatten())
If your array has more dimensions than two, you would need an extension that supports that number of dimensions. If the number of dimensions varies, you would need a bit more complex code to loop a variable number of dimensions.
You would of course first make sure that the number of dimensions and the size of the dimensions of the arrays match, before comparing the contents of the arrays.
Edit 2:
Turns out that you can use the OfType<T> method to flatten an array, as RobertS pointed out. Naturally that only works if all the items can actually be cast to the same type, but that is usually the case if you can compare them anyway. Example:
childe1.OfType<Person>().SequenceEqual(grandFatherNode.OfType<Person>())
Array.Equals is comparing the references, not their contents:
Currently, when you compare two arrays with the = operator, we are really using the System.Object's = operator, which only compares the instances. (i.e. this uses reference equality, so it will only be true if both arrays points to the exact same instance)
Source
If you want to compare the contents of the arrays you need to loop though the arrays and compare the elements.
The same blog post has examples of how to do this. The basic implementation is:
public static bool ArrayEquals<T>(T[] a, T[] b)
{
if (a.Length != b.Length)
{
return false;
}
for (int i = 0; i < a.Length; i++)
{
if (!a[i].Equals(b[i]))
{
return false;
}
}
return true;
}
Though this will have performance issues. Adding a constraint:
public static bool ArrayEquals<T>(T[] a, T[] b) where T: IEquatable<T>
will improve things but will mean the code only works with types that implement IEquatable.
Using EqualityComparer.Default's Equal method instead of calling Equals on the types themselves will also improve performance without requiring the type to implement IEquatable. In this case the body of the method becomes:
EqualityComparer<T> comparer = EqualityComparer<T>.Default;
for (int i = 0; i < a.Length; i++)
{
if (!comparer.Equals(a[i], b[i]))
{
return false;
}
}
The Equals method does a reference comparison - if the arrays are different objects, this will indeed return false.
To check if the arrays contain identical values (and in the same order), you will need to iterate over them and test equality on each.
Array.Equals() appears to only test for the same instance.
There doesn't appear to be a method that compares the values but it would be very easy to write.
Just compare the lengths, if not equal, return false. Otherwise, loop through each value in the array and determine if they match.

What is a equivalent of Delphi FillChar in C#?

What is the C# equivalent of Delphi's FillChar?
I'm assuming you want to fill a byte array with zeros (as that's what FillChar is mostly used for in Delphi).
.NET is guaranteed to initialize all the values in a byte array to zero on creation, so generally FillChar in .NET isn't necessary.
So saying:
byte[] buffer = new byte[1024];
will create a buffer of 1024 zero bytes.
If you need to zero the bytes after the buffer has been used, you could consider just discarding your byte array and declaring a new one (that's if you don't mind having the GC work a bit harder cleaning up after you).
If I understand FillChar correctly, it sets all elements of an array to the same value, yes?
In which case, unless the value is 0, you probably have to loop:
for(int i = 0 ; i < arr.Length ; i++) {
arr[i] = value;
}
For setting the values to the type's 0, there is Array.Clear
Obviously, with the loop answer you can stick this code in a utility method if you need... for example, as an extension method:
public static void FillChar<T>(this T[] arr, T value) {...}
Then you can use:
int[] data = {1,2,3,4,5};
//...
data.FillChar(7);
If you absolutely must have block operations, then Buffer.BlockCopy can be used to blit data between array locatiosn - for example, you could write the first chunk, then blit it a few times to fill the bulk of the array.
Try this in C#:
String text = "hello";
text.PadRight(10, 'h').ToCharArray();

Categories