I am looking to refactor a c# method into a c function in an attempt to gain some speed, and then call the c dll in c# to allow my program to use the functionality.
Currently the c# method takes a list of integers and returns a list of lists of integers. The method calculated the power set of the integers so an input of 3 ints would produce the following output (at this stage the values of the ints is not important as it is used as an internal weighting value)
1
2
3
1,2
1,3
2,3
1,2,3
Where each line represents a list of integers. The output indicates the index (with an offset of 1) of the first list, not the value. So 1,2 indicates that the element at index 0 and 1 are an element of the power set.
I am unfamiliar with c, so what are my best options for data structures that will allow the c# to access the returned data?
Thanks in advance
Update
Thank you all for your comments so far. Here is a bit of a background to the nature of the problem.
The iterative method for calculating the power set of a set is fairly straight forward. Two loops and a bit of bit manipulation is all there is to it really. It just get called..a lot (in fact billions of times if the size of the set is big enough).
My thoughs around using c (c++ as people have pointed out) are that it gives more scope for performance tuning. A direct port may not offer any increase, but it opens the way for more involved methods to get a bit more speed out of it. Even a small increase per iteration would equate to a measurable increase.
My idea was to port a direct version and then work to increase it. And then refactor it over time (with help from everyone here at SO).
Update 2
Another fair point from jalf, I dont have to use list or equivelent. If there is a better way then I am open to suggestions. The only reason for the list was that each set of results is not the same size.
The code so far...
public List<List<int>> powerset(List<int> currentGroupList)
{
_currentGroupList = currentGroupList;
int max;
int count;
//Count the objects in the group
count = _currentGroupList.Count;
max = (int)Math.Pow(2, count);
//outer loop
for (int i = 0; i < max; i++)
{
_currentSet = new List<int>();
//inner loop
for (int j = 0; j < count; j++)
{
if ((i & (1 << j)) == 0)
{
_currentSetList.Add(_currentGroupList.ElementAt(j));
}
}
outputList.Add(_currentSetList);
}
return outputList;
}
As you can see, not a lot to it. It just goes round and round a lot!
I accept that the creating and building of lists may not be the most efficient way, but I need some way of providing the results back in a manageable way.
Update 2
Thanks for all the input and implementation work. Just to clarify a couple of points raised: I dont need the output to be in 'natural order', and also I am not that interested in the empty set being returned.
hughdbrown's implementation is intesting but i think that i will need to store the results (or at least a subset of them) at some point. It sounds like memory limitiations will apply long before running time becomes a real issue.
Partly because of this, I think I can get away with using bytes instead of integers, giving more potential storage.
The question really is then: Have we reached the maximum speed for this calcualtion in C#? Does the option of unmanaged code provide any more scope. I know in many respects the answer is futile, as even if we havled the time to run, it would only allow an extra values in the original set.
Also, be sure that moving to C/C++ is really what you need to do for speed to begin with. Instrument the original C# method (standalone, executed via unit tests), instrument the new C/C++ method (again, standalone via unit tests) and see what the real world difference is.
The reason I bring this up is that I fear it may be a pyrhhic victory -- using Smokey Bacon's advice, you get your list class, you're in "faster" C++, but there's still a cost to calling that DLL: Bouncing out of the runtime with P/Invoke or COM interop carries a fairly substantial performance cost.
Be sure you're getting your "money's worth" out of that jump before you do it.
Update based on the OP's Update
If you're calling this loop repeatedly, you need to absolutely make sure that the entire loop logic is encapsulated in a single interop call -- otherwise the overhead of marshalling (as others here have mentioned) will definitely kill you.
I do think, given the description of the problem, that the issue isn't that C#/.NET is "slower" than C, but more likely that the code needs to be optimized. As another poster here mentioned, you can use pointers in C# to seriously boost performance in this kind of loop, without the need for marshalling. I'd look into that first, before jumping into a complex interop world, for this scenario.
If you are looking to use C for a performance gain, most likely you are planning to do so through the use of pointers. C# does allow for use of pointers, using the unsafe keyword. Have you considered that?
Also how will you be calling this code.. will it be called often (e.g. in a loop?) If so, marshalling the data back and forth may more than offset any performance gains.
Follow Up
Take a look at Native code without sacrificing .NET performance for some interop options. There are ways to interop without too much of a performance loss, but those interops can only happen with the simplest of data types.
Though I still think that you should investigate speeding up your code using straight .NET.
Follow Up 2
Also, may I suggest that if you have your heart set on mixing native code and managed code, that you create your library using c++/cli. Below is a simple example. Note that I am not a c++/cli guy, and this code doesn't do anything useful...its just meant to show how easily you can mix native and managed code.
#include "stdafx.h"
using namespace System;
System::Collections::Generic::List<int> ^MyAlgorithm(System::Collections::Generic::List<int> ^sourceList);
int main(array<System::String ^> ^args)
{
System::Collections::Generic::List<int> ^intList = gcnew System::Collections::Generic::List<int>();
intList->Add(1);
intList->Add(2);
intList->Add(3);
intList->Add(4);
intList->Add(5);
Console::WriteLine("Before Call");
for each(int i in intList)
{
Console::WriteLine(i);
}
System::Collections::Generic::List<int> ^modifiedList = MyAlgorithm(intList);
Console::WriteLine("After Call");
for each(int i in modifiedList)
{
Console::WriteLine(i);
}
}
System::Collections::Generic::List<int> ^MyAlgorithm(System::Collections::Generic::List<int> ^sourceList)
{
int* nativeInts = new int[sourceList->Count];
int nativeIntArraySize = sourceList->Count;
//Managed to Native
for(int i=0; i<sourceList->Count; i++)
{
nativeInts[i] = sourceList[i];
}
//Do Something to native ints
for(int i=0; i<nativeIntArraySize; i++)
{
nativeInts[i]++;
}
//Native to Managed
System::Collections::Generic::List<int> ^returnList = gcnew System::Collections::Generic::List<int>();
for(int i=0; i<nativeIntArraySize; i++)
{
returnList->Add(nativeInts[i]);
}
return returnList;
}
What makes you think you'll gain speed by calling into C code? C isn't magically faster than C#. It can be, of course, but it can also easily be slower (and buggier). Especially when you factor in the p/invoke calls into native code, it's far from certain that this approach will speed up anything.
In any case, C doesn't have anything like List. It has raw arrays and pointers (and you could argue that int** is more or less equivalent), but you're probably better off using C++, which does have equivalent datastructures. In particular, std::vector.
There are no simple ways to expose this data to C# however, since it will be scattered pretty much randomly (each list is a pointer to some dynamically allocated memory somewhere)
However, I suspect the biggest performance improvement comes from improving the algorithm in C#.
Edit:
I can see several things in your algorithm that seem suboptimal. Constructing a list of lists isn't free. Perhaps you can create a single list and use different offsets to represent each sublist. Or perhaps using 'yield return' and IEnumerable instead of explicitly constructing lists might be faster.
Have you profiled your code, found out where the time is being spent?
This returns one set of a powerset at a time. It is based on python code here. It works for powersets of over 32 elements. If you need fewer than 32, you can change long to int. It is pretty fast -- faster than my previous algorithm and faster than (my modified to use yield return version of) P Daddy's code.
static class PowerSet4<T>
{
static public IEnumerable<IList<T>> powerset(T[] currentGroupList)
{
int count = currentGroupList.Length;
Dictionary<long, T> powerToIndex = new Dictionary<long, T>();
long mask = 1L;
for (int i = 0; i < count; i++)
{
powerToIndex[mask] = currentGroupList[i];
mask <<= 1;
}
Dictionary<long, T> result = new Dictionary<long, T>();
yield return result.Values.ToArray();
long max = 1L << count;
for (long i = 1L; i < max; i++)
{
long key = i & -i;
if (result.ContainsKey(key))
result.Remove(key);
else
result[key] = powerToIndex[key];
yield return result.Values.ToArray();
}
}
}
You can download all the fastest versions I have tested here.
I really think that using yield return is the change that makes calculating large powersets possible. Allocating large amounts of memory upfront increases runtime dramatically and causes algorithms to fail for lack of memory very early on. Original Poster should figure out how many sets of a powerset he needs at once. Holding all of them is not really an option with >24 elements.
I'm also going to put in a vote for tuning-up your C#, particularly by going to 'unsafe' code and losing what might be a lot of bounds-checking overhead.
Even though it's 'unsafe', it's no less 'safe' than C/C++, and it's dramatically easier to get right.
Below is a C# algorithm that should be much faster (and use less memory) than the algorithm you posted. It doesn't use the neat binary trick yours uses, and as a result, the code is a good bit longer. It has a few more for loops than yours, and might take a time or two stepping through it with the debugger to fully grok it. But it's actually a simpler approach, once you understand what it's doing.
As a bonus, the returned sets are in a more "natural" order. It would return subsets of the set {1 2 3} in the same order you listed them in your question. That wasn't a focus, but is a side effect of the algorithm used.
In my tests, I found this algorithm to be approximately 4 times faster than the algorithm you posted for a large set of 22 items (which was as large as I could go on my machine without excessive disk-thrashing skewing the results too much). One run of yours took about 15.5 seconds, and mine took about 3.6 seconds.
For smaller lists, the difference is less pronounced. For a set of only 10 items, yours ran 10,000 times in about 7.8 seconds, and mine took about 3.2 seconds. For sets with 5 or fewer items, they run close to the same time. With many iterations, yours runs a little faster.
Anyway, here's the code. Sorry it's so long; I tried to make sure I commented it well.
/*
* Made it static, because it shouldn't really use or modify state data.
* Making it static also saves a tiny bit of call time, because it doesn't
* have to receive an extra "this" pointer. Also, accessing a local
* parameter is a tiny bit faster than accessing a class member, because
* dereferencing the "this" pointer is not free.
*
* Made it generic so that the same code can handle sets of any type.
*/
static IList<IList<T>> PowerSet<T>(IList<T> set){
if(set == null)
throw new ArgumentNullException("set");
/*
* Caveat:
* If set.Count > 30, this function pukes all over itself without so
* much as wiping up afterwards. Even for 30 elements, though, the
* result set is about 68 GB (if "set" is comprised of ints). 24 or
* 25 elements is a practical limit for current hardware.
*/
int setSize = set.Count;
int subsetCount = 1 << setSize; // MUCH faster than (int)Math.Pow(2, setSize)
T[][] rtn = new T[subsetCount][];
/*
* We don't really need dynamic list allocation. We can calculate
* in advance the number of subsets ("subsetCount" above), and
* the size of each subset (0 through setSize). The performance
* of List<> is pretty horrible when the initial size is not
* guessed well.
*/
int subsetIndex = 0;
for(int subsetSize = 0; subsetSize <= setSize; subsetSize++){
/*
* The "indices" array below is part of how we implement the
* "natural" ordering of the subsets. For a subset of size 3,
* for example, we initialize the indices array with {0, 1, 2};
* Later, we'll increment each index until we reach setSize,
* then carry over to the next index. So, assuming a set size
* of 5, the second iteration will have indices {0, 1, 3}, the
* third will have {0, 1, 4}, and the fifth will involve a carry,
* so we'll have {0, 2, 3}.
*/
int[] indices = new int[subsetSize];
for(int i = 1; i < subsetSize; i++)
indices[i] = i;
/*
* Now we'll iterate over all the subsets we need to make for the
* current subset size. The number of subsets of a given size
* is easily determined with combination (nCr). In other words,
* if I have 5 items in my set and I want all subsets of size 3,
* I need 5-pick-3, or 5C3 = 5! / 3!(5 - 3)! = 10.
*/
for(int i = Combination(setSize, subsetSize); i > 0; i--){
/*
* Copy the items from the input set according to the
* indices we've already set up. Alternatively, if you
* just wanted the indices in your output, you could
* just dup the index array here (but make sure you dup!
* Otherwise the setup step at the bottom of this for
* loop will mess up your output list! You'll also want
* to change the function's return type to
* IList<IList<int>> in that case.
*/
T[] subset = new T[subsetSize];
for(int j = 0; j < subsetSize; j++)
subset[j] = set[indices[j]];
/* Add the subset to the return */
rtn[subsetIndex++] = subset;
/*
* Set up indices for next subset. This looks a lot
* messier than it is. It simply increments the
* right-most index until it overflows, then carries
* over left as far as it needs to. I've made the
* logic as fast as I could, which is why it's hairy-
* looking. Note that the inner for loop won't
* actually run as long as a carry isn't required,
* and will run at most once in any case. The outer
* loop will go through as few iterations as required.
*
* You may notice that this logic doesn't check the
* end case (when the left-most digit overflows). It
* doesn't need to, since the loop up above won't
* execute again in that case, anyway. There's no
* reason to waste time checking that here.
*/
for(int j = subsetSize - 1; j >= 0; j--)
if(++indices[j] <= setSize - subsetSize + j){
for(int k = j + 1; k < subsetSize; k++)
indices[k] = indices[k - 1] + 1;
break;
}
}
}
return rtn;
}
static int Combination(int n, int r){
if(r == 0 || r == n)
return 1;
/*
* The formula for combination is:
*
* n!
* ----------
* r!(n - r)!
*
* We'll actually use a slightly modified version here. The above
* formula forces us to calculate (n - r)! twice. Instead, we only
* multiply for the numerator the factors of n! that aren't canceled
* out by (n - r)! in the denominator.
*/
/*
* nCr == nC(n - r)
* We can use this fact to reduce the number of multiplications we
* perform, as well as the incidence of overflow, where r > n / 2
*/
if(r > n / 2) /* We DO want integer truncation here (7 / 2 = 3) */
r = n - r;
/*
* I originally used all integer math below, with some complicated
* logic and another function to handle cases where the intermediate
* results overflowed a 32-bit int. It was pretty ugly. In later
* testing, I found that the more generalized double-precision
* floating-point approach was actually *faster*, so there was no
* need for the ugly code. But if you want to see a giant WTF, look
* at the edit history for this post!
*/
double denominator = Factorial(r);
double numerator = n;
while(--r > 0)
numerator *= --n;
return (int)(numerator / denominator + 0.1/* Deal with rounding errors. */);
}
/*
* The archetypical factorial implementation is recursive, and is perhaps
* the most often used demonstration of recursion in text books and other
* materials. It's unfortunate, however, that few texts point out that
* it's nearly as simple to write an iterative factorial function that
* will perform better (although tail-end recursion, if implemented by
* the compiler, will help to close the gap).
*/
static double Factorial(int x){
/*
* An all-purpose factorial function would handle negative numbers
* correctly - the result should be Sign(x) * Factorial(Abs(x)) -
* but since we don't need that functionality, we're better off
* saving the few extra clock cycles it would take.
*/
/*
* I originally used all integer math below, but found that the
* double-precision floating-point version is not only more
* general, but also *faster*!
*/
if(x < 2)
return 1;
double rtn = x;
while(--x > 1)
rtn *= x;
return rtn;
}
Your list of results does not match the results your code would produce. In particular, you do not show generating the empty set.
If I were producing powersets that could have a few billion subsets, then generating each subset separately rather than all at once might cut down on your memory requirements, improving your code's speed. How about this:
static class PowerSet<T>
{
static long[] mask = { 1L << 0, 1L << 1, 1L << 2, 1L << 3,
1L << 4, 1L << 5, 1L << 6, 1L << 7,
1L << 8, 1L << 9, 1L << 10, 1L << 11,
1L << 12, 1L << 13, 1L << 14, 1L << 15,
1L << 16, 1L << 17, 1L << 18, 1L << 19,
1L << 20, 1L << 21, 1L << 22, 1L << 23,
1L << 24, 1L << 25, 1L << 26, 1L << 27,
1L << 28, 1L << 29, 1L << 30, 1L << 31};
static public IEnumerable<IList<T>> powerset(T[] currentGroupList)
{
int count = currentGroupList.Length;
long max = 1L << count;
for (long iter = 0; iter < max; iter++)
{
T[] list = new T[count];
int k = 0, m = -1;
for (long i = iter; i != 0; i &= (i - 1))
{
while ((mask[++m] & i) == 0)
;
list[k++] = currentGroupList[m];
}
yield return list;
}
}
}
Then your client code looks like this:
static void Main(string[] args)
{
int[] intList = { 1, 2, 3, 4 };
foreach (IList<int> set in PowerSet<int>.powerset(intList))
{
foreach (int i in set)
Console.Write("{0} ", i);
Console.WriteLine();
}
}
I'll even throw in a bit-twiddling algorithm with templated arguments for free. For added speed, you can wrap the powerlist() inner loop in an unsafe block. It doesn't make much difference.
On my machine, this code is slightly slower than the OP's code until the sets are 16 or larger. However, all times to 16 elements are less than 0.15 seconds. At 23 elements, it runs in 64% of the time. The original algorithm does not run on my machine for 24 or more elements -- it runs out of memory.
This code takes 12 seconds to generate the power set for the numbers 1 to 24, omitting screen I/O time. That's 16 million-ish in 12 seconds, or about 1400K per second. For a billion (which is what you quoted earlier), that would be about 760 seconds. How long do you think this should take?
Does it have to be C, or is C++ an option too? If C++, you can just its own list type from the STL. Otherwise, you'll have to implement your own list - look up linked lists or dynamically sized arrays for pointers on how to do this.
I concur with the "optimize .NET first" opinion. It's the most painless. I imagine that if you wrote some unmanaged .NET code using C# pointers, it'd be identical to C execution, except for the VM overhead.
P Daddy:
You could change your Combination() code to this:
static long Combination(long n, long r)
{
r = (r > n - r) ? (n - r) : r;
if (r == 0)
return 1;
long result = 1;
long k = 1;
while (r-- > 0)
{
result *= n--;
result /= k++;
}
return result;
}
This will reduce the multiplications and the chance of overflow to a minimum.
Related
I have byte[] byteArray, usually byteArray.Length = 1-3
I need decompose an array into bits, take some bits (for example, 5-17), and convert these bits to Int32.
I tried to do this
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b)
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0);
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList();
bools = bools.GetRange(offset, length);
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() );
int[] array = new int[1];
(new BitArray(bools.ToArray())).CopyTo(array, 0);
return array[0];
}
But this method is too slow, and I have to call it very often.
How can I do this more efficiently?
Thanx a lot! Now i do this:
public static byte[] GetPartOfByteArray( byte[] source, int offset, int length)
{
byte[] retBytes = new byte[length];
Buffer.BlockCopy(source, offset, retBytes, 0, length);
return retBytes;
}
public static Int32 Bits2Int(byte[] source, int offset, int length)
{
if (source.Length > 4)
{
source = GetPartOfByteArray(source, offset / 8, (source.Length - offset / 8 > 3 ? 4 : source.Length - offset / 8));
offset -= 8 * (offset / 8);
}
byte[] intBytes = new byte[4];
source.CopyTo(intBytes, 0);
Int32 full = BitConverter.ToInt32(intBytes);
Int32 mask = (1 << length) - 1;
return (full >> offset) & mask;
}
And it works very fast!
If you're after "fast", then ultimately you need to do this with bit logic, not LINQ etc. I'm not going to write actual code, but you'd need to:
use your offset with / 8 and % 8 to find the starting byte and the bit-offset inside that byte
compose however many bytes you need - quite possibly up to 5 if you are after a 32-bit number (because of the possibility of an offset)
; for example into a long, in whichever endianness (presumably big-endian?) you expect
use right-shift (>>) on the composed value to drop however-many bits you need to apply the bit-offset (i.e. value >>= offset % 8;)
mask out any bits you don't want; for example value &= ~(-1L << length); (the -1 gives you all-ones; the << length creates length zeros at the right hand edge, and the ~ swaps all zeros for ones and ones for zeros, so you now have length ones at the right hand edge)
if the value is signed, you'll need to think about how you want negatives to be handled, especially if you aren't always reading 32 bits
First of all, you're asking for optimization. But the only things you've said are:
too slow
need to call it often
There's no information on:
how much slow is too slow? have you measured current code? have you estimated how fast you need it to be?
how often is "often"?
how large are the source byt arrays?
etc.
Optimization can be done in a multitude of ways. When asking for optimization, everything is important. For example, if source byte[] is 1 or 2 bytes long (yeah, may be ridiculous, but you didn't tell us), and if it rarely changes, then you could get very nice results by caching results. And so on.
So, no solutions from me, just a list of possible performance problems:
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b) // A
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0); // A
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList(); //A,B
bools = bools.GetRange(offset, length); //B
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() ); //C
int[] array = new int[1]; //D
(new BitArray(bools.ToArray())).CopyTo(array, 0); //D
return array[0]; //D
}
A: LINQ is fun, but not fast unless done carefully. For each input byte, it takes 1 byte, splits that in 8 bools, passing them around wrapped it in a compiler-generated IEnumerable object *). Note that it all needs to be cleaned up later, too. Probably you'd get a better performance simply returning a new bool[8] or even BitArray(size=8).
*) conceptually. In fact yield-return is lazy, so it's not 8valueobj+1refobj, but just 1 enumerable that generates items. But then, you're doing .ToList() in (B), so me writing this in that way isn't that far from truth.
A2: the 8 is hardcoded. Once you drop that pretty IEnumerable and change it to a constant-sized array-like thing, you can preallocate that array and pass it via parameter to GetBitsStartingFromLSB to further reduce the amount of temporary objects created and later thrown away. And since SelectMany visits items one-by-one without ever going back, that preallocated array can be reused.
B: Converts whole Source array to stream of bytes, converts it to List. Then discards that whole list except for a small offset-length range of that list. Why covert to list at all? It's just another pack of objects wasted, and internal data is copied as well, since bool is a valuetype. You could have taken the range directly from IEnumerable by .Skip(X).Take(Y)
C: padding a list of bools to have 32 items. AddRange/Repeat is fun, but Repeat has to return an IEnumerable. It's again another object that is created and throw away. You're padding the list with false. Drop the list idea, make it an bool[32]. Or BitArray(32). They start with false automatically. That's the default value of a bool. Iterate over the those bits from 'range' A+B and write them into that array by index. Those written will have their value, those unwritten will stay false. Job done, no objects wasted.
C2: connect preallocating 32-item array with A+A2. GetBitsStartingFromLSB doesn't need to return anything, it may get a buffer to be filled via parameter. And that buffer doesn't need to be 8-item buffer. You may pass the whole 32-item final array, and pass an offset so that function knows exactly where to write. Even less objects wasted.
D: finally, all that work to return selected bits as an integer. new temporary array is created&wasted, new BitArray is effectively created&wasted too. Note that earlier you were already doing manual bit-shift conversion int->bits in GetBitsStartingFromLSB, why not just create a similar method that will do some shifts and do bits->int instead? If you knew the order of the bits, now you know them as well. No need for array&BitArray, some code wiggling, and you save on that allocations and data copying again.
I have no idea how much time/space/etc will that save for you, but that's just a few points that stand out at first glance, without modifying your original idea for the code too much, without doing-it-all via math&bitshifts in one go, etc. I've seen MarcGravell already wrote you some hints too. If you have time to spare, I suggest you try first mine, one by one, and see how (and if at all !) each change affects performance. Just to see. Then you'll probably scrap it all and try again new "do-it-all via math&bitshifts in one go" version with Marc's hints.
currently im working on a solution for a prime-number calculator/checker. The algorythm is already working and verry efficient (0,359 seconds for the first 9012330 primes). Here is a part of the upper region where everything is declared:
const uint anz = 50000000;
uint a = 3, b = 4, c = 3, d = 13, e = 12, f = 13, g = 28, h = 32;
bool[,] prim = new bool[8, anz / 10];
uint max = 3 * (uint)(anz / (Math.Log(anz) - 1.08366));
uint[] p = new uint[max];
Now I wanted to go to the next level and use ulong's instead of uint's to cover a larger area (you can see that already), where i tapped into my problem: the bool-array.
Like everybody should know, bool's have the length of a byte what takes a lot of memory when creating the array... So I'm searching for a more resource-friendly way to do that.
My first idea was a bit-array -> not byte! <- to save the bool's, but haven't figured out how to do that by now. So if someone ever did something like this, I would appreciate any kind of tips and solutions. Thanks in advance :)
You can use BitArray collection:
http://msdn.microsoft.com/en-us/library/system.collections.bitarray(v=vs.110).aspx
MSDN Description:
Manages a compact array of bit values, which are represented as Booleans, where true indicates that the bit is on (1) and false indicates the bit is off (0).
You can (and should) use well tested and well known libraries.
But if you're looking to learn something (as it seems to be the case) you can do it yourself.
Another reason you may want to use a custom bit array is to use the hard drive to store the array, which comes in handy when calculating primes. To do this you'd need to further split addr, for example lowest 3 bits for the mask, next 28 bits for 256MB of in-memory storage, and from there on - a file name for a buffer file.
Yet another reason for custom bit array is to compress the memory use when specifically searching for primes. After all more than half of your bits will be 'false' because the numbers corresponding to them would be even, so in fact you can both speed up your calculation AND reduce memory requirements if you don't even store the even bits. You can do that by changing the way addr is interpreted. Further more you can also exclude numbers divisible by 3 (only 2 out of every 6 numbers has a chance of being prime) thus reducing memory requirements by 60% compared to plain bit array.
Notice the use of shift and logical operators to make the code a bit more efficient.
byte mask = (byte)(1 << (int)(addr & 7)); for example can be written as
byte mask = (byte)(1 << (int)(addr % 8));
and addr >> 3 can be written as addr / 8
Testing shift/logical operators vs division shows 2.6s vs 4.8s in favor of shift/logical for 200000000 operations.
Here's the code:
void Main()
{
var barr = new BitArray(10);
barr[4] = true;
Console.WriteLine("Is it "+barr[4]);
Console.WriteLine("Is it Not "+barr[5]);
}
public class BitArray{
private readonly byte[] _buffer;
public bool this[long addr]{
get{
byte mask = (byte)(1 << (int)(addr & 7));
byte val = _buffer[(int)(addr >> 3)];
bool bit = (val & mask) == mask;
return bit;
}
set{
byte mask = (byte) ((value ? 1:0) << (int)(addr & 7));
int offs = (int)addr >> 3;
_buffer[offs] = (byte)(_buffer[offs] | mask);
}
}
public BitArray(long size){
_buffer = new byte[size/8 + 1]; // define a byte buffer sized to hold 8 bools per byte. The spare +1 is to avoid dealing with rounding.
}
}
Given a large number, e.g. 9223372036854775807 (Int64.MaxValue), what is the quickest way to sum the digits?
Currently I am ToStringing and reparsing each char into an int:
num.ToString().Sum(c => int.Parse(new String(new char[] { c })));
Which is surely horrifically inefficent. Any suggestions?
And finally, how would you make this work with BigInteger?
Thanks
Well, another option is:
int sum = 0;
while (value != 0)
{
int remainder;
value = Math.DivRem(value, 10, out remainder);
sum += remainder;
}
BigInteger has a DivRem method as well, so you could use the same approach.
Note that I've seen DivRem not be as fast as doing the same arithmetic "manually", so if you're really interested in speed, you might want to consider that.
Also consider a lookup table with (say) 1000 elements precomputed with the sums:
int sum = 0;
while (value != 0)
{
int remainder;
value = Math.DivRem(value, 1000, out remainder);
sum += lookupTable[remainder];
}
That would mean fewer iterations, but each iteration has an added array access...
Nobody has discussed the BigInteger version. For that I'd look at 101, 102, 104, 108 and so on until you find the last 102n that is less than your value. Take your number div and mod 102n to come up with 2 smaller values. Wash, rinse, and repeat recursively. (You should keep your iterated squares of 10 in an array, and in the recursive part pass along the information about the next power to use.)
With a BigInteger with k digits, dividing by 10 is O(k). Therefore finding the sum of the digits with the naive algorithm is O(k2).
I don't know what C# uses internally, but the non-naive algorithms out there for multiplying or dividing a k-bit by a k-bit integer all work in time O(k1.6) or better (most are much, much better, but have an overhead that makes them worse for "small big integers"). In that case preparing your initial list of powers and splitting once takes times O(k1.6). This gives you 2 problems of size O((k/2)1.6) = 2-0.6O(k1.6). At the next level you have 4 problems of size O((k/4)1.6) for another 2-1.2O(k1.6) work. Add up all of the terms and the powers of 2 turn into a geometric series converging to a constant, so the total work is O(k1.6).
This is a definite win, and the win will be very, very evident if you're working with numbers in the many thousands of digits.
Yes, it's probably somewhat inefficient. I'd probably just repeatedly divide by 10, adding together the remainders each time.
The first rule of performance optimization: Don't divide when you can multiply instead. The following function will take four digit numbers 0-9999 and do what you ask. The intermediate calculations are larger than 16 bits. We multiple the number by 1/10000 and take the result as a Q16 fixed point number. Digits are then extracted by multiplication by 10 and taking the integer part.
#define TEN_OVER_10000 ((1<<25)/1000 +1) // .001 Q25
int sum_digits(unsigned int n)
{
int c;
int sum = 0;
n = (n * TEN_OVER_10000)>>9; // n*10/10000 Q16
for (c=0;c<4;c++)
{
printf("Digit: %d\n", n>>16);
sum += n>>16;
n = (n & 0xffff) * 10; // next digit
}
return sum;
}
This can be extended to larger sizes but its tricky. You need to ensure that the rounding in the fixed point calculation always works correctly. I also did 4 digit numbers so the intermediate result of the fixed point multiply would not overflow.
Int64 BigNumber = 9223372036854775807;
String BigNumberStr = BigNumber.ToString();
int Sum = 0;
foreach (Char c in BigNumberStr)
Sum += (byte)c;
// 48 is ascii value of zero
// remove in one step rather than in the loop
Sum -= 48 * BigNumberStr.Length;
Instead of int.parse, why not subtract '0' from each digit to get the actual value.
Remember, '9' - '0' = 9, so you should be able to do this in order k (length of the number). The subtraction is just one operation, so that should not slow things down.
(btw. This refers to 32 bit OS)
SOME UPDATES:
This is definitely an alignment issue
Sometimes the alignment (for whatever reason?) is so bad that access to the double is more than 50x slower than its fastest access.
Running the code on a 64 bit machine cuts down the issue, but I think it was still alternating between two timing (of which I could get similar results by changing the double to a float on a 32 bit machine)
Running the code under mono exhibits no issue -- Microsoft, any chance you can copy something from those Novell guys???
Is there a way to memory align the allocation of classes in c#?
The following demonstrates (I think!) the badness of not having doubles aligned correctly. It does some simple math on a double stored in a class, timing each run, running 5 timed runs on the variable before allocating a new one and doing it over again.
Basically the results looks like you either have a fast, medium or slow memory position (on my ancient processor, these end up around 40, 80 or 120ms per run)
I have tried playing with StructLayoutAttribute, but have had no joy - maybe something else is going on?
class Sample
{
class Variable { public double Value; }
static void Main()
{
const int COUNT = 10000000;
while (true)
{
var x = new Variable();
for (int inner = 0; inner < 5; ++inner)
{
// move allocation here to allocate more often so more probably to get 50x slowdown problem
var stopwatch = Stopwatch.StartNew();
var total = 0.0;
for (int i = 1; i <= COUNT; ++i)
{
x.Value = i;
total += x.Value;
}
if (Math.Abs(total - 50000005000000.0) > 1)
throw new ApplicationException(total.ToString());
Console.Write("{0}, ", stopwatch.ElapsedMilliseconds);
}
Console.WriteLine();
}
}
}
So I see lots of web pages about alignment of structs for interop, so what about alignment of classes?
(Or are my assumptions wrong, and there is another issue with the above?)
Thanks,
Paul.
Interesting look in the gears that run the machine. I have a bit of a problem explaining why there are multiple distinct values (I got 4) when a double can be aligned only two ways. I think alignment to the CPU cache line plays a role as well, although that only adds up to 3 possible timings.
Well, nothing you can do about it, the CLR only promises alignment for 4 byte values so that atomic updates on 32-bit machines are guaranteed. This is not just an issue with managed code, C/C++ has this problem too. Looks like the chip makers need to solve this one.
If it is critical then you could allocate unmanaged memory with Marshal.AllocCoTaskMem() and use an unsafe pointer that you can align just right. Same kind of thing you'd have to do if you allocate memory for code that uses SIMD instructions, they require a 16 byte alignment. Consider it a desperation-move though.
To prove the concept of misalignment of objects on heap in .NET you can run following code and you'll see that now it always runs fast. Please don't shoot me, it's just a PoC, but if you are really concerned about performance you might consider using it ;)
public static class AlignedNew
{
public static T New<T>() where T : new()
{
LinkedList<T> candidates = new LinkedList<T>();
IntPtr pointer = IntPtr.Zero;
bool continue_ = true;
int size = Marshal.SizeOf(typeof(T)) % 8;
while( continue_ )
{
if (size == 0)
{
object gap = new object();
}
candidates.AddLast(new T());
GCHandle handle = GCHandle.Alloc(candidates.Last.Value, GCHandleType.Pinned);
pointer = handle.AddrOfPinnedObject();
continue_ = (pointer.ToInt64() % 8) != 0 || (pointer.ToInt64() % 64) == 24;
handle.Free();
if (!continue_)
return candidates.Last.Value;
}
return default(T);
}
}
class Program
{
[StructLayoutAttribute(LayoutKind.Sequential)]
public class Variable
{
public double Value;
}
static void Main()
{
const int COUNT = 10000000;
while (true)
{
var x = AlignedNew.New<Variable>();
for (int inner = 0; inner < 5; ++inner)
{
var stopwatch = Stopwatch.StartNew();
var total = 0.0;
for (int i = 1; i <= COUNT; ++i)
{
x.Value = i;
total += x.Value;
}
if (Math.Abs(total - 50000005000000.0) > 1)
throw new ApplicationException(total.ToString());
Console.Write("{0}, ", stopwatch.ElapsedMilliseconds);
}
Console.WriteLine();
}
}
}
Maybe the StructLayoutAttribute is what you are looking for?
Using struct instead of class, makes the time constant. also consider using StructLayoutAttribute. It helps to specify exact memory layout of a structures. For CLASSES I do not think you have any guarantees how they are layouted in memory.
It will be correctly aligned, otherwise you'd get alignment exceptions on x64. I don't know what your snippet shows, but I wouldn't say anything about alignment from it.
You don't have any control over how .NET lays out your class in memory.
As others have said the StructLayoutAttribute can be used to force a specific memory layout for a struct BUT note that the purpose of this is for C/C++ interop, not for trying to fine-tune the performance of your .NET app.
If you're worried about memory alignment issues then C# is probably the wrong choice of language.
EDIT - Broke out WinDbg and looked at the heap running the code above on 32-bit Vista and .NET 2.0.
Note: I don't get the variation in timings shown above.
0:003> !dumpheap -type Sample+Variable
Address MT Size
01dc2fec 003f3c48 16
01dc54a4 003f3c48 16
01dc58b0 003f3c48 16
01dc5cbc 003f3c48 16
01dc60c8 003f3c48 16
01dc64d4 003f3c48 16
01dc68e0 003f3c48 16
01dc6cd8 003f3c48 16
01dc70e4 003f3c48 16
01dc74f0 003f3c48 16
01dc78e4 003f3c48 16
01dc7cf0 003f3c48 16
01dc80fc 003f3c48 16
01dc8508 003f3c48 16
01dc8914 003f3c48 16
01dc8d20 003f3c48 16
01dc912c 003f3c48 16
01dc9538 003f3c48 16
total 18 objects
Statistics:
MT Count TotalSize Class Name
003f3c48 18 288 TestConsoleApplication.Sample+Variable
Total 18 objects
0:003> !do 01dc9538
Name: TestConsoleApplication.Sample+Variable
MethodTable: 003f3c48
EEClass: 003f15d0
Size: 16(0x10) bytes
(D:\testcode\TestConsoleApplication\bin\Debug\TestConsoleApplication.exe)
Fields:
MT Field Offset Type VT Attr Value Name
6f5746e4 4000001 4 System.Double 1 instance 1655149.000000 Value
This seems to me that the classes' allocation addresses appear to be aligned unless I'm reading this wrong?
Can people recommend quick and simple ways to combine the hash codes of two objects. I am not too worried about collisions since I have a Hash Table which will handle that efficiently I just want something that generates a code quickly as possible.
Reading around SO and the web there seem to be a few main candidates:
XORing
XORing with Prime Multiplication
Simple numeric operations like multiplication/division (with overflow checking or wrapping around)
Building a String and then using the String classes Hash Code method
What would people recommend and why?
I would personally avoid XOR - it means that any two equal values will result in 0 - so hash(1, 1) == hash(2, 2) == hash(3, 3) etc. Also hash(5, 0) == hash(0, 5) etc which may come up occasionally. I have deliberately used it for set hashing - if you want to hash a sequence of items and you don't care about the ordering, it's nice.
I usually use:
unchecked
{
int hash = 17;
hash = hash * 31 + firstField.GetHashCode();
hash = hash * 31 + secondField.GetHashCode();
return hash;
}
That's the form that Josh Bloch suggests in Effective Java. Last time I answered a similar question I managed to find an article where this was discussed in detail - IIRC, no-one really knows why it works well, but it does. It's also easy to remember, easy to implement, and easy to extend to any number of fields.
If you are using .NET Core 2.1 or later or .NET Framework 4.6.1 or later, consider using the System.HashCode struct to help with producing composite hash codes. It has two modes of operation: Add and Combine.
An example using Combine, which is usually simpler and works for up to eight items:
public override int GetHashCode()
{
return HashCode.Combine(object1, object2);
}
An example of using Add:
public override int GetHashCode()
{
var hash = new HashCode();
hash.Add(this.object1);
hash.Add(this.object2);
return hash.ToHashCode();
}
Pros:
Part of .NET itself, as of .NET Core 2.1/.NET Standard 2.1 (though, see con below)
For .NET Framework 4.6.1 and later, the Microsoft.Bcl.HashCode NuGet package can be used to backport this type.
Looks to have good performance and mixing characteristics, based on the work the author and the reviewers did before merging this into the corefx repo
Handles nulls automatically
Overloads that take IEqualityComparer instances
Cons:
Not available on .NET Framework before .NET 4.6.1. HashCode is part of .NET Standard 2.1. As of September 2019, the .NET team has no plans to support .NET Standard 2.1 on the .NET Framework, as .NET Core/.NET 5 is the future of .NET.
General purpose, so it won't handle super-specific cases as well as hand-crafted code
While the template outlined in Jon Skeet's answer works well in general as a hash function family, the choice of the constants is important and the seed of 17 and factor of 31 as noted in the answer do not work well at all for common use cases. In most use cases, the hashed values are much closer to zero than int.MaxValue, and the number of items being jointly hashed are a few dozen or less.
For hashing an integer tuple {x, y} where -1000 <= x <= 1000 and -1000 <= y <= 1000, it has an abysmal collision rate of almost 98.5%. For example, {1, 0} -> {0, 31}, {1, 1} -> {0, 32}, etc. If we expand the coverage to also include n-tuples where 3 <= n <= 25, it does less terrible with a collision rate of about 38%. But we can do much better.
public static int CustomHash(int seed, int factor, params int[] vals)
{
int hash = seed;
foreach (int i in vals)
{
hash = (hash * factor) + i;
}
return hash;
}
I wrote a Monte Carlo sampling search loop that tested the method above with various values for seed and factor over various random n-tuples of random integers i. Allowed ranges were 2 <= n <= 25 (where n was random but biased toward the lower end of the range) and -1000 <= i <= 1000. At least 12 million unique collision tests were performed for each seed and factor pair.
After about 7 hours running, the best pair found (where the seed and factor were both limited to 4 digits or less) was: seed = 1009, factor = 9176, with a collision rate of 0.1131%. In the 5- and 6-digit areas, even better options exist. But I selected the top 4-digit performer for brevity, and it peforms quite well in all common int and char hashing scenarios. It also seems to work fine with integers of much greater magnitudes.
It is worth noting that "being prime" did not seem to be a general prerequisite for good performance as a seed and/or factor although it likely helps. 1009 noted above is in fact prime, but 9176 is not. I explicitly tested variations on this where I changed factor to various primes near 9176 (while leaving seed = 1009) and they all performed worse than the above solution.
Lastly, I also compared against the generic ReSharper recommendation function family of hash = (hash * factor) ^ i; and the original CustomHash() as noted above seriously outperforms it. The ReSharper XOR style seems to have collision rates in the 20-30% range for common use case assumptions and should not be used in my opinion.
Use the combination logic in tuple. The example is using c#7 tuples.
(field1, field2).GetHashCode();
I presume that .NET Framework team did a decent job in testing their System.String.GetHashCode() implementation, so I would use it:
// System.String.GetHashCode(): http://referencesource.microsoft.com/#mscorlib/system/string.cs,0a17bbac4851d0d4
// System.Web.Util.StringUtil.GetStringHashCode(System.String): http://referencesource.microsoft.com/#System.Web/Util/StringUtil.cs,c97063570b4e791a
public static int CombineHashCodes(IEnumerable<int> hashCodes)
{
int hash1 = (5381 << 16) + 5381;
int hash2 = hash1;
int i = 0;
foreach (var hashCode in hashCodes)
{
if (i % 2 == 0)
hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ hashCode;
else
hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ hashCode;
++i;
}
return hash1 + (hash2 * 1566083941);
}
Another implementation is from System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32) and System.Array.CombineHashCodes(System.Int32, System.Int32) methods. This one is simpler, but probably doesn't have such a good distribution as the method above:
// System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32): http://referencesource.microsoft.com/#System.Web/Util/HashCodeCombiner.cs,21fb74ad8bb43f6b
// System.Array.CombineHashCodes(System.Int32, System.Int32): http://referencesource.microsoft.com/#mscorlib/system/array.cs,87d117c8cc772cca
public static int CombineHashCodes(IEnumerable<int> hashCodes)
{
int hash = 5381;
foreach (var hashCode in hashCodes)
hash = ((hash << 5) + hash) ^ hashCode;
return hash;
}
This is a repackaging of Special Sauce's brilliantly researched solution.
It makes use of Value Tuples (ITuple).
This allows defaults for the parameters seed and factor.
public static int CombineHashes(this ITuple tupled, int seed=1009, int factor=9176)
{
var hash = seed;
for (var i = 0; i < tupled.Length; i++)
{
unchecked
{
hash = hash * factor + tupled[i].GetHashCode();
}
}
return hash;
}
Usage:
var hash1 = ("Foo", "Bar", 42).CombineHashes();
var hash2 = ("Jon", "Skeet", "Constants").CombineHashes(seed=17, factor=31);
If your input hashes are the same size, evenly distributed and not related to each other then an XOR should be OK. Plus it's fast.
The situation I'm suggesting this for is where you want to do
H = hash(A) ^ hash(B); // A and B are different types, so there's no way A == B.
of course, if A and B can be expected to hash to the same value with a reasonable (non-negligible) probability, then you should not use XOR in this way.
If you're looking for speed and don't have too many collisions, then XOR is fastest. To prevent a clustering around zero, you could do something like this:
finalHash = hash1 ^ hash2;
return finalHash != 0 ? finalHash : hash1;
Of course, some prototyping ought to give you an idea of performance and clustering.
Assuming you have a relevant toString() function (where your different fields shall appear), I would just return its hashcode:
this.toString().hashCode();
This is not very fast, but it should avoid collisions quite well.
I would recommend using the built-in hash functions in System.Security.Cryptography rather than rolling your own.