Join the HashCode magic - c#

am started testing hash function on the uniqueness of the generated HashCodes with my algorithm. And i wrote next text class to test when the same hashCode will be generated.
class Program
{
static void Main(string[] args)
{
var hashes = new List<int>();
for (int i = 0; i < 100000; i++)
{
var vol = new Volume();
var code = vol.GetHashCode();
if (!hashes.Contains(code))
{
hashes.Add(code);
}
else
{
Console.WriteLine("Same hash code generated on the {0} retry", hashes.Count());
}
}
}
}
public class Volume
{
public Guid DriverId = Guid.NewGuid();
public Guid ComputerId = Guid.NewGuid();
public int Size;
public ulong VersionNumber;
public int HashCode;
public static ulong CurDriverEpochNumber;
public static Random RandomF = new Random();
public Volume()
{
Size = RandomF.Next(1000000, 1200000);
CurDriverEpochNumber ++;
VersionNumber = CurDriverEpochNumber;
HashCode = GetHashCodeInternal();
}
public int GetHashCodeInternal()
{
unchecked
{
var one = DriverId.GetHashCode() + ComputerId.GetHashCode() * 22;
var two = (ulong)Size + VersionNumber;
var result = one ^ (int)two;
return result;
}
}
}
GUIDs fields DriverId, ComputerId and int Size are random.
I assumed that at some time we will generate the same hash-code. You know it will break work with big collections. Magic was in fact that the retry number when the duplicated
hash code is generated are the same! I run sample code for several time and got near the same result: firs run duplicate on 10170 retry, second on 7628, third 7628
and again and again on 7628. Some times i got a little bit others results. Bu in most cases it was on 7628.
It has no explanations for me.
Is it error in . NET random generator or what?
Thanks all. Now it is clear the was bug in my code (Matthew Watson). I had to call GetHashCodeIntelrnal() and not GetHashCode(). The best GetHashCode unique results gave me:
public int GetHashCodeInternal()
{
unchecked
{
var one = DriverId.GetHashCode() + ComputerId.GetHashCode();
var two = ((ulong)Size) + VersionNumber;
var result = one ^ (int)two << 32;
return result;
}
}
Bu still on near 140 000 it give same code... i think it is not good because ve have collections near 10 000...

If you change your Console.WriteLine() to also print Volume.Size like so:
Console.WriteLine("Same hash code generated on the {0} retry ({1})", hashes.Count, vol.Size);
you will see that although hashes.Count is always the same for the first collision, vol.Size is usually different.
This seems to rule out the random number generator causing this issue - it looks like some strange property of GetHashCodeInternal().
Closer inspection reveals that you are calling the wrong hash code function.
This line: var code = vol.GetHashCode();
Should be: var code = vol.HashCode;
Try that instead! Because at the moment you are calling the default .Net GetHashCode() which is not doing what you want at all.

You will need to pass in the random number generator, having created a single one to be reused, as currently you're creating new instances of them too close together which results in the same seed being used, and hence the same sequence of numbers coming out.
Your results will randomly come out seemingly random at points where the seed is generated from the next ticks/seconds of the seed date. So, just incidental, really.

Related

Distinct() not working, behaving differently in .NET 5 and 4.7.2

I'm trying to trim duplicates in a list for a test app, and I'm using Distinct(). I've implemented IEquatable on my object for the default comparer. When I went to run it, it ended up trimming nothing at all from my list. This happened enough times that I started to experiment and dig in a bit more.
What I've found is that something went seriously wrong with my code, and I'm not sure where. In the below code I generate 1000 items with IDs between 0 and 9. When I call Distinct() on my list, I expect to get a list of around 10 items.
What I get are wildly different results based on the version of .NET, both of which are pretty much dead wrong.
In .NET 5 the list isn't filtered at all. 1000 items before the call, 1000 items after.
In .NET 4.7.2, the list is filtered down to some number less than 10 -- usually around 2 or so in my attempts. Given that there are 1000 items to be looked at, all with IDs in a 10-number range, I should get 10 pretty much every time.
So my question is, what's going on here? I'm pretty sure I have a bug in my code, but I also can't explain the discrepancy between the two versions of .NET.
Here's the code, a pair of Fiddles, and output.
Object Definition:
public class Book : IEquatable<Book>
{
public int Id
{
get;
set;
}
public string Name
{
get;
set;
}
public bool Equals(Book other)
{
return Id == other.Id;
}
public override int GetHashCode()
{
return Name.GetHashCode() ^ Id.GetHashCode();
}
}
Main Program:
public class Program
{
public static void Main()
{
var list = new List<Book>();
for (int i = 0; i < 1000; i++)
{
list.Add(GenerateRandomItem());
}
Console.WriteLine("List count: " + list.Count);
list = list.Distinct().ToList();
Console.WriteLine("Final list count: " + list.Count);
foreach (var item in list) { Console.WriteLine("Id: " + item.Id); }
}
private static Book GenerateRandomItem()
{
var rng = new Random();
return new Book { Id = rng.Next(0, 10), Name = GenerateString(rng) };
}
private static string GenerateString(Random rng)
{
var result = string.Empty;
for (int i = 0; i < 10; i++)
{
result += (char)rng.Next(65, 91);
}
return result;
}
}
Fiddle .NET 5
Output (due to Random, will differ each run):
List count: 1000
Final list count: 1000
Id: 1
Id: 0
Id: 2
Id: 3
Id: 4
... continues for all 1000 items ...
Fiddle .NET 4.7.2
Output (due to Random, will differ each run):
List count: 1000
Final list count: 1
Id: 8
2 Problems here:
1.You GetHashCode doesn't match the Equals. It should be only:
public override int GetHashCode()
{
return Id.GetHashCode();
}
2.Also note that every time you do new Random() it is created using the clock. This means that in a tight loop you get the same value lots of times. You should keep a single Random instance, as following:
public static void Main()
{
var rng = new Random();
var list = new List<Book>();
for (int i = 0; i < 1000; i++)
{
list.Add(GenerateRandomItem(rng));
}
Console.WriteLine("List count: " + list.Count);
list = list.Distinct().ToList();
Console.WriteLine("Final list count: " + list.Count);
foreach (var item in list) { Console.WriteLine("Id: " + item.Id); }
}
private static Book GenerateRandomItem(Random rng)
{
return new Book { Id = rng.Next(0, 10), Name = GenerateString(rng) };
}
Make these two changes, and you'll see both .NET 5 and .NET 4.7.2 work exactly the same.
I'm pretty sure that the reason can be found in the fact that you always create new Random instances very quickly in the loop. The .NET behavior of the parameterless constructor is that it takes the current system time as seed. So if you call it quickly you will get the same seed again which means you will produce the same results over and over.
Instead you should pass the Random instance from Main to the GenerateRandomItem method.
This article here indicates that there was an (undocumented) breaking change in .NET Core. So maybe This is causing the difference:
https://github.com/dotnet/dotnet-api-docs/issues/3764
The behaviour of the System.Random class in .NET Framework is that the
parameterless constuctor takes the seed value from the current system
time (Environment.TickCount). This has been documented here:
https://learn.microsoft.com/en-us/dotnet/api/system.random?view=netframework-4.8#instantiating-the-random-number-generator.
This behaviour leads to the well-known issue: several random
generators created in quick succession produce the same value
sequences.
The behaviour has changed in .NET Core: now the initial seed value is
randomized as well, so several random generator's instances produce
different sequences even if created in quick succession
A simple test with LinqPad 5(.NET Framework 4.6/4.7/4.8) vs LinqPad 5(.Net Core 3/ .NET 5) showed that it's the reason:
void Main()
{
List<int> list = new List<int>();
for(int i = 0; i < 100; i++)
{
list.Add(RandomNumber());
}
var numLookup = list.ToLookup(i => i).OrderByDescending(x => x.Count());
}
int RandomNumber()
{
return new Random().Next(1, 100);
}
LinqPad 5(.NET Framework 4.6/4.7/4.8):
Always the same number
LinqPad 5(.Net Core 3/ .NET 5):
Different results even if using the parameterless constructor
As CloudWindMoonSun correctly pointed out your GetHashCode looks wrong, but it has nothing to do with this issue.
Use the built in HashCode struct
public class Book : IEquatable<Book>
{
public int Id
{
get;
set;
}
public string Name
{
get;
set;
}
public bool Equals(Book other)
{
return GetHashCode() == other.GetHashCode();
}
public override int GetHashCode()
{
return HashCode.Combine(Id, Name)
}
}

c# Generate Random number passing long as seed instead of int32

c# Generate Random number passing long as a seed instead of int32, but l need to pass phone numbers or accounts number
https://learn.microsoft.com/en-us/dotnet/api/system.random.-ctor?view=netframework-4.8#System_Random__ctor_System_Int32_
Please suggest any reliable NuGet package which does this or any implementation who has already done something like this.
I need to pass the complete PhoneNumber as the seed which I'm able to do in python but not with C# and my code stack is all in C#
using System;
public class Program
{
public static void Main()
{
int seed = 0123456789;
Random random = new Random(seed);
double result = random.NextDouble();
Console.WriteLine(result);
}
}
Some insights on my requirements and what I'm trying to achieve:
1)We're doing this for A/B testing and todo data analysis on the
experience of two services.
2)When a request comes with
phoneNumber based on random.NextDouble() there is a preset percentage
which we use to determine whether to send a request to service A or
service B
3)For example, let's says the request comes and falls
under >0.5 then we direct the request to service A and the next time
the request with the same phone number comes in it will be >0.5 and
goes service A since the seed is a unique hash of phoneNumber.
The method GetHashCode() belongs to Object class, it has nothing to do with random number generation. Please read here (https://learn.microsoft.com/en-us/dotnet/api/system.object.gethashcode?view=netframework-4.8). The documentation clearly states that it is possible to get collisions specially if input is consistent.
The method HashAlgorithm.ComputeHash (documented here - https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.hashalgorithm.computehash?view=netframework-4.8) calculates the hash for a given value, but it is consistent in nature, i.e. if input is same, generated output is also same. Obviously this is not the desired output (I assume). I have attached the sample code I tried to generate this.
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
while (true)
{
Console.WriteLine("Enter a 9 digit+ number to calculate hash");
var val = Console.ReadLine();
long target = 0;
bool result = long.TryParse(val,out target);
if (result)
{
var calculatedHash = OutputHash(target);
Console.WriteLine("Calculated hash is : " + calculatedHash);
}
else
{
Console.WriteLine("Incorrect input. Please try again.");
}
}
}
public static string OutputHash(long number)
{
string source = Convert.ToString(number);
string hash;
using (SHA256 sha256Hash = SHA256.Create())
{
hash = GetHash(sha256Hash, source);
Console.WriteLine($"The SHA256 hash of {source} is: {hash}.");
Console.WriteLine("Verifying the hash...");
if (VerifyHash(sha256Hash, source, hash))
{
Console.WriteLine("The hashes are the same.");
}
else
{
Console.WriteLine("The hashes are not same.");
}
}
return hash;
}
private static string GetHash(HashAlgorithm hashAlgorithm, string input)
{
// Convert the input string to a byte array and compute the hash.
byte[] data = hashAlgorithm.ComputeHash(Encoding.UTF8.GetBytes(input));
// Create a new Stringbuilder to collect the bytes
// and create a string.
var sBuilder = new StringBuilder();
// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}
// Return the hexadecimal string.
return sBuilder.ToString();
}
// Verify a hash against a string.
private static bool VerifyHash(HashAlgorithm hashAlgorithm, string input, string hash)
{
// Hash the input.
var hashOfInput = GetHash(hashAlgorithm, input);
// Create a StringComparer an compare the hashes.
StringComparer comparer = StringComparer.OrdinalIgnoreCase;
return comparer.Compare(hashOfInput, hash) == 0;
}
I agree with #Knoop 's comment above that you might end up with same integer mapping to multiple long number input values.
If you are looking for a 'pure' random number generator with long value as seed, you don't have a choice but to go for third party libraries (or implementing your own custom algorithm). However, rather than getting into such complexities, simple
Guid g = Guid.NewGuid();
should do the trick (https://learn.microsoft.com/en-us/dotnet/api/system.guid.newguid?view=netframework-4.8).
Documentation (https://learn.microsoft.com/en-gb/windows/win32/api/combaseapi/nf-combaseapi-cocreateguid?redirectedfrom=MSDN )says that even this can end up having collisions but chances are very minimal.
Finally, this sounds like potential duplicate of .NET unique object identifier
take the hash of the phone number, eg:
var phoneNumber = 123456789L;
var seed = phoneNumber.GetHashCode();
This means that for the same phoneNumber you will get the same sequence. It also means that for some phone numbers you will get identical sequences, but that is going to be slim. And it might be different on different .net runtimes as commented, but you might not care.
Not sure why you want to, but I there are reasons, e.g. test code

Create Unique Hashcode for the permutation of two Order Ids

I have a collection which is a permutation of two unique orders, where OrderId is unique. Thus it contains the Order1 (Id = 1) and Order2 (Id = 2) as both 12 and 21. Now while processing a routing algorithm, few conditions are checked and while a combination is included in the final result, then its reverse has to be ignored and needn't be considered for processing. Now since the Id is an integer, I have created a following logic:
private static int GetPairKey(int firstOrderId, int secondOrderId)
{
var orderCombinationType = (firstOrderId < secondOrderId)
? new {max = secondOrderId, min = firstOrderId}
: new { max = firstOrderId, min = secondOrderId };
return (orderCombinationType.min.GetHashCode() ^ orderCombinationType.max.GetHashCode());
}
In the logic, I create a Dictionary<int,int>, where key is created using the method GetPairKey shown above, where I ensure that out of given combination they are arranged correctly, so that I get the same Hashcode, which can be inserted and checked for an entry in a Dictionary, while its value is dummy and its ignored.
However above logic seems to have a flaw and it doesn't work as expected for all the logic processing, what am I doing wrong in this case, shall I try something different to create a Hashcode. Is something like following code a better choice, please suggest
Tuple.Create(minOrderId,maxOrderId).GetHashCode, following is relevant code usage:
foreach (var pair in localSavingPairs)
{
var firstOrder = pair.FirstOrder;
var secondOrder = pair.SecondOrder;
if (processedOrderDictionary.ContainsKey(GetPairKey(firstOrder.Id, secondOrder.Id))) continue;
Adding to the Dictionary, is the following code:
processedOrderDictionary.Add(GetPairKey(firstOrder.Id, secondOrder.Id), 0); here the value 0 is dummy and is not used
You need a value that can uniquely represent every possible value.
That is different to a hash-code.
You could uniquely represent each value with a long or with a class or struct that contains all of the appropriate values. Since after a certain total size using long won't work any more, let's look at the other approach, which is more flexible and more extensible:
public class KeyPair : IEquatable<KeyPair>
{
public int Min { get; private set; }
public int Max { get; private set; }
public KeyPair(int first, int second)
{
if (first < second)
{
Min = first;
Max = second;
}
else
{
Min = second;
Max = first;
}
}
public bool Equals(KeyPair other)
{
return other != null && other.Min == Min && other.Max == Max;
}
public override bool Equals(object other)
{
return Equals(other as KeyPair);
}
public override int GetHashCode()
{
return unchecked(Max * 31 + Min);
}
}
Now, the GetHashCode() here will not be unique, but the KeyPair itself will be. Ideally the hashcodes will be very different to each other to better distribute these objects, but doing much better than the above depends on information about the actual values that will be seen in practice.
The dictionary will use that to find the item, but it will also use Equals to pick between those where the hash code is the same.
(You can experiment with this by having a version for which GetHashCode() always just returns 0. It will have very poor performance because collisions hurt performance and this will always collide, but it will still work).
First, 42.GetHashCode() returns 42. Second, 1 ^ 2 is identical to 2 ^ 1, so there's really no point in sorting numbers. Third, your "hash" function is very weak and produces a lot of collisions, which is why you're observing the flaws.
There are two options I can think of right now:
Use a slightly "stronger" hash function
Replace your Dictionary<int, int> key with Dictionary<string, int> with keys being your two sorted numbers separated by whatever character you prever -- e.g. 56-6472
Given that XOR is commutative (so (a ^ b) will always be the same as (b ^ a)) it seems to me that your ordering is misguided... I'd just
(new {firstOrderId, secondOrderId}).GetHashCode()
.Net will fix you up a good well-distributed hashing implementation for anonymous types.

Unique number generation within a textbox

I am new and C#. i want to automatically generate a unique number inside a text box which i can use as a reference number to a form that does asset registration. this reference number will be used as a unique identifier to each asset registered and also given to the asset owner for reference's sake.
To do this, you can use a Guid (globally unique identifier) The chance that the value of the new Guid will be all zeros or equal to any other Guid is very low.
public static void Main()
{
Guid g = Guid.NewGuid();
Console.WriteLine(g);
}
You cand find more about this here:
http://msdn.microsoft.com/en-us/library/system.guid.newguid(v=vs.110).aspx
Have you considered using the GUID's they are pretty easy to generate and reasonably unique?
// This code example demonstrates the Guid.NewGuid() method.
using System;
class Sample
{
public static void Main()
{
Guid g;
// Create and display the value of two GUIDs.
g = Guid.NewGuid();
Console.WriteLine(g);
Console.WriteLine(Guid.NewGuid());
}
}
/*
This code example produces the following results:
0f8fad5b-d9cb-469f-a165-70867728950e
7c9e6679-7425-40de-944b-e07fc1f90ae7
*/
You can use a Guid.
Guid temp;
temp = Guid.NewGuid();
textBox1.Text = temp.ToString().Replace("-", "");
But be aware. A real uniqe number generation is impossible.
There are other ways like the Random class
You can use TimeStamp along with the new GUID.
string uniqueKey = string.Concat(DateTime.Now.ToString("yyyyMMddHHmmssf"), Guid.NewGuid().ToString());
If you really need a number intead of string as a unique key then you can use only time stamp with following stratergy.then it will unique with any given time,Lock to ensure that no two threads run your code at the same time. Thread.Sleep to ensure that you get two distinct times at the tenth of second.
static object lockerObject = new object();
static string GetUniqueKey()
{
lock (lockerObject)
{
return DateTime.Now.ToString("yyyyMMddHHmmssf");
Thread.Sleep(100);
}
}
Or i found a way to do it without time stamp from here as follows.
public long GetUniqueKey()
{
byte[] buffer = Guid.NewGuid().ToByteArray();
return BitConverter.ToInt64(buffer, 0);
}

Random Number Generation - Same Number returned [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
c# - getting the same random number repeatedly
Random number generator not working the way I had planned (C#)
I have a method that builds a queue of ints:
public Queue<int> generateTrainingInts(int count = 60)
{
Queue<int> retval = new Queue<int>();
for (int i = 0; i < count; i++)
{
retval.Enqueue(JE_Rand.rInt(2001, 100));
}
return retval;
}
JE_Rand.rInt() is just a function that delegates to a function of the Random class:
public static int rInt(int exclUB, int incLB = 0)
{
Random rand = new Random(DateTime.Now.Millisecond);
int t = rand.Next(incLB, exclUB);
rand = null;
return t;
}
But when I call generateTrainingInts, the same number is enqueued each time. However, if I change rInt to use a static instance of the Random class, instead of a local instance (with function scope as it is defined above), then it appears to work correctly (enqueue random integers). Does anybody know why this happens?
Edit:
Dear Answerers who didn't read my question thoroughly,
Like some of you pointed out, I am looking for a good explanation of why this happens. I am not looking for a solution to the same-number-generated problem, because I already fixed that like I said above. Thanks for your enthusiasm though :) I really just want to understand things like this, because my first implementation made more sense conceptually to me.
You need to keep the same Random object. Put it outside your static method as a static member
private static Random rand = new Random();
public static int rInt(int exclUB, int incLB = 0)
{
int t = rand.Next(incLB, exclUB);
return t;
}
Edit
The reason is the finite resolution of the clock used to initialize Random. Subsequent initializations of Random will get the same starting position in the random sequence. When reusing the same Random the next value in the random sequence is always generated.
Try out the following code and I think you'll see why:
void PrintNowAHundredTimes()
{
for (int i = 0; i < 100; ++i)
{
Console.WriteLine(DateTime.Now);
}
}
The Random objects are getting the same seed over and over. This is because the granularity of the system time returned by DateTime.Now is, quite simply, finite. On my machine for example the value only changes every ~15 ms. So consecutive calls within that time period return the same time.
And as I suspect you already know, two Random objects initialized with the same seed value will generate identical random sequences. (That's why it's called pseudorandom, technically.)
You should also be aware that even if it made sense to instantiate a new Random object locally within your method, setting it to null would still serve no purpose (once the method exits there will be no more references to the object anyway, so it will be garbage collected regardless).
public class JE_Rand
{
private static Random rand= new Random(DateTime.Now.Millisecond);
public static int rInt(int exclUB, int incLB = 0)
{
int t = rand.Next(incLB, exclUB);
return t;
}
}

Categories