In the application, when special types of objects are created, I need to generate a unique-id for each of them. The objects are created thro' a factory and have a high possibility of being created in a 'bulk' operation. I realize that the "Random" from the framework is not so 'random' after all, so I tried appending the time-stamp as follows:
private string GenerateUniqueId()
{
Random randomValue = new Random();
return DateTime.Now.Ticks.ToString() + randomValue.Next().ToString();
}
Unfortunately, even this does not work. For objects that are created in rapid succession, I generate the same Unique Id :-(
Currently, I am implementing it in a crude way as follows:
private string GenerateUniqueId()
{
Random randomValue = new Random();
int value = randomValue.Next();
Debug.WriteLine(value.ToString());
Thread.Sleep(100);
return DateTime.Now.Ticks.ToString() + value.ToString();
}
Since this is not a very large application, I think a simple and quick technique would suffice instead of implementing an elaborate algorithm.
Please suggest.
A GUID is probably what you're looking for:
private string GenerateUniqueId()
{
return Guid.NewGuid().ToString("N");
}
If you want a smaller, more manageable ID then you could use something like this:
private string GenerateUniqueId()
{
using (var rng = new RNGCryptoServiceProvider())
{
// change the size of the array depending on your requirements
var rndBytes = new byte[8];
rng.GetBytes(rndBytes);
return BitConverter.ToString(rndBytes).Replace("-", "");
}
}
Note: This will only give you a 64-bit number in comparison to the GUID's 128 bits, so there'll be more chance of a collision. Probably not an issue in the real world though. If it is an issue then you could increase the size of the byte array to generate a larger id.
Assuming you do not want a GUID, First option would be a static field, and interlocked:
private static long lastId = 0
private static long GetNextId() {
return Interlocked.Increment(ref lastId);
}
If you want something based on time ticks, remember the last value and if the same manually increment and save; otherwise just save:
private static long lastTick = 0;
private static object idGenLock = new Object();
private static long GetNextId() {
lock (idGenLock) {
long tick = DateTime.UtcNow.Ticks;
if (lastTick == tick) {
tick = lastTick+1;
}
lastTick = tick;
return tick;
}
}
(Neither of these approaches will be good with multiple processes.)
In your comments Codex you say use the unique ID as a file name. There is a specific function for generating cryptographically secure file names, Path.GetRandomFileName()
As it's cryptographically secure these would be unique even in batch operations. The format is a little horrible though as they're optimised for filenames, but it may work for other references as well.
Why can't your factory (which is presumably single-threaded) generate sequential unique integers? If you expected Random() to work, why not Guid() (or whatever is equivalent)?
If you're going to resort to coding your own UUID-generator, make sure you salt the generator.
I suggest you check out the open source package ossp-uuid, which is an ISO-C API and CLI for generating Universally Unique Identifiers.
Related
c# Generate Random number passing long as a seed instead of int32, but l need to pass phone numbers or accounts number
https://learn.microsoft.com/en-us/dotnet/api/system.random.-ctor?view=netframework-4.8#System_Random__ctor_System_Int32_
Please suggest any reliable NuGet package which does this or any implementation who has already done something like this.
I need to pass the complete PhoneNumber as the seed which I'm able to do in python but not with C# and my code stack is all in C#
using System;
public class Program
{
public static void Main()
{
int seed = 0123456789;
Random random = new Random(seed);
double result = random.NextDouble();
Console.WriteLine(result);
}
}
Some insights on my requirements and what I'm trying to achieve:
1)We're doing this for A/B testing and todo data analysis on the
experience of two services.
2)When a request comes with
phoneNumber based on random.NextDouble() there is a preset percentage
which we use to determine whether to send a request to service A or
service B
3)For example, let's says the request comes and falls
under >0.5 then we direct the request to service A and the next time
the request with the same phone number comes in it will be >0.5 and
goes service A since the seed is a unique hash of phoneNumber.
The method GetHashCode() belongs to Object class, it has nothing to do with random number generation. Please read here (https://learn.microsoft.com/en-us/dotnet/api/system.object.gethashcode?view=netframework-4.8). The documentation clearly states that it is possible to get collisions specially if input is consistent.
The method HashAlgorithm.ComputeHash (documented here - https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.hashalgorithm.computehash?view=netframework-4.8) calculates the hash for a given value, but it is consistent in nature, i.e. if input is same, generated output is also same. Obviously this is not the desired output (I assume). I have attached the sample code I tried to generate this.
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
while (true)
{
Console.WriteLine("Enter a 9 digit+ number to calculate hash");
var val = Console.ReadLine();
long target = 0;
bool result = long.TryParse(val,out target);
if (result)
{
var calculatedHash = OutputHash(target);
Console.WriteLine("Calculated hash is : " + calculatedHash);
}
else
{
Console.WriteLine("Incorrect input. Please try again.");
}
}
}
public static string OutputHash(long number)
{
string source = Convert.ToString(number);
string hash;
using (SHA256 sha256Hash = SHA256.Create())
{
hash = GetHash(sha256Hash, source);
Console.WriteLine($"The SHA256 hash of {source} is: {hash}.");
Console.WriteLine("Verifying the hash...");
if (VerifyHash(sha256Hash, source, hash))
{
Console.WriteLine("The hashes are the same.");
}
else
{
Console.WriteLine("The hashes are not same.");
}
}
return hash;
}
private static string GetHash(HashAlgorithm hashAlgorithm, string input)
{
// Convert the input string to a byte array and compute the hash.
byte[] data = hashAlgorithm.ComputeHash(Encoding.UTF8.GetBytes(input));
// Create a new Stringbuilder to collect the bytes
// and create a string.
var sBuilder = new StringBuilder();
// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}
// Return the hexadecimal string.
return sBuilder.ToString();
}
// Verify a hash against a string.
private static bool VerifyHash(HashAlgorithm hashAlgorithm, string input, string hash)
{
// Hash the input.
var hashOfInput = GetHash(hashAlgorithm, input);
// Create a StringComparer an compare the hashes.
StringComparer comparer = StringComparer.OrdinalIgnoreCase;
return comparer.Compare(hashOfInput, hash) == 0;
}
I agree with #Knoop 's comment above that you might end up with same integer mapping to multiple long number input values.
If you are looking for a 'pure' random number generator with long value as seed, you don't have a choice but to go for third party libraries (or implementing your own custom algorithm). However, rather than getting into such complexities, simple
Guid g = Guid.NewGuid();
should do the trick (https://learn.microsoft.com/en-us/dotnet/api/system.guid.newguid?view=netframework-4.8).
Documentation (https://learn.microsoft.com/en-gb/windows/win32/api/combaseapi/nf-combaseapi-cocreateguid?redirectedfrom=MSDN )says that even this can end up having collisions but chances are very minimal.
Finally, this sounds like potential duplicate of .NET unique object identifier
take the hash of the phone number, eg:
var phoneNumber = 123456789L;
var seed = phoneNumber.GetHashCode();
This means that for the same phoneNumber you will get the same sequence. It also means that for some phone numbers you will get identical sequences, but that is going to be slim. And it might be different on different .net runtimes as commented, but you might not care.
Not sure why you want to, but I there are reasons, e.g. test code
Background
I am converting media files to a new format and need a way of knowing if I've previously in current runtime, converted a file.
My solution
To hash each file and store the hash in an array. Each time I go to convert a file I hash it and check the hash against the hashes stored in the array.
Problem
My logic doesn't seem able to detect when I've already seen a file and I end up converting the same file multiple times.
Code
//Byte array of already processed files
private static readonly List<byte[]> Bytelist = new List<byte[]>();
public static bool DoCheck(string file)
{
FileInfo info = new FileInfo(file);
while (FrmMain.IsFileLocked(info)) //Make sure file is finished being copied/moved
{
Thread.Sleep(500);
}
//Get byte sig of file and if seen before dont process
byte[] myFileData = File.ReadAllBytes(file);
byte[] myHash = MD5.Create().ComputeHash(myFileData);
if (Bytelist.Count != 0)
{
foreach (var item in Bytelist)
{
//If seen before ignore
if (myHash == item)
{
return true;
}
}
}
Bytelist.Add(myHash);
return false;
}
Question
Is there more efficient way of trying to acheive my end goal? What am I doing wrong?
There are multiple questions, I'm going to answer the first one:
Is there more efficient way of trying to acheive my end goal?
TL;DR yes.
You're storing hashes and comparing hashes only for the files, which is a really expensive operation. You can do other checks before calculating the hash:
Is the file size the same? If not, go to the next check.
Are the first bunch of bytes the same? If not, go to the next check.
At this point you have to check the hashes (MD5).
Of course you will have to store size/first X bytes/hash for each processed file.
In addition, same MD5 doesn't mean the files are the same so you might want to take an extra step to check if they're really the same, but this might be an overkill, depends on how heavy the cost of reprocessing the file is, might be more important not to calculate expensive hashes.
EDIT: The second question: is likely to fail as you are comparing the reference of two byte arrays that will never be the same as you create a new one every time, you need to create a sequence equal comparison between byte[]. (Or convert the hash to a string and compare strings then)
var exists = Bytelist.Any(hash => hash.SequenceEqual(myHash));
Are you sure this new file format doesn't add extra meta data into
the content? like last modified, or attributes that change ?
Also, if you are converting to a known format, then there should be a
way using a file signature to know if its already in this format or
not, if this is your format, then add some extra bytes for signature to identify it.
Don't forget that if your app gets closed and opened again it will
reporcess all files again by your approach.
Another last point regarding the code, I prefer not storing byte
arrays, but if you should, its better you create HashSet
instead of list, it has an access time of O(1).
There's a lot of room for improvement with regard to efficiency, effectiveness and style, but this isn't CodeReview.SE, so I'll try to stick the problem at hand:
You're checking if a two byte arrays are equivalent by using the == operator. But that will only perform reference equality testing - i.e. test if the two variables point to the same instance, the very same array. That, of course, won't work here.
There are many ways to do it, starting with a simple foreach loop over the arrays (with an optimization that checks the length first, probably) or using Enumerable.SequenceEquals as you can find in this answer here.
Better yet, convert your hash's byte[] to a string (any string - Convert.ToBase64String would be a good choice) and store that in your Bytelist cache (which should be a Hashset, not a List). Strings are optimized for these sort of comparisons, and you won't run into the "reference equality" problem here.
So a sample solution would be this:
private static readonly HashSet<string> _computedHashes = new HashSet<string>();
public static bool DoCheck(string file)
{
/// stuff
//Get byte sig of file and if seen before dont process
byte[] myFileData = File.ReadAllBytes(file);
byte[] myHash = MD5.Create().ComputeHash(myFileData);
string hashString = Convert.ToBase64String(myHash);
return _computedHashes.Contains(hashString);
}
Presumably, you'll add the hash to the _computedHashes set after you've done the conversion.
You have to compare the byte arrays item by item:
foreach (var item in Bytelist)
{
//If seen before ignore
if (myHash.Length == item.Length)
{
bool isequal = true;
for (int i = 0; i < myHash.Length; i++)
{
if (myHash[i] != item[i])
{
isequal = false;
}
}
if (isequal)
{
return true;
}
}
}
Is there a way to generate unique alphanumeric key (12 digits) to be used in URLs in C#? I have a set of strings that are unique from each other, but cannot use them directly as they might change, so the URL will break. I have couple of approaches -
a) Use the primary key of database table itself which corresponds to the row with above set of strings, but this seems like a security issue as it will expose db structure.
b) Use Guid, but then again it is not dependent on the data.
Any help will be appreciated.
Short Answer: No.
What you're trying is not possible. You would have to keep track of the ids that you've already created. This is what a database does with index columns that increment. I also understand that URL shortening tools take new keys from a pool of generated unique ones.
All that being said, something like this DotNetFiddle might work and so might some of the other answers.
In the fiddle, we're hashing the primary key in the first example. Since only the full hash is computationally infeasible not to be unique per input, and since we're using a sub-string of the hash, the uniqueness is not guaranteed, but it may be close.
Here is what MSDN has to say about hash uniqueness.
A cryptographic hash function has the property that it is computationally infeasible to find two distinct inputs that hash to the same value.
In the second example, we're using time, and incrementing time is guaranteed to be unique as far as I know, so this will work if you can rely on the time being accurate. But if you're going to be relying on an external resource like the server time, then maybe you should be using an auto-incrementing index in a database table or a simple flat file.
using System;
using System.Text;
using System.Security.Cryptography;
public class Program
{
public static void Main()
{
UseAHash();
UseTime();
}
public static void UseAHash()
{
var primaryKey = 123345;
HashAlgorithm algorithm = SHA1.Create();
var hash = algorithm.ComputeHash(Encoding.UTF8.GetBytes(primaryKey.ToString()));
StringBuilder sb = new StringBuilder();
for (var i = 0; i < 6; ++i)
{
sb.Append(hash[i].ToString("X2"));
}
Console.WriteLine(sb);
}
public static void UseTime()
{
StringBuilder builder = new StringBuilder();
// use universal to avoid daylight to standard time change.
var now = DateTime.Now.ToUniversalTime();
builder.Append(now.DayOfYear.ToString("D3"));
builder.Append(now.Hour.ToString("D2"));
builder.Append(now.Minute.ToString("D2"));
builder.Append(now.Second.ToString("D2"));
builder.Append(now.Millisecond.ToString("D3"));
Console.WriteLine("Length: " + builder.Length);
Console.WriteLine("Result: " + builder);
}
}
You can use the key from the database to seed a random generator, and use that to create a key:
int id = 42;
string chars = "2345679abcdefghjkmnpqrstuvwxyz";
Random rnd = new Random(id);
string key = new String(Enumerable.Range(0, 12).Select(n => chars[rnd.Next(chars.Length)]).ToArray());
Note: This is not guaranteed to be unique. I tested the values from 1 to 10000000 though, and there are no duplicates there.
Simple. Create a new GUID, assign it an entity from the database, then add it to a database table.
public class FooGuid
{
[Key] public Guid Url { get; set; }
public Foo Foo { get; set; }
}
Guid urlpart = ...
Foo foo = dbContext.FooGuids
.Where(f => f.Url == urlpart)
.Select(f => f.Foo)
.Single();
I am new and C#. i want to automatically generate a unique number inside a text box which i can use as a reference number to a form that does asset registration. this reference number will be used as a unique identifier to each asset registered and also given to the asset owner for reference's sake.
To do this, you can use a Guid (globally unique identifier) The chance that the value of the new Guid will be all zeros or equal to any other Guid is very low.
public static void Main()
{
Guid g = Guid.NewGuid();
Console.WriteLine(g);
}
You cand find more about this here:
http://msdn.microsoft.com/en-us/library/system.guid.newguid(v=vs.110).aspx
Have you considered using the GUID's they are pretty easy to generate and reasonably unique?
// This code example demonstrates the Guid.NewGuid() method.
using System;
class Sample
{
public static void Main()
{
Guid g;
// Create and display the value of two GUIDs.
g = Guid.NewGuid();
Console.WriteLine(g);
Console.WriteLine(Guid.NewGuid());
}
}
/*
This code example produces the following results:
0f8fad5b-d9cb-469f-a165-70867728950e
7c9e6679-7425-40de-944b-e07fc1f90ae7
*/
You can use a Guid.
Guid temp;
temp = Guid.NewGuid();
textBox1.Text = temp.ToString().Replace("-", "");
But be aware. A real uniqe number generation is impossible.
There are other ways like the Random class
You can use TimeStamp along with the new GUID.
string uniqueKey = string.Concat(DateTime.Now.ToString("yyyyMMddHHmmssf"), Guid.NewGuid().ToString());
If you really need a number intead of string as a unique key then you can use only time stamp with following stratergy.then it will unique with any given time,Lock to ensure that no two threads run your code at the same time. Thread.Sleep to ensure that you get two distinct times at the tenth of second.
static object lockerObject = new object();
static string GetUniqueKey()
{
lock (lockerObject)
{
return DateTime.Now.ToString("yyyyMMddHHmmssf");
Thread.Sleep(100);
}
}
Or i found a way to do it without time stamp from here as follows.
public long GetUniqueKey()
{
byte[] buffer = Guid.NewGuid().ToByteArray();
return BitConverter.ToInt64(buffer, 0);
}
I've decided to implement a caching facade in one of our applications - the purpose is to eventually reduce the network overhead and limit the amount of db hits. We are using Castle.Windsor as our IoC Container and we have decided to go with Interceptors to add the caching functionality on top of our services layer using the System.Runtime.Caching namespace.
At this moment I can't exactly figure out what's the best approach for constructing the cache key. The goal is to make a distinction between different methods and also include passed argument values - meaning that these two method calls should be cached under two different keys:
IEnumerable<MyObject> GetMyObjectByParam(56); // key1
IEnumerable<MyObject> GetMyObjectByParam(23); // key2
For now I can see two possible implementations:
Option 1:
assembly | class | method return type | method name | argument types | argument hash codes
"MyAssembly.MyClass IEnumerable<MyObject> GetMyObjectByParam(long) { 56 }";
Option 2:
MD5 or SHA-256 computed hash based on the method's fully-qualified name and passed argument values
string key = new SHA256Managed().ComputeHash(name + args).ToString();
I'm thinking about the first option as the second one requires more processing time - on the other hand the second option enforces exactly the same 'length' of all generated keys.
Is it safe to assume that the first option will generate a unique key for methods using complex argument types? Or maybe there is a completely different way of doing this?
Help and opinion will by highly appreciated!
Based on some very useful links that I've found here and here I've decided to implement it more-or-less like this:
public sealed class CacheKey : IEquatable<CacheKey>
{
private readonly Type reflectedType;
private readonly Type returnType;
private readonly string name;
private readonly Type[] parameterTypes;
private readonly object[] arguments;
public User(Type reflectedType, Type returnType, string name,
Type[] parameterTypes, object[] arguments)
{
// check for null, incorrect values etc.
this.reflectedType = reflectedType;
this.returnType = returnType;
this.name = name;
this.parameterTypes = parameterTypes;
this.arguments = arguments;
}
public override bool Equals(object obj)
{
return Equals(obj as CacheKey);
}
public bool Equals(CacheKey other)
{
if (other == null)
{
return false;
}
for (int i = 0; i < parameterTypes.Count; i++)
{
if (!parameterTypes[i].Equals(other.parameterTypes[i]))
{
return false;
}
}
for (int i = 0; i < arguments.Count; i++)
{
if (!arguments[i].Equals(other.arguments[i]))
{
return false;
}
}
return reflectedType.Equals(other.reflectedType) &&
returnType.Equals(other.returnType) &&
name.Equals(other.name);
}
private override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 31 + reflectedType.GetHashCode();
hash = hash * 31 + returnType.GetHashCode();
hash = hash * 31 + name.GetHashCode();
for (int i = 0; i < parameterTypes.Count; i++)
{
hash = hash * 31 + parameterTypes[i].GetHashCode();
}
for (int i = 0; i < arguments.Count; i++)
{
hash = hash * 31 + arguments[i].GetHashCode();
}
return hash;
}
}
}
Basically it's just a general idea - the above code can be easily rewritten to a more generic version with one collection of Fields - the same rules would have to be applied on each element of the collection. I can share the full code.
An option you seem to have skipped is using the .NET built in GetHashCode() function for the string. I'm fairly certain this is what would go on behind the scenes in a C# dictionary with a String as the <TKey> (I mention that because you've tagged the question with dictionary). I'm not sure how the .NET dictionary class relates to your Castle.Windsor or the system.runtime.caching interface you mention.
The reason you wouldn't want to use GetHashCode as a hash key is that the functionality is specifically disclaimed by MicroSoft to change between versions without warning (as in to provide a more unique or faster executing function). If this cache will live strictly in memory, then this is not a concern because upgrading the .NET framework would necessitate a restart of your application, wiping the cache.
To clarify, just using the concatenated string (Option 1) should be sufficiently unique. It looks like you've added everything possible to uniquely qualify your methods.
If you end up feeding the String of an MD5 or Sha256 into a dictionary key, the program would probably rehash the string behind the scenes anyways. It's been a while since I read about the inner workings of the Dictionary class. If you leave it as a Dictionary<String, IEnumerable<MyObject>> (as opposed to calling GetHashCode() on the strings yourself using the int return value as the key) then the dictionary should handle collisions of the hash code itself.
Also note that (at least according to a benchmark program run on my machine), MD5 is around 10% faster than SHA1 and twice as fast as SHA256. String.GetHashCode() is around 20 times faster than MD5 (it's not cryptographically secure). Tests were taken for the total time to compute the hashes for the same 100,000 randomly generated strings of length between 32 and 1024 characters. But regardless of the exact numbers, using a cryptographically secure hash function as a key will only slow down your program.
I can post the source code for my comparisons if you like.