How can i get the numeric representation of a string in C#? To be clear, I do not want the address of the pointer, I do not want to parse an int from a string, I want the numeric representation of the value of the string.
The reason I want this is because I am trying to generate a hash code based on a file path (path) and a number (line). I essentially want to do this:
String path;
int line;
public override int GetHashCode() {
return line ^ (int)path;
}
I'm up to suggestions for a better method, but because I'm overriding the Equals() method for the type I'm creating (to check that both object's path and line are the same), I need to reflect that in the override of GetHashCode.
Edit: Obviously this method is bad, that has been pointed out to me and I get that. The answer below is perfect. However, it does not entirely answer my question. I still am curious if there is a simple way to get an integer representation of the value of a string. I know that I could iterate through the string, add the binary representation of that char to a StringBuffer and convert that string to an int, but is there a more clean way?
Edit 2: I'm aware that this is a strange and very limited question. Converting in this method limits the size of the string to 2 chars (2 16 bit char = 1 32 bit int), but it was the concept I was getting at, and not the practicality. Essentially, the method works, regardless of how obscure and useless it may be.
If all you want is a HashCode, why not get the hashcode of the string too? Every object in .net has a GetHashCode() function:
public override int GetHashCode() {
return line ^ path.GetHashCode();
}
For the purposes of GetHashCode, you should absolutely call GetHashCode. However, to answer the question as asked (after clarification in comments) here are two options, returning BigInteger (as otherwise you'd only get two characters in before probably overflowing):
static BigInteger ConvertToBigInteger(string input)
{
byte[] bytes = Encoding.BigEndianUnicode.GetBytes(input);
// BigInteger constructor expects a little-endian byte array
Array.Reverse(bytes);
return new BigInteger(bytes);
}
static BigInteger ConvertToBigInteger(string input)
{
BigInteger sum = 0;
foreach (char c in input)
{
sum = (sum << 16) + (int) c;
}
return sum;
}
(These two approaches give the same result; the first is more efficient, but the second is probably easier to understand.)
Related
I have been given some C# code which defined some Private String but I am not sure what it is doing honestly and need to convert into VB for my Project but wandered if someone might take a moment to explain and possible provide a conversion?
private string GetChecksum(StringBuilder buf)
{
// calculate checksum of message
uint sum = 0;
for (int i = 0; i < buf.Length; i++)
{
sum += (char)buf[i];
}
return string.Format("{0:X04}", sum);
}
The part with private string ... is the method declaration. C#'s
Accessibility ReturnType MethodName(Type paramName)
translates to
Accessibility Function MethodName(paramName As Type) As ReturnType
Private Function GetChecksum(buf As StringBuilder) As String
'calculate checksum of message
Dim sum As UInteger = 0
For i As Integer = 0 To buf.Length - 1
sum += CChar(buf(i))
Next
Return String.Format("{0:X04}", sum)
End Function
What the function does is adds up the ASCII values of each character in the string (stored in a 2-byte char without overflow checking) and return the result as a string - the 4-character hexadecimal representation of the 2-byte result.
A checksum is used to detect data errors; if two strings yield different checksums then they cannot be equal. Two strings that give the same checksum, however, are non necessarily equal, so it cannot be used to verify equality.
I have an integer value. I want to convert it to the Base 64 value. I tried the following code.
byte[] b = BitConverter.GetBytes(123);
string str = Convert.ToBase64String(b);
Console.WriteLine(str);
Its giving the out put as "ewAAAA==" with 8 characters.
I convert the same value to base 16 as follows
int decvalue = 123;
string hex = decvalue.ToString("X");
Console.WriteLine(hex);
the out put of the previous code is 7B
If we do this in maths the out comes are same. How its differ? How can I get same value to Base 64 as well. (I found the above base 64 conversion in the internet)
The question is rather unclear... "How is it differ?" - well, in many different ways:
one is base-16, the other is base-64 (hence they are fundamentally different anyway)
one is doing an arithmetic representation; one is a byte serialization format - very different
one is using little-endian arithmetic (assuming a standard CPU), the other is using big-endian arithmetic
To get a comparable base-64 result, you probably need to code it manually (since Convert only support base-8, base-10, base-16 for arithmetic converts). Perhaps (note: not optimized):
static void Main()
{
string b64 = ConvertToBase64Arithmetic(123);
}
// uint because I don't care to worry about sign
static string ConvertToBase64Arithmetic(uint i)
{
const string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
StringBuilder sb = new StringBuilder();
do
{
sb.Insert(0, alphabet[(int)(i % 64)]);
i = i / 64;
} while (i != 0);
return sb.ToString();
}
I want to make a list of pointers to locations that contains a certain value in the process memory of another process. The value can be a short, int, long, string, bool or something else.
My idea is to use Generics for this. I have one problem with making it, how can I tell the compiler to what type he needs to convert the byte array?
This is what I made:
public List<IntPtr> ScanProccessFor<T>(T ItemToScanFor)
{
List<IntPtr> Output = new List<IntPtr>();
IntPtr StartOffset = SelectedProcess.MainModule.BaseAddress;
int ScanSize = SelectedProcess.MainModule.ModuleMemorySize;
for (int i = 0; i < ScanSize; i++)
if (ReadMemory(SelectedProcess, StartOffset + i, (UInt16)Marshal.SizeOf(ItemToScanFor)) == ItemToScanFor)
Output.Insert(Output.Count,StartOffset + i);
return Output;
}
How can I tell the compiler that he needs to convert the byte[] to type T?
Your question is a little bit confusing, but I'll try to answer what I can
Instead of taking a generic type, I would probably write a method that takes an instance of an interface like IConvertableToByteArray or something.
public IConvertableToByteArray
{
public byte[] ToByteArray();
}
Then If you needed to allow a specific type to be compatible with that method, you could make an encapsulating class
public IntConvertableToByteArray : IConvertableToByteArray
{
public int Value{get; set;}
public byte[] ToByteArray()
{
insert logic here
}
}
You could use Marshal.StructureToPtr to get an unmanaged representation of the structure (which has to be a 'simple' structure). You might need to special case strings though.
You should also think about the alignment constraints on what you are searching for -- advancing through memory 1 byte at a time will be very slow and wasteful if the item must be 4 or 8 byte aligned.
I am having a problem with hash collisions using short strings in .NET4.
EDIT: I am using the built-in string hashing function in .NET.
I'm implementing a cache using objects that store the direction of a conversion like this
public class MyClass
{
private string _from;
private string _to;
// More code here....
public MyClass(string from, string to)
{
this._from = from;
this._to = to;
}
public override int GetHashCode()
{
return string.Concat(this._from, this._to).GetHashCode();
}
public bool Equals(MyClass other)
{
return this.To == other.To && this.From == other.From;
}
public override bool Equals(object obj)
{
if (obj == null) return false;
if (this.GetType() != obj.GetType()) return false;
return Equals(obj as MyClass);
}
}
This is direction dependent and the from and to are represented by short strings like "AAB" and "ABA".
I am getting sparse hash collisions with these small strings, I have tried something simple like adding a salt (did not work).
The problem is that too many of my small strings like "AABABA" collides its hash with the reverse of "ABAAAB" (Note that these are not real examples, I have no idea if AAB and ABA actually cause collisions!)
and I have gone heavy duty like implementing MD5 (which works, but is MUCH slower)
I have also implemented the suggestion from Jon Skeet here:
Should I use a concatenation of my string fields as a hash code?
This works but I don't know how dependable it is with my various 3-character strings.
How can I improve and stabilize the hashing of small strings without adding too much overhead like MD5?
EDIT: In response to a few of the answers posted... the cache is implemented using concurrent dictionaries keyed from MyClass as stubbed out above. If I replace the GetHashCode in the code above with something simple like #JonSkeet 's code from the link I posted:
int hash = 17;
hash = hash * 23 + this._from.GetHashCode();
hash = hash * 23 + this._to.GetHashCode();
return hash;
Everything functions as expected.
It's also worth noting that in this particular use-case the cache is not used in a multi-threaded environment so there is no race condition.
EDIT: I should also note that this misbehavior is platform dependant. It works as intended on my fully updated Win7x64 machine but does not behave properly on a non-updated Win7x64 machine. I don't know the extend of what updates are missing but I know it doesn't have Win7 SP1... so I would assume there may also be a framework SP or update it's missing as well.
EDIT: As susggested, my issue was not caused by a problem with the hashing function. I had an elusive race condition, which is why it worked on some computers but not others and also why a "slower" hashing method made things work properly. The answer I selected was the most useful in understanding why my problem was not hash collisions in the dictionary.
Are you sure that collisions are who causes problems? When you say
I finally discovered what was causing this bug
You mean some slowness of your code or something else? If not I'm curious what kind of problem is that? Because any hash function (except "perfect" hash functions on limited domains) would cause collisions.
I put a quick piece of code to check for collisions for 3-letter words. And this code doesn't report a single collision for them. You see what I mean? Looks like buid-in hash algorithm is not so bad.
Dictionary<int, bool> set = new Dictionary<int, bool>();
char[] buffer = new char[3];
int count = 0;
for (int c1 = (int)'A'; c1 <= (int)'z'; c1++)
{
buffer[0] = (char)c1;
for (int c2 = (int)'A'; c2 <= (int)'z'; c2++)
{
buffer[1] = (char)c2;
for (int c3 = (int)'A'; c3 <= (int)'z'; c3++)
{
buffer[2] = (char)c3;
string str = new string(buffer);
count++;
int hash = str.GetHashCode();
if (set.ContainsKey(hash))
{
Console.WriteLine("Collision for {0}", str);
}
set[hash] = false;
}
}
}
Console.WriteLine("Generated {0} of {1} hashes", set.Count, count);
While you could pick almost any of well-known hash functions (as David mentioned) or even choose a "perfect" hash since it looks like your domain is limited (like minimum perfect hash)... It would be great to understand if the source of problems are really collisions.
Update
What I want to say is that .NET build-in hash function for string is not so bad. It doesn't give so many collisions that you would need to write your own algorithm in regular scenarios. And this doesn't depend on the lenght of strings. If you have a lot of 6-symbol strings that doesn't imply that your chances to see a collision are highier than with 1000-symbol strings. This is one of the basic properties of hash functions.
And again, another question is what kind of problems do you experience because of collisions? All build-in hashtables and dictionaries support collision resolution. So I would say all you can see is just... probably some slowness. Is this your problem?
As for your code
return string.Concat(this._from, this._to).GetHashCode();
This can cause problems. Because on every hash code calculation you create a new string. Maybe this is what causes your issues?
int hash = 17;
hash = hash * 23 + this._from.GetHashCode();
hash = hash * 23 + this._to.GetHashCode();
return hash;
This would be much better approach - just because you don't create new objects on the heap. Actually it's one of the main points of this approach - get a good hash code of an object with a complex "key" without creating new objects. So if you don't have a single value key then this should work for you. BTW, this is not a new hash function, this is just a way to combine existing hash values without compromising main properties of hash functions.
Any common hash function should be suitable for this purpose. If you're getting collisions on short strings like that, I'd say you're using an unusually bad hash function. You can use Jenkins or Knuth's with no issues.
Here's a very simple hash function that should be adequate. (Implemented in C, but should easily port to any similar language.)
unsigned int hash(const char *it)
{
unsigned hval=0;
while(*it!=0)
{
hval+=*it++;
hval+=(hval<<10);
hval^=(hval>>6);
hval+=(hval<<3);
hval^=(hval>>11);
hval+=(hval<<15);
}
return hval;
}
Note that if you want to trim the bits of the output of this function, you must use the least significant bits. You can also use mod to reduce the output range. The last character of the string tends to only affect the low-order bits. If you need a more even distribution, change return hval; to return hval * 2654435761U;.
Update:
public override int GetHashCode()
{
return string.Concat(this._from, this._to).GetHashCode();
}
This is broken. It treats from="foot",to="ar" as the same as from="foo",to="tar". Since your Equals function doesn't consider those equal, your hash function should not. Possible fixes include:
1) Form the string from,"XXX",to and hash that. (This assumes the string "XXX" almost never appears in your input strings.
2) Combine the hash of 'from' with the hash of 'to'. You'll have to use a clever combining function. For example, XOR or sum will cause from="foo",to="bar" to hash the same as from="bar",to="foo". Unfortunately, choosing the right combining function is not easy without knowing the internals of the hashing function. You can try:
int hc1=from.GetHashCode();
int hc2=to.GetHashCode();
return (hc1<<7)^(hc2>>25)^(hc1>>21)^(hc2<<11);
I'm familiar with the System.Numerics.BigInteger class, but in my app, I'm only ever dealing with positive integers. Negative integers are an error case, and it'd be nice if there was an unsigned equivalent of the BigInteger type so I could remove all of these checks. Does one exist?
There's nothing in the framework, no. I would try to centralize the checks in as small a public API as possible, and then treat the data as valid for the rest of the time - just as you would for something like null checking. Of course, you still need to be careful if you perform any operations which could create a negative value (e.g. subtracting one from another).
You may be able to make the code slightly neater by creating an extension method, e.g.
public static void ThrowIfNegative(this BigInteger value, string name)
{
if (value.Sign < 0)
{
throw new ArgumentOutOfRangeException(name);
}
}
... and use it like this:
input.ThrowIfNegative("input");
You could potentially create your own UBigInteger struct which contained a BigInteger, and perform operations between different values by using the BigInteger implementation and checks, but I suspect that would be quite a lot of work for relatively little benefit, and may have performance implications if you're using it for a lot of calculations.
Well, let's have a look at a simple example
uint a=1;
uint b=2;
uint c=a-b;
Console.WriteLine(c);
gives you the output 4294967295 (=2^32-1).
But what if you had an unsigned BigInteger with similar behaviour?
UBigInteger a(1);
UBigInteger b(2);
UBigInteger c=a-b;
Console.WriteLine(c.ToString());
What should that be? Of course, from what you wrote one can assume you might expect to get some kind of exception in this case, but such behaviour would not be consistent to int. Better introduce the checks for < 0 where you need them, for example, like the way Jon Skeet suggested.
If you only use a reasonably small subset of the the BigInteger API, writing your own wrapper class is easy if laborious. Here is some sample code to demonstrate that it needn't be that big an operation:
public struct UnsignedBigInteger
{
private BigInteger value;
private UnsignedBigInteger(BigInteger n) { value = n; }
public UnsignedBigInteger(uint n) { value = new BigInteger(n); }
// ... other constructors ...
public static UnsignedBigInteger operator+(UnsignedBigInteger lhs, UnsignedBigInteger rhs)
{
return new UnsignedBigInteger(lhs.value + rhs.value);
}
public static UnsignedBigInteger operator-(UnsignedBigInteger lhs, UnsignedBigInteger rhs)
{
var result = lhs.value - rhs.value;
if (result < BigInteger.Zero) throw new InvalidOperationException("value out of range");
return new UnsignedBigInteger(result);
}
// ... other operators ...
}
If negative values present such a problem, is possible to eliminate them somehow so that bad data(the negative values) doesn't even reach your logic ? This will eliminate the check all together. Could you post a short snippet of what you are doing ?
There isn't any support in the framework to declare BigInteger as unsigned. However, you could create a static method to check if the number is negative or not.
public static void ValidateBigIntForUnsigned(BigInteger bigInteger)
{
if(bigInteger.Sign < 0)
throw new Exception("Only unsigned numbers are allowed!");
}