Is this an efficient way to parse function parameters? - c#

So I am new to C#, but one thing I already like coming from other higher level languages is the ability to do bitwise operations in (close to) C. I have a bunch of functions where some or all parameters are optional, and I like switches, so I built a function that converts boolean arrays to unsigned Shorts which allows me to basically Mux a boolean array to a single value for the switch:
namespace firstAsp.Helpers{
public class argMux{
public static ushort ba2ushort (bool[] parms){
//initialize position and output
ushort result = 0;
int i = parms.Length-1;
foreach (bool b in parms){
if (b)//put a one in byte at position of b
//bitwise or with position
result |= (ushort)(1<<i);
i--;
}
return result;
}
}
}
Here is an example use case:
public IActionResult Cheese(string fname,string lname)
{
bool[] tf = {fname!=null,lname!=null};
switch(argMux.ba2ushort(tf)){
case 3:
#ViewData["Data"]=$"Hello, {fname} {lname}";
break;
case 2:
#ViewData["Data"]=$"Hello, {fname}";
break;
case 1:
#ViewData["Data"]=$"Hello, Dr. {lname}";
break;
case 0:
#ViewData["Data"]="Hello, Dr. CheeseBurger";
break;
}
return View();
}
My question is, Is this an efficient way to do this, or is there a way that is superior? I am aiming for simplicity of use which this definitely delivers for me, but I would also like it to be efficient code that is fast at runtime. Any pointers? Is this a stupid way to do this? Any and all feedback is welcome, you can even call me an idiot if you believe it, I'm not too sensitive. Thanks!

This is all bad.
The correct way to write your method is to use none of this:
public IActionResult Cheese(string firstName, string lastName)
{
#ViewData["Data"]=$"Hello, {firstName ?? "Dr."} {lastName ?? "Cheeseburger"}";
return View();
}
one thing I already like coming from other higher level languages is the ability to do bitwise operations in (close to) C.
If you are twiddling bits to solve a high level business problem, you are probably doing something wrong. Solve high-level business problems with high-level business code.
Also, if you are using unsigned types in C#, odds are good you are doing something wrong. Unsigned types are there for interoperability with unmanaged code. It is very rare to use a ushort or a uint or a ulong for anything else in C#. Quantities which are logically unsigned, like the length of an array, are always represented as signed quantities.
C# has many features which are designed to ensure that people who have COM libraries can continue to use their libraries, and so that people who need the raw, unchecked performance of pointer arithmetic can do so when it is reasonably safe. Don't mistake the existence of those low-level programming features as evidence that C# is typically used as a low-level programming language. Write your code so that it reads clearly as an implementation of a business workflow.
The business of your code is rendering a name as a string, so it should clearly read as rendering a name as a string. If I asked you to write down a name on a piece of paper, the first thing you'd do would not be to make a bit array to help you, so it shouldn't be here either.
Now, there might be some cases where this sort of thing is sensible, and in those cases you should use an enum rather than treating a short as a bit field:
[Flags]
enum Permissions
{
None = 0x00,
Read = 0x01,
Write = 0x02,
ReadWrite = 0x03,
Delete = 0x04,
ReadDelete = 0x05,
WriteDelete = 0x06,
ReadWriteDelete = 0x07
}
...
static Permissions GetPermission(bool read, bool write, bool delete) {
var p1 = read ? Permissions.Read : Permissions.None;
var p2 = write ? Permissions.Write : Permissions.None;
var p3 = delete ? Permissions.Delete : Permissions.None;
return p1 | p2 | p3;
}
And now you have a convenient
switch(p)
{
case Permissions.None: ...
case Permissions.Read: ...
case Permissions.Write: ...
case Permissions.ReadWrite: ...
But notice here that we keep everything in the business domain. What are we doing? Checking permissions. So what does the code look like? Like it is checking permissions. Not twiddling a bunch of bits and then switching on an integer.

Related

Exclude extra private field in struct with LayoutKind.Explicit from being part of the structure layout

Let's say we have one structure :
[StructLayout(LayoutKind.Explicit, Size=8)] // using System.Runtime.InteropServices;
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
}
What I want to have : Both direct access to type string and int values, for the field Ident in this structure, without breaking the 8 bytes size of the structure, nor having to compute a string value each time from the int value.
The field Ident in that structure as int is interesting because I can fast compare with other idents if they match, other idents may come from datas that are unrelated to this structure, but are in the same int format.
Question : Is there a way to define a field that is not part of the struture layout ? Like :
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
[NoOffset()] // <- is there something I can do the like of this
string _identStr;
public string IdentStr {
get { // EDIT ! missed the getter on this property
if (string.IsNullOrEmpty(_identStr)) _identStr =
System.Text.Encoding.ASCII.GetString(Ident.GetBytes());
// do the above only once. May use an extra private bool field to go faster.
return _identStr;
}
}
}
PS : I use pointers ('*' and '&', unsafe) because I need to deal with endianness (Local system, binary files/file format, network) and fast type conversions, fast arrays filling. I also use many flavours of Marshal methods (fixing structures on byte arrays), and a little of PInvoke and COM interop. Too bad some assemblies I'm dealing with doesn't have their dotNet counterpart yet.
TL;DR; For details only
The question is all it is about, I just don't know the answer. The following should answer most questions like "other approaches", or "why not do this instead", but could be ignored as the answer would be straightforward. Anyway, I preemptively put everything so it's clear from the start what am I trying to do. :)
Options/Workaround I'm currently using (or thinking of using) :
Create a getter (not a field) that computes the string value each time :
public string IdentStr {
get { return System.Text.Encoding.ASCII.GetString(Ident.GetBytes()); }
// where GetBytes() is an extension method that converts an int to byte[]
}
This approach, while doing the job, performs poorly : The GUI displays aircraft from a database of default flights, and injects other flights from the network with a refresh rate of one second (I should increase that to 5 seconds). I have around 1200 flights within a area, relating to 2400 airports (departure and arrival), meaning I have 2400 calls to the above code each second to display the ident in a DataGrid.
Create another struct (or class), which only purpose is to manage
data on GUI side, when not reading/writing to a stream or file. That means, read
the data with the explicit layout struct. Create another struct with
the string version of the field. Work with GUI. That will perform
better on an overall point of view, but, in the process of defining
structures for the game binaries, I'm already at 143 structures of
the kind (just with older versions of the game datas; there are a bunch I didn't write yet, and I plan to add structures for the newest datas types). ATM, more than half of them require one or more extra
fields to be of meaningful use. It's okay if I were the only one to use the assembly, but
other users will probably get lost with AirportHeader,
AirportHeaderEx, AirportEntry, AirportEntryEx,
AirportCoords, AirportCoordsEx.... I would avoid doing that.
Optimize option 1 to make computations perform faster (thanks to SO,
there are a bunch of ideas to look for - currently working on the idea). For the Ident field, I
guess I could use pointers (and I will). Already doing it for fields I must display in little endian and read/write in big
endian. There are other values, like 4x4 grid informations that are
packed in a single Int64 (ulong), that needs bit shifting to
expose the actual values. Same for GUIDs or objects pitch/bank/yaw.
Try to take advantage of overlapping fields (on study). That would work for GUIDs. Perhaps it may work for the Ident example, if MarshalAs can constrain the
value to an ASCII string. Then I just need to specify the same
FieldOffset, '0' in this case. But I'm unsure setting the field
value (entry.FieldStr = "FMEP";) actually uses the Marshal constrain on the managed code side. My undestanding is it will store the string in Unicode on managed side (?).
Furthermore, that wouldn't work for packed bits (bytes that contains
several values, or consecutive bytes hosting values that have to be
bit shifted). I believe it is impossible to specify value position, length and format
at bit level.
Why bother ? context :
I'm defining a bunch of structures to parse binary datas from array of bytes (IO.File.ReadAllBytes) or streams, and write them back, datas related to a game. Application logic should use the structures to quickly access and manipulate the datas on demand. Assembly expected capabilities is read, validate, edit, create and write, outside the scope of the game (addon building, control) and inside the scope of the game (API, live modding or monitoring). Other purpose is to understand the content of binaries (hex) and make use of that understanding to build what's missing in the game.
The purpose of the assembly is to provide a ready to use basis components for a c# addon contributor (I don't plan to make the code portable). Creating applications for the game or processing addon from source to compilation into game binaries. It's nice to have a class that loads the entire content of a file in memory, but some context require you to not do that, and only retrieve from the file what is necessary, hence the choice of the struct pattern.
I need to figure out the trust and legal issues (copyrighted data) but that's outside the scope of the main concern. If that matter, Microsoft did provide over the years public freely accessible SDKs exposing binaries structures on previous versions of the game, for the purpose of what I'm doing (I'm not the first and probably not the last to do so). Though, I wouldn't dare to expose undocumented binaries (for the latest game datas for instance), nor facilitate a copyright breach on copyrighted materials/binaries.
I'm just asking confirmation if there is a way or not to have private fields not being part of the structure layout. Naive belief ATM is "that's impossible, but there are workarounds". It's just that my c# experience is pretty sparce, so maybe I'm wrong, why I ask. Thanks !
As suggested, there are several ways to get the job done. Here are the getters/setters I came up with within the structure. I'll measure how each code performs on various scenarios later. The dict approach is very seducing as on many scenarios, I would need a directly accessible global database of (59000) airports with runways and parking spots (not just the Ident), but a fast check between struct fields is also interesting.
public string IdentStr_Marshal {
get {
var output = "";
GCHandle pinnedHandle; // CS0165 for me (-> c# v5)
try { // Fast if no exception, (very) slow if exception thrown
pinnedHandle = GCHandle.Alloc(this, GCHandleType.Pinned);
IntPtr structPtr = pinnedHandle.AddrOfPinnedObject();
output = Marshal.PtrToStringAnsi(structPtr, 4);
// Cannot use UTF8 because the assembly should work in Framework v4.5
} finally { if (pinnedHandle.IsAllocated) pinnedHandle.Free(); }
return output;
}
set {
value.PadRight(4); // Must fill the blanks - initial while loop replaced (Charlieface's)
IntPtr intValuePtr = IntPtr.Zero;
// Cannot use UTF8 because some users are on Win7 with FlightSim 2004
try { // Put a try as a matter of habit, but not convinced it's gonna throw.
intValuePtr = Marshal.StringToHGlobalAnsi(value);
Ident = Marshal.ReadInt32(intValuePtr, 0).BinaryConvertToUInt32(); // Extension method to convert type.
} finally { Marshal.FreeHGlobal(intValuePtr); // freeing the right pointer }
}
}
public unsafe string IdentStr_Pointer {
get {
string output = "";
fixed (UInt32* ident = &Ident) { // Fixing the field
sbyte* bytes = (sbyte*)ident;
output = new string(bytes, 0, 4, System.Text.Encoding.ASCII); // Encoding added (#Charlieface)
}
return output;
}
set {
// value must not exceed a length of 4 and must be in Ansi [A-Z,0-9,whitespace 0x20].
// value validation at this point occurs outside the structure.
fixed (UInt32* ident = &Ident) { // Fixing the field
byte* bytes = (byte*)ident;
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value);
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
}
}
}
static Dictionary<UInt32, string> ps_dict = new Dictionary<UInt32, string>();
public string IdentStr_StaticDict {
get {
string output; // logic update with TryGetValue (#Charlieface)
if (ps_dict.TryGetValue(Ident, out output)) return output;
output = System.Text.Encoding.ASCII.GetString(Ident.ToBytes(EndiannessType.LittleEndian));
ps_dict.Add(Ident, output);
return output;
}
set { // input can be "FMEE", "DME" or "DK". length of 2 characters is the minimum.
var bytes = new byte[4]; // Need to convert value to a 4 byte array
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value); // should be 4 bytes or less
// Put the valid ASCII codes in the array.
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
Ident = BitConverter.ToUInt32(bytes, 0); // Set structure int value
if (!ps_dict.ContainsKey(Ident)) // Add if missing
ps_dict.Add(Ident, System.Text.Encoding.ASCII.GetString(bytes));
}
}
As mentioned by others, it is not possible to exclude a field from a struct for marshalling.
You also cannot use a pointer as a string in most places.
If the number of different possible strings is relatively small (and it probably will be, given it's only 4 characters), then you could use a static Dictionary<int, string> as a kind of string-interning mechanism.
Then you write a property to add/retrieve the real string.
Note that dictionary access is O(1), and hashing an int just returns itself, so it will be very, very fast, but will take up some memory.
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader
{
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
static Dictionary<int, string> _identStrings = new Dictionary<int, string>();
public string IdentStr =>
_identStrings.TryGetValue(Ident, out var ret) ? ret :
(_identStrings[Ident] = Encoding.ASCII.GetString(Ident.GetBytes());
}
This is not possible because a structure must contain all of its values ​​in a specific order. Usually this order is controlled by the CLR itself. If you want to change the order of the data order, you can use the StructLayout. However, you cannot exclude a field or that data would simply not exist in memory.
Instead of a string (which is a reference type) you can use a pointer to point directly to that string and use that in your structure in combination with the StructLayout. To get this string value, you can use a get-only property that reads directly from unmanaged memory.

How to use properties with an integral overflow (byte)?

I am fairly new to programming and C#, and I am creating a game using C# 9.0 in which all instances of Entity have certain stats. I want to be able to change their private data fields using properties, though I'm not entirely sure how properties work. I know they are useful in encapsulation as getters and setters.
Context:
I am trying to optimize code and decrease memory usage where possible
The byte field str should be variable (through events, training, etc.), but have a "ceiling" and "floor"
If dog.str = 253, then dog.Str += 5; should result in dog.str being 255
If dog.str = 2, then dog.Str -= 5; should result in dog.str being 0
private byte str;
public short Str
{
get => str;
set
{
if (value > byte.MaxValue) str = byte.MaxValue; //Pos Overflow
else if (value < byte.MinValue) str = byte.MinValue; //Neg Overflow
else str = (byte)value;
}
}
Questions:
Since the property is of datatype Short, does it create a new private backing field that consumes memory? Or is value/Str{set;} just a local variable that later disappears?
Does the property public float StrMod {get => (float)(str*Effects.Power);} create a backing field? Would it be better to just create a method like public float getStrMod() instead?
Is this code optimal for what I'm trying to achieve? Is there some better way to do this, considering the following?
If for some reason the Short overflowed (unlikely in this scenario, but there may be a similar situation), then I would end up with the same problem. However, if extra memory allocation isn't an issue, then I could use an int.
The {get;} will return a Short, which may or may not be an issue.
Question 1:
No it doesn't, its backing field is str.
Question 2:
Profile your code first instead of making random changes in hope to reduce memory usage.
"Premature optimization is the root of all evil", do you really have such issues at this point ?
Personally I'd use int and use same type for property and backing field for simplicity.
This would avoid wrapping such as assigning 32768 which would then result as -32768 for short.
Side note, don't think that using byte necessarily results in 1 byte, if you have tight packing requirements then you need to look at StructLayoutAttribute.Pack.
Other than that I see nothing wrong with your code, just get it to work first then optimize it!
Here's how I'd write your code, maybe you'll get some ideas from it:
class Testing
{
private int _value;
public int Value
{
get => _value;
set => _value = Clamp(value, byte.MinValue, byte.MaxValue);
}
private static int Clamp(int value, int min, int max)
{
return Math.Max(min, Math.Min(max, value));
}
}
EDIT:
Different scenarios:
class Testing
{
private int _value1;
public int Value1 // backing field is _value1
{
get => _value1;
set => _value1 = value;
}
public int Value2 { get; set; } // adds a backing field
public int Value3 { get; } // adds a backing field
public int Value4 => 42; // no backing field
}
As you might have guessed, properties are syntactic sugar for methods, they can do 'whatever' under the hood compared to a field which can only be assigned a value to.
Also, one difference with a method is that you can browse its value in the debugger, that's handy.
Suggested reading:
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/properties
Finally, properties are expected to return quickly, else write a method, and possibly async if it's going to take a while (advantage to method in this case as properties can't be async).
#aybe answer covers main thing about you question. I would like to add additional info to your 2nd question. You should consider on which platform you write application. There is a word term:
In computing, a word is the natural unit of data used by a particular
processor design. A word is a fixed-sized piece of data handled as a
unit by the instruction set or the hardware of the processor. The
number of bits in a word (the word size, word width, or word length)
is an important characteristic of any specific processor design or
computer architecture.
If processor has 64 bit word, then every variable which type is less than 64 bits will still occupy 64 bits in memory. Keep in mind that variable of given type will be handled as given type and size in memory doesn't impact range, overflow or underflow - arithmetic will be processed for given type.
In short - if you have 64-bit desktop processor and you will use only short variables, then you will not observe any memory savings in comparison to declaring int variables.

Unsafe pointer manipulation

I'm trying to write a CPU emulator in C#. The machine's object looks like this:
class Machine
{
short a,b,c,d; //these are registers.
short[] ram=new short[0x10000]; //RAM organised as 65536 16-bit words
public void tick() { ... } //one instruction is processed
}
When I execute an instruction, I have a switch statement which decides what the result of the instruction will be stored in (either a register or a word of RAM)
I want to be able to do this:
short* resultContainer;
if (destination == register)
{
switch (resultSymbol) //this is really an opcode, made a char for clarity
{
case 'a': resultContainer=&a;
case 'b': resultContainer=&b;
//etc
}
}
else
{
//must be a place in RAM
resultContainer = &RAM[location];
}
then, when I've performed the instruction, I can simply store the result like:
*resultContainer = result;
I've been trying to figure out how to do this without upsetting C#.
How do I accomplish this using unsafe{} and fixed(){ } and perhaps other things I'm not aware of?
This is how I would implement it:
class Machine
{
short a, b, c, d;
short[] ram = new short[0x10000];
enum AddressType
{
Register,
DirectMemory,
}
// Gives the address for an operand or for the result.
// `addressType`and `addrCode` are extracted from instruction opcode
// `regPointers` and `ramPointer` are fixed pointers to registers and RAM.
private unsafe short* GetAddress(AddressType addressType, short addrCode, short*[] regPointers, short* ramPointer)
{
switch (addressType)
{
case AddressType.Register:
return regPointers[addrCode]; //just an example implementation
case AddressType.DirectMemory:
return ramPointer + addrCode; //just an example implementation
default:
return null;
}
}
public unsafe void tick()
{
fixed (short* ap = &a, bp = &b, cp = &c, dp = &d, ramPointer = ram)
{
short*[] regPointers = new short*[] { ap, bp, cp, dp };
short* pOperand1, pOperand2, pResult;
AddressType operand1AddrType, operand2AddrType, resultAddrType;
short operand1AddrCode, operand2AddrCode, resultAddrCode;
// ... decipher the instruction and extract the address types and codes
pOperand1 = GetAddress(operand1AddrType, operand1AddrCode, regPointers, ramPointer);
pOperand2 = GetAddress(operand2AddrType, operand2AddrCode, regPointers, ramPointer);
pResult = GetAddress(resultAddrType, resultAddrCode, regPointers, ramPointer);
// execute the instruction, using `*pOperand1` and `*pOperand2`, storing the result in `*pResult`.
}
}
}
To get the address of the registers and RAM array, you have to use the fixed statement. Also you can only use the acquired pointers in the fixed block. So you have to pass around the pointers.
What if we look at the problem from another view? Execute the instruction and store the result in a variable (result), and then decide on where you should put the result. Doesn't work?
A good option, one I had planned for DevKit but not yet implemented, is to generate your emulation code. It's a time-honoured, tried and tested solution. For every opcode/register combination, or some efficient subset, generate the code for ADD A,B and SUB X,Y separately. You can use a bitmask to grab the specific case you're looking for and simply execute the correct code. Compiler should be able to use a jump table under the hood, and there's no lookups, no conditionals, should be pretty efficient.

How to improve hashing for short strings to avoid collisions?

I am having a problem with hash collisions using short strings in .NET4.
EDIT: I am using the built-in string hashing function in .NET.
I'm implementing a cache using objects that store the direction of a conversion like this
public class MyClass
{
private string _from;
private string _to;
// More code here....
public MyClass(string from, string to)
{
this._from = from;
this._to = to;
}
public override int GetHashCode()
{
return string.Concat(this._from, this._to).GetHashCode();
}
public bool Equals(MyClass other)
{
return this.To == other.To && this.From == other.From;
}
public override bool Equals(object obj)
{
if (obj == null) return false;
if (this.GetType() != obj.GetType()) return false;
return Equals(obj as MyClass);
}
}
This is direction dependent and the from and to are represented by short strings like "AAB" and "ABA".
I am getting sparse hash collisions with these small strings, I have tried something simple like adding a salt (did not work).
The problem is that too many of my small strings like "AABABA" collides its hash with the reverse of "ABAAAB" (Note that these are not real examples, I have no idea if AAB and ABA actually cause collisions!)
and I have gone heavy duty like implementing MD5 (which works, but is MUCH slower)
I have also implemented the suggestion from Jon Skeet here:
Should I use a concatenation of my string fields as a hash code?
This works but I don't know how dependable it is with my various 3-character strings.
How can I improve and stabilize the hashing of small strings without adding too much overhead like MD5?
EDIT: In response to a few of the answers posted... the cache is implemented using concurrent dictionaries keyed from MyClass as stubbed out above. If I replace the GetHashCode in the code above with something simple like #JonSkeet 's code from the link I posted:
int hash = 17;
hash = hash * 23 + this._from.GetHashCode();
hash = hash * 23 + this._to.GetHashCode();
return hash;
Everything functions as expected.
It's also worth noting that in this particular use-case the cache is not used in a multi-threaded environment so there is no race condition.
EDIT: I should also note that this misbehavior is platform dependant. It works as intended on my fully updated Win7x64 machine but does not behave properly on a non-updated Win7x64 machine. I don't know the extend of what updates are missing but I know it doesn't have Win7 SP1... so I would assume there may also be a framework SP or update it's missing as well.
EDIT: As susggested, my issue was not caused by a problem with the hashing function. I had an elusive race condition, which is why it worked on some computers but not others and also why a "slower" hashing method made things work properly. The answer I selected was the most useful in understanding why my problem was not hash collisions in the dictionary.
Are you sure that collisions are who causes problems? When you say
I finally discovered what was causing this bug
You mean some slowness of your code or something else? If not I'm curious what kind of problem is that? Because any hash function (except "perfect" hash functions on limited domains) would cause collisions.
I put a quick piece of code to check for collisions for 3-letter words. And this code doesn't report a single collision for them. You see what I mean? Looks like buid-in hash algorithm is not so bad.
Dictionary<int, bool> set = new Dictionary<int, bool>();
char[] buffer = new char[3];
int count = 0;
for (int c1 = (int)'A'; c1 <= (int)'z'; c1++)
{
buffer[0] = (char)c1;
for (int c2 = (int)'A'; c2 <= (int)'z'; c2++)
{
buffer[1] = (char)c2;
for (int c3 = (int)'A'; c3 <= (int)'z'; c3++)
{
buffer[2] = (char)c3;
string str = new string(buffer);
count++;
int hash = str.GetHashCode();
if (set.ContainsKey(hash))
{
Console.WriteLine("Collision for {0}", str);
}
set[hash] = false;
}
}
}
Console.WriteLine("Generated {0} of {1} hashes", set.Count, count);
While you could pick almost any of well-known hash functions (as David mentioned) or even choose a "perfect" hash since it looks like your domain is limited (like minimum perfect hash)... It would be great to understand if the source of problems are really collisions.
Update
What I want to say is that .NET build-in hash function for string is not so bad. It doesn't give so many collisions that you would need to write your own algorithm in regular scenarios. And this doesn't depend on the lenght of strings. If you have a lot of 6-symbol strings that doesn't imply that your chances to see a collision are highier than with 1000-symbol strings. This is one of the basic properties of hash functions.
And again, another question is what kind of problems do you experience because of collisions? All build-in hashtables and dictionaries support collision resolution. So I would say all you can see is just... probably some slowness. Is this your problem?
As for your code
return string.Concat(this._from, this._to).GetHashCode();
This can cause problems. Because on every hash code calculation you create a new string. Maybe this is what causes your issues?
int hash = 17;
hash = hash * 23 + this._from.GetHashCode();
hash = hash * 23 + this._to.GetHashCode();
return hash;
This would be much better approach - just because you don't create new objects on the heap. Actually it's one of the main points of this approach - get a good hash code of an object with a complex "key" without creating new objects. So if you don't have a single value key then this should work for you. BTW, this is not a new hash function, this is just a way to combine existing hash values without compromising main properties of hash functions.
Any common hash function should be suitable for this purpose. If you're getting collisions on short strings like that, I'd say you're using an unusually bad hash function. You can use Jenkins or Knuth's with no issues.
Here's a very simple hash function that should be adequate. (Implemented in C, but should easily port to any similar language.)
unsigned int hash(const char *it)
{
unsigned hval=0;
while(*it!=0)
{
hval+=*it++;
hval+=(hval<<10);
hval^=(hval>>6);
hval+=(hval<<3);
hval^=(hval>>11);
hval+=(hval<<15);
}
return hval;
}
Note that if you want to trim the bits of the output of this function, you must use the least significant bits. You can also use mod to reduce the output range. The last character of the string tends to only affect the low-order bits. If you need a more even distribution, change return hval; to return hval * 2654435761U;.
Update:
public override int GetHashCode()
{
return string.Concat(this._from, this._to).GetHashCode();
}
This is broken. It treats from="foot",to="ar" as the same as from="foo",to="tar". Since your Equals function doesn't consider those equal, your hash function should not. Possible fixes include:
1) Form the string from,"XXX",to and hash that. (This assumes the string "XXX" almost never appears in your input strings.
2) Combine the hash of 'from' with the hash of 'to'. You'll have to use a clever combining function. For example, XOR or sum will cause from="foo",to="bar" to hash the same as from="bar",to="foo". Unfortunately, choosing the right combining function is not easy without knowing the internals of the hashing function. You can try:
int hc1=from.GetHashCode();
int hc2=to.GetHashCode();
return (hc1<<7)^(hc2>>25)^(hc1>>21)^(hc2<<11);

Best approach to programming highly complex business/math rules

I have to take a piece of data, and apply a large number of possible variables to it. I really don't like the idea of using a gigantic set of if statements, so i'm looking for help in an approach to simplify, and make it easier to maintain.
As an example:
if (isSoccer)
val = soccerBaseVal;
else if (isFootball)
val = footballBaseVal;
.... // 20 different sports
if (isMale)
val += 1;
else
val += 5;
switch(dayOfWeek)
{
case DayOfWeek.Monday:
val += 12;
...
}
etc.. etc.. etc.. with possibly in the range of 100-200 different tests and formula variations.
This just seems like a maintenance nightmare. Any suggestions?
EDIT:
To further add to the problem, many variables are only used in certain situations, so it's more than just a fixed set of logic with different values. The logic itself has to change based on conditions, possibly conditions applied from previous variables (if val > threshold, for instance).
So yes, i agree about using lookups for many of the values, but I also have to have variable logic.
A common way to avoid large switching structures is to put the information into data structures. Create an enumeration SportType and a Dictionary<SportType, Int32> containing the associated values. The you can simply write val += sportTypeScoreMap[sportType] and you are done.
Variations of this pattern will help you in many similar situations.
public enum SportType
{
Soccer, Football, ...
}
public sealed class Foo
{
private static readonly IDictionary<SportType, Int32> sportTypeScoreMap =
new Dictionary<SportType, Int32>
{
{ Soccer, 30 },
{ Football, 20 },
...
}
private static readonly IDictionary<DayOfWeek, Int32> dayOfWeekScoreMap =
new Dictionary<DayOfWeek, Int32>
{
{ DayOfWeek.Monday, 12 },
{ DayOfWeek.Tuesday, 20 },
...
}
public Int32 GetScore(SportType sportType, DayOfWeek dayOfWeek)
{
return Foo.sportTypeScoreMap[sportType]
+ Foo.dayOfWeekScoreMap[dayOfWeek];
}
}
Use either a switch statement or filter function.
By filter function, I mean something like:
func filter(var object, var value)
{
if(object == value)
object = valueDictionary['value'];
}
Then apply the filter with:
filter(theObject, soccer)
filter(theObject, football)
Note that the filter works much better using a dictionary, but it is not required.
Cribbing from The Pragmatic Programmer, you could use a DSL to encapsulate the rules and write a process engine. For your presented problem, a solution might look like:
MATCH{
Soccer soccerBaseVal
IsMale 5
!IsMale 1
}
SWITCH{
Monday 12
Tuesday 13
}
Then match everything in the first col of MATCH, and the first item in each SWITCH you come to. You can make whatever syntax you feel like, then just write a bit of script to cram that into code (or use Xtext because it looks pretty cool).
Here are a few ideas:
1 Use lookup tables:
var val = 0;
SportType sportType = GetSportType();
val += sportvalues[sportType];
You can load the table from the database.
2 Use the factory pattern:
var val = 0;
val += SportFactory.Create(sportType).CalculateValue();
The Dynamic Factory Pattern is useful in situations were new (sport) types are added frequently to the code. This pattern uses reflection to prevent the factory class (or any global configuration) from being changed. It allows you to simply add a new class to your code.
Of course the use of an dynamic factory, or even a factory can be overkill in your situation. You're the only one who can tell.
As a first step I would probably break up each logical processing area into its own method: (May not be the best names as a first pass)
EnforceSportRules
ProcessSportDetails
EnforceGenderRules
Next, depending on how complex the rules are, I may break each section into its own class and have them processed by a main class (like a factory).
GenderRules
GenderContext
I have nothing special to offer you than to first recommend not to just leave it as a big block-- break it into sections, make comment dividers between important parts.
Another suggestion is if you are going to have many very short tests as in your example, break from convention and put the val incrementors on the same line as the evaluatation and indent so they align with eachother.
if (isSoccer) val = soccerBaseVal;
if (isMale) val += 1;
else val += 5;
switch(dayOfWeek){
case DayOfWeek.Monday: val += 12;
...
}
Excess whitespace can make those hundred things into several hundred lines, making vertical scrolling excessive and difficult to get an overall view of the thing.
If you are really just adding values in this sort, I would either create an enumeration with defined indices that correspond to stored values in an array. Then you can do something like this:
enum Sport
{
football = 0,
soccer = 1,
//...
}
int sportValues[] = {
/* footballValue */,
/* soccerValue */,
/* ...Values */
};
int ApplyRules(Sport sport, /* other params */)
{
int value = startingValue;
value += sportValues[(int)sport];
value += /* other rules in same fashion */;
}
Consider implementing the Strategy Pattern which utilizes inheritance/polymorphism to make managing individual functions sane. By seperating each function into its own dedicated class you can forego the nightmare of having miles-long case blocks or if statements.
Not sure if C# supports it yet (or ever will) but VB.NET integrates XML Comment CompletionList directives into intellisense, which--when combined with the Strategy Pattern--can give you the ease of use of an Enum with the open-ended extensibility of OO.

Categories