I've been playing with Script#, and I was wondering how the C# numbers were converted to Javascript. I wrote this little bit of code
int a = 3 / 2;
and looked at the relevant bit of compiled Javascript:
var $0=3/2;
In C#, the result of 3 / 2 assigned to an int is 1, but in Javascript, which only has one number type, is 1.5.
Because of this disparity between the C# and Javascript behaviour, and since the compiled code doesn't seem to compensate for it, should I assume that my numeric calculations written in C# might behave incorrectly when compiled to Javascript?
Should I assume that my numeric calculations written in C# might behave incorrectly when compiled to Javascript?
Yes.
Like you said, "the compiled code doesn't seem to compensate for it" - though for the case you mention where a was declared as an int it would be easy enough to compensate by using var $0 = Math.floor(3/2);. But if you don't control how the "compiler" works you're in a pickle. (You could correct the JavaScript manually, but you'd have to do that every time you regenerated it. Yuck.)
Note also that you are likely to have problems with decimal numbers too due to the way JavaScript represents decimal places. Most people are surprised the first time they find out that JavaScript will tell you that 0.4 * 3 works out to be 1.2000000000000002. For more details see one of the many other questions on this issue, e.g., How to deal with floating point number precision in JavaScript?. (Actually I think C# handles decimals the same way, so maybe this issue won't be such a surprise. Still, it can be a trap for new players...)
Related
So this may be obvious but i have recently inherited some legacy code and scattered around the code are array indexes like this
someArray(&H7D0)
I get that this "&H7D0" is the index but how do i go about changing it to a real number as i am converting the code to C#.
the code is a mess and it's not obvious what it might be.
This is a Hexidecimal number. The equivalent C# would be someArray(0x7d0)
Both are equivalent to the decimal number 2000 so you could actually write someArray(2000) to allow the code to be used in both languages.
Is there a generally accepted best approach to coding complex math? For example:
double someNumber = .123 + .456 * Math.Pow(Math.E, .789 * Math.Pow((homeIndex + .22), .012));
Is this a point where hard-coding the numbers is okay? Or should each number have a constant associated with it? Or is there even another way, like storing the calculations in config and invoking them somehow?
There will be a lot of code like this, and I'm trying to keep it maintainable.
Note: The example shown above is just one line. There would be tens or hundreds of these lines of code. And not only could the numbers change, but the formula could as well.
Generally, there are two kinds of constants - ones with the meaning to the implementation, and ones with the meaning to the business logic.
It is OK to hard-code the constants of the first kind: they are private to understanding your algorithm. For example, if you are using a ternary search and need to divide the interval in three parts, dividing by a hard-coded 3 is the right approach.
Constants with the meaning outside the code of your program, on the other hand, should not be hard-coded: giving them explicit names gives someone who maintains your code after you leave the company non-zero chances of making correct modifications without having to rewrite things from scratch or e-mailing you for help.
"Is it okay"? Sure. As far as I know, there's no paramilitary police force rounding up those who sin against the one true faith of programming. (Yet.).
Is it wise?
Well, there are all sorts of ways of deciding that - performance, scalability, extensibility, maintainability etc.
On the maintainability scale, this is pure evil. It make extensibility very hard; performance and scalability are probably not a huge concern.
If you left behind a single method with loads of lines similar to the above, your successor would have no chance maintaining the code. He'd be right to recommend a rewrite.
If you broke it down like
public float calculateTax(person)
float taxFreeAmount = calcTaxFreeAmount(person)
float taxableAmount = calcTaxableAmount(person, taxFreeAmount)
float taxAmount = calcTaxAmount(person, taxableAmount)
return taxAmount
end
and each of the inner methods is a few lines long, but you left some hardcoded values in there - well, not brilliant, but not terrible.
However, if some of those hardcoded values are likely to change over time (like the tax rate), leaving them as hardcoded values is not okay. It's awful.
The best advice I can give is:
Spend an afternoon with Resharper, and use its automatic refactoring tools.
Assume the guy picking this up from you is an axe-wielding maniac who knows where you live.
I usually ask myself whether I can maintain and fix the code at 3 AM being sleep deprived six months after writing the code. It has served me well. Looking at your formula, I'm not sure I can.
Ages ago I worked in the insurance industry. Some of my colleagues were tasked to convert the actuarial formulas into code, first FORTRAN and later C. Mathematical and programming skills varied from colleague to colleague. What I learned was the following reviewing their code:
document the actual formula in code; without it, years later you'll have trouble remember the actual formula. External documentation goes missing, become dated or simply may not be accessible.
break the formula into discrete components that can be documented, reused and tested.
use constants to document equations; magic numbers have very little context and often require existing knowledge for other developers to understand.
rely on the compiler to optimize code where possible. A good compiler will inline methods, reduce duplication and optimize the code for the particular architecture. In some cases it may duplicate portions of the formula for better performance.
That said, there are times where hard coding just simplify things, especially if those values are well understood within a particular context. For example, dividing (or multiplying) something by 100 or 1000 because you're converting a value to dollars. Another one is to multiply something by 3600 when you'd like to convert hours to seconds. Their meaning is often implied from the greater context. The following doesn't say much about magic number 100:
public static double a(double b, double c)
{
return (b - c) * 100;
}
but the following may give you a better hint:
public static double calculateAmountInCents(double amountDue, double amountPaid)
{
return (amountDue - amountPaid) * 100;
}
As the above comment states, this is far from complex.
You can however store the Magic numbers in constants/app.config values, so as to make it easier for the next developer to maitain your code.
When storing such constants, make sure to explain to the next developer (read yourself in 1 month) what your thoughts were, and what they need to keep in mind.
Also ewxplain what the actual calculation is for and what it is doing.
Do not leave in-line like this.
Constant so you can reuse, easily find, easily change and provides for better maintaining when someone comes looking at your code for the first time.
You can do a config if it can/should be customized. What is the impact of a customer altering the value(s)? Sometimes it is best to not give them that option. They could change it on their own then blame you when things don't work. Then again, maybe they have it in flux more often than your release schedules.
Its worth noting that the C# compiler (or is it the CLR) will automatically inline 1 line methods so if you can extract certain formulas into one liners you can just extract them as methods without any performance loss.
EDIT:
Constants and such more or less depends on the team and the quantity of use. Obviously if you're using the same hard-coded number more than once, constant it. However if you're writing a formula that its likely only you will ever edit (small team) then hard coding the values is fine. It all depends on your teams views on documentation and maintenance.
If the calculation in your line explains something for the next developer then you can leave it, otherwise its better to have calculated constant value in your code or configuration files.
I found one line in production code which was like:
int interval = 1 * 60 * 60 * 1000;
Without any comment, it wasn't hard that the original developer meant 1 hour in milliseconds, rather than seeing a value of 3600000.
IMO May be leaving out calculations is better for scenarios like that.
Names can be added for documentation purposes. The amount of documentation needed depends largely on the purpose.
Consider following code:
float e = m * 8.98755179e16;
And contrast it with the following one:
const float c = 299792458;
float e = m * c * c;
Even though the variable names are not very 'descriptive' in the latter you'll have much better idea what the code is doing the the first one - arguably there is no need to rename the c to speedOfLight, m to mass and e to energy as the names are explanatory in their domains.
const float speedOfLight = 299792458;
float energy = mass * speedOfLight * speedOfLight;
I would argue that the second code is the clearest one - especially if programmer can expect to find STR in the code (LHC simulator or something similar). To sum up - you need to find an optimal point. The more verbose code the more context you provide - which might both help to understand the meaning (what is e and c vs. we do something with mass and speed of light) and obscure the big picture (we square c and multiply by m vs. need of scanning whole line to get equation).
Most constants have some deeper meening and/or established notation so I would consider at least naming it by the convention (c for speed of light, R for gas constant, sPerH for seconds in hour). If notation is not clear the longer names should be used (sPerH in class named Date or Time is probably fine while it is not in Paginator). The really obvious constants could be hardcoded (say - division by 2 in calculating new array length in merge sort).
Okay so I'm trying to make a basic malware scanner in C# my question is say I have the Hex signature for a particular bit of code
For example
{
System.IO.File.Delete(#"C:\Users\Public\DeleteTest\test.txt");
}
//Which will have a hex of 53797374656d2e494f2e46696c652e44656c657465284022433a5c55736572735c5075626c69635c44656c657465546573745c746573742e74787422293b
Gets Changed to -
{
System.IO.File.Delete(#"C:\Users\Public\DeleteTest\notatest.txt");
}
//Which will have a hex of 53797374656d2e494f2e46696c652e44656c657465284022433a5c55736572735c5075626c69635c44656c657465546573745c6e6f7461746573742e74787422293b
Keep in mind these bits will be within the entire Hex of the program - How could I go about taking my base signature and looking for partial matches that say have a 90% match therefore gets flagged.
I would do a wildcard but that wouldn't work for slightly more complex things where it might be coded slightly different but the majority would be the same. So is there a way I can do a percent match for a substring? I was looking into the Levenshtein Distance but I don't see how I'd apply it into this given scenario.
Thanks in advance for any input
Using an edit distance would be fine. You can take two strings and calculate the edit distance, which will be an integer value denoting how many operations are needed to take one string to the other. You set your own threshold based off that number.
For example, you may statically set that if the distance is less than five edits, the change is relevant.
You could also take the length of string you are comparing and take a percentage of that. Your example is 36 characters long, so (int)(input.Length * 0.88m) would be a valid threashold.
First, your program bits should match EXACTLY or else it has been modified or is corrupt. Generally, you will store an MD5 hash on the original binary and check the MD5 against new versions to see if they are 'the same enough' (MD5 can't guarantee a 100% match).
Beyond this, in order to detect malware in a random binary, you must know what sort of patterns to look for. For example, if I know a piece of malware injects code with some binary XYZ, I will look for XYZ in the bits of the executable. Patterns get much more complex than that, of course, as the malware bits can be spread out in chuncks. What is more interesting is that some viruses are self-morphing. This means that each time it runs, it modifies itself, meaning the scanner does not know an exact pattern to find. In these cases, the scanner must know the types of derivatives can be produced and look for all of them.
In terms of finding a % match, this operation is very time consuming unless you have constraints. By comparing 2 strings, you cannot tell which pieces were removed, added, or replaced. For instance, if I have a starting string 'ABCD', is 'AABCDD' a 100% match or less since content has been added? What about 'ABCDABCD'; here it matches twice. How about 'AXBXCXD'? What about 'CDAB'?
There are many DIFF tools in existence that can tell you what pieces of a file have been changed (which can lead to a %). Unfortunately, none of them are perfect because of the issues that I described above. You will find that you have false negatives, false positives, etc. This may be 'good enough' for you.
Before you can identify a specific algorithm that will work for you, you will have to decide what the restrictions of your search will be. Otherwise, your scan will be NP-hard, which leads to unreasonable running times (your scanner may run all day just to check one file).
I suggest you look into Levenshtein distance and Damerau-Levenshtein distance.
The former tells you how many add/delete operations are needed to turn one string into another; and the latter tells you how many add/delete/replace operations are needed to turn one string into another.
I use these quite a lot when writing programs where users can search for things, but they may not know the exact spelling.
There are code examples on both articles.
I'm trying to read-in a bunch of unsigned integers from a configuration file into a class. These numbers may be specified in either base-10 (eg: 1234) or in base-16 (eg: 0xAB31). Therefore looking for the strtoul equivalent in C# 2.0.
More specifically, I'm interested in a C# function which mimics the behaviour of the this function when the argument indicating the base or radix is passed in as zero. (Under C++, strtoul will attempt to 'guess' the base or radix based on the first couple of characters in the string and then proceed to convert the number suitably)
Currently I'm manually checking the first two characters (using string.Substring() method) of the string and then calling Convert.ToUInt32(hex, 10) or Convert.ToUInt32(hex, 16) as needed.
I'm sure that there has to be a better way to deal with this problem and hence this post. More elegant ideas/solutions or work-arounds would be great help.
Well, you don't need to use Substring unless it's in hex, but it sounds like you're basically doing it the right way:
return text.StartsWith("0x") ? Convert.ToUInt32(text.Substring(2), 16)
: Convert.ToUInt32(text, 10);
Obviously this will create an extra object for the Substring call, and you could write your own hex parsing code to cope with this - but unless you've actually run into performance problems with this approach, I'd keep it simple.
I'm solving problems in Project Euler. Most of the problems solved by
big numbers that exceeds ulong,
Ex : ulong number = 81237146123746237846293567465365862854736263874623654728568263582;
very sensitive decimal numbers with significant digits over 30
Ex : decimal dec =
0,3242342543573894756936576474978265726385428569234753964340653;
arrays that must have index values that exceeds biggest int value.
Ex : bool[] items = new
bool[213192471235494658346583465340673475263842864836];
I found a library called IntX to solve this big numbers. But I wonder how can I solve this problems with basic .NET types ?
Thanks for the replies !
Well, for the third item there you really don't want to use an array, since it needs to be allocated that big as well.
Let me rephrase that.
By the time you can afford, and get access to, that much memory, the big-number problem will be solved!
To answer your last question there, there is no way you can solve this using only basic types, unless you do what the makers of IntX did, implement big-number support.
Might I suggest you try a different programming language for the euler-problems? I've had better luck with Python, since it has support for big numbers out of the box and integrated into everything else. Well, except for that array, you really can't do that in any language these days.
Maybe this could give you ideas to how to solve part of your problem:
http://www.codeproject.com/csharp/BigInteger.asp
Wikipedia also has a good article about Arbitrary-precision math and in that article there is a link to Codeplex and W3b.sine wich is an arbitrary precision real number c# library.
Well, I suggest you take look at this other answer to see how I solved the Big Numbers problem. Basically, you need to represent numbers in another way ...
Most of the problems solved by
big numbers that exceeds ulong,
very sensitive decimal numbers with significant digits over 30
arrays that must have index values that exceeds biggest int value.
Most of the problems are designed to fit within 64 bit longs. There are one or two which require bigger integers, but not many. None I've seen require decimal numbers with more than 30 digits, and none require arrays larger than a few thousand entries.
Remember that the correct solutions to the problems should run in a few seconds at most, and populating an array of 213192471235494658346583465340673475263842864836 bits will take 10^30 years.
Another options might be to use the BigInt type that is available in F#: http://cs.hubfs.net/forums/thread/887.aspx