I'm doing a code review and I've noticed that the developer has done this:
UserSession.LocationId = CheckInteger(elementValue);
with this wrapper
private int CheckInteger(string elementValue)
{
int outNumber;
int.TryParse(elementValue, out outNumber);
return outNumber;
}
I can't see that this brings very much to the party. Should I push back, or just let sleeping dogs lie? I don't think there's any particular company policy that covers this.
Are they sleeping dogs or wolves? This code effectively discards non-integer values and returns 0 for failed cases.
This may or may not be appropriate, but the name certainly is misleading - no check is performed at all. Errors are simply discarded. It's almost equivalent to hiding exceptions.
This can hurt you when invalid values aren't detected and 0s are set in unexpected places.
Such a method should at least be renamed to GetIntegerOrDefault so it's explicit that it will return a default value on error.
You should also ensure that 0 is a valid value for LocationId or at least an expected default. Otherwise you may encounter some very hard to diagnose bugs.
I've encountered bugs in a Stock Exchange application that send portfolio performance ratios of -100%, because someone 10 levels bellow decided to return 0 on invalid exchange rate values, and nobody remembered 5 years later.
Given that you could just, in this case, write int.TryParse(elementValue, out UserSession.LocationId) I would say this is not a good idea.
Now, if you needed to add logging to it or deal with special cases/errors in a consistent way, then sure go for it.
Related
I'm fairly new to programming and started learning C# and this is my first project. I'm struggling to figure out why strange and seemingly random issue is occurring. This is a fairly simple trading application. Basically it connects to a websocket stream and receives live price data from the exchange and then evaluates the price in real time and performs some actions. The price is updated hundreds of times per second and operates without issues and then all of a sudden, I will get a price value that is thousands of dollars off the actual price that was sent from the exchange. I finally caught this occurring in real time. The app had been running for 11 hours or so without issue, then the bad value came through.
Here is the code in question:
public static decimal CurrentPrice;
// ...
if (BitmexTickerStreamIsConnected)
{
bitmexApiSocketService.Subscribe(BitmetSocketSubscriptions.CreateInstrumentSubsription(
message =>
{
foreach (var instrumentDto in message.Data)
{
if (instrumentDto.Symbol == "XBTUSD")
{
BitmexTickerStreamLastMessageReceived = DateTime.Now;
decimal LastPrice = instrumentDto.LastPrice.HasValue ? Convert.ToDecimal(instrumentDto.LastPrice) : CurrentPrice;
CurrentPrice = LastPrice;
}
}
}));
}
These are the values from the debug after a breakpoint was hit further down:
instrumentDto.LastPrice = 7769.5
LastPrice = 7769.5
CurrentPrice = 776.9
The issue is that CurrentPrice seems to be for some reason shifting the decimal to the left by one place. The values coming in from the websocket are fine, its just when CurrentPrice is set to LastPrice that the issue happens.
I have no idea why this is happening and seems to be totally random.
Anyone have any idea why this might be happening or how?
Thank you for your help!
There's two common causes:
Market data, due to how fast it's updated, will occasionally give
you straight up bad data depending on the provider. If you're
consuming directly from an exchange you need to code for this. Some
providers (like OPRA) will filter or mark bad ticks for you.
If you see this issue consistently it's due to things like tick size
or scale. Some exchanges do this differently, but effectively you
need to multiply certain price fields by a certain scale. Consult
with the data provider documentation for details.
If this is seen very rarely, you likely just got a bad price. Yes, this absolutely will happen on occassion and you need to be prepared for it, unless you want to become the next Knight Capital.
In all handlers I've written (or contributed to) there's a "sanity check" to see if the data is good. Depending on what you're trying to accomplish, just dropping the bad tick is fine.
Another solution that I've commonly used is alternate streams of data (usually called "A" and "B" streams or similar). If you get a bad tick on one stream, use the other.
That said this is not directly related to the programming language, but at the core it's handling quirks with the API/data.
Edit
Also beware of threading issues here. Be sure CurrentPrice isn't updated by multiple threads at once. decimal is 128-bit base 10 floating point, and that's larger than word size currently (32 or 64 bits).
You may need to synchronize the reads and writes to it which you can do in a variety of ways. The above information still applies, though.
Is there a generally accepted best approach to coding complex math? For example:
double someNumber = .123 + .456 * Math.Pow(Math.E, .789 * Math.Pow((homeIndex + .22), .012));
Is this a point where hard-coding the numbers is okay? Or should each number have a constant associated with it? Or is there even another way, like storing the calculations in config and invoking them somehow?
There will be a lot of code like this, and I'm trying to keep it maintainable.
Note: The example shown above is just one line. There would be tens or hundreds of these lines of code. And not only could the numbers change, but the formula could as well.
Generally, there are two kinds of constants - ones with the meaning to the implementation, and ones with the meaning to the business logic.
It is OK to hard-code the constants of the first kind: they are private to understanding your algorithm. For example, if you are using a ternary search and need to divide the interval in three parts, dividing by a hard-coded 3 is the right approach.
Constants with the meaning outside the code of your program, on the other hand, should not be hard-coded: giving them explicit names gives someone who maintains your code after you leave the company non-zero chances of making correct modifications without having to rewrite things from scratch or e-mailing you for help.
"Is it okay"? Sure. As far as I know, there's no paramilitary police force rounding up those who sin against the one true faith of programming. (Yet.).
Is it wise?
Well, there are all sorts of ways of deciding that - performance, scalability, extensibility, maintainability etc.
On the maintainability scale, this is pure evil. It make extensibility very hard; performance and scalability are probably not a huge concern.
If you left behind a single method with loads of lines similar to the above, your successor would have no chance maintaining the code. He'd be right to recommend a rewrite.
If you broke it down like
public float calculateTax(person)
float taxFreeAmount = calcTaxFreeAmount(person)
float taxableAmount = calcTaxableAmount(person, taxFreeAmount)
float taxAmount = calcTaxAmount(person, taxableAmount)
return taxAmount
end
and each of the inner methods is a few lines long, but you left some hardcoded values in there - well, not brilliant, but not terrible.
However, if some of those hardcoded values are likely to change over time (like the tax rate), leaving them as hardcoded values is not okay. It's awful.
The best advice I can give is:
Spend an afternoon with Resharper, and use its automatic refactoring tools.
Assume the guy picking this up from you is an axe-wielding maniac who knows where you live.
I usually ask myself whether I can maintain and fix the code at 3 AM being sleep deprived six months after writing the code. It has served me well. Looking at your formula, I'm not sure I can.
Ages ago I worked in the insurance industry. Some of my colleagues were tasked to convert the actuarial formulas into code, first FORTRAN and later C. Mathematical and programming skills varied from colleague to colleague. What I learned was the following reviewing their code:
document the actual formula in code; without it, years later you'll have trouble remember the actual formula. External documentation goes missing, become dated or simply may not be accessible.
break the formula into discrete components that can be documented, reused and tested.
use constants to document equations; magic numbers have very little context and often require existing knowledge for other developers to understand.
rely on the compiler to optimize code where possible. A good compiler will inline methods, reduce duplication and optimize the code for the particular architecture. In some cases it may duplicate portions of the formula for better performance.
That said, there are times where hard coding just simplify things, especially if those values are well understood within a particular context. For example, dividing (or multiplying) something by 100 or 1000 because you're converting a value to dollars. Another one is to multiply something by 3600 when you'd like to convert hours to seconds. Their meaning is often implied from the greater context. The following doesn't say much about magic number 100:
public static double a(double b, double c)
{
return (b - c) * 100;
}
but the following may give you a better hint:
public static double calculateAmountInCents(double amountDue, double amountPaid)
{
return (amountDue - amountPaid) * 100;
}
As the above comment states, this is far from complex.
You can however store the Magic numbers in constants/app.config values, so as to make it easier for the next developer to maitain your code.
When storing such constants, make sure to explain to the next developer (read yourself in 1 month) what your thoughts were, and what they need to keep in mind.
Also ewxplain what the actual calculation is for and what it is doing.
Do not leave in-line like this.
Constant so you can reuse, easily find, easily change and provides for better maintaining when someone comes looking at your code for the first time.
You can do a config if it can/should be customized. What is the impact of a customer altering the value(s)? Sometimes it is best to not give them that option. They could change it on their own then blame you when things don't work. Then again, maybe they have it in flux more often than your release schedules.
Its worth noting that the C# compiler (or is it the CLR) will automatically inline 1 line methods so if you can extract certain formulas into one liners you can just extract them as methods without any performance loss.
EDIT:
Constants and such more or less depends on the team and the quantity of use. Obviously if you're using the same hard-coded number more than once, constant it. However if you're writing a formula that its likely only you will ever edit (small team) then hard coding the values is fine. It all depends on your teams views on documentation and maintenance.
If the calculation in your line explains something for the next developer then you can leave it, otherwise its better to have calculated constant value in your code or configuration files.
I found one line in production code which was like:
int interval = 1 * 60 * 60 * 1000;
Without any comment, it wasn't hard that the original developer meant 1 hour in milliseconds, rather than seeing a value of 3600000.
IMO May be leaving out calculations is better for scenarios like that.
Names can be added for documentation purposes. The amount of documentation needed depends largely on the purpose.
Consider following code:
float e = m * 8.98755179e16;
And contrast it with the following one:
const float c = 299792458;
float e = m * c * c;
Even though the variable names are not very 'descriptive' in the latter you'll have much better idea what the code is doing the the first one - arguably there is no need to rename the c to speedOfLight, m to mass and e to energy as the names are explanatory in their domains.
const float speedOfLight = 299792458;
float energy = mass * speedOfLight * speedOfLight;
I would argue that the second code is the clearest one - especially if programmer can expect to find STR in the code (LHC simulator or something similar). To sum up - you need to find an optimal point. The more verbose code the more context you provide - which might both help to understand the meaning (what is e and c vs. we do something with mass and speed of light) and obscure the big picture (we square c and multiply by m vs. need of scanning whole line to get equation).
Most constants have some deeper meening and/or established notation so I would consider at least naming it by the convention (c for speed of light, R for gas constant, sPerH for seconds in hour). If notation is not clear the longer names should be used (sPerH in class named Date or Time is probably fine while it is not in Paginator). The really obvious constants could be hardcoded (say - division by 2 in calculating new array length in merge sort).
Okay so I'm trying to make a basic malware scanner in C# my question is say I have the Hex signature for a particular bit of code
For example
{
System.IO.File.Delete(#"C:\Users\Public\DeleteTest\test.txt");
}
//Which will have a hex of 53797374656d2e494f2e46696c652e44656c657465284022433a5c55736572735c5075626c69635c44656c657465546573745c746573742e74787422293b
Gets Changed to -
{
System.IO.File.Delete(#"C:\Users\Public\DeleteTest\notatest.txt");
}
//Which will have a hex of 53797374656d2e494f2e46696c652e44656c657465284022433a5c55736572735c5075626c69635c44656c657465546573745c6e6f7461746573742e74787422293b
Keep in mind these bits will be within the entire Hex of the program - How could I go about taking my base signature and looking for partial matches that say have a 90% match therefore gets flagged.
I would do a wildcard but that wouldn't work for slightly more complex things where it might be coded slightly different but the majority would be the same. So is there a way I can do a percent match for a substring? I was looking into the Levenshtein Distance but I don't see how I'd apply it into this given scenario.
Thanks in advance for any input
Using an edit distance would be fine. You can take two strings and calculate the edit distance, which will be an integer value denoting how many operations are needed to take one string to the other. You set your own threshold based off that number.
For example, you may statically set that if the distance is less than five edits, the change is relevant.
You could also take the length of string you are comparing and take a percentage of that. Your example is 36 characters long, so (int)(input.Length * 0.88m) would be a valid threashold.
First, your program bits should match EXACTLY or else it has been modified or is corrupt. Generally, you will store an MD5 hash on the original binary and check the MD5 against new versions to see if they are 'the same enough' (MD5 can't guarantee a 100% match).
Beyond this, in order to detect malware in a random binary, you must know what sort of patterns to look for. For example, if I know a piece of malware injects code with some binary XYZ, I will look for XYZ in the bits of the executable. Patterns get much more complex than that, of course, as the malware bits can be spread out in chuncks. What is more interesting is that some viruses are self-morphing. This means that each time it runs, it modifies itself, meaning the scanner does not know an exact pattern to find. In these cases, the scanner must know the types of derivatives can be produced and look for all of them.
In terms of finding a % match, this operation is very time consuming unless you have constraints. By comparing 2 strings, you cannot tell which pieces were removed, added, or replaced. For instance, if I have a starting string 'ABCD', is 'AABCDD' a 100% match or less since content has been added? What about 'ABCDABCD'; here it matches twice. How about 'AXBXCXD'? What about 'CDAB'?
There are many DIFF tools in existence that can tell you what pieces of a file have been changed (which can lead to a %). Unfortunately, none of them are perfect because of the issues that I described above. You will find that you have false negatives, false positives, etc. This may be 'good enough' for you.
Before you can identify a specific algorithm that will work for you, you will have to decide what the restrictions of your search will be. Otherwise, your scan will be NP-hard, which leads to unreasonable running times (your scanner may run all day just to check one file).
I suggest you look into Levenshtein distance and Damerau-Levenshtein distance.
The former tells you how many add/delete operations are needed to turn one string into another; and the latter tells you how many add/delete/replace operations are needed to turn one string into another.
I use these quite a lot when writing programs where users can search for things, but they may not know the exact spelling.
There are code examples on both articles.
A standard enumeration in System.Windows.Forms:
[Flags]
public enum DragDropEffects
{
Scroll = -2147483648,
//
// Summary:
// The combination of the System.Windows.DragDropEffects.Copy, System.Windows.Forms.DragDropEffects.Link,
// System.Windows.Forms.DragDropEffects.Move, and System.Windows.Forms.DragDropEffects.Scroll
// effects.
All = -2147483645,
None = 0,
Copy = 1,
Move = 2,
Link = 4,
}
Quite a strange value for Scroll, don't you think?
As I understand these values all come from "the old times" of COM\OLE DROPEFFECT... But why were they chosen so in the first place? Did author try to reserve the interval between 8 and 0x80000000 for something? Is it usefule somehow or is there an interesting story behind it or it's just another long-lived illustration of the YAGNI principle?
It is a status flag, separate from the principal drop effects (Copy/Move/Link). Short from leaving room for future drop effects, picking the high bit allows a trick like checking if the value is negative. Same kind of idea as an HRESULT or the GetAsyncKeyState return value.
Yes, it looks like an "interesting" hack of some sort. Common sense would suggest using 8, but maybe there's some Windows version related reason why 8 couldn't be used, and so the author used -2147483645 (-0x80000000) instead. It's not that unusual a number - whoever wrote it is just starting with a binary '1' from the high significant end rather than the low significant end.
Perhaps scrolling was regarded in some other group of drag/drop effects to copy/move/link, and so the author wanted to place it at the other end of the word, along with any other future similarly different effects.
Maybe there's some awful piece of logic somewhere to test to see if a DragDropEffects variable is greater than zero (intending to mean "anything that isn't none"), and Scroll should not fall in that range?
Bit of a mystery. At the very least you'd think they would put the constant in as hex, to show it's not just some totally random number.
This allows a quick check for > 0 in order to know whether Copy, Move, or Link are being invoked. It excludes None as well as Scroll.
After writing code that can be boiled down to the following:
var size=-1;
var arr=new byte[size];
I was surprised that it threw an OverflowException. The docs for OverflowException state:
The exception that is thrown when an arithmetic, casting, or conversion operation in a checked context results in an overflow.
I couldn't see how providing a negative size for and array length fits into the description given for this exception, so delved a little deeper and found that this is indeed the specified behaviour:
The computed values for the dimension lengths are validated as follows. If one or more of the values are less than zero, a System.OverflowException is thrown and no further steps are executed.
I wonder why OverflowException was chosen. It's pretty misleading if you ask me. It cost me at least 5 minutes of investigation (not counting my musings here). Can anyone shed any light on this (to my thinking) peculiar design decision?
This is almost certainly an optimization. The .NET framework code is pretty religious about checking arguments to let the programmer fall in the pit of success. But that doesn't come for free. The cost is fairly minuscule, many class methods take lots more machine cycles than is spent on the checking.
But arrays are special. They are the very core data structure in the framework. Almost every collection class is built on top of them. Any overhead put in the Array class directly impacts the efficiency of a lot of code that sits on top of it. Avoiding the check is okay, it gets implicitly checked anyway when the internal code needs to cast the value to unsigned. And it is very rare that it trips. So checking it twice is not quite worth the better exception message.
OverflowException, in the documentation, basically defines an overflow as something that:
produces a result that is outside the range of the data type
In this case, negative values are outside of the valid range for an array size (or really, any size).
I could see the argument that ArgumentOutOfRangeException might be, in some ways, better - however, there is no argument involved in an array definition (as its not a method), so it, too, would not be a perfect choice.
It might be because that size is an unsigned int. It stores -1 in two's complement, which when looked at as a unsigned int, is the maximum positive integer that can be stored. If this number is bigger than the possible size of an array, it will overflow.
Warning: this is pure speculation.