I'm fairly new to programming and started learning C# and this is my first project. I'm struggling to figure out why strange and seemingly random issue is occurring. This is a fairly simple trading application. Basically it connects to a websocket stream and receives live price data from the exchange and then evaluates the price in real time and performs some actions. The price is updated hundreds of times per second and operates without issues and then all of a sudden, I will get a price value that is thousands of dollars off the actual price that was sent from the exchange. I finally caught this occurring in real time. The app had been running for 11 hours or so without issue, then the bad value came through.
Here is the code in question:
public static decimal CurrentPrice;
// ...
if (BitmexTickerStreamIsConnected)
{
bitmexApiSocketService.Subscribe(BitmetSocketSubscriptions.CreateInstrumentSubsription(
message =>
{
foreach (var instrumentDto in message.Data)
{
if (instrumentDto.Symbol == "XBTUSD")
{
BitmexTickerStreamLastMessageReceived = DateTime.Now;
decimal LastPrice = instrumentDto.LastPrice.HasValue ? Convert.ToDecimal(instrumentDto.LastPrice) : CurrentPrice;
CurrentPrice = LastPrice;
}
}
}));
}
These are the values from the debug after a breakpoint was hit further down:
instrumentDto.LastPrice = 7769.5
LastPrice = 7769.5
CurrentPrice = 776.9
The issue is that CurrentPrice seems to be for some reason shifting the decimal to the left by one place. The values coming in from the websocket are fine, its just when CurrentPrice is set to LastPrice that the issue happens.
I have no idea why this is happening and seems to be totally random.
Anyone have any idea why this might be happening or how?
Thank you for your help!
There's two common causes:
Market data, due to how fast it's updated, will occasionally give
you straight up bad data depending on the provider. If you're
consuming directly from an exchange you need to code for this. Some
providers (like OPRA) will filter or mark bad ticks for you.
If you see this issue consistently it's due to things like tick size
or scale. Some exchanges do this differently, but effectively you
need to multiply certain price fields by a certain scale. Consult
with the data provider documentation for details.
If this is seen very rarely, you likely just got a bad price. Yes, this absolutely will happen on occassion and you need to be prepared for it, unless you want to become the next Knight Capital.
In all handlers I've written (or contributed to) there's a "sanity check" to see if the data is good. Depending on what you're trying to accomplish, just dropping the bad tick is fine.
Another solution that I've commonly used is alternate streams of data (usually called "A" and "B" streams or similar). If you get a bad tick on one stream, use the other.
That said this is not directly related to the programming language, but at the core it's handling quirks with the API/data.
Edit
Also beware of threading issues here. Be sure CurrentPrice isn't updated by multiple threads at once. decimal is 128-bit base 10 floating point, and that's larger than word size currently (32 or 64 bits).
You may need to synchronize the reads and writes to it which you can do in a variety of ways. The above information still applies, though.
Related
Very simple question. Which would test faster? This:
var myString = "goodTimes";
if (myString.StartsWith("g"))
{
// do stuff
}
Or this:
var myString = "goodTimes";
if (myString[0] == 'g')
{
// do stuff
}
They do different things - they behave differently for zero length strings, in particular. Other than that, hypothetically the myString[0] should be marginally faster (it does less), but: whether this actually matters is hugely contextual. In most cases, it won't, and you'll have spent more time asking yourself the question than it will ever save. If you're in a scenario where it matters, you'll also know that you need to benchmark with actual realistic data to have a good answer. And only you can do that, with your own particular data.
They would both test at almost the exact same time, for all practical purposes.
The second one might be several clock cycles faster, but who cares.
If you were to write the most successful app ever, downloaded by millions of users, and ran on millions of devices every day, and if your app was executing the above code once a second on each installation, the total number of seconds you would save for all your users combined would never exceed the total amount of time we just spent discussing this.
I'm doing a code review and I've noticed that the developer has done this:
UserSession.LocationId = CheckInteger(elementValue);
with this wrapper
private int CheckInteger(string elementValue)
{
int outNumber;
int.TryParse(elementValue, out outNumber);
return outNumber;
}
I can't see that this brings very much to the party. Should I push back, or just let sleeping dogs lie? I don't think there's any particular company policy that covers this.
Are they sleeping dogs or wolves? This code effectively discards non-integer values and returns 0 for failed cases.
This may or may not be appropriate, but the name certainly is misleading - no check is performed at all. Errors are simply discarded. It's almost equivalent to hiding exceptions.
This can hurt you when invalid values aren't detected and 0s are set in unexpected places.
Such a method should at least be renamed to GetIntegerOrDefault so it's explicit that it will return a default value on error.
You should also ensure that 0 is a valid value for LocationId or at least an expected default. Otherwise you may encounter some very hard to diagnose bugs.
I've encountered bugs in a Stock Exchange application that send portfolio performance ratios of -100%, because someone 10 levels bellow decided to return 0 on invalid exchange rate values, and nobody remembered 5 years later.
Given that you could just, in this case, write int.TryParse(elementValue, out UserSession.LocationId) I would say this is not a good idea.
Now, if you needed to add logging to it or deal with special cases/errors in a consistent way, then sure go for it.
I'm getting the following error while streaming data:
Google.ApisGoogle.Apis.Requests.RequestError
Internal Error [500]
Errors [
Message[Internal Error] Location[ - ] Reason[internalError] Domain[global]
]
My code:
public bool InsertAll(BigqueryService s, String datasetId, String tableId, List<TableDataInsertAllRequest.RowsData> data)
{
try
{
TabledataResource t = s.Tabledata;
TableDataInsertAllRequest req = new TableDataInsertAllRequest()
{
Kind = "bigquery#tableDataInsertAllRequest",
Rows = data
};
TableDataInsertAllResponse response = t.InsertAll(req, projectId, datasetId, tableId).Execute();
if (response.InsertErrors != null)
{
return true;
}
}
catch (Exception e)
{
throw e;
}
return false;
}
I'm streaming data constantly and many times a day I have this error. How can I fix this?
We seen several problems:
the request randomly fails with type 'Backend error'
the request randomly fails with type 'Connection error'
the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)
some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.
For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take is an exponential-backoff with retry, even the support told to do so. Also the failure rate fits the 99.9% uptime we have in the SLA, so there is no reason for objection.
There's something to keep in mind in regards to the SLA, it's a very strictly defined structure, the details are here. The 99.9% is uptime not directly translated into fail rate. What this means is that if BQ has a 30 minute downtime one month, and then you do 10,000 inserts within that period but didn't do any inserts in other times of the month, it will cause the numbers to be skewered. This is why we suggest a exponential backoff algorithm. The SLA is explicitly based on uptime and not error rate, but logically the two correlates closely if you do streaming inserts throughout the month at different times with backoff-retry setup. Technically, you should experience on average about 1/1000 failed insert if you are doing inserts through out the month if you have setup the proper retry mechanism.
You can check out this chart about your project health:
https://console.developers.google.com/project/YOUR-APP-ID/apiui/apiview/bigquery?tabId=usage&duration=P1D
About times. Since streaming has a limited payload size, see Quota policy it's easier to talk about times, as the payload is limited in the same way to both of us, but I will mention other side effects too.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.
The approach you've chosen if takes hours that means it does not scale, and won't scale. You need to rethink the approach with async processes that can retry.
Processing in background IO bound or cpu bound tasks is now a common practice in most web applications. There's plenty of software to help build background jobs, some based on a messaging system like Beanstalkd.
Basically, you needed to distribute insert jobs across a closed network, to prioritize them, and consume(run) them. Well, that's exactly what Beanstalkd provides.
Beanstalkd gives the possibility to organize jobs in tubes, each tube corresponding to a job type.
You need an API/producer which can put jobs on a tube, let's say a json representation of the row. This was a killer feature for our use case. So we have an API which gets the rows, and places them on tube, this takes just a few milliseconds, so you could achieve fast response time.
On the other part, you have now a bunch of jobs on some tubes. You need an agent. An agent/consumer can reserve a job.
It helps you also with job management and retries: When a job is successfully processed, a consumer can delete the job from the tube. In the case of failure, the consumer can bury the job. This job will not be pushed back to the tube, but will be available for further inspection.
A consumer can release a job, Beanstalkd will push this job back in the tube, and make it available for another client.
Beanstalkd clients can be found in most common languages, a web interface can be useful for debugging.
Is there a generally accepted best approach to coding complex math? For example:
double someNumber = .123 + .456 * Math.Pow(Math.E, .789 * Math.Pow((homeIndex + .22), .012));
Is this a point where hard-coding the numbers is okay? Or should each number have a constant associated with it? Or is there even another way, like storing the calculations in config and invoking them somehow?
There will be a lot of code like this, and I'm trying to keep it maintainable.
Note: The example shown above is just one line. There would be tens or hundreds of these lines of code. And not only could the numbers change, but the formula could as well.
Generally, there are two kinds of constants - ones with the meaning to the implementation, and ones with the meaning to the business logic.
It is OK to hard-code the constants of the first kind: they are private to understanding your algorithm. For example, if you are using a ternary search and need to divide the interval in three parts, dividing by a hard-coded 3 is the right approach.
Constants with the meaning outside the code of your program, on the other hand, should not be hard-coded: giving them explicit names gives someone who maintains your code after you leave the company non-zero chances of making correct modifications without having to rewrite things from scratch or e-mailing you for help.
"Is it okay"? Sure. As far as I know, there's no paramilitary police force rounding up those who sin against the one true faith of programming. (Yet.).
Is it wise?
Well, there are all sorts of ways of deciding that - performance, scalability, extensibility, maintainability etc.
On the maintainability scale, this is pure evil. It make extensibility very hard; performance and scalability are probably not a huge concern.
If you left behind a single method with loads of lines similar to the above, your successor would have no chance maintaining the code. He'd be right to recommend a rewrite.
If you broke it down like
public float calculateTax(person)
float taxFreeAmount = calcTaxFreeAmount(person)
float taxableAmount = calcTaxableAmount(person, taxFreeAmount)
float taxAmount = calcTaxAmount(person, taxableAmount)
return taxAmount
end
and each of the inner methods is a few lines long, but you left some hardcoded values in there - well, not brilliant, but not terrible.
However, if some of those hardcoded values are likely to change over time (like the tax rate), leaving them as hardcoded values is not okay. It's awful.
The best advice I can give is:
Spend an afternoon with Resharper, and use its automatic refactoring tools.
Assume the guy picking this up from you is an axe-wielding maniac who knows where you live.
I usually ask myself whether I can maintain and fix the code at 3 AM being sleep deprived six months after writing the code. It has served me well. Looking at your formula, I'm not sure I can.
Ages ago I worked in the insurance industry. Some of my colleagues were tasked to convert the actuarial formulas into code, first FORTRAN and later C. Mathematical and programming skills varied from colleague to colleague. What I learned was the following reviewing their code:
document the actual formula in code; without it, years later you'll have trouble remember the actual formula. External documentation goes missing, become dated or simply may not be accessible.
break the formula into discrete components that can be documented, reused and tested.
use constants to document equations; magic numbers have very little context and often require existing knowledge for other developers to understand.
rely on the compiler to optimize code where possible. A good compiler will inline methods, reduce duplication and optimize the code for the particular architecture. In some cases it may duplicate portions of the formula for better performance.
That said, there are times where hard coding just simplify things, especially if those values are well understood within a particular context. For example, dividing (or multiplying) something by 100 or 1000 because you're converting a value to dollars. Another one is to multiply something by 3600 when you'd like to convert hours to seconds. Their meaning is often implied from the greater context. The following doesn't say much about magic number 100:
public static double a(double b, double c)
{
return (b - c) * 100;
}
but the following may give you a better hint:
public static double calculateAmountInCents(double amountDue, double amountPaid)
{
return (amountDue - amountPaid) * 100;
}
As the above comment states, this is far from complex.
You can however store the Magic numbers in constants/app.config values, so as to make it easier for the next developer to maitain your code.
When storing such constants, make sure to explain to the next developer (read yourself in 1 month) what your thoughts were, and what they need to keep in mind.
Also ewxplain what the actual calculation is for and what it is doing.
Do not leave in-line like this.
Constant so you can reuse, easily find, easily change and provides for better maintaining when someone comes looking at your code for the first time.
You can do a config if it can/should be customized. What is the impact of a customer altering the value(s)? Sometimes it is best to not give them that option. They could change it on their own then blame you when things don't work. Then again, maybe they have it in flux more often than your release schedules.
Its worth noting that the C# compiler (or is it the CLR) will automatically inline 1 line methods so if you can extract certain formulas into one liners you can just extract them as methods without any performance loss.
EDIT:
Constants and such more or less depends on the team and the quantity of use. Obviously if you're using the same hard-coded number more than once, constant it. However if you're writing a formula that its likely only you will ever edit (small team) then hard coding the values is fine. It all depends on your teams views on documentation and maintenance.
If the calculation in your line explains something for the next developer then you can leave it, otherwise its better to have calculated constant value in your code or configuration files.
I found one line in production code which was like:
int interval = 1 * 60 * 60 * 1000;
Without any comment, it wasn't hard that the original developer meant 1 hour in milliseconds, rather than seeing a value of 3600000.
IMO May be leaving out calculations is better for scenarios like that.
Names can be added for documentation purposes. The amount of documentation needed depends largely on the purpose.
Consider following code:
float e = m * 8.98755179e16;
And contrast it with the following one:
const float c = 299792458;
float e = m * c * c;
Even though the variable names are not very 'descriptive' in the latter you'll have much better idea what the code is doing the the first one - arguably there is no need to rename the c to speedOfLight, m to mass and e to energy as the names are explanatory in their domains.
const float speedOfLight = 299792458;
float energy = mass * speedOfLight * speedOfLight;
I would argue that the second code is the clearest one - especially if programmer can expect to find STR in the code (LHC simulator or something similar). To sum up - you need to find an optimal point. The more verbose code the more context you provide - which might both help to understand the meaning (what is e and c vs. we do something with mass and speed of light) and obscure the big picture (we square c and multiply by m vs. need of scanning whole line to get equation).
Most constants have some deeper meening and/or established notation so I would consider at least naming it by the convention (c for speed of light, R for gas constant, sPerH for seconds in hour). If notation is not clear the longer names should be used (sPerH in class named Date or Time is probably fine while it is not in Paginator). The really obvious constants could be hardcoded (say - division by 2 in calculating new array length in merge sort).
I have two question:
1) I need some expert view in terms of witting code which will be Performance and Memory Consumption wise sound enough.
2) Performance and Memory Consumption wise how good/bad is following piece of code and why ???
Need to increment the counter that could go maximum by 100 and writing code like this:
Some Sample Code is as follows:
for(int i=0;i=100;i++)
{
Some Code
}
for(long i=0;i=1000;i++)
{
Some Code
}
how good is to use Int16 or anything else instead of int, long if the requirement is same.
Need to increment the counter that could go maximum by 100 and writing code like this:
Options given:
for(int i=0;i=100;i++)
for(long i=0;i=1000;i++)
EDIT: As noted, neither of these would even actually compile, due to the middle expression being an assignment rather than an expression of type bool.
This demonstrates a hugely important point: get your code working before you make it fast. Your two loops don't do the same thing - one has an upper bound of 1000, the other has an upper bound of 100. If you have to choose between "fast" and "correct", you almost always want to pick "correct". (There are exceptions to this, of course - but that's usually in terms of absolute correctness of results across large amounts of data, not code correctness.)
Changing between the variable types here is unlikely to make any measurable difference. That's often the case with micro-optimizations. When it comes to performance, architecture is usually much more important than in-method optimizations - and it's also a lot harder to change later on. In general, you should:
Write the cleanest code you can, using types that represent your data most correctly and simply
Determine reasonable performance requirements
Measure your clean implementation
If it doesn't perform well enough, use profiling etc to work out how to improve it
DateTime dtStart = DateTime.Now;
for(int i=0;i=10000;i++)
{
Some Code
}
response.write ((DateTime.Now - dtStart).TotalMilliseconds.ToString());
same way for Long as well and you can know which one is better... ;)
When you are doing things that require a number representing iterations, or the quantity of something, you should always use int unless you have a good semantic reason to use a different type (ie data can never be negative, or it could be bigger than 2^31). Additionally, Worrying about this sort of nano-optimization concern will basically never matter when writing c# code.
That being said, if you are wondering about the differences between things like this (incrementing a 4 byte register versus incrementing 8 bytes), you can always cosult Mr. Agner's wonderful instruction tables.
On an Amd64 machine, incrementing long takes the same amount of time as incrementing int.**
On a 32 bit x86 machine, incrementing int will take less time.
** The same is true for almost all logic and math operations, as long as the value is not both memory bound and unaligned. In .NET a long will always be aligned, so the two will always be the same.