So, long story short, this is obviously valid code. Working in Unity3D- runs fine on PC. Build for Android- NRE when trying to access past the specific address pinned- NRE where commented in the code block below.
public static unsafe byte[] SerialiseFloat(byte header, float val)
{
data = new byte[5];
fixed (byte* b_ptr = data)
{
*b_ptr = header;
*((float*)(b_ptr + 1)) = val; //NRE
return data;
}
}
And not to be offhand but because it seems to come up in every single question about unsafe context C#- yes, I need to be using it. No, it's not that important for this instance, I know it's an unnecessary use of an unsafe context in this example (this is from my old serialisation lib, contracts are generated which hand over to these methods to serialise members, it was just a quick put-together and these things need to be inlined so I'm not working with multiple arrays.
Anyhow, the point stands- NRE at the line commented above. Anybody know how this is even possible, and how I might begin to fix it? It is happening consistently, it's clearly allocated and pinned. Memory analysis shows that it's being referenced absolutely correctly, everything is aligned as expected. Reproduced this about 15 times now.
Thanks to GSerg for his comment pointing me towards this question. I now know of ARM's alignment constraints which come into play when working with floating-point values. I'll simply force my structure packing for strict alignment now that I'm aware of the cause. Thanks for the heads up- being new to mobile dev and ARM I had absolutely no idea about this and would have definitely lost some sleep, so a huge thanks from me!
EDIT: Just an update to confirm that everything is working properly after running through a few serialisation tests with my code generator now preprocessing generated contracts on Android to force floating point fields to always align to 4-byte memory addresses.
Related
I want to process many integers in a class, so I listed them into an int* array.
int*[] pp = new int*[]{&aaa,&bbb,&ccc};
However, the compiler declined the code above with the following EXCUSE:
> You can only take the address of an unfixed expression inside of a fixed statement initializer
I know I can change the code above to avoid this error; however, we need to consider ddd and eee will join the array in the future.
public enum E {
aaa,
bbb,
ccc,
_count
}
for(int i=0;i<(int)E._count;i++)
gg[(int)E.bbb]
Dictionary<string,int>ppp=new Dictionary<string,int>();
ppp["aaa"]=ppp.Count;
ppp["bbb"]=ppp.Count;
ppp["ccc"]=ppp.Count;
gg[ppp["bbb"]]
These solution works, but they make the code and the execution time longer.
I also expect a nonofficial patch to the compiler or a new nonofficial C# compiler, but I have not seen an available download for many years; it seems very difficult to have one for us.
Are there better ways so that
I do not need to count the count of the array ppp.
If the code becomes long, there are only several letters longer.
The execution time does not increase much.
To add ddd and eee into the array, there are only one or two
setences for each new member.
.NET runtime is a managed execution runtime which (among other things) provides garbage collection. .NET garbage collector (GC)
not only manages the allocation and release of memory, but also transparently moves the objects around the "managed heap", blocking
the rest of your code while doing it.
It also compacts (defragments) the memory by moving longer lived objects together, and even "promoting" them into different parts of the heap, called generations, to avoid checking their status too often.
There is a bunch of memory being copied all the time without your program even realizing it. Since garbage collection is an operation that can happen at any time during the execution of your program, any pointer-related
("unsafe") operations must be done within a small scope, by telling the runtime to "pin" the objects using the fixed keyword. This prevents the GC from moving them, but only for a while.
Using pointers and unsafe code in C# is not only less safe, but also not very idiomatic for managed languages in general. If coming from a C background, you may feel like at home with these constructs, but C# has a completely different philosophy: your job as a C# programmer should be to write reliable, readable and maintenable code, and only then think about squeezing a couple of CPU cycles for performance reasons. You can use pointers from time to time in small functions, doing some very specific, time-critical code. But even then it is your duty to profile before making such optimizations. Even the most experienced programmers often fail at predicting bottlenecks before profiling.
Finally, regarding your actual code:
I don't see why you think this:
int*[] pp = new int*[] {&aaa, &bbb, &ccc};
would be any more performant than this:
int[] pp = new int[] {aaa, bbb, ccc};
On a 32-bit machine, an int and a pointer are of the same size. On a 64-bit machine, a pointer is even bigger.
Consider replacing these plain ints with a class of your own which will provide some context and additional functionality/data to each of these values. Create a new question describing the actual problem you are trying to solve (you can also use Code Review for such questions) and you will benefit from much better suggestions.
I am trying to optimize my code and was running VS performance monitor on it.
It shows that simple assignment of float takes up a major chunk of computing power?? I don't understand how is that possible.
Here is the code for TagData:
public class TagData
{
public int tf;
public float tf_idf;
}
So all I am really doing is:
float tag_tfidf = td.tf_idf;
I am confused.
I'll post another theory: it might be the cache miss of the first access to members of td. A memory load takes 100-200 cycles which in this case seems to amount to about 1/3 of the total duration of the method.
Points to test this theory:
Is your data set big? It bet it is.
Are you accessing the TagData's in random memory order? I bet they are not sequential in memory. This causes the memory prefetcher of the CPU to be dysfunctional.
Add a new line int dummy = td.tf; before the expensive line. This new line will now be the most expensive line because it will trigger the cache miss. Find some way to do a dummy load operation that the JIT does not optimize out. Maybe add all td.tf values to a local and pass that value to GC.KeepAlive at the end of the method. That should keep the memory load in the JIT-emitted x86.
I might be wrong but contrary to the other theories so far mine is testable.
Try making TagData a struct. That will make all items of term.tags sequential in memory and give you a nice performance boost.
Are you using LINQ? If so, LINQ uses lazy enumeration so the first time you access the value you pulled out, it's going to be painful.
If you are using LINQ, call ToList() after your query to only pay the price once.
It also looks like your data structure is sub optimal but since I don't have access to your source (and probably couldn't help even if I did :) ), I can't tell you what would be better.
EDIT: As commenters have pointed out, LINQ may not be to blame; however my question is based on the fact that both foreach statements are using IEnumerable. The TagData assignment is a pointer to the item in the collection of the IEnumerable (which may or may not have been enumerated yet). The first access of legitimate data is the line that pulls the property from the object. The first time this happens, it may be executing the entire LINQ statement and since profiling uses the average, it may be off. The same can be said for tagScores (which I'm guessing is database backed) whose first access is really slow and then speeds up. I wasn't pointing out the solution just a possible problem given my understanding of IEnumerable.
See http://odetocode.com/blogs/scott/archive/2008/10/01/lazy-linq-and-enumerable-objects.aspx
As we can see that next line to the suspicious one takes only 0.6 i.e
float tag_tfidf = td.tf_idf;//29.6
string tagName =...;//0.6
I suspect this is caused bu the excessive number of calls, and also note float is a value type, meaning they are copied by value. So everytime you assign it, runtime creates new float (Single) struct and initializes it by copying the value from td.tf_idf which takes huge time.
You can see string tagName =...; doesn't takes much because it is copied by reference.
Edit: As comments pointed out I may be wrong in that respect, this might be a bug in profiler also, Try re profiling and see if that makes any difference.
Currently, I am working on a project where I need to bring GBs of data on to client machine to do some task and the task needs whole data as it do some analysis on the data and helps in decision making process.
so the question is, what are the best practices and suitable approach to manage that much amount of data into memory without hampering the performance of client machine and application.
note: at the time of application loading, we can spend time to bring data from database to client machine, that's totally acceptable in our case. but once the data is loaded into application at start up, performance is very important.
This is a little hard to answer without a problem statement, i.e. what problems you are currently facing, but the following is just some thoughts, based on some recent experiences we had in a similar scenario. It is, however, a lot of work to change to this type of model - so it also depends how much you can invest trying to "fix" it, and I can make no promise that "your problems" are the same as "our problems", if you see what I mean. So don't get cross if the following approach doesn't work for you!
Loading that much data into memory is always going to have some impact, however, I think I see what you are doing...
When loading that much data naively, you are going to have many (millions?) of objects and a similar-or-greater number of references. You're obviously going to want to be using x64, so the references will add up - but in terms of performance the biggesst problem is going to be garbage collection. You have a lot of objects that can never be collected, but the GC is going to know that you're using a ton of memory, and is going to try anyway periodically. This is something I looked at in more detail here, but the following graph shows the impact - in particular, those "spikes" are all GC killing performance:
For this scenario (a huge amount of data loaded, never released), we switched to using structs, i.e. loading the data into:
struct Foo {
private readonly int id;
private readonly double value;
public Foo(int id, double value) {
this.id = id;
this.value = value;
}
public int Id {get{return id;}}
public double Value {get{return value;}}
}
and stored those directly in arrays (not lists):
Foo[] foos = ...
the significance of that is that because some of these structs are quite big, we didn't want them copying themselves lots of times on the stack, but with an array you can do:
private void SomeMethod(ref Foo foo) {
if(foo.Value == ...) {blah blah blah}
}
// call ^^^
int index = 17;
SomeMethod(ref foos[index]);
Note that we've passed the object directly - it was never copied; foo.Value is actually looking directly inside the array. The tricky bit starts when you need relationships between objects. You can't store a reference here, as it is a struct, and you can't store that. What you can do, though, is store the index (into the array). For example:
struct Customer {
... more not shown
public int FooIndex { get { return fooIndex; } }
}
Not quite as convenient as customer.Foo, but the following works nicely:
Foo foo = foos[customer.FooIndex];
// or, when passing to a method, SomeMethod(ref foos[customer.FooIndex]);
Key points:
we're now using half the size for "references" (an int is 4 bytes; a reference on x64 is 8 bytes)
we don't have several-million object headers in memory
we don't have a huge object graph for GC to look at; only a small number of arrays that GC can look at incredibly quickly
but it is a little less convenient to work with, and needs some initial processing when loading
additional notes:
strings are a killer; if you have millions of strings, then that is problematic; at a minimum, if you have strings that are repeated, make sure you do some custom interning (not string.Intern, that would be bad) to ensure you only have one instance of each repeated value, rather than 800,000 strings with the same contents
if you have repeated data of finite length, rather than sub-lists/arrays, you might consider a fixed array; this requires unsafe code, but avoids another myriad of objects and references
As an additional footnote, with that volume of data, you should think very seriously about your serialization protocols, i.e. how you're sending the data down the wire. I would strongly suggest staying far away from things like XmlSerializer, DataContractSerializer or BinaryFormatter. If you want pointers on this subject, let me know.
I got an OutOfMemoryException earlier and couldn't figure out what it was for. It made no sense at all. Dug around in my code, and suddenly remembered that somewhere had forgotten to check for null, and in this particular case it was (and should be) exactly that. That shouldn't cause an OutOfMemoryException in my opinion, but I fixed it anywas of course. And when I did, the exception didn't appear anymore!
So I removed the check again and studied the exception I got some more. And turns out it had an InnerException of type NullReferenceException and a stack trace which of course made a lot more sense.
But why did I get an OutOfMemoryException? This has never happend to me before... makes no sense to me...
Would love to give some more context, but can't really say much without having to upload the whole project, which I can't (And which you wouldn't want to read through anyways :p). But the specific place it happend looks like this:
{
foreach (var exportParameter in exportParameters)
{
// Copy to local
var ep = exportParameter;
// Load stored values from db
...
}
int i = 1;
exportParameters
.OrderBy(ø => ø.Sequence)
.ForEach(ø => { if (!ø.Locked) ø.Sequence = i++; });
}
The fix was to put an if(exportParameters != null) before the code block. exportParameters is a List<ExportParameter>, except in the failing case in which it was null.
You might be facing the problem that Constrained Execution Regions are designed to prevent - that is, the JITting of some code that your catch clause relies on is causing the out of memory condition.
(In response to svish's comment, this is the first link when googling the phrase: http://msdn.microsoft.com/en-us/library/ms228973.aspx)
Aside from the obvious reason for getting an OOMException, you can also get it if you still have memory available, just not a big enough chunk for what is being requested. If you're getting it reliably and relatively near startup, you're probably accidentally requesting more memory than you intend to (ie. requesting a very large array). Can you post a bit of your code or at least describe your allocation pattern?
I have inherited some code that uses the ref keyword extensively and unnecessarily. The original developer apparently feared objects would be cloned like primitive types if ref was not used, and did not bother to research the issue before writing 50k+ lines of code.
This, combined with other bad coding practices, has created some situations that are absurdly dangerous on the surface. For example:
Customer person = NextInLine();
//person is Alice
person.DataBackend.ChangeAddress(ref person, newAddress);
//person could now be Bob, Eve, or null
Could you imagine walking into a store to change your address, and walking out as an entirely different person?
Scary, but in practice the use of ref in this application seems harmlessly superfluous. I am having trouble justifying the extensive amount of time it would take to clean it up. To help sell the idea, I pose the following question:
How else can unnecessary use of ref be destructive?
I am especially concerned with maintenance. Plausible answers with examples are preferred.
You are also welcome to argue clean-up is not necessary.
I would say the biggest danger is if the parameter were set to null inside the function for some reason:
public void MakeNull(ref Customer person)
{
// random code
person = null;
return;
}
Now, you're not just a different person, you've been erased from existence altogether!
As long as whoever is developing this application understands that:
By default, object references are passed by value.
and:
With the ref keyword, object references are passed by reference.
If the code works as expected now and your developers understand the difference, it's probably not worth the effort it's going to take remove them all.
I'll just add the worst use of the ref keyword I've ever seen, the method looked something like this:
public bool DoAction(ref Exception exception) {...}
Yup, you had to declare and pass an Exception reference in order to call the method, then check the method's return value to see if an exception had been caught and returned via the ref exception.
Can you work out why the originator of the code thought that they needed to have the parameter as a ref? Was it because they did update it and then removed the functionality or was it simply because they didn't understand c# at the time?
If you think the clean up is worth it, then go ahead with it - particularly if you have the time now. You might not be in a position to do fix it properly when a real issue does arise, as it will most likely be an urgent bug fix and you won't have the time to do a proper job.
It is quite common in C# to modify the values of arguments in methods since they usually are by value, and not by ref. This applies to both reference and value types; setting a reference to null for instance would change the original reference. This could lead to very strange and painful bugs when other developers work "as usual". Creating recursive methods with ref arguments is a no-go.
Apart from this, you have restrictions on what you can pass by ref. For instance, you cannot pass constant values, readonly fields, properties etc., so that a lot of helper variables are required when calling methods with ref arguments.
Last but not least the performance if likely not nearly as well, since it requires more indirections (a ref is just a reference which needs to be resolved on every access) and may also keep objects alive longer, since the references are not going out of scope as quickly.
To me, smells like a C++ developer making unwarranted assumptions.
I'd be wary of making wholesale changes to something that works. (I'm assuming it works because you don't comment about it being broken, just about it being dangerous).
The last thing you want to do is to break something subtle and have to spend a week tracking down the problem.
I suggest you clean up as you go - one method at a time.
Find a method that uses ref where you're sure it isn't required.
Change the method signature and fix up the calls.
Test.
Repeat.
While the specific problems you have may be more severe than most cases, your situation is pretty common - having a large code base that doesn't comply with our current understanding of the "right way" to do things.
Wholesale "upgrades" often run into difficulties. Refactoring as you go - bringing things up to spec as you work on them - is much safer.
There's precedent here in the building industry. For example, electrical wiring in older (say, 19th century) buildings doesn't need to be touched unless there's a problem. When there is a problem, though, the new work has to be completed to modern standards.
I would try to fix it. Just perform a solution wide string replacement with regex and check the unit tests after that. I am aware that this might break the code. But how often do you use ref? Almost never, right? And given that the developer did not know how it works, I consider the chance that it is used somewhere (the way it should) even smaller. And if the code break - well, rollback ...
How else can unnecessary use of ref be destructive?
The other answers deal with semantic issues which is definitely the most important thing. Good code is self-documenting, and when I give a ref parameter, I assume it will change. If it doesn't, the API failed to be self-documenting.
But for fun, how about we look at another aspect -- performance?
void ChangeAddress(ref Customer person, Address address)
{
person.Address = address;
}
Here person is a reference to a reference, so there will be some indirection introduced whenever you access it. Lets look at some assembly this might translate into:
mov eax, [person] ; load the reference to person.
mov [eax+Address], address ; assign address to person.Address.
Now the non-ref version:
void ChangeAddress(Customer person, Address address)
{
person.Address = address;
}
There's no indirection here, so we can get rid of one read:
mov [person+Address], address ; assign address to person.Address.
In practice, one hopes that .NET caches [person] in the ref version, amortizing the indirection cost over multiple accesses. It probably won't actually be a 50% drop in instruction count outside of trivial methods like the ones here.