I'm developing a 2D overhead shooter game using C# and XNA. I have a class that I'll call "bullet" and need to update many of these instances every fraction of a second.
My first way to do this was to have a generic List of bullets and simply remove and add new bullets as needed. But in doing so the GC kicks in often and my game had some periodic jerky lag. (Alot of code cut out, but just wanted to show a simple snippet)
if (triggerButton)
{
bullets.Add(new bullet());
}
if (bulletDestroyed)
{
bullets.Remove(bullet);
}
My second and current attempt is to have a separate generic Stack of bullets that I push to when I'm done with a bullet, and pop off a bullet when I need a new one if there's anything in the stack. If there's nothing in the stack then I add a new bullet to the list. It seems to cut the jerky lag but then again, sometimes there's still some jerky lag springing up (though I don't know if it's related).
if (triggerButton)
{
if (bulletStack.Count > 0)
{
bullet temp = bulletStack.Pop();
temp.resetPosition();
bullets.Add(temp);
}
else
{
bullets.Add(new bullet());
}
}
if (bulletDestroyed)
{
bulletStack.Push(bullet);
bullets.Remove(bullet);
}
So, I know premature optimization is the root of all evil, but this was very noticeable inefficiency that I could catch early (and this was before even having to worry about enemy bullets filling the screen). So my questions are: Will pushing unused objects to a stack invoke the garbage collection? Will the references by kept alive or are objects still being destroyed? Is there a better way to handle updating many different objects? For instance, am I getting too fancy? Would it be fine to just iterate through the list and find an unused bullet that way?
There are a lot of issues here, and it's tricky to tell.
First off, is bullet a struct or a class? If bullet is a class, any time you construct one, then unroot it (let it go out of scope or set it to null), you're going to be adding something the GC needs to collection.
If you're going to be making many of these, and updating them every frame, you may want to consider using a List<bullet> with bullet being a struct, and the List being pre-allocated (generate it with a size large enough to hold all of your bullets, so it's not being recreated as you call List.Add). This will help dramatically with the GC pressure.
Also, just because I need to rant:
So, I know premature optimization is the root of all evil, but this was very noticeable inefficiency
Never, ever, be afraid to optimize a routine that you know is causing problems. If you're seeing a performance issue (ie: your lags), this is no longer premature optimization. Yes, you don't want to be optimizing every line of code, but you do need to optimize code, especially when you see a real performance issue. Optimizing it as soon as you see it's a problem is much easier than trying to optimize it later, as any design changes required will be much more easily implemented before you've added a lot of other code that uses your bullet class.
You may find the flyweight design pattern useful. There need be only one bullet object, but multiple flyweights may specify different positions and velocities for it. The flyweights can be stored in a preallocated array (say, 100) and flagged as active or not.
That should eliminate garbage-collection completely, and may reduce the space necessary for tracking each bullet's malleable properties.
I will admit that I don't have any experience in this per se, but I would consider using a traditional array. Initialize the array to a size that is more then you need, and would be the theoretical maximum number of bullets, say 100. Then starting at 0 assign the bullets at the beginning of the array, leaving the last element as a null. So if you had four active bullets your array would look like:
0 B
1 B
2 B
3 B
4 null
...
99 null
The benefit is that the array would always be allocated and therefore you are not dealing with the overhead of a more complex data structure. This is actually fairly similar to how strings work, since they are actually char[] with a null terminator.
Might be worth a shot. One downside, you'll have to do some manual manipulation when removing a bullet, probably move everything after that bullet up a slot. But you are just moving pointers at that point, so I don't think it would have a high penalty like allocating memory or a GC.
You are correct in assuming that by keeping the unused Bullets in a Stack prevents them from being Garbage Collected.
As for the cause of the Lag, have you tried any profiling tools? Just to find where the problem is.
Your stack based solution is pretty close to a class I wrote to generically do this sort of resource pooling:
http://codecube.net/2010/01/xna-resource-pool/
You mentioned that this makes the problem mostly go away, but it still crops up here and there. What's happening is that with this stack/queue based method of pooling, the system will reach a point of stability once you are no longer requesting more new objects than the pool can supply. But if the requests go higher than your previous max # of requested items, it will cause you to have to create a new instance to service the request (thus invoking GC from time to time).
One way you can side-step this is to go through and pre-allocate as many instances as you think you might need at the peak. That way, you won't have any new allocations (at least from the pooled objects), and the GC won't be triggered :-)
List actually has a built in capacity to prevent allocation for every add/remove. Once you exceed the capacity, it adds more ( I think I doubles every time ). The problem may be more on remove than add. Add will just drop on at the first open spot which is tracked by size. To remove, the list has to be condensed to fill in the now empty slot. If you are always removing for the front of the list, then every element needs to slide down.
A Stack still uses an array as its internal storage mechanism. So you are still bound by the add/remove properties of an array.
To make the array work, you need to create all the bullets up from with an Active property for each. When you need a new one, filled the Active flag to true and set all of the new bullets properties. Once complete, flip the Active flag false.
If you wanted to try to eliminate the need to iterate the list ( which could be very large depending on what you are going to allow ) for each repaint, you could try to implement a double linked list within the array. When a new bullet is needed, asked the array for the first available free entry. Go to the last active bullet ( a variable ) and add the new bullet array position into its next active bullet property. When it is time to remove it, go to the previous bullet and change its active bullet property to the removed next active.
//I am using public fields for demonstration. You will want to make them properties
public class Bullet {
public bool Active;
public int thisPosition;
public int PrevBullet = -1;
public int NextBullet = -1;
public List<Bullet> list;
public void Activate(Bullet lastBullet) {
this.Active = true;
this.PrevBullet = lastBullet.thisPosition;
list[this.PrevBullet].NextBullet = this.thisPosition;
}
public void Deactivate() {
this.Active = false;
list[PrevBullet].NextBullet = this.NextBullet;
list[NextBullet].PrevBullet= this.PrevBullet;
}
}
That way, you have a prebuilt array with all the needed bullets but the paint only hits the bullets that are active regardless of their position within the array. You just need to maintain a link to the first active bullet to start the paint and the last active bullet to know where list starts anew.
Now you are just worried about the memory to hold the entire list instead of when the GC is going to clean up.
Related
In my game I can use a list of game objects or tags to iterate but i prefer knows what is the most efficient way.
Save more memory using tags or unity requires many resources to do a search by tag?
public List<City> _Citys = new List<City>();
or
foreach(GameObject go in GameObject.FindGameObjectsWithTag("City"))
You're better of using a List of City objects and doing a standard for loop to iterate over the 'City' objects. The List just simply holds references to the 'City' objects, so impact on memory should be minimal - you could use an array of GameObjects[] instead of a List (which is what FindGameObjectsWithTag returns).
It's better for performance to use a populated List/Array rather than searching by Tags and of course you're explicitly pointing to an object rather than using 'magic' strings -- if you change the tag name later on then the FindGameObjectsWithTag method will silently break, as it will no longer find any objects.
Also, avoid using a foreach loop in Unity as this unfortunately creates a lot of garbage (the garbage collector in Unity isn't great so it's best to create as little garbage as possbile), instead just use a standard for loop:
Replace the “foreach” loops with simple “for” loops. For some reason, every iteration of every “foreach” loop generated 24 Bytes of garbage memory. A simple loop iterating 10 times left 240 Bytes of memory ready to be collected which was just unacceptable
EDIT: As mentioned in pid's answer - measure. You can use the built-in Unity profiler to inspect memory usage: http://docs.unity3d.com/Manual/ProfilerMemory.html
Per Microsoft's C# API rules, verbs such as Find* or Count* denote active code while terms such as Length stand for actual values that require no code execution.
Now, if the Unity3D folks respected those guidelines is a matter of debate, but from the name of the method I can already tell that it has a cost and should not be taken too lightly.
On the other side, your question is about performance, not correctness. Both ways are correct per se, but one is supposed to have better performance.
So, the main rule of refactoring for performance is: MEASURE.
It depends on memory allocation and garbage collection, it is impossible to tell which really is faster without measuring.
So the best advice I could give you is pretty general. Whenever you feel the need to enhance performance of code you have to actually measure what you are about to improve, before and after.
Your code examples are 2 distinctly different things. One is instantiating a list, and one is enumerating over an IEnumerable returned from a function call.
I assume you mean the difference between iterating over your declared list vs iterating over the return value from GameObject.FindObjectsWithTag() in which case;
Storing a List as a member variable in your class, populating it once and then iterating over it several times is more efficient than iterating over GameObject.FindObjectsWithTag several times.
This is because you keep your List and your references to the objects in your list at all times without having to repopulate it.
GameObject.FindObjectsWithTag will search your entire object hierarchy and compile a list of all the objects that it finds that matches your search criteria. This is done every time you call the function, so there is additional overhead even if the amount of objects it finds is the same as it still searches your hierarchy.
To be honest, you could just cache your results with a List object using GameObject.FindObjectWithTag providing the amount of objects returned will not change. (As in to say you are not instantiating or destroying any of those objects)
I am trying to optimize my code and was running VS performance monitor on it.
It shows that simple assignment of float takes up a major chunk of computing power?? I don't understand how is that possible.
Here is the code for TagData:
public class TagData
{
public int tf;
public float tf_idf;
}
So all I am really doing is:
float tag_tfidf = td.tf_idf;
I am confused.
I'll post another theory: it might be the cache miss of the first access to members of td. A memory load takes 100-200 cycles which in this case seems to amount to about 1/3 of the total duration of the method.
Points to test this theory:
Is your data set big? It bet it is.
Are you accessing the TagData's in random memory order? I bet they are not sequential in memory. This causes the memory prefetcher of the CPU to be dysfunctional.
Add a new line int dummy = td.tf; before the expensive line. This new line will now be the most expensive line because it will trigger the cache miss. Find some way to do a dummy load operation that the JIT does not optimize out. Maybe add all td.tf values to a local and pass that value to GC.KeepAlive at the end of the method. That should keep the memory load in the JIT-emitted x86.
I might be wrong but contrary to the other theories so far mine is testable.
Try making TagData a struct. That will make all items of term.tags sequential in memory and give you a nice performance boost.
Are you using LINQ? If so, LINQ uses lazy enumeration so the first time you access the value you pulled out, it's going to be painful.
If you are using LINQ, call ToList() after your query to only pay the price once.
It also looks like your data structure is sub optimal but since I don't have access to your source (and probably couldn't help even if I did :) ), I can't tell you what would be better.
EDIT: As commenters have pointed out, LINQ may not be to blame; however my question is based on the fact that both foreach statements are using IEnumerable. The TagData assignment is a pointer to the item in the collection of the IEnumerable (which may or may not have been enumerated yet). The first access of legitimate data is the line that pulls the property from the object. The first time this happens, it may be executing the entire LINQ statement and since profiling uses the average, it may be off. The same can be said for tagScores (which I'm guessing is database backed) whose first access is really slow and then speeds up. I wasn't pointing out the solution just a possible problem given my understanding of IEnumerable.
See http://odetocode.com/blogs/scott/archive/2008/10/01/lazy-linq-and-enumerable-objects.aspx
As we can see that next line to the suspicious one takes only 0.6 i.e
float tag_tfidf = td.tf_idf;//29.6
string tagName =...;//0.6
I suspect this is caused bu the excessive number of calls, and also note float is a value type, meaning they are copied by value. So everytime you assign it, runtime creates new float (Single) struct and initializes it by copying the value from td.tf_idf which takes huge time.
You can see string tagName =...; doesn't takes much because it is copied by reference.
Edit: As comments pointed out I may be wrong in that respect, this might be a bug in profiler also, Try re profiling and see if that makes any difference.
I'm making a game and need to store a fair bit of animation data. For each frame I have about 15 values to store. My initial idea was to have a list of "frame" objects that contain those values. Then I thought I could also just have separate lists for each of the values and skip the objects altogether. The data will be loaded once from an XML file when the game is started.
I'm just looking for advice here, is either approach at all better (speed, memory usage, ease of use, etc) than the other?
Sorry if this is a dumb question, still pretty new and couldn't find any info on stuff like this.
(PS: this is a 2D game with sprites, so 1 frame != 1 frame of screen time. I estimate somewhere around 500-1000 frames total)
If the animation data is not changing, you could use a struct instead of an class, which combines the "namespacing" of objects with the "value-typeness" of primitives. That will make all the values of a single frame reside in the same space, and save you some memory and page faults.
Just make sure that the size of your arrays doesn't get you into LOH if you intend to allocate and deallocate them often.
you are creating an animation So you should keep in mind a few basic things of.net:
Struct is created on stack. So, it is faster to instantiate (and destroy) a struct than a class.
On the other hand, every time you assign a structure or pass it to a function, it gets copied.
you can still pass struct by reference simply by specifying ref in the function parameter list
So it will be better to keep in mind this things
I've got a few global arrays I use in a simple WinForms game. The arrays are initialized when a new game starts. When a player is in the middle of the game (the arrays are filled with data) he clicks on the StartNewGame() button (restarts the game). What to do next?
Is it ok to reinitialize the whole array for the new game or should I just set every array item to null and use the already initialized array (which would be slower)?
I mean is it okay to do something like this?
MyClass[,] gameObjects;
public Form1()
{
StartNewGame();
// game flow .. simplified here .. normally devided in functions and events..
StartNewGame();
// other game flow
}
public StartNewGame()
{
gameObjects = new MyClass[10,10];
// some work with gameObjects
}
This almost entirely depends upon MyClass, specifically how many data members it contains, how much processing does its constructor (and members' constructors) require and whether it is a relatively simply operation to (re)set an object of this class to "initialized" state. A more objective answer can be obtained through benchmarking.
From you question, I understand that there are not so many array's - in that case I would say, reinitialize the whole array
In cases you have a lot of work that can take 30 sec to set up maybe you do clean up instead of reinitializing everything.
If you choose to place null, you can jet some ugly exception , so I think you rather clean the object inside the array rather then set them to null
If there are only 100 elements as in your example, then there shouldn't really be a noticeable performance hit.
If you reinitialize the array, you will perform n constructions for n objects. The garbage collector will come clean up the old array and de-allocate those old n objects at some later time. (So you have n allocations upfront, and n deallocations by the GC).
If you set each pointer in the array to null, the garbage collector will still do the same amount of work and come clean up those n objects at some later time. The only difference is you're not deallocating the array here, but that single deallocation is negligible.
From my point of view, the best way to achieve performance in this case is to not reallocate the objects at all, but to use the same ones. Add a valid bit to mark whether or not an object is valid (in use), and to reinitialize you simply set all the valid bits to false. In a similar fashion, programs don't go through and write 0's to all your memory when it's not in use. They just leave it as garbage and overwrite data as necessary.
But again, if your number of objects isn't going into the thousands, I'd say you really won't notice the performance hit.
gameObjects = new MyClass[10,10];
... is the way to go. This is definitely faster than looping through the array and setting the items to null. It is also simpler to code and to understand. But both variants are very fast in anyway, unless you have tens of millions of entries! '[10, 10]' is very small, so forget about performance and do what seems more appropriate and more understandable to you. A clean coding is more important than performance in most cases.
EDIT: Problem wasn't related to the question. It was indeed something wrong with my code, and actually, it was so simple that I don't want to put it on the internet. Thanks anyway.
I read in roughly 550k Active directory records and store them in a List, the class being a simple wrapper for an AD user. I then split the list of ADRecords into four lists, each containing a quarter of the total. After I do this, I read in about 400k records from a database, known as EDR records, into a DataTable. I take the four quarters of my list and spawn four threads, passing each one of the four quarters. I have to match the AD records to the EDR records using email right now, but we plan to add more things to match on later.
I have a foreach on the list of AD records, and inside of that, I have to run a for loop on the EDR records to check each one, because if an AD record matches more than one EDR record, then that isn't a direct match, and should not be treated as a direct match.
My problem, by the time I get to this foreach on the list, my ADRecords list only has about 130 records in it, but right after I pull them all in, I Console.WriteLine the count, and it's 544k.
I am starting to think that even though I haven't set the list to null to be collected later, C# or Windows or something is actually taking my list away to make room for the EDR records because I haven't used the list in a while. The database that I have to use to read EDR records is a linked server, so it takes about 10 minutes to read them all in, so my list is actually idle for 10 minutes, but it's never set to null.
Any ideas?
//splitting list and passing in values to threads.
List<ADRecord> adRecords = GetAllADRecords();
for (int i = 0; i < adRecords.Count/4; i++)
{
firstQuarter.Add(adRecords[i]);
}
for (int i = adRecords.Count/4; i < adRecords.Count/2; i++)
{
secondQuarter.Add(adRecords[i]);
}
for (int i = adRecords.Count/2; i < (adRecords.Count/4)*3; i++)
{
thirdQuarter.Add(adRecords[i]);
}
for (int i = (adRecords.Count/4)*3; i < adRecords.Count; i++)
{
fourthQuarter.Add(adRecords[i]);
}
DataTable edrRecordsTable = GetAllEDRRecords();
DataRow[] edrRecords = edrRecordsTable.Select("Email_Address is not null and Email_Address <> ''", "Email_Address");
Dictionary<string, int> letterPlaces = FindLetterPlaces(edrRecords);
Thread one = new Thread(delegate() { ProcessMatches(firstQuarter, edrRecords, letterPlaces); });
Thread two = new Thread(delegate() { ProcessMatches(secondQuarter, edrRecords, letterPlaces); });
Thread three = new Thread(delegate() { ProcessMatches(thirdQuarter, edrRecords, letterPlaces); });
Thread four = new Thread(delegate() { ProcessMatches(fourthQuarter, edrRecords, letterPlaces); });
one.Start();
two.Start();
three.Start();
four.Start();
In ProcessMatches, there is a foreach on the List of ADRecords passed in. The first line in the foreach is AdRecordsProcessed++; which is a global static int, and the program finishes with it at 130 instead of the 544k.
The variable is never set to null and is still in scope? If so, it shouldn't be collected and idle time isn't your problem.
First issue I see is:
AdRecordsProcessed++;
Are you locking that global variable before updating it? If not, and depending on how fast the records are processed, it's going to be lower than you expect.
Try running it from a single thread (i.e. pass in adRecords instead of firstQuarter and don't start the other threads.) Does it work as expected with 1 thread?
Firstly, you don't set a list to null. What you might do is set every reference to a list to null (or to another list), or all such references might simply fall out of scope. This may seem like a nitpick point, but if you are having to examine what is happening to your data it's time to be nitpicky on such things.
Secondly, getting the GC to deallocate something that has a live reference is pretty hard to do. You can fake it with a WeakReference<> or think you've found it when you hit a bug in a finaliser (because the reference isn't actually live, and even then its a matter of the finaliser trying to deal with a finalised rather than deallocated object). Bugs can happen everywhere, but that you've found a way to make the GC deallocate something that is live is highly unlikely.
The GC will be likely do two things with your list:
It is quite likely to compact the memory used by it, which will move its component items around.
It is quite likely to promote it to a higher generation.
Neither of these are going to have any changes you will detect unless you actually look for them (obviously you'll notice a change in generation if you keep calling GetGeneration(), but aside from that you aren't really going to).
The memory used could also be paged out, but it will be paged back in when you go to use the objects. Again, no effect you will notice.
Finally, if the GC did deallocate something, you wouldn't have a reduced number of items, you'd have a crash, because if objects just got deallocated the system will still try to use the supposedly live references to them.
So, while the GC or the OS may do something to make room for your other object, it isn't something observable in code, and it does not stop the object from being available and in the same programmatic state.
Something else is the problem.
Is there a reason you have to get all the data all at once? If you break the data up into chunks it should be more manageable. All I know is having to get into GC stuff is a little smelly. Best to look at refactoring your code.
The garbage collector will not collect:
A global variable
Objects managed by static objects
A local variable
A variable referencable by any method on the call stack
So if you can reference it from your code, there is no possibility that the garbage collector collected it. No way, no how.
In order for the collector to collect it, all references to it must have gone away. And if you can see it, that's most definitely not the case.