I have a List with a large amount of elements in it. I need to create a copy of this list to perform operations on it without altering the original list. However, the operations typically only access a small proportion of the elements of the list, and so it is inefficient to copy the entire thing when most of it will go unused. Is there a simple way to create an object which is a clone of a list, but only clones elements when they are accessed? I have looked into the Lazy<T> class, which seems to be what I want, but I don't know how to apply it in this situation.
I want to be able to do something like this:
LazyListCopy<SomeType> lazyCopy = LazyListCopy.Copy(myList); // No elements have been copied at this point
DoSomethingWith(lazyCopy[34]); // 35th element has now been copied
And this:
foreach(SomeType listElement in LazyCopy.Copy(myOtherList))
{
if (!Check(listElement)) // Object corresponding to listElement has been cloned
break;
}
I don't mind if the solution isn't generic enough to handle Lists of any type; I would be fine with it being specific to one or two classes I've created.
Preferably this would be a deep copy rather than a shallow copy, but a shallow copy would still be useful, and I would appreciate examples of that if it is shorter/simpler too.
Sounds like you want to end up with your original list plus a sparse collection of overrides.
Why not create a dictionary for the overrides, keyed on the index into the original list? You can then manually add values from your original list as they are needed.
You could wrap this functionality up into a class that wraps IList<T> if it's something you're going to use often.
Related
I want to make backup List for Undo/Redo,
But the object in the BackUp List will be change after I modified the object in the original List.
How can I deal with this problem? "const", "in" seems not working.
private List<GeminiFileStruct> BackUpForUndoRedo(List<GeminiFileStruct> gfl,
ToolStripMenuItem tm)
{
var li =
(from i in gfl
select i).ToList();
tm.Enabled = true;
return li;
}
Sorry, it used to be struct. Cause some problem, I change to class.
Could struct has Get/Set???
Green hand to C#.
While Bernoulli IT describes the actual problem with using a shallow copy, I wanted to provide some more background for undo/redo. There are two main approaches for undo/redo
Memento pattern. Before doing a change to an object, a memento is created that can be used to restore the state of said object. This can be applied to the whole application, i.e. before any change, the application state is serialized, just like it would if the user saves to a file. This serialized state can then be restored, just like when loading a file. Assuming there is a function to save to file, and that this represents the application state. Note that serialization/deserialization will implicitly create a deep copy.
Command pattern. Each change should be done by a command that knows how to reverse the change. A downside with this is that it can be complicated to make sure all actions generate these objects correctly.
What you need is a so-called deep copy of the list:
Items in the backup list will be clones of the items in the original
list. Fresh new instances of items with identical properties.
Not a shallow copy:
A backup list with "just" references to items in the original list. This will cause
changes to item A in the backup list to be changed to item A in the
original list because they reference the same item 😉.
Have a look at this SO post or any of these web pages: tutorial 1, tutorial 2.
Deep copying is not a trivial programming technique as you will discover. But under the right assumptions in the right context it can be done safely.
Note
As #Llama points out a deep copy of a list with structs is automagically obtained when doing new List<TStruct>(originalListWithStructs). Struct is a value type and behaves different compared to a reference type.
My application's "document" is really just a BindingSource that has a List<>. Since you cannot directly serialize a BindingSource, when it comes time to save, I serialize the List<>.
When it comes time to load, I'd like to deserialize straight into the BindingSource, but I do not thing you can (the .List member is read-only). So I have to deserialize into a temporary list and then Add each of the items to the BindingSource's List.
This means that by the time I'm done there are two copies in memory, one of which will be cleaned up, but if the files get large that's going to be a problem.
Is there a more direct approach I should be taking, or some way to deserialize directly into the List that the BindingSource holds?
Rather than serializing the list I tried individually serializing the elements in the list, but the resulting disk file was much larger.
If the size is big enough that holding it twice is an issue, then probably holding it once is an issue too. However; if you are interested, protobuf-net has a DeserializeItems method that returns a non-buffered (fully streaming) IEnumerable<T> - i.e. no list. So you could:
foreach(var item in {DeserializeItems})
bindingSource.Add(item);
I can also pretty-much guarantee that it'll be smaller on disk and faster to process too.
I wonder, though, if "virtual mode" might be a better option for a large list.
You could use Queue/Dequeue to transport the from one list to another as data arrive
Group of related data like a list of parts etc., can be handled either using Arrays(Array of Parts) or using Collection. I understand that When Arrays are used, Insertion, Deletion and some other operations have performance impact when it is compared with Collections. Does this mean that Arrays are not used internally by the collections?, If so what is the data structure used for collections like List, Collection etc?
How the collections are handled internally?
List<T> uses an internal array. Removing/inserting items near the beginning of the list will be more expensive than doing the same near the end of the list, since the entire contents of the internal array need to be shifted in one direction. Also, once you try to add an item when the internal list is full, a new, bigger array will be constructed, the contents copied, and the old array discarded.
The Collection<T> class, when used with the parameterless constructor, uses a List<T> internally. So performance-wise they will be identical, with the exception of overhead caused by wrapping. (Essentially one more level of indirection, which is going to be negligible in most scenarios.)
LinkedList<T> is, as its name implies, a linked list. This will sacrifice iteration speed for insertion/removal speed. Since iterating means traversing pointers-to-pointers-to-pointers ad infinitum, this is going to take more work overall. Aside from the pointer traversal, two nodes may not be allocated anywhere near each other, reducing the effectiveness of CPU RAM caches.
However, the amount of time required to insert or remove a node is constant, since it requires the same number of operations no matter the state of the list. (This does not take into account any work that must be done to actually locate the item to remove, or to traverse the list to find the insertion point!)
If your primary concern with your collection is testing if something is in the collection, you might consider a HashSet<T> instead. Addition of items to the set will be relatively fast, somewhere between insertion into a list and a linked list. Removal of items will again be relatively fast. But the real gain is in lookup time -- testing if a HashSet<T> contains an item does not require iterating the entire list. On average it will perform faster than any list or linked list structure.
However, a HashSet<T> cannot contain equivalent items. If part of your requirements is that two items that are considered equal (by an Object.Equals(Object) overload, or by implementing IEquatable<T>) coexist independently in the collection, then you simply cannot use a HashSet<T>. Also, HashSet<T> does not guarantee insertion order, so you also can't use a HashSet<T> if maintaining some sort of ordering is important.
There are two basic ways to implement a simple collection:
contiguous array
linked list
Contiguous arrays have performance disadvantages for the operations you mentioned because the memory space of the collection is either preallocated or allocated based on the contents of the collection. Thus deletion or insertion requires moving many array elements to keep the entire collection contiguous and in the proper order.
Linked lists remove these issues because the items in the collection do not need to be stored in memory contiguously. Instead each element contains a reference to one or more of the other elements. Thus, when an insertion is made, the item in question is created anywhere in memory and only the references on one or two of the elements already in the collection need to be modified.
For example:
LinkedList<object> c = new LinkedList<object>(); // a linked list
object[] a = new object[] { }; // a contiguous array
This is simplified of course. The internals of LinkedList<> are doubtless more complex than a simple singly or doubly linked list, but that is the basic structure.
I think that some collection classes might use arrays internally as well as linked lists or something similar. The benefit of using collections from the System.Collections namespace instead of arrays, is that you do not need to spend any extra time writing code to perform update operations.
Arrays will always be more lightweight, and if you know some very good search algorithms, then you might even be able to use them more efficiently, but most of the the time you can avoid reinventing the wheel by using classes from System.Collections. These classes are meant to help the programmer avoid writing code that has already been written and tuned hundreds of times, so it is unlikely that you'll get a significant performance boost by manipulating arrays yourself.
When you need a static collection that doesn't require much adding, removing or editing, then perhaps it is a good time to use an array, since they don't require the extra memory that collections do.
I was recently profiling an application trying to work out why certain operations were extremely slow. One of the classes in my application is a collection based on LinkedList. Here's a basic outline, showing just a couple of methods and some fluff removed:
public class LinkInfoCollection : PropertyNotificationObject, IEnumerable<LinkInfo>
{
private LinkedList<LinkInfo> _items;
public LinkInfoCollection()
{
_items = new LinkedList<LinkInfo>();
}
public void Add(LinkInfo item)
{
_items.AddLast(item);
}
public LinkInfo this[Guid id]
{ get { return _items.SingleOrDefault(i => i.Id == id); } }
}
The collection is used to store hyperlinks (represented by the LinkInfo class) in a single list. However, each hyperlink also has a list of hyperlinks which point to it, and a list of hyperlinks which it points to. Basically, it's a navigation map of a website. As this means you can having infinite recursion when links go back to each other, I implemented this as a linked list - as I understand it, it means for every hyperlink, no matter how many times it is referenced by another hyperlink, there is only ever one copy of the object.
The ID property in the above example is a GUID.
With that long winded description out the way, my problem is simple - according to the profiler, when constructing this map for a fairly small website, the indexer referred to above is called no less than 27906 times. Which is an extraordinary amount. I still need to work out if it's really necessary to be called that many times, but at the same time, I would like to know if there's a more efficient way of doing the indexer as this is the primary bottleneck identified by the profiler (also assuming it isn't lying!). I still needed the linked list behaviour as I certainly don't want more than one copy of these hyperlinks floating around killing my memory, but I also do need to be able to access them by a unique key.
Does anyone have any advice to offer on improving the performance of this indexer. I also have another indexer which uses a URI rather than a GUID, but this is less problematic as the building incoming/outgoing links is done by GUID.
Thanks;
Richard Moss
You should use a Dictionary<Guid, LinkInfo>.
You don't need to use LinkedList in order to have only one copy of each LinkInfo in memory. Remember that LinkInfo is a managed reference type, and so you can place it in any collection, and it'll just be a reference to the object that gets placed in the list, not a copy of the object itself.
That said, I'd implement the LinkInfo class as containing two lists of Guids: one for the things this links to, one for the things linking to this. I'd have just one Dictionary<Guid, LinkInfo> to store all the links. Dictionary is a very fast lookup, I think that'll help with your performance.
The fact that this[] is getting called 27,000 times doesn't seem like a big deal to me, but what's making it show up in your profiler is probably the SingleOrDefault call on the LinkedList. Linked lists are best for situations where you need fast insertions & removals, particularly in the middle of the list. For quick lookups, which is probably more important here, let the Dictionary do its work with hash tables.
Let's say I have a class Collection which holds a list of Items.
public class Collection
{
private List<Item> MyList;
//...
}
I have several instances of this Collection class which all have different MyLists but share some Items.
For example: There are 10 Items, Collection1 references Items 1-4, Collection2 has Items 2-8 and Collection3 4,7,8 and 10 on its List.
I implemented this as follows: I have one global List which holds any Items available. Before I create a new Collection I check if there are already Items I need in this list -- if not I create the Item and add it to the global List (and to the Collection of course).
The problem I see is that those Items will never be released - even if all Collections are gone, the memory they consume is still not freed because the global list still references them.
Is this something I need to worry about? If so, what should I do? I thought of adding a counter to the global list to see when an Item is not needed anymore and remove its reference.
Edit:
It is in fact a design problem, I think. I will discard the idea of a global list and instead loop through all Collections and see if they have the needed Item already.
If the global list needs references to the items then you can't realistically free them. Do you actually need references to the items in the global list? When should you logically be able to remove items from the global list?
You could consider using weak references in the global list, and periodically pruning the WeakReference values themselves if their referents have been collected.
It looks like a bit of a design problem, do you really need the global list?
Apart from weakreferences that Jon mentions, you could also periodically rebuild the global list (for example after deleting a collection) or only build it dynamically when you need it and release it again.
You'll have to decide which method is most appropriate, we don't have enough context here.