I have been working on a few tests with this awesome control of Phillip Piper, but also I am having some questions that I cannot get answered (on the cooking book, source code examples and so on...).
What is the real difference between the FastOlv and VirtualOlv on loading and working with large lists?
Imagine for testing purposes only that:
I have a List that is shown on both FastOlv and VirtualOlv.
This List has 1.000.000 documents (loaded in memory) and that this List is not editable (no adding, removing or changing documents).
I have loaded this list to both OLVs and the performance is the same, I mean, the load time (and for example going from top to bottom on these OLVs) is the same.
What is the real benefit of using a VirtualOlv over a FastOlv?
Is it all in the implementation of the IVirtualListDataSource that can be done to perform better for a particular situation? Can you share some examples?
Thank you for your insights on this.
Krs
VirtualObjectListView is an abstract base class that can be used to implement your own virtual list, with the contents coming from whereever your data is stored.
FastObjectListView is an implementation of VirtualObjectListView such that it acts like a normal ObjectListView only much faster for large lists.
Good question. You may want to take a look at the source code comments. The author isn't very specific, but reading through the comments and code helps to get an idea about the differences.
Actually FastObjectListView is derived from VirtualObjectListView. But it uses a FastObjectListDataSource as VirtualListDataSource instead of the default VirtualListVersion1DataSource which is used by the VirtualObjectListView. Both are derived from AbstractVirtualListDataSource, but the FastObjectListDataSource overrides/implements many more functions including searching and sorting for example.
Essentially it looks like the FastObjectListView does the work to extends the VirtualObjectListView by implementing functionality to behave like the ordinary ObjectListView. It also implements a GroupingStrategy which you could also add manually on any VirtualListDataSource.
So it seems that, as long as you just display a large number of items in the list, there is no performance difference to be expected, since FastObjectListView actually is VirtualObjectListView with added functionality that can be optionally used.
Related
I have a private List<Experience> experiences; that tracks generic experiences and experience specific information. I am using Json Serialize and Deserialize to save and load my list. When you start the application the List populates itself with the current saved information automatically and when a new experience is added to the list it saves the new list to file.
A concern that is popping into my head that I would like to get ahead of is, there is nothing that would stop the user from at any point doing something like experiences = new List<Experience>(); and then adding new experiences to it. Saving this would result in a loss of all previous data as right now the file is overwritten with each save. In an ideal world, this wouldn't happen, but I would like to figure out how to better structure my code to guard against it. Essentially I want to disallow removing items from the List or setting the list to a new list after the list has already been populated from load.
I have toyed with the idea of just appending the newest addition to the file, but I also want to cover the case where you change properties of an existing item in the List, and given that the list will never be all that large of a file, I thought overwriting would be the simplest approach as the cost isn't a concern.
Any help in figuring out the best approach is greatly appreciated.
Edit* Looked into the repository pattern https://www.infoworld.com/article/3107186/application-development/how-to-implement-the-repository-design-pattern-in-c.html and this seems like a potential approach.
I'm making an assumption that your user in this case is a code-level consumer of your API and that they'll be using the results inside the same memory stack, which is making you concerned about reference mutation.
In this situation, I'd return a copy of the list rather than the list itself on read-operations, and on writes allow only add and remove as maccettura recommends in the comments. You could keep the references to the items in the list intact if you want the consumer to be able to mutate them, but I'd think carefully about whether that's appropriate for your use case and consider instead requiring the consumer to call an update function (which could be the same as your add function a-la HTTP PUT).
Sometimes when you want to highlight that your collection should not be modified, exposing it as an IEnumerable except List may be enough, but in case you are writing some serious API, something like repository pattern seems to, be a good solution.
Is there a way to check for the size of a class in C#?
My reason for asking is:
I have a routine that stores a class's data in a file, and a different routine that loads this object (class) from that same file. Each attribute is stored in a specific order, and if you change this class you have to be reminded of these export/import routines needs changing.
An example in C++ (no matter how clumsy or bad programming this might be) would be
the following:
#define PERSON_CLASS_SIZE 8
class Person
{
char *firstName;
}
...
bool ExportPerson(Person p)
{
if (sizeof(Person) != PERSON_CLASS_SIZE )
{
CatastrophicAlert("You have changed the Person class and not fixed this export routine!")
}
}
Thus before compiletime you need to know the size of Person, and modify export/import routines with this size accordingly.
Is there a way to do something similar to this in C#, or are there other ways of "making sure" a different developer changes import/export routines if he changes a class.
... Apart from the obvious "just comment this in the class, this guarantees that a developer never screws things up"-answer.
Thanks in advance.
Each attribute is stored in a specific order, and if you change this class you have to be reminded of these export/import routines needs changing.
It sounds like you're writing your own serialization mechanism. If that's the case, you should probably include some sort of "fingerprint" of the expected properties in the right order, and validate that at read time. You can then include the current fingerprint in a unit test, which will then fail if a property is added. The appropriate action can then be taken (e.g. migrating existing data) and the unit test updated.
Just checking the size of the class certainly wouldn't find all errors - if you added one property and deleted one of the same size in the same change, you could break data without noticing it.
A part from the fact that probably is not the best way to achieve what you need,
I think the fastest way is to use Cecil. You can get the IL body of the entire class.
I am writing a log file decoder which should be capable of reading many different structures of files. My question is how best to represent this data. I am using C#, but am new to OOP.
An example:
The log files have a range of sensor values. One sensor reading can be called A, another B. Obviously, there are many more than 2 entry types.
In different log files, they could be stored either as ABABABABAB or AAAAABBBBB.
I was thinking of describing this as blocks of entries. So in the first case, a block would be 'AB', with 5 blocks. In the second case, the first block is 'A', read 5 times. This is followed by a block of 'B', read 5 times.
This is quite a simplification (there are actually 40 different types of log file, each with up to 40 sensor values in a block). No log has more than 300 blocks.
At the moment, I store all of this in a datatable. I have a column for each entry, with a property of how many to read. If this is set to -1, it continues to the next column in the block. If not, it will assume that it has reached the end of the block.
This all seems quite clumsy. Can anyone suggest a better way of doing this?
I think you should first start here, and then here to learn a little bit about what object oriented programming is. Don't worry about your current problem while learning about OOP.
As you are learning about OO concepts, you should begin to understand code is not data, and data is not code. It does not matter how you represent your data from an OOP stance. You can write OO code to consume your data, or you could write procedurage code to consume your data, that part is irrelevant to the format of the data.
So then getting back to your question
My question is how best to represent this data
It depends on your needs. What is writing the log file? Do you have control over the writer and reader? If I did I would rely on build the built in serialization methods to minize the amount of code I need to write. Is the log file going to be really long? If so the "datatable" approach you described is usually better. If the log file isn't going to be a huge in file size, XML is really easy to work with.
Very basic and straightforward:
Define an interface for IEnrty with properties like string EntryBlock, int Count
Define a class which represents an Entry and implements IEntry
Code which doing a binary serialization should be aware of interfaces, for instance it should reffer IEnumerable<IEntry>
Class Entry could override ToString() to return something like [ABAB-2], surely if this is would be helpful whilst serialization
Interface IEntry could provide method void CreateFromRawString(string rawDataFromLog) if it would be helpful, decide yourself
If you want more info please share code you are using for serialization/deserializaton
In addition to what Bob has offered, I highly recommend Head First Design Patterns as a gentle, but robust introduction to OO for a C# programmer. The samples are in Java, which translate easily to C#.
As for OOP, you want to learn SOLID.
I would suggest you build this using Test Driven Development.
Start small, with a simple fragment of your log data and write a test like (you'll find a better way to do this with experience and apply it to your situation):
[Test]
public void ReadSequence_FiveA_ReturnsProperList()
{
// Arrange
string sequenceStub = "AAAAA";
// Act
MyFileDecoder decoder = new MyFileDecoder();
List<string> results = decoder.ReadSequence(sequenceStub);
// Assert
Assert.AreEqual(5, results.Count);
Assert.AreEqual("A", results[0]);
}
That test code snippet is just a starting point, and I've tried to be rather verbose in the assertions. You can come up with more creative ways over time. The point is to start small. Once this test passes, add another test where you mix "AB" and change your decoder to handle this properly. Eventually, you'll have a large set of tests that handle your different formats. Using TDD, you'll be on the path to using SOLID properly. Whenever you find something you can't test, you should review the rules and see if you can't make it simpler and inject dependencies.
Eventually you'll get into mocking. For example, you might find that you'd rather INJECT the ability for your MyFileDecoder class to have a dependency that will read your log file. In that case, you would create a mock object and pass that into the constructor and set the mock to return the sequenceStub when a method is called.
I was recently profiling an application trying to work out why certain operations were extremely slow. One of the classes in my application is a collection based on LinkedList. Here's a basic outline, showing just a couple of methods and some fluff removed:
public class LinkInfoCollection : PropertyNotificationObject, IEnumerable<LinkInfo>
{
private LinkedList<LinkInfo> _items;
public LinkInfoCollection()
{
_items = new LinkedList<LinkInfo>();
}
public void Add(LinkInfo item)
{
_items.AddLast(item);
}
public LinkInfo this[Guid id]
{ get { return _items.SingleOrDefault(i => i.Id == id); } }
}
The collection is used to store hyperlinks (represented by the LinkInfo class) in a single list. However, each hyperlink also has a list of hyperlinks which point to it, and a list of hyperlinks which it points to. Basically, it's a navigation map of a website. As this means you can having infinite recursion when links go back to each other, I implemented this as a linked list - as I understand it, it means for every hyperlink, no matter how many times it is referenced by another hyperlink, there is only ever one copy of the object.
The ID property in the above example is a GUID.
With that long winded description out the way, my problem is simple - according to the profiler, when constructing this map for a fairly small website, the indexer referred to above is called no less than 27906 times. Which is an extraordinary amount. I still need to work out if it's really necessary to be called that many times, but at the same time, I would like to know if there's a more efficient way of doing the indexer as this is the primary bottleneck identified by the profiler (also assuming it isn't lying!). I still needed the linked list behaviour as I certainly don't want more than one copy of these hyperlinks floating around killing my memory, but I also do need to be able to access them by a unique key.
Does anyone have any advice to offer on improving the performance of this indexer. I also have another indexer which uses a URI rather than a GUID, but this is less problematic as the building incoming/outgoing links is done by GUID.
Thanks;
Richard Moss
You should use a Dictionary<Guid, LinkInfo>.
You don't need to use LinkedList in order to have only one copy of each LinkInfo in memory. Remember that LinkInfo is a managed reference type, and so you can place it in any collection, and it'll just be a reference to the object that gets placed in the list, not a copy of the object itself.
That said, I'd implement the LinkInfo class as containing two lists of Guids: one for the things this links to, one for the things linking to this. I'd have just one Dictionary<Guid, LinkInfo> to store all the links. Dictionary is a very fast lookup, I think that'll help with your performance.
The fact that this[] is getting called 27,000 times doesn't seem like a big deal to me, but what's making it show up in your profiler is probably the SingleOrDefault call on the LinkedList. Linked lists are best for situations where you need fast insertions & removals, particularly in the middle of the list. For quick lookups, which is probably more important here, let the Dictionary do its work with hash tables.
I'm working on a class library and have opted for a route with my design to make implementation and thread safety slightly easier, however I'm wondering if there might be a better approach.
A brief background is that I have a multi-threaded heuristic algorithm within a class library, that once set-up with a scenario should attempt to solve it. However I obviously want it to be thread safe and if someone makes a change to anything while it is solving for that to causes crashes or errors.
The current approach I've got is if I have a class A, then I create a number InternalA instances for each A instance. The InternalA has many of the important properties from the A class, but is internal an inaccessible outside the library.
The downside of this, is that if I wish to extend the decision making logic (or actually let someone do this outside the library) then it means I need to change the code within the InternalA (or provide some sort of delegate function).
Does this sound like the right approach?
It's hard to really say from just that - but I can say that if you can make everything immutable, your life will be a lot easier. Look at how functional languages approach immutable data structures and collections. The less shared mutable data you have, the simple threading will be.
Why Not?
Create generic class, that accepts 2 members class (eg. Lock/Unlock) - so you could provide
Threadsafe impl (implmenetation can use Monitor.Enter/Exit inside)
System-wide safe impl (using Mutex)
Unsafe, but fast (using empty impl).
another way i have had some success with is by using interfaces to achieve functional separation. the cost of this approach is that you end up with some fields 'repeated' because each interface requires total separation from the others fields.
In my case I had 2 threads that need to pass over a set of data that potentially is large and needs as little garbage collection as possible. Ie I only want to pass change information from the first stage to the second. And then have the first process the next work unit.
this was achieved by the use of change buffers to pass changes from one interface to the next.
this allows one thread to work away at one interface, make all its changes and then publish a struct containing the changes that the other interface (thread) needs to apply prior to its work.
by doing this You have a double buffer ... (thread 1 produces a change report whilst thread 2 consumes the last report). If you add more interfaces (and threads) it appears like there are pulses of work moving through the threads.
This was based on my research and I have no doubt that there are better methods available now.
My aim when coming up with this however was to avoid the need for locks in the vast majority of code by designing out race conditions. the other major consideration is performance in garbage collection - which may not be an issue for you.
this way is all good until you need complex interactions between threads ... then you find that you start forcing the layout of your buffer structures for reuse to get around inheritance which in turn has an upkeep overhead.
A little more information on the problem to help...
The heuristic I'm using is to solve TSP like problems. What happens right at the start of each
calculation is that all the aspects that form the problem (sales man/places to visit) are cloned
so they aren't affected across threads.
This means each thread can change data (such as stock left on a sales man etc) as there are a number
of values that change during the calculation as things progress. What I'd quite like to do is allow
the checked such as HasSufficientStock() for a simple example to be override by a developer using the library.
Unforutantely at present however to add further protection across threads and makings some simplier/lightweight
classes I convert them to these internal classes, and these are the things that are actually used and cloned.
For example
class A
{
public double Stock { get; }
// Processing and cloning actually works using these InternalA's
internal InternalA ConvertToInternal() {}
}
internal class InternalA : ICloneable
{
public double Stock { get; set; }
public bool HasSufficientStock() {}
}