How do I avoid 'Binding to Large CLR Objects'? - c#

This guide on optimizing DataBinding says:
There is a significant performance impact when you data bind to a single CLR object with thousands of properties. You can minimize this impact by dividing the single object into multiple CLR objects with fewer properties.
What does this mean? I am still trying to get familiar with DataBinding, but my analogy here is that properties are like SQL table fields, and objects are rows. This advice then translates to "to avoid problems with a large number of fields, use less fields and create more rows". As this doesn't make any sense to me, possibly my understanding of databinding is completely askew?
Does this advice actually apply? I am unsure if it is specific to .NET 4/WPF, while i am using 3.5 and a custom WinForms based control library (DevExpress)
As an aside: am I correct in thinking DataBinding uses reflection when an using IList style datasource?
This is not just a academic question. I am currently trying to speed up loading a XtraGridView (DevExpress Control) with ~100,000 objects with 50 properties or so.

This advice then translates to "to avoid problems with a large number of fields, use less fields and create more rows"
I think it should translate to "use less fields and create smaller tables" (i.e. with less fields). And the original advice should read "[...]dividing the single class into multiple classes", with fewer properties. As you correctly noted, it wouldn't make sense to create more "rows"...
Anyway, if you do have a class that exposes hundreds or thousands of properties, you have a far more serious problem than binding performance... This is a serious design flaw that you should fix after reading some OO principles.
Does this advice actually apply? I am unsure if it is specific to .NET 4/WPF, while i am using 3.5 and a custom WinForms based control library (DevExpress)
Well, the page you mentioned is about WPF, but I think the idea of binding to smaller objects can apply to WinForms too (because the more properties need to be watched, the slower it will be)
As an aside: am I correct in thinking DataBinding uses reflection when an using IList style datasource?
You're partially correct... it actually uses TypeDescriptor, which in turns uses reflection to examine regular CLR objects. But this mechanism is much more flexible than reflection: a type can implement ICustomTypeDescriptor to provide its own description, list of members, etc (DataTable is one example of such a type)

You are solving the wrong problem. It will take a typical user well over a week to find back what she is looking for when she's got 5 million fields to search through. The speed of your UI becomes irrelevant. Only a machine can do a better job finding the data back.
You've got one. Help the user narrow down what she is searching for by letting her enter search terms so that the total query result doesn't contain more than, say, a hundred rows. The dbase engine helps you make that fast. And it automatically solves your grid perf problem.

Related

Which serialization to use for my c# objects to save them in a SQL database

I'm looking for some advice, it may be that there is no hard and fast answer but any thoughts are appreciated. The application is very typical. Its a c# code, currently using VS2010. This code is used amongst other things for providing data from a model to end users. The model itself is based on some known data at fixed points (a "cube") various parameters and settings. In standard programming practice the user accesses the data via public "get" functions which in turn rely on private member variables such as the known data and the settings. So far so standard. Now I want to save the class providing this data to the users into an sql database - primarily so all users can share the same data (or more precisely model generated data).
Of course I could just take each member variable of the class and write these into the db using sql database and reinstantiate the class from these. But I dont want to miss out on all the goodies .net & c# has to offer. So what I'm thinking of doing is serializing the object and using linq to sql to squirt this into the db. The linq to sql section is straightforward, but I'm a total newbie when it comes to serialization and I'm a bit concerned/confused about it. It seems the obvious thing is to format the object into xml and write this into the database as a column in the table with sql datatype "xml". But that leaves me with a few questions.
To my understanding the standard XMLserializer will only write the public members of the class into the xml. That looks like a non-starter to me since my class design is predicated on keeping much of the class private (writing classes with all public members is outside of my experience - who does that ?). This has led me to consider the DataContractSerializer in which you can opt-in variables for serialization. However this seems to have some WCF dependencies and I'm wondering what are the potential drawbacks of using it. Additionally there is the SoapFormatter, which seems to be more prevalent for web applications and also JSON. I'm not really considering these, but maybe I should ? Maybe there are better ways of solving the problem ? Perhaps a bit verbose but hopefully all the context can help so people can shoot me some advice.
I have had requirements similar to yours and have done quite a bit of research in this area. I did a number of proof-of-concept projects using XMLSerialization, JSON, BinraryFormatter and not to forget some home grown hacks. I had almost decided to go with JSON (JSON.NET), until I found protobuf-net! It is fast, small in size, version independent, supports inheritance and easy to implement without much changes to your code. Recommend heavily.
If you store an object as XML, it will be very hard to use from the database. For example, if you store customer objects as XML, how would you write the following?
select * from Customers where LastName = 'Obama'
It can be done, but it's not easy.
How to map objects to a database is a subject of some controversy. Frameworks that are easy to get started with can become overly complex in the application's later life. Since most applications spend more time in maintenance than in initial development, I'd use the simplest model that works.
Plain ADO.NET or Dapper are good contenders. You'll write a bit more boilerplate code, but the decrease in complexity more than makes up for that.

Build separate pages or not

I have about 20 grid views that I have to create. All of them are pretty standard across the board. Just take IEnumerable T and display it in a grid view, that's it.
I would prefer to create one aspx page and have the grid view be dynamically generated by using ITemplate. And I guess for the data source use IEnumerable Object.
Are there significant performance considerations between doing it the way I'd like to do it or would it be better to go ahead and build the 20 or more grid views on separate aspx pages?
An example of a concern I have is taking List T and casting to IEnumerable T where T is type Object.
Build just the one and performance test. It will be easier to apply lessons learned.
If the data is long, turn buffering off to improve time to first byte.
Having one generic view page is preferable - where it is possible which it sounds like it is in your case.
Secondly no performance hit going from List to IEnumerable as IEnumerable is a behaviour which List has.
However you will get a performance hit building a List if you don't already have it - it is much better to ensure that you are passing IEnumerable from the LINQ statements directly as it is only realized when used - which can have major benefit with long lists and when using sorting or filtering (because you can modify the IEnumerable before it is realized)
As with anything related to performance build it and profile it to see if performance is an issue. No amount of opinion, however well informed is a substitute for profiling and only optimising when necessary, always avoid premature optimisation.
20 grid views, Ok
Just make sure you disable the ViewState of the controls that do not require it.
That will considerably reduce you page size & in turn reduce the page load time.
If there is nothing custom and you have to only show default data for all 20 tables/lists then i think you should use one page

Definition of C# data structures and algorithms

This may be a silly question (with MSDN and all), but maybe some of you will be able to help me sift through amazing amounts of information.
I need to know the specifics of the implementations of common data structures and algorithms in C#. That is, for example, I need to know, say, how Linked Lists are handled and represented, how they and their methods are defined.
Is there a good centralized source of documentation for this (with code), or should I just reconstruct it? Have you ever had to know the specifics of these things to decide what to use?
Regards, and thanks.
Scott Mitchell has a great 6-part article that covers many .NET data structures:
An Extensive Examination of Data Structures
For an algorithmic overview of data structures, I suggest reading the algorithm textbook: "Introduction to Algorithms" by Cormen, et al..
For details on each .NET data structure the MSDN page on that specific class is good.
When all of them fail to address issues, Reflector is always there. You can use it to dig through the actual source and see things for yourself.
If you really want to learn it, try making your own.
Googling for linked lists will give you a lot of hits and sample code to go off of. Wikipedia will also be a good resource.
Depends on the language. Most languages have the very basics now pre-built with them, but that doesn't mean their implementations are the same. The same named object--LinkedList in C# is completely different than the LinkedList in Java or C++. Even the String library is different. C# for instance is known to create a new String object every time you assign a string a new value...this becomes something you learn quickly when it brings your program to a crashing halt when you're working with substrings in C# for the first time.
So the answer to your question is massively complicated because I don't know quite what you're after. If you're just going to be teaching a class what a generic version of these algorithms and data structures are, you can present them without getting into the problems I mentioned above. You'll just need to select, lookup, read about a particular type of implementation of them. Like for LinkedList you need to be able to instantiate the list, destroy the list, copy the list, add to the list somewhere (usually front/back), remove from the list, etc. You could get fancy and add as many methods as you want.

C# creating a fixed size hashtable

I want to be able to create a fixed size hashmap of say 100 buckets, and if I need to store over 100 items then collisions and overwriting will just have to happen. The hashtable class has a IsFixedSize property however it is readonly.
Am I thinking about this completely wrongly, or is there a solution to this?
Collections in the .NET framework don't allow for a lot of fine-tuning. Although you might find one efficient enough for your needs. Try some viable ones out before optimizing.
If you don't roll your own then you might find a 3rd party alternative that has more fine-grained controls. For example, see The C5 Generic Collection Library
for C# and CLI as a possible start. Check into the various Hash* classes on their documentation page.
If you decide to roll your own then you'll want to implement some of the standard interfaces for collections and/or lists, enumerations, etc so they work as expected with C# foreach and language and .NET features.
You might also take an efficient C++ implementation if you have one and there are ways of using it in C#/.NET. It might take a bit of finagling but there are answers on SO about how to accomplish this kind of thing.

C# Is holding data in a delimited string bad practice

Is it bad practice to have a string like
"name=Gina;postion= HouseMatriarch;id=1234" to hold state data in an application.I know that I could just as well have a struct , class or hashtable to hold this info.
Is it acceptable practice to hold delimited key/value pairs in a database field– just for use in where the data type is not known at design time.
Thanks...I am just trying to insure that I maintain good design practices
Yes, holding your data in a string like "name=Gina;postion= HouseMatriarch;id=1234" is very bad practice. This data structure should be stored in structs or objects, because it is hard to access, validate and process the data in a string. It will take you much more time to write the code to parse your string to get at your data than just using the language structures of C#.
I would also advise against storing key/value pairs in database fields if the other option is just adding columns for those fields. If you don't know the type of your data at design time, you are probably not doing the design right. How will you be able to build an application when you don't know what data types your fields will have to hold? Or perhaps you should elaborate on the context of the application to make the intent clearer. It is not all black and white :-)
Well, if your application only passes this data between other systems, I don't see any problems in treating it as a string. However, if you need to use the data, I would definitely introduce the necessary types to handle that.
I think you will find your application easier to maintain if you make a struct or class to hold the data and then add a custom property to return (and set) the string you been using. This method will take the fields and format it in the string that you are already using and do the reverse (take the string and fill the fields) This way you maintain maximum compatibility with your old algorithms.
Well one immediate problem with that approach is embedded escape chars. Given your example what would happen if the user entered their name as follows:
Pet;er
or
Pe=;ter
or
pe;Name=Yeoi;
I am not sure what state data it is you are trying to hold, and without any context it's hard to make valid suggestions. Perhaps a first step would be to replace this with a key value pair, at least that negates the problem mentioned above and means you don't have to parse strings regularly.
I try to not keep data in any string based formats. But I encountered several situations, in which it was not possible to know in advance how the structure of the data will be (e.g. it was possible for the customer/end-user to dynamically add fields).
In contrast to your approach, we decided to store the data in XML, e.g. in your case this would be something similar like this:
<user id="1234">
<name>Gina</name>
<postion>HouseMatriarch</position>
</user>
This gives you the following advantages:
The classes to work with the data (read/write) are already available in the framework (e.g. XmlDocument or XML serialization)
you can easily exchange the data with other systems (if/when required)
You can store the data in a file
you can store the data in a database column (xml data type). You can even query that column when using SQL Server (although I'd try to avoid storing data in XML, that has to be queried)
using XML allows to add additional fields to your data at any time
Update: I'm not sure why my answer was downvoted that much - maybe it is because of the bad example. Therefore I'd like to make it clear: I would not use XML for properties such as an ID/primary key of a user, or for standard properties like "name", "email", etc. But for "extended/dynamic" properties (as described above) I still think this is an easy and elegant solution.
If you want to store structured data in a string I think you should use a standard notation such as JSON.
It's bad practice because of the amount of effort you have to go to, to construct the strings and parse them later. There are other more robust ways of serialising data for passing between systems.
For core business data, suitably designed classes will be far simpler to maintain, and with all the properties strongly typed, you'll know early on when you mis-type a property name.
As for key-value pairs, I'd say they're sometimes Ok, sometimes not. If there are a lot of possible values, but not a lot of actually owned values, then it can be perfectly all right to use KVPs. Sebastian Dietz's alternative of having a separate column for each field would result in a lot of empty fields in that case. It would also mean extra work altering the table every time you needed a new one.
None of the answers has mentioned normalization yet, so I thought I would. When database fields are involved, one of the key principles of normalization is that each field in a table only represents one thing. Delimited fields violate that principle.
One of the guys at Red Gate Software posted this article along those lines that you may find useful.
Well it just means that it is less searchable or indexable as a hashtable would be. It would also require a bit of processing to get into a state where it could be easily used by other bits of code. For example a bit of code that queries the id in that data would be something horrible such as:
if(dataStringThing.Substring(26, 4) == SomeIdInStringFormat)
So yes in most cases it is bad practice. However in other cases where this might be a default format that you need to retain the data in or performance means that you only should parse it as and when required. So it may not be a bad thing.
I would suggest myself if you have reasons to keep it in that format that it might be best to transform it into a class that separates the fields but also create a ToString() implementation on that class that restores it to the original format if you also need this. If the performance of it is a concern then modify this object to only parse the source into the fields in the class the first time those fields are accessed.
To re-iterate nothing in isolation is necessarily a bad practise. Bad practises are context dependant.
It (hopefully) obviously shouldn't be a normal choice. But there are cases where it's useful or necessary.
I can't think of any cases that wouldn't involve it being part of a communications protocol with some external service (e.g. a database connection string), so you're probably stuck with the format.
If you have a choice in the format (perhaps you are writing both sides of a system which can only communicate using strings), then at least choose something structured and well known. Examples of such have been given elsewhere, but the prime ones are naturally going to be XML or JSON. CSV, or some other delimited format may be useful in very simple cases (such as the database connection string) - but pay special attention to escaping delimiter characters (as the "Bobby Tables" joke (already referenced in another comment) nicely illustrated - google for him if you are not familiar with that one).
Your mention of a database suggests that this may be where the focus is. Are you trying to serialise application objects? (there are other ways of doing that). As another poster said, this may be a sign of a design that needs rethinking. But if you do need to store unknown datatypes in a DB, then XML may be an appropriate choice - especially if your DB supports XML fields. It's a bit of a minefield, though, so make sure you are familiar with how they work first.
I think it is not that bad when you are using a StringList for manging your string.
Especially when the structure of a e.g configuration-string (or configuration-database field) must be flexibel.
But in normally you should not do this, because of this disadvantages.
It all depends on what you're trying to accomplish.
If you need a heirarcical format of data or lots of fields that preserve data type, then no... a parsed string is a bad idea.
However, if you just need to transmit a string across a service and byte-conservation is important, then a Tag-Data pair may be exactly what you need.
If you do use a parsed string, it's important to be able to get at the data inside and quickly manage it. If you want an example TDP class, I posted one today to my website:
http://www.jerryandcheryl.net/jspot/2009/01/tag-data-pairs.html
I hope that helps.
I suggest considering these usage factors.
If you are processing the data within your own code, then you can use whatever data structures you wish. However, you may have issues developing your own implementation of a complex data structure, so consider using a pre-built one instead. Many come with whatever programming platform you may be using, while many more are documented in various books, articles, and discussions both printed and online. If you properly isolate your work from others, then you can safely do whatever you want.
On the other hand, if you need to share that data with others, then most careful consideration should be given. If you must share the data with an API, or via a storage mechanism (database, file, etc.), or via some transport (sockets, HTTP, etc.), then you should be thinking of others first and foremost. If you wish success and respect from your efforts, then you need to pay attention to standards and conventions and cost. Thankfully, practically any such use that you can imagine has been done before, so you can leverage others' efforts.
In a database, consider how others (and yourself) will be inserting, updating, deleting, and selecting the data. For example, using XML in a database makes all these steps unnecessarily hard and expensive compared to the alternatives. Pay attention to database normalization--learn it if you are not familiar already.
If you are dealing with text, pay attention to character encodings and make them explicit.
If there is an existing standard or convention for what you are doing, honor it. If there is a compelling reason to deviate, then accept the burden of justifying it, explaining it, and making it easy for others to accommodate your choices.
If you control both sides of a communication/transport medium, feel free to optimize. If you don't, err on the side of interoperability. Remember that a primary difference between the two scenarios is the level of self-description embedded with the data: interoperability has lots, optimization drops it based on shared assumptions. Text-rich data is more understandable, but binary is faster.
Think about your audience.

Categories