Is it suggestable to use generics for large amount of data?

Is it suggestable to use generics for large amount of data? - c#

I'm having let's say thousands of Customer records and I have to show them on a webform. Also, I have one CustomerEntity which has 10 properties. So when I fetch data in using a DataReader and convert it into List<CustomerEntity> I am required to loop through the data two times.
So is the use of generics suggestable in such a scenario? If yes then what will be my applications performance?
For E.g.
In CustomerEntity class, i'm having CustomerId & CustomerName propeties. And i'm getting 100 records from Customer Table
Then for Preparing List i've wrote following code
while (dr.Read())
{
// creation of new object of customerEntity
// code for getting properties of CustomerEntity
for (var index = 0; index < MyProperties.Count; index++)
{
MyProperty.setValue(CustEntityObject,dr.GetValue(index));
}
//adding CustEntity object to List<CustomerEntity>
}
How can i avoid these two loops. Is their any other mechanism?

I'm not really sure how generics ties into data-volume; they are unrelated concepts... it also isn't clear to me why this requires you to read everything twice. But yes: generics are fine when used in volume (why wouldn't they be?). But of course, the best way to find a problem is profiling (either server performance or bandwidth - perhaps more the latter in this case).
Of course the better approach is: don't show thousands of records on a web form; what is the user going to do with that? Use paging, searching, filtering, ajax, etc - every trick imaginable - but don't send thousands of records to the client.
Re the updated question; the loop for setting properties isn't necessarily bad. This is an entirely appropriate inner loop. Before doing anything, profile to see if this is actually a problem. I suspect that sheer bandwidth (between server and client, or server and database) is the biggest issue. If you can prove that this loop is a problem there are things you can do do optimise:
switch to using PropertyDescriptor (rather than PropertyInfo), and use HyperDescriptor to make it a lot faster
write code with DynamicMethod to do the job - requires some understanding of IL, but very fast
write a .NET 3.5 / LINQ Expression to do the same and use .Compile() - like the second point, but (IMO) a bit easier
I can add examples for the first and third bullets; I don't really want to write an example for the second, simply because I wouldn't write that code myself that way any more (I'd use the 3rd option where available, else the 1st).

It is very difficult what to say the performance will be, but consider these things -
Generics provides type saftey
If you're going to display 10,000 records in the page, your application will probably be unusable. If records are being paged, consider returning only those records that are actually needed for the page you are on.
You shouldn't need to loop through the data twice. What are you doing with the data?

Related

Mass Update a property on multiple records inside a dictionary (VB.NET / C#)

I have a Dictionary (of Long, Class), where Class has multiple properties (assume we have a property called Updated as Boolean).
I want to update this (Updated) property to (True) at once for let's say all Odd key records (or based on any specific rule). What is the best way to do so?
My thoughts are to use Linq to fetch those records then (for each) them, but is there any better way to do so like doing a mass update where a condition happens (like what we do in the database)?
An example of my approach is below. Appreciate it if there is a better way to do such an update...
Thanks
Dim ReturnedObjs = From Obj In Dictionary Where Obj.Key Mod 2 = 1
For Each item As KeyValuePair(Of Long, Class) In ReturnedObjs
item.Value.Updated = True
Next

First, this sounds like a obvious case for the speed rant:
https://ericlippert.com/2012/12/17/performance-rant/
Second:
The best way is to keep this in the Database. You are not going to beat the speed of a DB Query with Indexes designed for quick matching, by transfering the data over the network twice (once to get it, once to return it) and doubling the search load (once to get all odd ones, once to update all the ones you just changed). My standing advice is to always keep as much work as possible on the DB side. Your client code will never be able to beat it.
Third:
If you do need to use client side processing:
Now a lot of my answer depend on details of the implementation, how the JiT and general Compiler optimsiations work, etc.
Foreach uses works on enumerators, not Collections. But if you feed a collection to foreaach, a Enumerator is implicitly created. Now enumerators do have two properties:
If the collection changes, the Enumerator becomes invalid. Most people learn about them because they ran into this issue.
It is a extra function call and set of checks for accessing a collection. So it will be a slowdown. How much is hard to say, as the Optimisations and JiT are pretty good.
So you propably want to use for loop instead.
If you could turn the Dictionary into a collection where the Primary Key is used as Index, it might be a bit faster. But hat has the danger of running into a lot of "dry spells" regarding data, so it depends a lot on your source data.

Is it safe to write dBCommand.AddParameter even though I'm not going to use it in a query?

Scenario: my query variable is dynamic, there are 4 possible values for that depending on the report type (_reportType). Meaning there are 4 different queries and some of it doesn't have #STAFF in the where condition, so my question is, is it safe to just leave my
dBCommand.AddParameter("#STAFF", staff)
there or should I include if else condition just to be safe?
Like this
if(_reportType == 1)
{
dBCommand.AddParameter("#STAFF", staff);
}
else if (_reportType == 2)
{
//code
}
else if (_reportType == 3)
{
//code
}
else
{
//Don't add dBCommand.AddParameter("#STAFF", staff);
}
Is it safe just to leave addParameter("#STAFF", staff) even though I'm not going to use it in a query?
Example I'm going to write
dBCommand.Initialize(string.Format(query, "RetailTable"), batch);
dBCommand.AddParameter("#STAFF", staff);
But the query value doesn't have #STAFF in the WHERE condition

It should generally be ok to specify unused parameters, aside from the minor overhead of sending the value to the server. The exception is if you execute DDL queries that have a restriction of being the only statement in the batch (e.g. CREATE VIEW). Those would fail due to the parameter.

There are 2 glaring bad practices in your approach:
1. Generating dynamic query within the code.
This approach has many drawbacks and possible security loopholes. You should almost always avoid doing that.
Please go through the following links to understand this more:
https://codingsight.com/dynamic-sql-vs-stored-procedure/
https://www.ecanarys.com/Blogs/ArticleID/112/SQL-injection-attack-and-prevention-using-stored-procedure
2. Trying to use generic Where Clause that fits all your variations.
This approach is disaster in waiting, regardless of the query being written in your application code OR in a Stored Procedure.
This is an ugly code-smell and a maintenance nightmare.
No developer can ever be 100% sure that there will not be any change required during the lifespan of the application due to a simple fact that the client WILL need enhancements on regular bases.
So, even if this approach may work for you for a small period of time, this will blow back.
Assume, over the period, there are few more filter parameters added due to new requirements. Now, imagine how your code would look like and the possibilities it creates of problems you may get if they are not handled properly. Specially when YOU are not making those changes. Scary, right?
Always write code that will not only be easier to read and understand, but also easy to enhance and maintain, regardless of the person writing the code.
So, IMHO, you should add those if-else conditions OR use switch-case blocks to safeguard yourself and your client. It may look overkill in the start, but will surely payoff in future.
Hope this help!

new objects added during long loop

We currently have a production application that runs as a windows service. Many times this application will end up in a loop that can take several hours to complete. We are using Entity Framework for .net 4.0 for our data access.
I'm looking for confirmation that if we load new data into the system, after this loop is initialized, it will not result in items being added to the loop itself. When the loop is initialized we are looking for data "as of" that moment. Although I'm relatively certain that this will work exactly like using ADO and doing a loop on the data (the loop only cycles through data that was present at the time of initialization), I am looking for confirmation for co-workers.
Thanks in advance for your help.
//update : here's some sample code in c# - question is the same, will the enumeration change if new items are added to the table that EF is querying?
IEnumerable<myobject> myobjects = (from o in db.theobjects where o.id==myID select o);
foreach (myobject obj in myobjects)
{
//perform action on obj here
}

It depends on your precise implementation.
Once a query has been executed against the database then the results of the query will not change (assuming you aren't using lazy loading). To ensure this you can dispose of the context after retrieving query results--this effectively "cuts the cord" between the retrieved data and that database.
Lazy loading can result in a mix of "initial" and "new" data; however once the data has been retrieved it will become a fixed snapshot and not susceptible to updates.
You mention this is a long running process; which implies that there may be a very large amount of data involved. If you aren't able to fully retrieve all data to be processed (due to memory limitations, or other bottlenecks) then you likely can't ensure that you are working against the original data. The results are not fixed until a query is executed, and any updates prior to query execution will appear in results.

I think your best bet is to change the logic of your application such that when the "loop" logic is determining whether it should do another interation or exit you take the opportunity to load the newly added items to the list. see pseudo code below:
var repo = new Repository();
while (repo.HasMoreItemsToProcess())
{
var entity = repo.GetNextItem();
}
Let me know if this makes sense.

The easiest way to assure that this happens - if the data itself isn't too big - is to convert the data you retrieve from the database to a List<>, e.g., something like this (pulled at random from my current project):
var sessionIds = room.Sessions.Select(s => s.SessionId).ToList();
And then iterate through the list, not through the IEnumerable<> that would otherwise be returned. Converting it to a list triggers the enumeration, and then throws all the results into memory.
If there's too much data to fit into memory, and you need to stick with an IEnumerable<>, then the answer to your question depends on various database and connection settings.

I'd take a snapshot of ID's to be processed -- quickly and as a transaction -- then work that list in the fashion you're doing today.
In addition to accomplishing the goal of not changing the sample mid-stream, this also gives you the ability to extend your solution to track status on each item as it's processed. For a long-running process, this can be very helpful for progress reporting restart / retry capabilities, etc.

Is this better/quicker to return a count than returning an object? stack overflow!

I'm creating a repository in EF4. For one of the methods a password and username is used to verify a user. The method returns a count of users so a 0 means they don't exist and a 1 they do. Would it make much of a difference if I just returned a user object and checked it for null?

Technically the most efficient way would probably be to use the Any() extension method. If you return an object there is the cost of filling that object. If you return a count, then there is the cost of going through every record (after the where clause has been applied) and counting them. Any() should use Exists in sql, and therefore, SQL server can stop as soon as it finds the first record.
Ultimately though, I agree with the others, this isn't a place you want to start optimizing right away. Donald Knuth probably has the best quote about this:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil".
For instance, let's say you have this method return a bool and you use the Any() method. Later in the request, you might need to pull the user object out of the database (this could be something you end up doing a lot). Now, by optimizing early, you've actually increased the number of calls to the database.
HTH

well the option with Any is going to be better because EF has a high cost of materialization and change tracking for an object and if that object happens to have lot of properties, you should definitely consider using Any.

The second version would be better - in terms of design. In terms of microefficiency it shouldn't matter

Hoakey if you want to know the truth. Both methods are going to be so negligible that it wont matter which one you choose. Choose whichever method makes your code easier to understand and read. Often times people get worried about performance in all the wrong places.
I agree with Armen, return the object and check for null. Very simple and is easy to understand what is going on.

If you don't need any data from the 'user' table after you verify that a valid user/password combo exists, then either method will work (and performance won't matter).
On the other hand, if once you verify valid username/password you plan on making a second call to get the user details, then clearly returning the object in the first place(and checking for null to verify existence) is a more efficient strategy in my opinion.

Cache lookup performance

We have a big winforms C# application, that's basically a frontend for some databases (CRUD stuff) and I'm trying to implement some in memory cache for business objects.
Something like:
List<Customer> customerCache; // Loaded during app. startup
I've already created some code to keep the cache up-to-date with the database. This code run on a separate thread all the time, and is working really well.
My problem is that depending on the size of the cache, it's faster to do a 'select * from customers where id = x' in the database than looping through the cache with a foreach (foreach Customer cmr in customerCache) to find that specific object...
Is there a way to search for specific objects in my cache really fast ? I was going to try some algorithm or changing the type of my collection, but I would appreciate listening to your suggestions.
Please note that we have several 'List xxxCache' and everything is fast (for small N, off course). But when the number of cached itens grow (> 3000 normally) its faster to read from the database.
What's the best way to loop through my cached items to find a specific one ? All business items inherit from a common ancestor and have an 'ID' property(integer, unique).
Sorry for my bad english, it's not my primary language.
Best regards,
Greetings from Brazil.

Use Dictionary<int, Customer> instead. It supports O(1) lookup based on a key. In this case, key would be Customer.Id.
You might also want to look into other pre-built database caching solutions for .Net.

Insteaf of using a List<T> object, why not use a :
KeyValuePair
Dictionary is the correct object to use (KeyValuePair is what a dictionary holds a collection of **facepalm**)

use as many dictionaries as the number of indexes you need.
dictionary<int,Customer> CustomerIds //(Ids)
dictionary<string,Customer> CustomerNames //(Names)
//or
dictionary<string,List<Customer>> //(if name is not unique)

We have a similar case for the web form application.
We use MS Enterprise Lib Cache block.
It is easy to implement and use.
The only thing you need to focus in Cache Key (string type)
cache.add(key, object)
cache.getdata(key)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.