How do I process large amount of data in asp.net - c#

I have a web project in asp.net/C#/Mysql where the data can be up to some 30,000 rows of data to process.
Its a reporting tool, and I have to show statistics like counts and sum at several levels.
I want to know which would be the better way to go around this.
I can filter my data to limited columns though which I can query.
Now, Is is a good way to get the data (all rows) to my application on load and whenever user queries I can filter that data and do my calculations in the code and show my statistics.
or I can have a stored procedure do all my calculations and every time user queries I can call the stored procedure and get my statistics.
Thanks

Databases are optimized to do this kind of data manipulation. And since you reduce network load as well i would vote for the second option.
Another possibility is to consider some kind of OLAP solution where transactional data is already consolidated into smaller chunks of data which in turn can be easily queried.

I would definetly go with the second option. You are talking about a web application, if You want to get all the data at load, then you must store it somewhere to preserve it during postbacks. If you chose to store it in a session state, you will end up consuming the Web server memory, since you have more than one user accessing your site.
If you store it in view state, then you will end up in a very big client response,which will make your page very slow to load on the client side, and will cause network traffic.
Option 2 is the best, because stored procedures are precompiled, which means they are much better in terms of performance. You will also reduce network traffic.

Related

How to use a local APPlication database

I have an application in which such a large amount of data is loaded at the beginning that the waiting time for the users is no longer justifiable.
At first only data is loaded to fill a listbox explorer, which serves as browser to load the remaining information when selecting the item. So much for the data model.
I now intend to maintain a local data source and only update the data that the user selects, but I have to deal with the question if I should keep the finished objects for the model or the raw data.
Has anyone played around with the different approaches and can say/link to what is best approach in terms of maintenance and performance? I work with .NET

Is it a good pratice to put big Lists<T> on ASP.NET MVC sessions?

Hell guys, I'm really confused about wich solution use for my project.
Well, I have a big List retrivied from my database(more than 1000 results with a large query clauses, searching in more than 3 tables with more than 3.000.000 items) and I don't want make this query twice without changes because more than 300 users can make this big query at the same time, so I decided to use session to stay with every user query results, but I really don't know if it's a good pratice.
My team mate told me that's better make the big query at every user post because it's not a good pratice put Big Lists inside Sessions because a lot of users using Sessions with large Lists will waste more from our server than make this query a lot of times.
So, Is a good pratice put big Lists on ASP.NET MVC sessions?
[EDIT]
every user can have different results, they're not the same for all users.
[EDIT 2]
I need to show all the results of the query at the same time, so I can't paginate it.
firstly- Bryan Crosby's remark is a very good one, plus- is the user going to need to view 1000 items at a time?
have you considered paging your data?
if, however, you decide that you must have that huge result set, then how about this-
if I understand you correctly, this query is identical for all 300 users.
if that's the case, the proper place for this result set is not Session but application's Cache.
this is because Session is per-user, and Cache is per-application (meaning- shared between all users).
so if you store your items in cache, once the first user has retrieved those items from storage, they'll be available to all subsequent users.
a few things to keep in mind, though:
1. since cache is common to all users, you must synchronize your access to it.
2. you need to set an expiry period (cache item has this option natively), so that those 1000s of items won't live in the memory of your application forever.
3. if those items can change, you need to invalidate the cache when they do, so the user doesn't view stale data.
good luck
Generally, no, it is not a good practice to store large sets of data in the session...
The problem with storing "large" sets of information in the session state is that every time you do a form post-back, that session state is re-transmitted to the client, slowing down their user experience. (It's stored in a hidden field on the form, and can baloon in size due to poor compression of data to web-safe encrypted text - generally, you should avoid putting large amounts of data in the session state if you can)
In cases where the user has to view "large" sets of information, it's possible to create session-like stores or chaches to keep the info in server memory, and then just expose a session key in the session state; tie the server cashed item for that session to the session key and your set to re-transmit it if needed.
Something like (pseudocode)
Dictinary<Guid, DataSet> serverCache = new Dictionary<Guid, DataSet>;
This.ApplicationState.Add(serverCache, "DataCache");
// Add users session key and local cached data here
serverCache.Add(This.GetUserSessionGuid(), This.LoadData());
Also +1 to the post about paging this data - now that you have it in the server cache - you can handle paging easily.
All that said, keeping this data in a cache for some fixed time, might eat up your server memory pretty quick (usually "cheaply" solved with more memory... but still)
A good DataBase and Front end pairing should be optimized to handle the traffic load for the data. As suggested - do some metrics to find out if it's even an issue. I would suggest designing the database queries to allow for paging of data, so each view on the form by each user is further limited....
10 queries, one page at a time, for 1000 users, returning 100 rows, at a time (100-thousand rows, at a time with 1 query per user per second) is much more tollerable to a performance DB than 1 query, all at once, returning all 10000 rows, for 1000 users (1-Million rows, at a time)
I wouldn't put anything that big into session if there's any way you can avoid it. If I did have to store it in session I wouldn't store the List object. Convert the List controls to an Array, and store the array if you must store it in session.
Session["mylist"] = list.ToArray();
Rality check: You have toy data.
1000 results are nothing, tables with 3 million items are nothing. They even wre nothing significant 10 years ago - today my mobilep hone handles that without a sweat.
Simple like that.
THAT SAID: it also goes the other way. 1000 items are a joke memory wise 8unless they are images) so they MAY be storage in session. Unless you run a ton of users, it may be worth to just store it in memory - there is a tradeof, but for example for most nitranet type applications this is doable.
My main problem with that is that session state is once possibly for multiple browser windows (tabls) and the number of times I ahve been pissed by a stupid prgorammer storing something in the session that killed me using 2-3 tabs on the site at the same time is higher than 0. I would be carefull with that. Like someone using two tabs for different searches to compare the list.

Pagination and Data buffering in Windows Application using C# 2005

Requirement
.NET Windows application using C# interacts with Oracle DB for retrieving and saving data
Issue
With huge volume of data, the performance is slow and the memory usage is high, the application displays the entire data in the screen. Response time is high due to database call and client side data processing
Proposed Solution
Using pagination (from Oracle DB) to display partial data in the screen, response time of the application will be faster; however, it will make DB calls for each page.
We are looking at a solution to get the 1st page data from DB and start the application, after which there will be a background job to get the rest of the data from DB to a local XML DB. So, in case of next page, the data will be loaded from XML instead of making a DB call.
Is this design possible?
Is synchronization possible between local XML DB and the Oracle DB?
Personally I am not sure you really want to go that far, as synchronoization, and overall disk IO could be very "interesting" at best.
Typically what I have found to be good in the past if you REALLY must have "pre-fetched" records for more of the result set is that you can cache say the next 2 and previous 2 pages in memory, that way the users transition is smooth, and after you navigate the page, a backend thread will go out and pre-fetch the next page so taht you have it.
Otherwise, if you do what you are talking about, you are only deferring the performance impacts and introducing data synchronization and other issues.

Reducing roundtrips to the server/database

When writing ASP.NET pages, what signs do you look for that your page is making too many roundtrips to a database or server?
(This is a general question but I say ASP.NET as the majority of my coding is on the web side of things).
How much is too much? The €1M question! Profile. Then profile. If your app is spending most of its time doing data access, you have a problem (and should look at a sql trace). If it is spending most of its time drawing the UI, then (assuming your view isn't doing data access) you should probably look elsewhere first...
Round trips are more relevant to latency than the total quantity of data being moved, so it really does make sense to optimize for them. The usual way is to use stored procedures that do multiple steps, perhaps even returning multiple result sets.
What I do is I look at the ASP performance counters and SQL performance counters. To get an accurate measurement you must ensure that there is no random noise activity on the SQL Server (ie. import batches running unrelated to the web site).
The relevant counters I look at are:
SQL Statistics/Batch requests/sec: This indicates exactly how many Transact-SQL batches the server receives. It can be, in most cases, equated 1:1 with the number of round trips from the web site to SQL.
Databases/Transaction/sec: this counter is instanced per database, so I can quickly see in which database there is 'activity'. This way I can correlate the web site data roundtrips (ie. my app logic requests, goes to app database) and the ASP session state and user stuff (goes to Asp session db or tempdb)
Databases/Write Transaction/sec: This I correlate with the counters above (transaction per second) so I can get a feel of the read-to-write ratio the site is doing.
ASP.NET Applications/Requests/sec: With this counter I can get the number of requests/sec the site is seeing. Correlated with the number of SQL Batch Requests/sec it gives a good indication of the average number of round-trips per request.
The next thing to measure is usually trying to get a feel for where is the time spent in the request. On my own project, I use abundantly performance counters I publish myself so is really easy to measure. But I'm not always so lucky as to clean up only my own mess... Profiling is usually not an option for me because I most times troubleshoot live production systems I cannot instrument.
My approach is to try to sort out the SQL side of things first, since it's easy to find the relevant statistics for execution times in SQL: SQL keeps a nice aggregated statistic ready to look at in sys.dm_exec_query_stats. I can also use Profiler to measure execution duration in real time. With some analysis of these numbers collected, knowing the normal request pattern of the most visited pages, you can give a pretty good estimate of the total time spent in SQL per web request. If this times adds up to nearly all the time it takes a request to serve the page, then you have your answer.
And to answer the original question title: to reduce the number of round-trips, you make fewer requests. Seriously. First, caching what is appropriate to cache I guess is obvious. Second you reduce the complexity: don't display unnecessary data on each page, you cache and display stale data when you can get away with it, you hide details on secondary navigation panels.
If you feel that the problem is the number of round-trips per se as opposed to the number of requests (ie. you would benefit tremendously from batching multiple requests in one round-trip), then you should somehow measure that the round-trip overhead is what's killing you. With connection pooling on a normal network connection this is usually not the most important factor.
And finally you should look if everything that can be done in sets is done in sets. If you have some half-brained ORM that retrieves objects one at a time from an ID keyset, get rid of it.
I know that this may sound reiterative, but client server round trips depends of how many program logic is located at any side of the connection.
First thing to check is validation: You have to validate and sanitize your input at server side always, but it does not means that you cannot do it too at client side too reducing a round trips that are been used only too check input.
At second: What can you do at client side to reduce server side overload? There are calculations that you can check or make at client side. There is also AJAX that can be used to load only a percentage of the page that is changing.
At third: Can you delegate work to another server? If your server is too loaded, why not to use web services or simply delegate some side of the logic to another server?
As Mark wrote: ¿How is too much? It is is up to you and your budget.
When writing ASP.NET pages, what signs
do you look for that your page is
making too many roundtrips to a
database or server?
Of course it all depends and you have to profile. However, here are some indicators, they do not mean there is a problem, but often will indicate
Page is taking a very long time to render locally.
Read this question: Slow response-time cheat sheet , In particular this link
To render the page you need more than 30 round trips. I pulled that number out of my hat, but assuming a round trip is taking about 3.5ms then 30 round trips will kick you over the 100ms guideline (before any other kind of processing).
All the queries involved in rendering the page are heavily optimized and do not take longer than a millisecond or two to execute. There are no operations that require lots of CPU cycles that execute every time you render the page.
Data access is abstracted away and not cached in any kind of way. If, for example, GetCustomer will call the DAL which in turn issues a query and your page is asking for 100 Customer objects which are not retrieved in a batch, you are probably in trouble.

Returning Large Results Via a Webservice

I'm working on a web service at the moment and there is the potential that the returned results could be quite large ( > 5mb).
It's perfectly valid for this set of data to be this large and the web service can be called either sync or async, but I'm wondering what people's thoughts are on the following:
If the connection is lost, the
entire resultset will have to be
regenerated and sent again. Is there
any way I can do any sort of
"resume" if the connection is lost
or reset?
Is sending a result set this large even appropriate? Would it be better to implement some sort of "paging" where the resultset is generated and stored on the server and the client can then download chunks of the resultset in smaller amounts and re-assemble the set at their end?
I have seen all three approaches, paged, store and retrieve, and massive push.
I think the solution to your problem depends to some extent on why your result set is so large and how it is generated. Do your results grow over time, are they calculated all at once and then pushed, do you want to stream them back as soon as you have them?
Paging Approach
In my experience, using a paging approach is appropriate when the client needs quick access to reasonably sized chunks of the result set similar to pages in search results. Considerations here are overall chattiness of your protocol, caching of the entire result set between client page requests, and/or the processing time it takes to generate a page of results.
Store and retrieve
Store and retrieve is useful when the results are not random access and the result set grows in size as the query is processed. Issues to consider here are complexity for clients and if you can provide the user with partial results or if you need to calculate all results before returning anything to the client (think sorting of results from distributed search engines).
Massive Push
The massive push approach is almost certainly flawed. Even if the client needs all of the information and it needs to be pushed in a monolithic result set, I would recommend taking the approach of WS-ReliableMessaging (either directly or through your own simplified version) and chunking your results. By doing this you
ensure that the pieces reach the client
can discard the chunk as soon as you get a receipt from the client
can reduce the possible issues with memory consumption from having to retain 5MB of XML, DOM, or whatever in memory (assuming that you aren't processing the results in a streaming manner) on the server and client sides.
Like others have said though, don't do anything until you know your result set size, how it is generated, and overall performance to be actual issues.
There's no hard law against 5 Mb as a result set size. Over 400 Mb can be hard to send.
You'll automatically get async handlers (since you're using .net)
implement some sort of "paging" where
the resultset is generated and stored
on the server and the client can then
download chunks of the resultset in
smaller amounts and re-assemble the
set at their end
That's already happening for you -- it's called tcp/ip ;-) Re-implementing that could be overkill.
Similarly --
entire resultset will have to be
regenerated and sent again
If it's MS-SQL, for example that is generating most of the resultset -- then re-generating it will take advantage of some implicit cacheing in SQL Server and the subsequent generations will be quicker.
To some extent you can get away with not worrying about these problems, until they surface as 'real' problems -- because the platform(s) you're using take care of a lot of the performance bottlenecks for you.
I somewhat disagree with secretGeek's comment:
That's already happening for you -- it's called tcp/ip ;-) Re-implementing that could be overkill.
There are times when you may want to do just this, but really only from a UI perspective. If you implement some way to either stream the data to the client (via something like a pushlets mechanism), or chunk it into pages as you suggest, you can then load some really small subset on the client and then slowly build up the UI with the full amount of data.
This makes for a slicker, speedier UI (from the user's perspective), but you have to evaluate if the extra effort will be worthwhile... because I don't think it will be an insignificant amount of work.
So it sounds like you'd be interested in a solution that adds 'starting record number' and 'final record number' parameter to your web method. (or 'page number' and 'results per page')
This shouldn't be too hard if the backing store is sql server (or even mysql) as they have built in support for row numbering.
Despite this you should be able to avoid doing any session management on the server, avoid any explicit caching of the result set, and just rely on the backing store's caching to keep your life simple.

Categories