Returning Large Results Via a Webservice

Returning Large Results Via a Webservice - c#

I'm working on a web service at the moment and there is the potential that the returned results could be quite large ( > 5mb).
It's perfectly valid for this set of data to be this large and the web service can be called either sync or async, but I'm wondering what people's thoughts are on the following:
If the connection is lost, the
entire resultset will have to be
regenerated and sent again. Is there
any way I can do any sort of
"resume" if the connection is lost
or reset?
Is sending a result set this large even appropriate? Would it be better to implement some sort of "paging" where the resultset is generated and stored on the server and the client can then download chunks of the resultset in smaller amounts and re-assemble the set at their end?

I have seen all three approaches, paged, store and retrieve, and massive push.
I think the solution to your problem depends to some extent on why your result set is so large and how it is generated. Do your results grow over time, are they calculated all at once and then pushed, do you want to stream them back as soon as you have them?
Paging Approach
In my experience, using a paging approach is appropriate when the client needs quick access to reasonably sized chunks of the result set similar to pages in search results. Considerations here are overall chattiness of your protocol, caching of the entire result set between client page requests, and/or the processing time it takes to generate a page of results.
Store and retrieve
Store and retrieve is useful when the results are not random access and the result set grows in size as the query is processed. Issues to consider here are complexity for clients and if you can provide the user with partial results or if you need to calculate all results before returning anything to the client (think sorting of results from distributed search engines).
Massive Push
The massive push approach is almost certainly flawed. Even if the client needs all of the information and it needs to be pushed in a monolithic result set, I would recommend taking the approach of WS-ReliableMessaging (either directly or through your own simplified version) and chunking your results. By doing this you
ensure that the pieces reach the client
can discard the chunk as soon as you get a receipt from the client
can reduce the possible issues with memory consumption from having to retain 5MB of XML, DOM, or whatever in memory (assuming that you aren't processing the results in a streaming manner) on the server and client sides.
Like others have said though, don't do anything until you know your result set size, how it is generated, and overall performance to be actual issues.

There's no hard law against 5 Mb as a result set size. Over 400 Mb can be hard to send.
You'll automatically get async handlers (since you're using .net)
implement some sort of "paging" where
the resultset is generated and stored
on the server and the client can then
download chunks of the resultset in
smaller amounts and re-assemble the
set at their end
That's already happening for you -- it's called tcp/ip ;-) Re-implementing that could be overkill.
Similarly --
entire resultset will have to be
regenerated and sent again
If it's MS-SQL, for example that is generating most of the resultset -- then re-generating it will take advantage of some implicit cacheing in SQL Server and the subsequent generations will be quicker.
To some extent you can get away with not worrying about these problems, until they surface as 'real' problems -- because the platform(s) you're using take care of a lot of the performance bottlenecks for you.

I somewhat disagree with secretGeek's comment:
That's already happening for you -- it's called tcp/ip ;-) Re-implementing that could be overkill.
There are times when you may want to do just this, but really only from a UI perspective. If you implement some way to either stream the data to the client (via something like a pushlets mechanism), or chunk it into pages as you suggest, you can then load some really small subset on the client and then slowly build up the UI with the full amount of data.
This makes for a slicker, speedier UI (from the user's perspective), but you have to evaluate if the extra effort will be worthwhile... because I don't think it will be an insignificant amount of work.

So it sounds like you'd be interested in a solution that adds 'starting record number' and 'final record number' parameter to your web method. (or 'page number' and 'results per page')
This shouldn't be too hard if the backing store is sql server (or even mysql) as they have built in support for row numbering.
Despite this you should be able to avoid doing any session management on the server, avoid any explicit caching of the result set, and just rely on the backing store's caching to keep your life simple.

Related

How to Minimize Data Transfer Out From Azure Query in C# .NET

I have a small table(23 rows, 2 int columns), just a basic user-activity monitor. The first column represents user id. The second column holds a value that should be unique to every user, but I must alert the users if two values are the same. I'm using an Azure Sql database to hold this table, and Linq to Sql in C# to run the query.
The problem: Microsoft will bill me based on data transferred out of their data-centers. I would like have all of my users to be aware of the current state of this table at all times, second by second, and keep data-transfer under 5 GB per month. I'm thinking along the lines of a Linq-To-Sql expression such as
UserActivity.Where(x => x.Val == myVal).Count() > 1;
But this would download the table to the client, which cannot happen. Should I be implementing a Linq solution? Or would SqlDataReader download less metadata from the server? Am I taking the right approach by using a database at all? Gimme thoughts!

If it is data transfer you are worried about you need to do your processing on the server and return only the results. A SQLDataReader solution can return a smaller, already processed set of data to minimise the traffic.

A couple thoughts here:
First, I strongly encourage you to profile the SQL generated by your LINQ-to-SQL queries. There are several tools available for this, here's one at random (I have no particular preference or affiliation):
LINQ Profiler from Devart
Your prior experience with LINQ query inefficiency notwithstanding, the LINQ sample you quote in your question isn't particularly complex so I would expect you could make it or similar work efficiently, given a good feedback mechanism like the tool above or similar.
Second, you don't explicitly mention whether your query client is running in Azure or outside, but I gather from your concern about data egress costs that its running outside Azure. So the data egress costs are going to be query results using the TDS protocol (low-level protocol for SQL Server), which is pretty efficient. Some quick back-of-the-napkin math shows that you should be fine to stay below your monthly 5 GB limit:
23 users
10 hours/day
30 days/month (less if only weekdays)
3600 requests/hour/user
32 bits of raw data per response
= about 95 MB of raw response data per month
Even if you assume 10x overhead of TDS for header metadata, etc. (and if my math is right :-) ) then you've still got plenty of room underneath 5 GB. The point isn't that you should stop thinking about it and assume it's fine... but don't assume it isn't fine, either. In fact, don't assume anything. Test, and measure, and make an informed choice. I suspect you'll find a way to stay well under 5 GB without much trouble, even with LINQ.
One other thought... perhaps you could consider running your query inside Azure, and weigh the cost of that vs. the cost of data egress under the "query running outside Azure" scenario? This could (for example) take the form of a small Azure Web Job that runs the query every second and notifies the 23 users if the count goes above 1.
Azure Web Jobs
In essence, you wouldn't notify them if the condition is false, only when it's true. As for the notification mechanism, there are various cloud-friendly options:
Azure mobile push notifications
SMS messaging
SignalR notifications
The key here is to determine whether its more cost-effective and in line with any bigger-picture technology or business goals to have each user issue the query continuously, or to use some separate process in Azure to notify users asynchronously if the "trigger condition" is met.
Best of luck!

Retrieve all images at once or individually from SQL Server

I have a number of images that are stored as VARBINARY(MAX) (using FileStream) in a database. I'm looking to retrieve about 10 images or so at a time.
The prescribed, most common way using ASP.net is to use an HTTP handler and hit the database for each individual image. Seems fine, but is a bit slow at times.
Is it best to download all images for a given page at the same time in one big data chunk? Or should I try to grab each individually? Best practice?

Probably best to do them individually on a domain that doesn't have cookies set, or make sure your handler will work with multiple simultaneous requests. That way you can stream multiple results from the DB at the same time, and stream multiple images from your webserver as it gets them.

Well,
I think many people would have different opinions, and reasons about what the best practice is for them, but in reality, it all depends on hardware, software, data structure, and if the data is normalized.
In general, the SQL server likes SET operations better, meaning, the loops in general are slower.But, loops are safer for IOPs related issues, and they are better at causing less locks.
I am not sure which object mapper, or built in SQL library you are using( I have a feeling you may be using LINQ after you built a SQL class), but it also depends on the library you are using, and I would definitely recommend dapper.
I think reading them all at once would be faster, and here is why;
- If it is as you say, and you hit the database each time for the image, then that would add the delay of reconnecting to the database, so the latency will occur. But when there is one connection, the data retrieval is straight and your connection is open at that moment without requiring further session authentication.
I would recommend downloading them all at once, and informing the end user with a download screen during the process of that. Also for retrieving data, this link is very helpful I believe : https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx
Depending on the features of your server, and edition, you could definitely use different features.

Resources for learning how to handle heavy traffic asp.net mvc site?

how much traffic is heavy traffic? what are the best resources for learning about heavy traffic web site development?.. like what are the approaches?

There are a lot of principles that apply to any web site, irrelevant of the underlying stack:
use HTTP caching facilities. For one there is the user agent cache. Second, the entire web backbone is full of proxies that can cache your requests, so use this to full advantage. A request that does even land on your server will add 0 to your load, you can't optimize better than that :)
corollary to the point above, use CDNs (Content Delivery Network, like CloudFront) for your static content. CSS, JPG, JS, static HTML and many more pages can be served from a CDN, thus saving the web server from a HTTP request.
second corollary to the first point: add expiration caching hints to your dynamic content. Even a short cache lifetime like 10 seconds will save a lot of hits that will be instead served from all the proxies sitting between the client and the server.
Minimize the number of HTTP requests. Seems basic, but is probably the best overlooked optimization available. In fact, Yahoo best practices puts this as the topmost optimization, see Best Practices for Speeding Up Your Web Site. Here is their bets practices list:
Minimize HTTP Requests
Use a Content Delivery Network
Add an Expires or a Cache-Control Header
Gzip Components
... (the list is quite long actually, just read the link above)
Now after you eliminated as much as possible from the superfluous hits, you still left with optimizing whatever requests actually hit your server. Once your ASP code starts to run, everything will pale in comparison with the database requests:
reduce number of DB calls per page. The best optimization possible is, obviously, not to make the request to the DB at all to start with. Some say 4 reads and 1 write per page are the most a high load server should handle, other say one DB call per page, still other say 10 calls per page is OK. The point is that fewer is always better than more, and writes are significantly more costly than reads. Review your UI design, perhaps that hit count in the corner of the page that nobody sees doesn't need to be that accurate...
Make sure every single DB request you send to the SQL server is optimized. Look at each and every query plan, make sure you have proper covering indexes in place, make sure you don't do any table scan, review your clustered index design strategy, review all your IO load, storage design, etc etc. Really, there is no short cut you can take her, you have to analyze and optimize the heck out of your database, it will be your chocking point.
eliminate contention. Don't have readers wait for writers. For your stack, SNAPSHOT ISOLATION is a must.
cache results. And usually this is were the cookie crumbles. Designing a good cache is actually quite hard to pull off. I would recommend you watch the Facebook SOCC keynote: Building Facebook: Performance at Massive Scale. Somewhere at slide 47 they show how a typical internal Facebook API looks like:
.
cache_get (
$ids,
'cache_function',
$cache_params,
'db_function',
$db_params);
Everything is requested from a cache, and if not found, requested from their MySQL back end. You probably won't start with 60000 servers thought :)
On the SQL Server stack the best caching strategy is one based on Query Notifications. You can almost mix it with LINQ...

I will define heavy traffic as traffic which triggers resource intensive work. Meaning, if one web request triggers multiple sql calls, or they all calculate pi with a lot of decimals, then it is heavy.
If you are returning static html, then your bandwidth is more of an issue than what a good server today can handle (more or less).
The principles are the same no matter if you use MVC or not when it comes to optimize for speed.
Having a decoupled architecture
makes it easier to scale by adding
more servers etc
Use a repository
pattern for data retrieval (makes
adding a cache easier)
Cache data
which is expensive to query
Data to
be written could be written thru a
cache, so that the client don't have
to wait for the actual database
commit
There's probably more ground rules as well. Maybe you can you say something about the architecture of your application, and how much load you need to plan for?

MSDN has some resources on this. This particular article is out of date, but is a start.
I would suggest also not limiting yourself to reading about the MVC stack: many principles are cross-platform.

Approach for caching data from data logger

Greetings,
I've been working on a C#.NET app that interacts with a data logger. The user can query and obtain logs for a specified time period, and view plots of the data. Typically a new data log is created every minute and stores a measurement for a few parameters. To get meaningful information out of the logger, a reasonable number of logs need to be acquired - data for at least a few days. The hardware interface is a UART to USB module on the device, which restricts transfers to a maximum of about 30 logs/second. This becomes quite slow when reading in the data acquired over a number of days/weeks.
What I would like to do is improve the perceived performance for the user. I realize that with the hardware speed limitation the user will have to wait for the full download cycle at least the first time they acquire a larger set of data. My goal is to cache all data seen by the app, so that it can be obtained faster if ever requested again. The approach I have been considering is to use a light database, like SqlServerCe, that can store the data logs as they are received. I am then hoping to first search the cache prior to querying a device for logs. The cache would be updated with any logs obtained by the request that were not already cached.
Finally my question - would you consider this to be a good approach? Are there any better alternatives you can think of? I've tried to search SO and Google for reinforcement of the idea, but I mostly run into discussions of web request/content caching.
Thanks for any feedback!

Seems like a very reasonable approach. Personally I'd go with SQL CE for storage, make sure you index the column holding the datetime of the record, then use TableDirect on the index for getting and inserting data so it's blazing fast. Since your data is already chronological there's no need to get any slow SQL query processor involved, just seek to the date (or the end) and roll forward with a SqlCeResultSet. You'll end up being speed limited only by I/O. I profiled doing really, really similar stuff on a project and found TableDirect with SQLCE was just as fast as a flat binary file.

I think you're on the right track wanting to store it locally in some queryable form.
I'd strongly recommend SQLite. There's a .NET class here.

Reducing roundtrips to the server/database

When writing ASP.NET pages, what signs do you look for that your page is making too many roundtrips to a database or server?
(This is a general question but I say ASP.NET as the majority of my coding is on the web side of things).

How much is too much? The €1M question! Profile. Then profile. If your app is spending most of its time doing data access, you have a problem (and should look at a sql trace). If it is spending most of its time drawing the UI, then (assuming your view isn't doing data access) you should probably look elsewhere first...

Round trips are more relevant to latency than the total quantity of data being moved, so it really does make sense to optimize for them. The usual way is to use stored procedures that do multiple steps, perhaps even returning multiple result sets.

What I do is I look at the ASP performance counters and SQL performance counters. To get an accurate measurement you must ensure that there is no random noise activity on the SQL Server (ie. import batches running unrelated to the web site).
The relevant counters I look at are:
SQL Statistics/Batch requests/sec: This indicates exactly how many Transact-SQL batches the server receives. It can be, in most cases, equated 1:1 with the number of round trips from the web site to SQL.
Databases/Transaction/sec: this counter is instanced per database, so I can quickly see in which database there is 'activity'. This way I can correlate the web site data roundtrips (ie. my app logic requests, goes to app database) and the ASP session state and user stuff (goes to Asp session db or tempdb)
Databases/Write Transaction/sec: This I correlate with the counters above (transaction per second) so I can get a feel of the read-to-write ratio the site is doing.
ASP.NET Applications/Requests/sec: With this counter I can get the number of requests/sec the site is seeing. Correlated with the number of SQL Batch Requests/sec it gives a good indication of the average number of round-trips per request.
The next thing to measure is usually trying to get a feel for where is the time spent in the request. On my own project, I use abundantly performance counters I publish myself so is really easy to measure. But I'm not always so lucky as to clean up only my own mess... Profiling is usually not an option for me because I most times troubleshoot live production systems I cannot instrument.
My approach is to try to sort out the SQL side of things first, since it's easy to find the relevant statistics for execution times in SQL: SQL keeps a nice aggregated statistic ready to look at in sys.dm_exec_query_stats. I can also use Profiler to measure execution duration in real time. With some analysis of these numbers collected, knowing the normal request pattern of the most visited pages, you can give a pretty good estimate of the total time spent in SQL per web request. If this times adds up to nearly all the time it takes a request to serve the page, then you have your answer.
And to answer the original question title: to reduce the number of round-trips, you make fewer requests. Seriously. First, caching what is appropriate to cache I guess is obvious. Second you reduce the complexity: don't display unnecessary data on each page, you cache and display stale data when you can get away with it, you hide details on secondary navigation panels.
If you feel that the problem is the number of round-trips per se as opposed to the number of requests (ie. you would benefit tremendously from batching multiple requests in one round-trip), then you should somehow measure that the round-trip overhead is what's killing you. With connection pooling on a normal network connection this is usually not the most important factor.
And finally you should look if everything that can be done in sets is done in sets. If you have some half-brained ORM that retrieves objects one at a time from an ID keyset, get rid of it.

I know that this may sound reiterative, but client server round trips depends of how many program logic is located at any side of the connection.
First thing to check is validation: You have to validate and sanitize your input at server side always, but it does not means that you cannot do it too at client side too reducing a round trips that are been used only too check input.
At second: What can you do at client side to reduce server side overload? There are calculations that you can check or make at client side. There is also AJAX that can be used to load only a percentage of the page that is changing.
At third: Can you delegate work to another server? If your server is too loaded, why not to use web services or simply delegate some side of the logic to another server?
As Mark wrote: ¿How is too much? It is is up to you and your budget.

When writing ASP.NET pages, what signs
do you look for that your page is
making too many roundtrips to a
database or server?
Of course it all depends and you have to profile. However, here are some indicators, they do not mean there is a problem, but often will indicate
Page is taking a very long time to render locally.
Read this question: Slow response-time cheat sheet , In particular this link
To render the page you need more than 30 round trips. I pulled that number out of my hat, but assuming a round trip is taking about 3.5ms then 30 round trips will kick you over the 100ms guideline (before any other kind of processing).
All the queries involved in rendering the page are heavily optimized and do not take longer than a millisecond or two to execute. There are no operations that require lots of CPU cycles that execute every time you render the page.
Data access is abstracted away and not cached in any kind of way. If, for example, GetCustomer will call the DAL which in turn issues a query and your page is asking for 100 Customer objects which are not retrieved in a batch, you are probably in trouble.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.