I have to monitor some data during the day (basically a log stored in a Oracle table), and it has to be as close to real-time as possible. So what I need is:
A good way to get incremental data from the table (replication packets?); and
A good way to display it on both a web page and a C# GUI monitor.
Explaining the existence of two displays: they're aimed for different users, role-based - embedded question, any advice on middle layer filtering method?
Rgs,
Arthur
Wouldn't you simply have an auto-refreshing (meta tag) asp.net page, which does a select straight from your log table. Refreshing say every 5 seconds wouldn't put undue load on your oracle server i'd think, especially with only 2 users (i assume the GUI version is conceptually similar).
Marking as answered, only today finished something that worked well.
All the data I need comes from a legacy query, ugly and slow. It uses a dozen of tables, and I need to keep it updated just because of two of them (the others change like three or four times a day only).
The solution was to refactor (and exhaustively test) the query so I could divide the tables in three groups:
Tables that don't change at all;
Tables that rarely change over the day;
Tables that change frequently.
This way I was able to query the high-frequency ones every second, and keep the others updated in a reasonable manner. Data merging was not an issue when done programatically.
Thanks for the help,
Related
I'm trying to build a product catalog application in ASP.NET and C# that will allow a user to select product attributes from a series of drop-down menus, with a list of relevant products appearing in a gridview.
On page load, the options for each of the drop-downs are queried from the database, as well as the entire product catalog for the gridview. Currently this catalog stands at over 6000 items, but we're looking at perhaps five or six times that when the application goes live.
The query that pulls this catalog runs in less than a second when executed in SQL Server Management Studio, but takes upwards of ten seconds to render on the web page. We've refined the query as much as we know how: pulling only the columns that will show in our gridview (as opposed to saying select * from ...) and adding the with (nolock) command to the query to pull data without waiting for updates, but it's still too slow.
I've looked into SqlCacheDependency, but all the directions I can find assume I'm using a SqlDataSource object. I can't do this because every time the user makes a selection from the menu, a new query is constructed and sent to the database to refine the list of displayed products.
I'm out of my depth here, so I'm hoping someone can offer some insight. Please let me know if you need further information, and I'll update as I can.
EDIT: FYI, paging is not an option here. The people I'm building this for are standing firm on that point. The best I can do is wrap the gridview in a div with overflow: auto set in the CSS.
The tables I'm dealing with aren't going to update more than once every few months, if that; is there any way to cache this information client-side and work with it that way?
Most of your solution will come in a few forms (none of which have to do with a Gridview):
Good indexes. Create good indexes for the tables that pull this data; good indexes are defined as:
Indexes that store as little information as actually needed to display the product. The smaller the amount of data stored, the greater amount of data can be stored per 8K page in SQL Server.
Covering indexes: Your SQL Query should match exactly what you need (not SELECT *) and your index should be built to cover that query (hence why it's called a 'covering index')
Good table structure: this goes along with the index. The fewer joins needed to pull the information, the faster you can pull it.
Paging. You shouldn't ever pull all 6000+ objects at once -- what user can view 6000 objects at once? Even if a theoretical superhuman could process that much data; that's never going to be your median usecase. Pull 50 or so at a time (if you really even need that many) or structure your site such that you're always pulling what's relevant to the user, instead of everything (keep in mind this is not a trivial problem to solve)
The beautiful part of paging is that your clients don't even need to know you've implemented paging. One such technique is called "Infinite Scrolling". With it, you can go ahead and fetch the next N rows while the customer is scrolling to them.
If, as you're saying paging really is not an option (although I really doubt it ; please explain why you think it is, and I'm pretty sure someone will find a solution), there's really no way to speed up this kind of operation.
As you noticed, it's not the query that's taking long, it's the data transfer. Copying the data from one memory space (sql) to another (your application) is not that fast, and displaying this data is orders of magnitude slower.
Edit: why are your clients "firm on that point" ? Why do they think it's not possible otherwise ? Why do they think it's the best solution ?
There are many options to show a big largeset of data on a grid but third parties software.
Try to use jquery/javascript grids with ajax calls. It will help you to render on client a large amount of rows. Even you can use the cache to not query many times the database.
Those are a good grids that will help your to show thousands of rows on a web browser:
http://www.trirand.com/blog/
https://github.com/mleibman/SlickGrid
http://demos.telerik.com/aspnet-ajax/grid/examples/overview/defaultcs.aspx
http://w2ui.com/web/blog/7/JavaScript-Grid-with-One-Million-Records
I Hope it helps.
You can load all the rows into a Datatable on the client using a Background thread when the application (Web page) starts. Then only use the Datatable to populate your Grids etc....So you do not have to hit SQL again until you need to read / write different data. (All the other answers cover the other options)
I have around 1000 rows of data.On the ASPX page, whenever the user clicks the sort button, it will basically sort the result according to a specific column.
I propose to sort the result in the SQL query which is much more easier with just an Order by clause.
However, my manager insisted me to store the result in an array, then sort the data within an array because he thinks that it will affect the performance to call the database everytime the user clicks the sort button.
Just out of curiosity - Does it really matter?
Also, if we disregard the number of rows, performance wise, which of these methods is actually more efficient?
Well, there are three options:
Sort in the SQL
Sort server-side, in your ASP code
Sort client-side, in your Javascript
There's little reason to go with (2), I'd say. It's meat and drink to a database to sort as it returns data: that's what a database is designed to do.
But there's a strong case for (3) if you want to have a button that the user can click. This means it's all done client-side, so you have no need to send anything to the web server. If you have only a few rows (and 1000 is really very few these days), it'll feel much faster, because you won't have to wait for sending the request and getting a response.
Realistically, if you've got so many things that Javascript is too slow as a sorting mechanism, you've got too many things to display them all anyway.
In short, if this is a one-off thing for displaying the initial page, and you don't want the user to have to interact with the page and sort on different columns etc., then go with (1). But if the user is going to want to sort things after the page has loaded, then (3) is your friend.
Short Answer
Ah... screw it: there's no short answer to a question like this.
Longer Answer
The best solution depends on a lot of factors. The question is somewhat vague, but for the sake of simplicity let's assume that the 1000 rows are stored in the database and are being retrieved by the client.
Now, a few things to get out of the way:
Performance can mean a variety of things in a variety of situations.
Sorting is (relatively) expensive, no matter where you do it.
Sorting is least expensive when done in the database, as the database already has the all the necessary data and is optimized for these operations.
Posting a question on SO to "prove your manager wrong" is a bad idea. (The question could easily have been asked without mentioning the manager.)
Your manager believes that you should upload all the data to the client and do all the processing there. This idea has some merit. With a reasonably sized dataset processing on the client will almost always be faster than making a round trip to the server. Here's the caveat: you have to get all of that data to the client first, and that can be a very expensive operation. 1000 rows is already a big payload to send to a client. If your data set grows much larger then you would be crazy to send all of it at once, particularly if the user really only needs a few rows. In that case you'll have to do some form of paging on the server side, sending chunks of data as the user requests it, usually 10 or 20 rows at a time. Once you start paging at the server your sorting decision is made for you: you have no choice but to do your sorting there. How else would you know which rows to send?
For most "line-of-business" apps your query processing belongs in the database. My generalized recommendation: by all means do your sorting and paging in the database, then return the requested data to the client as a JSON object. Please don't regenerate the entire web page just to update the data in the grid. (I've made this mistake and it's embarrassing.) There are several JavaScript libraries dedicated solely to rendering grids from AJAX data. If this method is executed properly your page will be incredibly responsive and your database will do what it does best.
We had a problem similar to this at my last employer. we had to return large sets of data efficiently, quickly and consistently into a datagridview object.
The solution that they came up was to have a set of filters the user could use to narrow down the query return and to set the maximum number of rows returned at 500. Sorting was then done by the program on an array of those objects.
The reasons behind this were:
Most people will not not process that many rows, they are usually looking for a specific item (Hence the filters)
Sorting on the client side did save the server a bunch of time, especially when there was the potential for thousands of people to be querying the data at the same time.
Performance of the GUI object itself started to become an issue at some point (reason for limiting the returns)
I hope that helps you a bit.
From both a data-modeling perspective and from an application architecture pattern, its "best practice" to put sorting/filtering into the "controller" portion of the MVC pattern. That is directly opposed to the above answer several have already voted for.
The answer to the question is really: "It depends"
If the application stays only one table, no joins, and a low number of rows, then sorting in JavaScript on the client is likely going to win performance tests.
However, since it's already APSX, you may be preparing for your data/model to expand.--Once there are more tables and joins, and if the UI includes a data grid where the choice of which column to sort will change on a per-client basis, then maybe the middle-tier should be handling this sorting for your application.
I suggest reviewing Tom Dykstra's classic Contosa University ASP.NET example which has been updated with Entity Framework and MVC 5. It includes a section on Sorting, Filtering and Paging. This example shows the value of proper MVC architecture and the ease of implementing sorting/filtering on multiple columns.
Remember, applications change (read: "grow") over time so plan for it using an architecture pattern such as MVC.
I'm looking for a design solution for a pattern that I am going to have to repeat quite a lot throughout a website I am designing. It is going to be ASP.NET MVC front-end, with C# WCF web services connecting using NHibernate to SQL database.
It's a social networking site so imagine facebook here to get a conceptual idea. What I'm looking for is an efficient and performant way to return paginated results of large datasets, for example a user may have 150 emails. I want to return them 10 at a time depending on what page theyre on, obviously only returning the 10 that relate to the page rather than having to load all 150 items into memory and only displaying 10 at a time as I think the user experience would be better to have a slightly longer delay in changing pages compared to a faster initial load. After all when do you look at emails 6 months old? The usual case is you only care about the first page of results anyway. Similarly a user may have had a number of interactions since their last login (eg your notifications feed on facebook) but again I only want to load n number of results at a time, but in this instance rather than having pages, you would click the "Display more" button which would then fetch the next N results, display them with another "display more" link and so forth you can keep clicking until you reach the end of the dataset. I can imagine they would both use the same design though as they are technically both paginated results, just with different UI output and flow.
Can anyone offer some advice on a good design to use for this, bearing in mind my data retrieval is using NHibernate Queryable or Enumerables? Would I want to be loading all data from DB in one hit then using an interator pattern to only return N rows from the service layer, keeping the rest of the list held in memory on the server open in the users session context so if I made another call to retrieve the next N rows, it would be held in place and keep returning N rows until the iterator finished, or would it be best to simply retrieve N rows from the database and return those, holding nothing in session context? I can see how to return top 10 results from Queryable as
var results = (from email in emails where email.UserId = userId).Take(10);
But I'm not sure how efficient this is, is this the fastest way of doing it? And furthermore I don't see how to start at a certain position, this will always only return the first 10, not say the second 10, or third 10 etc.
So I'm a bit unsure how the best way to proceed is and was hoping for some pointers and advice from people who have done something similar. Bearing in mind with my website performance is going to be of the essence so the user experience needs to be pretty sharp and interactive with refreshing new results. Basically if you were trying to simulate a facebook news feed/wall - how would you implement it with the above architecture?
Thanks!
You can use Skip in combination with Take:
var results = (from email in emails where email.UserId = userId)
.Skip((currentPage - 1) * 10)
.Take(10);
About the web service: You really should make it a stateless web service. You could use the ASP.NET Web API for this. This enables you to build a RESTful web service.
Do I want to be loading all the in one hit...
Definitely not, you only want to pull down the records you need, not the ones you may need.
...using an interator pattern to only return N rows from the service layer, keeping the rest of the list held in memory on the server open in the users session context...
Scalability goes right out the window with that idea.
...or would it be best to simply retrieve N rows from the database and return those, holding nothing in session context?
Now your starting to get on the right track...
In general, you want to let the database do as much as the querying as possible i.e. you don't want to hit the database to then have to further query the results (however, that's not always avoidable). In other words, you want to delegate most, if not all, the heavy lifting to the database.
You mentioned you are using NHibernate which is a pretty powerful ORM. The good news is that do a lot of the work for you in terms of query optimization/caching data etc. Like most ORM's nowadays, NHibernate uses deferred execution with it's queries so just watch out for things like hitting the database too early & choosing when to eager load data instead of performing multiple queries. There is a lot to learn with NHibernate, if you haven't already, it's worth taking the time to read up about it before diving in it will save you a lot of hassle in the long run.
Bearing in mind with my website performance is going to be of the essence so the user experience needs to be pretty sharp and interactive with refreshing new results
In terms of the performance (I assume you mean page load speeds) you would just want to ajaxify your site i.e. load what needs to be loaded with the page, pull the rest in the background & update the page dynamically. To achieve the "refreshing new results" part you need to look at polling the server and pulling down new data. I am pretty sure Facebook use a technique called long polling which essentially keeps an active request open with the server for a set amount of time so the data appears to happen "instantly". Polling is a different ball game all together though, it's about striking the balance of server load vs how "fresh" the data needs to be - that's something you would need to decide yourself and the answer to that is usually dependant on the type of data vs the hardware capabilities of the server.
There are some links about it (like this) out there but I liked this guy approach. I don't know if I'd use his PagedQueryable, but his IPageable, IPagedEnumerable and PagedEnumerable are really interesting. Besides, his project introduction page may give you some ideas on how to roll your own pagination.
We have a table in our database which has around 2,500,000 rows (around 3GB). Is it technically possible to view the data in this table in a silverlight application which queries this data using WCF? Potentially, I see issues with the maximum buffer size and timeout errors. We may need the entire data to be used for visualization purposes.
Please guide me if there is a practical solution to this problem.
Moving 3GB to a client is not going to work.
for visualization purposes.
Better prepare the visualization server-side. That will be slow enough.
Generally in this sort of situation if you need to view individual records then you would use a paging strategy. So your call to WCF would be for a page worth of records and you would display those records and the user would click on a next / previous button or some such.
As for the visualisation you should look to perform some transformation / reduction on the server as 2.5 million records is akin to displaying one data point per pixel on your screen.
First of all, have a look here.
Transfering 3GB of data from Disk to Disk can take quite a few minutes let alone on crossing across the network. I think you have got bigger fishes to fry - WCF limitation is irrelevant here.
So let's assume after a few minutes/hours you got the data across teh wire, where do you store it? You Silverlight app if running inside the browser can not grow to 3GB (even on a 64bit machine) and even it could, it does not make any sense. Especialy that amount of data when transformed into objects will take a lot more space.
Here is what I would do:
Get the server to provide snapshots/views of the data that is useful, e.g. providing summary, OLAP cubes, ...
For each record, provide minimum data required.
If you need detail on each record, do that in a separate call
Well, I believe and suggest that you're not going to show 2,5 milion rows in the same listing.
If you develop a good paging of data and the way you query the data is optimal, I don't find the problem with WCF.
I'm agree with querying data with a WCF interface is less efficient than a standalone, direct access to infraestructure solution, but if you need to host some business and data and N clients to access that in a SOA solution, or it's a client-server solution, you'll need to be sure that your queries are efficient.
Suggestions:
Use an OR/M. NHibernate will be your best choice, since it has a lot of ways of tweaking performance and paging is made easy because of it's LINQ support through QueryOver API in NHibernate 3.0. This product has a very interesting caching scheme and it'll let your application efficiently visualize your 2,5 milion-rows database.
Do caching. NHibernate may help you in this area, but think about that and, depending on the client technology (Web, Windows...), you'll find good options for caching presentation views (ASP.NET output caching, for example).
Think about how you're going to serialize objects in WCF: SOAP or JSON? Maybe you would be interested in JSON because serialized objects are tiny enough in order to save network trafic.
If you have questions, just comment out!
Ok, after many users talk about what you do there technically - what is the sense someone without thinking thought you have there?
2.5 million rows make no sensein a grid. Zero. Showing 80 rows per page (wide sdcreen, tilted 90 degree) that would be 31250 pages worth of data. You can not even scripp to a specific page. Ignoring load times -even IF (!) you load that etc., it just makes no sense to have this amount ina grid. Filter it down, then load what you need page wise. But the key here is to force the user to filter BEFORE even thinking about a grid. And once you ahve them, lets not get into takling abuot the performance of the grid.
To show you how bad this is. For get the grid. If you assign ONE PIXEL or every data item, you take 1.33 screens of 1024*768 pixels to show the data. THis is one pixel per item.
So, at the end of the day, even IF (which is impossible) to manage to get this working, you end up with a non sensical / non usable applciation.
I am working on a Sometimes Connected CRUD application that will be primarily used by teams(2-4) of Social Workers and Nurses to track patient information in the form of a plan. The application is a revisualization of a ASP.Net app that was created before my time. There are approx 200 tables across 4 databases. The Web App version relied heavily on SP's but since this version is a winform app that will be pointing to a local db I see no reason to continue with SP's. Also of note, I had planned to use Merge Replication to handle the Sync'ing portion and there seems to be some issues with those two together.
I am trying to understand what approach to use for the DAL. I originally had planned to use LINQ to SQL but I have read tidbits that state it doesn't work in a Sometimes Connected setting. I have therefore been trying to read and experiment with numerous solutions; SubSonic, NHibernate, Entity Framework. This is a relatively simple application and due to a "looming" verion 3 redesign this effort can be borderline "throwaway." The emphasis here is on getting a desktop version up and running ASAP.
What i am asking here is for anyone with any experience using any of these technology's(or one I didn't list) to lend me your hard earned wisdom. What is my best approach, in your opinion, for me to pursue. Any other insights on creating this kind of App? I am really struggling with the DAL portion of this program.
Thank you!
If the stored procedures do what you want them to, I would have to say I'm dubious that you will get benefits by throwing them away and reimplementing them. Moreover, it shouldn't matter if you use stored procedures or LINQ to SQL style data access when it comes time to replicate your data back to the master database, so worrying about which DAL you use seems to be a red herring.
The tricky part about sometimes connected applications is coming up with a good conflict resolution system. My suggestions:
Always use RowGuids as your primary keys to tables. Merge replication works best if you always have new records uniquely keyed.
Realize that merge replication can only do so much: it is great for bringing new data in disparate systems together. It can even figure out one sided updates. It can't magically determine that your new record and my new record are actually the same nor can it really deal with changes on both sides without human intervention or priority rules.
Because of this, you will need "matching" rules to resolve records that are claiming to be new, but actually aren't. Note that this is a fuzzy step: rarely can you rely on a unique key to actually be entered exactly the same on both sides and without error. This means giving weighted matches where many of your indicators are the same or similar.
The user interface for resolving conflicts and matching up "new" records with the original needs to be easy to operate. I use something that looks similar to the classic three way merge that many source control systems use: Record A, Record B, Merged Record. They can default the Merged Record to A or B by clicking a header button, and can select each field by clicking against them as well. Finally, Merged Records fields are open for edit, because sometimes you need to take parts of the address (say) from A and B.
None of this should affect your data access layer in the slightest: this is all either lower level (merge replication, provided by the database itself) or higher level (conflict resolution, provided by your business rules for resolution) than your DAL.
If you can install a db system locally, go for something you feel familiar with. The greatest problem I think will be the syncing and merging part. You must think of several possibilities: Changed something that someone else deleted on the server. Who does decide?
Never used the Sync framework myself, just read an article. But this may give you a solid foundation to built on. But each way you go with data access, the solution to the businesslogic will probably have a much wider impact...
There is a sample app called issueVision Microsoft put out back in 2004.
http://windowsclient.net/downloads/folders/starterkits/entry1268.aspx
Found link on old thread in joelonsoftware.com. http://discuss.joelonsoftware.com/default.asp?joel.3.25830.10
Other ideas...
What about mobile broadband? A couple 3G cellular cards will work tomorrow and your app will need no changes sans large pages/graphics.
Excel spreadsheet used in the field. DTS or SSIS to import data into application. While a "better" solution is created.
Good luck!
If by SP's you mean stored procedures... I'm not sure I understand your reasoning from trying to move away from them. Considering that they're fast, proven, and already written for you (ie. tested).
Surely, if you're making an app that will mimic the original, there are definite merits to keeping as much of the original (working) codebase as possible - the least of which is speed.
I'd try installing a local copy of the db, and then pushing all affected records since the last connected period to the master db when it does get connected.