Dear All
I have project which manage so much data. sometimes I must show data almost 1 million row. If I have 2 choice to solve it and I want to make it more faster when show data what technologies better I choose between Devart or NHibernate?
I`m using PostgreSQL as database and want to show data as fast as possible
rgrds
I can hardly imagine that you really want to show 1 million rows at once.
Even if you have one big table with a million rows, you will probably show them in a form or on a page which allows filtering and/or paging, so your users will only see a few rows at a time.
So I think what you really want is to select, let's say, 50 or 100 rows at once from your big table with a million rows.
For that, you can use ADO.NET or any ORM you want. They all do basically the same, it's just a matter of personal preference and there's no notable performance difference when used with this amount of data.
If you really want to load the whole million rows at once, well...you will get performance problems anyway, no matter what data access technology you use. Even with ADO.NET and a DataReader.
And even if performance would not be an issue...it still makes no sense to me.
What do your users do with a million rows of data, all shown at once? They can't see them all at the same time anyway.
If you are going to show a 1 million of rows then any ORM is not your choice.
Related
I'm trying to build a product catalog application in ASP.NET and C# that will allow a user to select product attributes from a series of drop-down menus, with a list of relevant products appearing in a gridview.
On page load, the options for each of the drop-downs are queried from the database, as well as the entire product catalog for the gridview. Currently this catalog stands at over 6000 items, but we're looking at perhaps five or six times that when the application goes live.
The query that pulls this catalog runs in less than a second when executed in SQL Server Management Studio, but takes upwards of ten seconds to render on the web page. We've refined the query as much as we know how: pulling only the columns that will show in our gridview (as opposed to saying select * from ...) and adding the with (nolock) command to the query to pull data without waiting for updates, but it's still too slow.
I've looked into SqlCacheDependency, but all the directions I can find assume I'm using a SqlDataSource object. I can't do this because every time the user makes a selection from the menu, a new query is constructed and sent to the database to refine the list of displayed products.
I'm out of my depth here, so I'm hoping someone can offer some insight. Please let me know if you need further information, and I'll update as I can.
EDIT: FYI, paging is not an option here. The people I'm building this for are standing firm on that point. The best I can do is wrap the gridview in a div with overflow: auto set in the CSS.
The tables I'm dealing with aren't going to update more than once every few months, if that; is there any way to cache this information client-side and work with it that way?
Most of your solution will come in a few forms (none of which have to do with a Gridview):
Good indexes. Create good indexes for the tables that pull this data; good indexes are defined as:
Indexes that store as little information as actually needed to display the product. The smaller the amount of data stored, the greater amount of data can be stored per 8K page in SQL Server.
Covering indexes: Your SQL Query should match exactly what you need (not SELECT *) and your index should be built to cover that query (hence why it's called a 'covering index')
Good table structure: this goes along with the index. The fewer joins needed to pull the information, the faster you can pull it.
Paging. You shouldn't ever pull all 6000+ objects at once -- what user can view 6000 objects at once? Even if a theoretical superhuman could process that much data; that's never going to be your median usecase. Pull 50 or so at a time (if you really even need that many) or structure your site such that you're always pulling what's relevant to the user, instead of everything (keep in mind this is not a trivial problem to solve)
The beautiful part of paging is that your clients don't even need to know you've implemented paging. One such technique is called "Infinite Scrolling". With it, you can go ahead and fetch the next N rows while the customer is scrolling to them.
If, as you're saying paging really is not an option (although I really doubt it ; please explain why you think it is, and I'm pretty sure someone will find a solution), there's really no way to speed up this kind of operation.
As you noticed, it's not the query that's taking long, it's the data transfer. Copying the data from one memory space (sql) to another (your application) is not that fast, and displaying this data is orders of magnitude slower.
Edit: why are your clients "firm on that point" ? Why do they think it's not possible otherwise ? Why do they think it's the best solution ?
There are many options to show a big largeset of data on a grid but third parties software.
Try to use jquery/javascript grids with ajax calls. It will help you to render on client a large amount of rows. Even you can use the cache to not query many times the database.
Those are a good grids that will help your to show thousands of rows on a web browser:
http://www.trirand.com/blog/
https://github.com/mleibman/SlickGrid
http://demos.telerik.com/aspnet-ajax/grid/examples/overview/defaultcs.aspx
http://w2ui.com/web/blog/7/JavaScript-Grid-with-One-Million-Records
I Hope it helps.
You can load all the rows into a Datatable on the client using a Background thread when the application (Web page) starts. Then only use the Datatable to populate your Grids etc....So you do not have to hit SQL again until you need to read / write different data. (All the other answers cover the other options)
Currently I am creating a C# application which has to read a lot of data (over 2,000,000 records) from an existing database and compare it with a lot of other data (also about 2,000,000 records) which do not exist in the database. These comparisons will mostly be String comparisons. The amount of data will grow much bigger and therefore I need to know which solution will result in the best performance.
I have already searched the internet and I came up with two solutions;
Solution 1
The application will execute a single query (SELECT column_name FROM table_name, for example) and store all the data in a DataTable. The application will then compare all the stored data with the input, and if there is a comparison it will be written to the database.
Pros:
The query will only be executed once. After that, I can use the stored data multiple times for all incoming records.
Cons:
As the database grows bigger, so will my RAM usage. Currently I have to work with 1GB (I know, tough life) and I'm afraid it won't fit if I'd practically download the whole content of the database in it.
Processing all the data will take lots and lots of time.
Solution 2
The application will execute a specific query for every record, for example
SELECT column_name FROM table_name WHERE value_name = value
and will then check is the DataTable will have records, something like
if(datatable.Rows.Count>0) { \\etc }
If it has records, I can conclude there are matching records and I can write to the database.
Pros:
Probably a lot less usage of RAM since I will only get specific data.
Processing goes a lot faster.
Cons:
I will have to execute a lot of queries. If you are interested in numbers, it will probably around 5 queries per record. Having 2,000,000 records, that would be 10,000,000 queries.
My question is, what would be the smartest option, given that I have limited RAM?
Any other suggestions are welcome aswell, ofcourse.
If you have SQL Server available to you, this seems a job directly suited to SQL Server Integration Services. You might consider using that tool instead of building your own. It depends on your exact business needs, but in general data merging like this would be a batch/unattended or tool based operation ?
You might be able to code it to run faster than SSIS, but I'd give it a try just to see if its acceptible to you, and save yourself the cost of the custom development.
I need to compare particular content of 2 SQL tables located in different servers: Table1 and Table2.
I want to compare each row from Table1 against the whole content of Table2.
Comparison logic is kind of complicated so I want to apply a logical operator that I will wrinte in C#. So I don't want to do the comparison on the SQL query itself.
My concern is the size of the data I will work on will be around 200 MB.
I was thinking to load the data into a DataTable by using ADO.Net and do the comparison on the memory.
What would you recommend? Is there already a pattern like approach to compare massive data?
200 MB should not be a problem. A .NET application can handle much more than that at once.
But even so, I would probably use a forward-only data reader for Table 1, just because there's no good reason not to, and that should reduce the amount of memory required. You can keep table 2 in memory with whatever structure you are accustomed to.
You can use two SqlDataReaders. They only have one row in memory at a time, are forward only, and extremely efficient. After getting the row back from the reader you can then compare the values. Here is an example.
See MSDN.
The most scalable solution is to create SQLCLR functions to execute the comparisons you want.
You should probably avoid a row-by-row comparison at all costs. The network latency and delays due to round-tripping will result in extremely slow execution.
A quick&dirty solution is to extract the data to local files then do the comparison as you will pay the network tax only once. Unfortunately, you lose the speedup provided by database indexes and query optimizations.
A similar solution is to load all the data once in memory and then use indexing structures like dictionaries to provide additional speedup. This is probably doable as your data can fit in memory. You still pay the network tax only once but gain from faster execution.
The most scalable solution is to create SQLCLR code to create one or more functions that will perform the comparisons you want. This way you avoid the network tax altogether, avoid creating and optimizing your own structures in memory and can take advantage of indexes and optimizations.
These solutions may not be applicable, depending on the actual logic of the comparisons you are doing. Both solutions rely on sorting the data correctly
1) Binary search. - You can find the matching row in table 2 without scanning through all of table 2 by using a binary search, this will significantly reduce the number of comparisons
2) If you are looking for overlaps/matches/missing rows between the two tables, you can sort both tables in the same order. Then you can loop through the two tables simultaniously, keeping a pointer to the current row of each table. If table 1 is "ahead" of table 2, then you only increment the table 2 pointer until they are either equal, or table 2 is ahead. Then once table 2 is ahead, you start incrementing table 1 until it is ahead. etc. In this way you only have to loop through each record from each table one time, and you are guaranteed that there were no matches that you missed.
If table 1 and 2 match, then that is a match. while table 1 is ahead, then every row in table 2 is "missing" from table 1, and visa versa.
This solution would also work if you only need to take some action if the rows are in a certain range of each other or something.
3) If you have to actually do some action for every row in table 2 for every row in table 1, then its just two nested loops, and there is not much that you can do to optimize that other than make the comparison/work as efficient as possible. You could possibly multi-thread it though depending on what the work was and where your bottle-neck is.
Can you stage the data to the same database using a quick ETL/SSIS job? This would allow you to do set operations which might be easier deal with. If not, I would agree with the recommendations for forward-only data reader with one table in memory
A couple years ago I wrote a db table comparison tool, which is now an open-source project called Data Comparisons.
You can check out the source code if you want. There is a massive optimization you can make when the two tables you're comparing are on the same physical server, because you can write a SQL query to take care of this. I called this the "Quick compare" method in Data Comparisons, and it's available whenever you're sharing the same connection string for both sides of the comparison.
When they're on two different servers, however, you have no choice but to pull the data into memory and compare the rows there. Using SqlDataReaders would work. However, it's complicated when you must know exactly what's different (what rows are missing from table A or table B, what rows are different, etc). For that reason my method was to use DataTables, which are slower but at least they provide you with the necessary functionality.
Building this tool was a learning process for me. There are probably opportunities for optimization with the in-memory comparison. For example, loading the data into a Dictionary and doing your comparisons off of primary keys with Linq would probably be faster. You could even try Parallel Linq and see if that helps. And as Jeffrey L Whitledge mentioned, you might as well use a SqlDataReader for one of the tables while the other is stored in memory.
I am developing an application with Fluent nHibernat/nHibernate 3/Sqlite. I have run into a very specific problem for which I need help with.
I have a product database and a batch database. Products are around 100k but batches run in around 11 million+ mark as of now. When provided with a product, I need to fill a Combobox with batches. As I do not want to load all the batches at once because of memory constraints, I am loading them, when the product is provided, directly from the database. But the problem is that sqlite (or maybe the combination of sqlite & nh) for this, is a little slow. It normally takes around 3+ seconds to retrieve the batches for a particular product. Although it might not seem like a slow scenario, I want to know that can I improve this time? I need sub second results to make order entry a smooth experience.
The details:
New products and batches are imported periodically (bi-monthly).
Nothing in the already persisted products or batchs ever changes (No Update).
Storing products is not an issue. Batches are the main culprit.
Product Ids are long
Batch Ids are string
Batches contain 3 fields, rate, mrp (both decimal) & expiry (DateTime).
The requirements:
The data has to be stored in a file based solution. I cannot use a client-server approach.
Storage time is not important. Search & retrieval time is.
I am open to storing the batch database using any other persistence model.
I am open to using anything like Lucene, or a nosql database (like redis), or a oodb, provided they are based on single storage file implementation.
Please suggest what I can use for fast object retrieval.
Thanks.
You need to profile or narrow down to find out where those 3+ seconds are.
Is it the database fetching?
Try running the same queries in Sqlite browser. Does the queries take 3+ seconds there too? Then you might need to do something with the database, like adding some good indexes.
Is it the filling of the combobox?
What if you only fill the first value in the combobox and throw away the others? Does that speed up the performance? Then you might try BeginUpdate and EndUpdate.
Are the 3+ seconds else where? If so, find out where.
This may seem like a silly question, but figured I'd double-check before proceeding to alternatives or other optimizations, but is there an index (or hopefully a primary key) on the Batch Id column in your Batch table. Without indexes those kinds of searches will be painfully slow.
For fast object retrieval, a key/value store is definitely a viable alternative. I'm not sure I would necessarily recommend redis in this situation since your Batches database may be a little too large to fit into memory, and although it also stores to a disk it's generally better when suited with a dataset that strictly fits into memory.
My personal favourite would be mongodb - but overall the best thing to do would be to take your batches data, load it into a couple of different nosql dbs and see what kind of read performance you're getting and pick the one that suits the data best. Mongo's quite fast and easy to work with - and you could probably ditch the nhibernate layer for such a simple data structure.
There is a daemon that needs to run locally, but depending on the size of the db it will be single file (or a few files if it has to allocate more space). Again, ensure there is an index on your batch id column to ensure quick lookups.
3 seconds to load ~100 records from the database? That is slow. You should examine the generated sql and create an index that will improve the query's performance.
In particular, the ProductId column in the Batches table should be indexed.
What is the best way to retrieve records from the database?
Currently we are grabbing all of them, caching them, and binding them to our GridView control. We incorporate paging using this control.
So what would be better? Retrieving all the records like we are currently doing, or just retrieving the records needed using an index and row count.
That kind of depends on how much data you are talking about. A few dozen to a few hundred and your current solution will likely suffice. Start getting into several hundred to thousands and you may want to look into paging with new stuff in SQL 2005 like the Row_Number and Rowcount features.
Here's a small run through on it:
http://www.asp.net/LEARN/data-access/tutorial-25-cs.aspx
There are several ways to do it but this should get you started at least on considering what you should do.
You could even consider just capping how many records are returned by using the Top syntax IF of course you are using SQL Server. We have done that before and informed users to refine their search if the max result count was reached.
You could throw together a quick test using the above SQL 2005 functionality to see how your performance does and decide from there.
Like klabranche said, it depends on the amount of rows you're talking about. For up to a couple of hundred, you approach is probably fine.
If you're talking about thousands, one option is using the ASP ObjectDataSource. It lets you specify separate methods for getting the row count and the actual rows for the current page:
http://msdn.microsoft.com/en-us/library/system.web.ui.webcontrols.objectdatasource.aspx