Ok, the story so far is i have a datatable, about 10,000 lines or so. and about 150 columns per row. ao more or less 150.000 cells in this datatable. i have all updateing working fine
but the updating is slow.
I need to iterate through a list of porcedures then update cells in the table depending on the procedure. when i am completle finished updating about 75% - 80% of all the cells will have changed.
I am using a search on the table using a primary key index assigened to an INT value.
datatable.rows.find() seems a a little quicker
datatable.select ( expression ) almost the same but little difference.
Is there any ideas who may speed this up. uppon changing 80,000 - 120,000 cells it can take minutes.
anyideas would be great thanks.
A study in the March 2005 issue of ASP.Net Pro magazine compared various approaches involving DataTables, DataViews and DataReaders. Their findings were that the fastest approach depended upon the number of records involved.
For 50 records or less, by far the fastest search method was a For..Next loop on the DataTable's DataRowCollection. That approach was followed by DataRowCollection.Find. Many times slower were re-retrieving the data with a DataReader, using DataView.RowFilter, and worst of all using DataTable.Select.
For 500 - 5,000 records, the fastest search was with DataRowCollection.Find, followed closely by DataTable.Select. The worst by far for this range of records were DataView.RowFilter and DataView.FindRows.
For 50,000 records, the fastest search was accomplished with DataRowCollection.Find. In a close second place was re-retrieving the records with a DataReader. The worst by far for this category were DataView.RowFilter and DataView.FindRows.
Related
This may be a dumb question, but I wanted to be sure. I am creating a Winforms app, and using c# oledbconnection to connect to a MS Access database. Right now, i am using a "SELECT * FROM table_name" and looping through each row to see if it is the row with the criteria I want, then breaking out of the loop if it is. I wonder if the performance would be improved if I used something like "SELECT * FROM table_name WHERE id=something" so basically use a "WHERE" statement instead of looping through every row?
The best way to validate the performance of anything is to test. Otherwise, a lot of assumptions are made about what is the best versus the reality of performance.
With that said, 100% of the time using a WHERE clause will be better than retrieving the data and then filtering via a loop. This is for a few different reasons, but ultimately you are filtering the data on a column before retrieving all of the columns, versus retrieving all of the columns and then filtering out the data. Relational data should be dealt with according to set logic, which is how a WHERE clause works, according to the data set. The loop is not set logic and compares each individual row, expensively, discarding those that don’t meet the criteria.
Don’t take my word for it though. Try it out. Especially try it out when your app has a lot of data in the table.
yes, of course.
if you have a access database file - say shared on a folder. Then you deploy your .net desktop application to each workstation?
And furthermore, say the table has 1 million rows.
If you do this:
SELECT * from tblInvoice WHERE InvoiceNumber = 123245
Then ONLY one row is pulled down the network pipe - and this holds true EVEN if the table has 1 million rows. To traverse and pull 1 million rows is going to take a HUGE amount of time, but if you add criteria to your select, then it would be in this case about 1 million times faster to pull one row as opposed to the whole table.
And say if this is/was multi-user? Then again, even on a network - again ONLY ONE record that meets your criteria will be pulled. The only requirement for this "one row pull" over the network? Access data engine needs to have a useable index on that criteria. Of course by default the PK column (ID) always has that index - so no worries there. But if as per above we are pulling invoice numbers from a table - then having a index on that column (InvoiceNumber) is required for the data engine to only pull one row. If no index can be used - then all rows behind the scenes are pulled until a match occurs - and over a network, then this means significant amounts of data will be pulled without that index across that network (or if local - then pulled from the file on the disk).
I know there were many disscussions about what is fast when we talk about fetching data from DB. But still, my problem is that I need to load a huge amount of data (between 20.000 rows - 5 million rows, based on conditions which user executes), in order to export all that in Excel worksheets, from DatagridView.
I know I could set pagination of query in DatagridView & I have allready tested that (which works nice & fast btw), but bottom problem of exporting to Excel still exists, so I need to fetch everything. Currently what I have done is to load data in different thread, so that UI doesn't freeze while fetching data.
Query in DB is optimized as It get's - unfortunally It's not a simple one since It uses even DB links with joins to produce final resultset. But from the point of DB we can't do nothing else.
I allready tested all I could think of - working with DataAdapter.Fill & DataReader.Load methods, with Datasets,DataTable and even List < T >. I did all of these in order to see difference between perfomances (speed vs. memory usage).
Results were actually a bit surprising to me - with all talking about DataReader.Load speed over DataAdapter.Fill method, and filling List < T > over DataTable, I came to a conclusion that in my case all statements doesn't apply quite as It supposed to:
1.) Filling List < T > was slower than filling DataTable, but lower memory consumption;
2.) DataReader.Load vs DataAdapter.Fill is almost no difference at all - sometimes first wins, sometimes not, and differencies are in couple of seconds.
Anyway, nothing I tried haven't produced desired results, since my data for about 2 millions of rows is loading around 5 minutes (at It's best case).
Is there anything I have missed that beats all these methods when It comes to filling data in terms of speed ? Maybe some caching ?
For any advices thanks in advance !
P.S.: I would be satisfied just with a plain answer on where to start, that's why I didn't include any code.
I have two huge data tables with 300 columns and 100000 rows in both.I want to compare them cell by cell and show the result in a third data table. If match has occurred show 1 in result and if miss match happened show 0 in result.I used for loop but it was very slow and took a lot of time.can any one help please?
you can follow the below link : -
http://canlu.blogspot.in/2009/05/how-to-compare-two-datatables-in-adonet.html
https://www.dotnetperls.com/datatable-compare-rows
The only possible solution is the looping , but the above two links gives you some built-in collections that may ease the looping and give you performance .
First of all you need to provide some code and same expectation.
if you have a table with 300 columns I think you broke some fundamental normalization database design role.
if you want the result as t1.c1 = t2.c2 ... you can try to perform this in query with join as more performant way then loop through every columns for every rows
I am creating an application where I have n number of rows (in reality it will be 1 million+ rows) and need to know what do you think; is better to implement?
The goal is I need to iterate through each row and for each of those rows do some cool things...however there are two ways to do this.
First Way:
SELECT * FROM table and load them all into a list...
Second Way:
SELECT...LIMIT 1 and load 1 row, do the work on that row,then redo a SELECT...LIMIT 1 each time.
Which way is better when taking into consideration performance is important, memory is no problem, and constant lookups are not very expensive.
Just load batch of 1000 records at a time, process them and load the next 1000. This will reduce the amount of connections being setup/teared down and the amount of queries sent to the database server.
Nightly, I need to fill a SQL Server 2005 table from an ODBC source with over 8 million records. Currently I am using an insert statement from linked server with syntax select similar to this:
Insert Into SQLStagingTable from Select * from OpenQuery(ODBCSource, 'Select * from SourceTable')
This is really inefficient and takes hours to run. I'm in the middle of coding a solution using SqlBulkInsert code similar to the code found in this question.
The code in that question is first populating a datatable in memory and then passing that datatable to the SqlBulkInserts WriteToServer method.
What should I do if the populated datatable uses more memory than is available on the machine it is running (a server with 16GB of memory in my case)?
I've thought about using the overloaded ODBCDataAdapter fill method which allows you to fill only the records from x to n (where x is the start index and n is the number of records to fill). However that could turn out to be an even slower solution than what I currently have since it would mean re-running the select statement on the source a number of times.
What should I do? Just populate the whole thing at once and let the OS manage the memory? Should I populate it in chunks? Is there another solution I haven't thought of?
The easiest way would be to use ExecuteReader() against your odbc data source and pass the IDataReader to the WriteToServer(IDataReader) overload.
Most data reader implementations will only keep a very small portion of the total results in memory.
SSIS performs well and is very tweakable. In my experience 8 million rows is not out of its league. One of my larger ETLs pulls in 24 million rows a day and does major conversions and dimensional data warehouse manipulations.
If you have indexes on the destination table, you might consider disabling those till the records get inserted?