I am loading a large result set of about 3 million rows (ADODB record set) into a data table. Its taking too long to even load the result set into a data table. I want to find out a way of extracting only part of a result set and then loading it into a DataTable. Alternatively, is there a way to directly read the recordset directly instead of loading it into a data table and then reading it ?
This is the code I use to fill my DataTable -
OleDbDataAdapter oleDA = new OleDbDataAdapter();
DataTable dt = new DataTable();
oleDA.Fill(dt, myADODBRecordset);
Here are some options to consider:
Get only the rows and columns you really need to work with.
Get some data and let the user ask for the next set of rows when they want to.
Write optimized SQL queries
Don't use a DataTable unless you have to because it contains more metadata information that other list-type objects.
Consider using a managed .NET provider.
Why load so large data into memory? A large data transaction must comsume large resource, so optimize your code, or use EF.
Related
I have a table with millions of rows of data, and I would like to know which is the best way to query my data - using .ExecuteReader() or using a Dataset.
Using SqlDataReader like this:
myReader = cmd.ExecuteReader();
And after fill a list with the result
Or using DataSet
using (SqlDataAdapter da = new SqlDataAdapter(cmd))
{
a.Fill(ds);
}
Which are the best method?
The two objects are to be used in contexts fundamentally different.
A DataReader instance returned by the ExecuteReader doesn't return anything until you loop over it using the Read() method. It is a connected object that has a pointer to a current record on the backend database. You read the content of the record using the various GetXXXXX methods provided by the reader or simply using the indexer. When you have done with the current record you orderly pass to the following one using the Read() method. No way to go back or jump to record N + 100.
A DataSet instead is a disconnected object. It uses internally a DataReader to fill its local memory buffer with all the records returned by the command text query. It is handy if you need to work randomly on the data returned or show them on video or print them. But of course, waiting to have millions of records returned by the internal reader could be time consuming and the consuming of the local memory probably will kill your process before the end.
So, which is the best? Neither, if you have millions of records in your table, you need to put in place an appropriate WHERE condition to reduce the amount of records returned. Said that, it depends on what you need to do with the returned data. To display them on a grid probably you could use a DataSet. Instead a DataReader is better if you need to execute operations on the records one by one.
The question is what you want to fill
If you want to fill a DataSet/DataTable use DataAdapter.Fill(ds)
If you want to fill a list/array use a DataReader and a loop
The DataAdapter also uses a DataReader behind the scenes, but it loops all records. You could add a different logic to loop only part of the resultset.
"I have a table with million of rows": You should almost never need to return so many records. So don't filter in memory but in the database.
Both Are good Methods. But if you use SqlDataReader than you have to close it. its is must. Otherwise you will not able to execute any other query until SqlDataReader is not closed.
I have a group of data sets and they are all editable, what I need is a function that copies all of the information in the data tables and saves it back into the data set or as a seperate xml.
Ultimately, a DataTable is updatable. I can write
ds.Tables[0].Rows[0][0] = "Test"
(where ds is a DataSet) and that will update row 1, column 1 of the first DataTable within the DataSet. What I do from there with it is my choice, but the change is there in the in-memory copy of the DataSet.
If you're populating from a database via a DataAdapter, you can then call Update() to commit the changes back to the database, but if you're not in that scenario, then your change will remain in memory until you either dispose of the DataSet/s or go back off to the source to fetch them again.
I think the bottom line here is that you don't need a function to update DataTables, as they are inherently capable of being updated.
I have a data source with 1.4+ millions rows in it, and growing.
We make the users add filters to cut the called data down, but you are still looking at 43,000+/- to 100,000 +/- rows at a time.
Before any one says, no one can look at that many rows anyway, they are exported to a excel workbook for calculations based on them.
I am loading the result as follows in the GridView from the CSV file that is returned:
Object result = URIService.data;
CSVReader csvReader = new CSVReader(result);
DataTable dataTable = csvReader.CreateDataTable(true, true);
If(dataTable != null)
{
gridView1.BeginUpdate();
gridView1.DataSource = dataTable;
gridView1.DataBind()
gridView1.EndUpdate();
}
Else
{
Return;
}
CSVReader is a CSV Parser.
My question is, is this the best and most efficient way to load a large data set to a gridview?
EDIT: Would using a list for the rows or something other than a data table be better?
I think there is only one way to load the large data set into grid-view and it is the one you are using right now, but if you want to make the performance better I highly recommend using pagination so you have chunks of data loaded on every page therefore you will decrease the loading time
http://sivanandareddyg.blogspot.com/2011/11/efficient-server-side-paging-with.html
http://www.codeproject.com/Articles/125541/Effective-Paging-with-GridView-Control-in-ASP-NET
https://web.archive.org/web/20211020140032/https://www.4guysfromrolla.com/articles/031506-1.aspx
Did you try to use buffered renderer?
in the case of SQL SERVER use SqlBulkCopy class to copy large data with the highest speed
I need to copy large resultset from one database and save it to another database.
Stored procedures are used for both fetching and storing due to the fact that there is some logic involved during saving.
I'm trying to find an efficent solution, no way I can hold the whole dataset in memory, and I would like to minimize roundtrips count.
Data is read from source table with
var reader = fetchCommand.ExecuteReader();
while (reader.Read()){...}
Is there a way to insert this data to another sqlCommand without loading the whole dataset into a DataTable but also without inserting rows ine by one?
Sqlserver is MS SQL Server 2008 on both source and target databases. Databases are on different servers. Use of SSIS or linked servers is not an option.
EDIT:
It appears it's possible to stream rows into a stored procedure using table-valued paramaters. Will investigate this approach as well.
UPDATE:
Yes it's possible to stream data out from command.ExecuteReader to another command like this:
var reader = selectCommand.ExecuteReader();
insertCommand.Parameters.Add(
new SqlParameter("#data", reader)
{SqlDbType = SqlDbType.Structured}
);
insertCommand.ExecuteNonQuery();
Where insertCommand is a stored procedure with table-valued parameter #data.
You need SqlBulkCopy. You can just use it like this:
using (var reader = fetchCommand.ExecuteReader())
using (var bulkCopy = new SqlBulkCopy(myOtherDatabaseConnection))
{
bulkCopy.DestinationTableName = "...";
bulkCopy.ColumnMappings = ...
bulkCopy.WriteToServer(reader);
}
There is also a property to set the batch size. Something like 1000 rows might give you the best trade-off between memory usage and speed.
Although this doesn't let you pipe it into a stored procedure, the best approach might be to copy data to a temporary table and then run bulk update command on the server to copy the data into its final location. This usually faster by far than executing lots of separate statements for each row.
You can use SqlBulkCopy with a data-reader, which does roughly what you are asking (non-buffered etc) - however, this won't be calling stored procedures to insert. If you want that, perhaps use SqlBulkCopy to push the data into a second table (same structure), then at the DB server, loop over the rows calling the sproc locally. That way, latency etc ceases to be an issue (as the loop is all at the DB server).
C# with .net 2.0 with a SQL server 2005 DB backend.
I've a bunch of XML files which contain data along the lines of the following, the structure varies a little but is more or less as follows:
<TankAdvisory>
<WarningType name="Tank Overflow">
<ValidIn>All current tanks</ValidIn>
<Warning>Tank is close to capacity</Warning>
<IssueTime Issue-time="2011-02-11T10:00:00" />
<ValidFrom ValidFrom-time="2011-01-11T13:00:00" />
<ValidTo ValidTo-time="2011-01-11T14:00:00" />
</WarningType>
</TankAdvisory>
I have a single DB table that has all the above fields ready to be filled.
When I use the following method of reading the data from the XML file:
DataSet reportData = new DataSet();
reportData.ReadXml("../File.xml");
It successfully populates the Dataset but with multiple tables. So when I come to use SQLBulkCopy I can either save just one table this way:
sbc.WriteToServer(reportData.Tables[0]);
Or if I loop through all the tables in the Dataset adding them it adds a new row in the Database, when in actuality they're all to be stored in the one row.
Then of course there's also the issue of columnmappings, I'm thinking that maybe SQLBulkCopy is the wrong way of doing this.
What I need to do is find a quick way of getting the data from that XML file into the Database under the relevant columns in the DB.
Ok, so the original question is a little old, but i have just came across a way to resolve this issue.
All you need to do is loop through all the DataTables that are in your DataSet and add them to the One DataTable that has all the columns in the Table in your DB like so...
DataTable dataTable = reportData.Tables[0];
//Second DataTable
DataTable dtSecond = reportData.Tables[1];
foreach (DataColumn myCol in dtSecond.Columns)
{
sbc.ColumnMappings.Add(myCol.ColumnName, myCol.ColumnName);
dataTable.Columns.Add(myCol.ColumnName);
dataTable.Rows[0][myCol.ColumnName] = dtSecond.Rows[0][myCol];
}
//Finally Perform the BulkCopy
sbc.WriteToServer(dataTable);
foreach (DataColumn myCol in dtSecond.Columns)
{
dataTable.Columns.Add(myCol.ColumnName);
for (int intRowcnt = 0; intRowcnt <= dtSecond.Rows.Count - 1; intRowcnt++)
{
dataTable.Rows[intRowcnt][myCol.ColumnName] = dtSecond.Rows[intRowcnt][myCol];
}
}
SqlBulkCopy is for many inserts. It's perfect for those cases when you would otherwise generate a lot of INSERT statements and juggle the limit on total number of parameters per batch. The thing about the SqlBulkCopy class though, is that it's a cranky. Unless you fully specify all column mappings for the data set it will throw an exception.
I'm assuming that your data is quite manageable since your reading it into a DataSet. If you where to have even larger data sets you could lift chunks into memory and then flush them to the database piece by piece. But if everything fits in one go, it's as simple as that.
The SqlBulkCopy is the fastest way to put data into the database. Just setup column mappings for all the columns, otherwise it won't work.
Why reinvent the wheel? Use SSIS. Read with an XML Source, transform with one of the many Transformations, then load it with an OLE Db Destination into the SQL Server table. You will never beat SSIS in terms of runtime, speed to deploy the solution, maintenance, error handling etc etc.