This question is about finding a more efficient way for a simple problem. I have two DataTables with same structure (i.e. the Columns have same name with same Ordinals). Let them Call DataTable A and DataTable B. Assume both have 100 rows. Now I want to copy all the rows of DataTable B to DataTable A without removing rows from DataTable A. So in the end DataTable A has 200 rows. I did it as shown below.
for (int i = 0; i < B.Rows.Count - 1;i++ )
{
DataRow dr = B.Rows[i];
A.Rows.Add(dr);
}
The issue is I do not want to loop. Is there a direct way to copy it, without looping. The whole 100 rows at once. Is there a function which specifies the set of rows you want to copy.
As far as I know, there is no other way of copying multiple rows from one Datatable to another than iterating through all the rows. In fact, on MSDN there is an article telling you how to copy rows between Datatables and uses an iteration loop.
https://support.microsoft.com/en-gb/kb/305346
There are some problems with your simple approach because it doesnt handle primary key violations. Try BeginLoadData, LoadDataRow and EndLoadData. This should be more efficient. BeginLoadData and EndLoadData call only once.
If you just need a new independent DataTable instance to work with and do not need to append rows to an existing DataTable, then the DataView.ToTable() method is very convenient.
https://msdn.microsoft.com/en-us/library/a8ycds2f(v=vs.110).aspx
It creates a separate copy with the same schema and content.
DataTable objTableB = objTableA.DefaultView.ToTable();
Related
What is the best way to compare two DataTables. If I have 2 Datatables with same data, if I change any value in datatable 2, while comparing I don't want to check row by row, count and memory with datatable 1.
If you get 2 DataTable object and want to check for differences between them then you'd probably must use loop.
If you want to react on change in object then you can use RowChanged event.
If you'd like to get changes since last read then use GetChanges() method.
A .Except() might be useful here. It produces a set difference of two sequences.
Two datatables - mtbl for master data, dtbl for detail data or the new data you're comparing to.
var differences = dtbl.AsEnumerable().Except(mtbl.AsEnumerable(), DataRowComparer.Default);
return differences.Any() ? differences.CopyToDataTable() : new DataTable();
return differences.Any() ? differences.CopyToDataTable() : new DataTable();
I have a DataTable object that I need to fill based on data stored in a stream of columns - i.e. the stream initially contains the schema of the DataTable, and subsequently, values that should go into it organised by column.
At present, I'm taking the rather naive approach of
Create enough empty rows to hold all data values.
Fill those rows per cell.
The result is a per-cell iteration, which is not especially quick to say the least.
That is:
// Create rows first...
// Then populate...
foreach (var col in table.Columns.Cast<DataColumn>)
{
List<object> values = GetValuesfromStream(theStream);
// Actual method has some DBNull checking here, but should
// be immaterial to any solution.
for (var i=0; i<values.Count; i++)
table.Rows[i][col] = values[i];
}
My guess is the backing DataStorage items for each column aren't expanding as the rows are added, but as values are added to each column, but I'm far from certain. Any tips for loading this kind of data.
NB that loading all lists first and then reading in by row is probably not sensible - this approach is being taken in the first place to mitigate potential out of memory exceptions that tend to result when serializing huge DataTable objects, so grabbing a clone of the entire data grid and reading it in would probably just move the problem elsewhere. There's definitely enough memory for the original table and another column of values, but there probably isn't for two copies of the DataTable.
Whilst I haven't found a way to avoid iterating cells, as per the comments above, I've found that writing to DataRow items that have already been added to the table turns out to be a bad idea, and was responsible for the vast majority of the slowdown I observed.
The final approach I used ended up looking something like this:
List<DataRow> rows = null;
// Start population...
var cols = table.Columns.Cast<DataColumn>.Where(c => string.IsNullOrEmpty(c.Expression));
foreach (var col in cols)
{
List<object> values = GetValuesfromStream(theStream);
// Create rows first if required.
if (rows == null)
{
rows = new List<DataRow>();
for (var i=0; i<values.Count; i++)
rows.Add(table.NewRow());
}
// Actual method has some DBNull checking here, but should
// be immaterial to any solution.
for (var i=0; i<values.Count; i++)
rows[i][col] = values[i];
}
rows.ForEach(r => table.Rows.Add(r));
This approach addresses two problems:
If you try to add an empty DataRow to a table that has null-restrictions or similar, then you'll get an error. This approach ensures all the data is there before it's added, which should address most such issues (although I haven't had need to check how it works with auto-incrementing PK columns).
Where expressions are involved, these are evaluated when row state changes for a row that has been added to a table. Consequently, where before I had re-calculation of all expressions taking place every time a value was added to a cell (expensive and pointless), now all calculation takes place just once after all base data has been added.
There may of course be other complications with writing to a table that I've not yet encountered because the tables I am making use of don't use those features of the DataTable class/model. But for simple cases, this works well.
I've got a datatable that I need to sort and place back into another datatable. On the face of this its easy as below:
DataTable sortme = getdata();
sortme.Select("col1 = 'something'", "sortbyme ASC").CopytoDataTable();
However, I've found as soon as I pass the DataRow array created by select() to CopytoDataTable(), the new datatable is no longer sorted by sortbyme.
How do I fix this without creating a loop to push each DataRow into a datatable? And what is causing the sorting to be lost?
There is already a question asked and solved about how to sort DataTables by row, please look at it here, it might help you:
Sorting rows in a data table
This is my first post on Stackoverflow.
I am reading millions of rows from a flat file (comma delimited) and iterating each read row and then each column of each row. The iteration of each column is to allow user defined conversions, defaults, removal of special characters, etc. to be performed. The current implementation is very efficient.
The reading of the data is done in batches of 20k. When I'm processing a read row, I issue a NewRow() call on my in-memory DataTable. I then start iterating each column to scrub their values. I'm trying to minimize as much as I can when I'm processing a rows columns.
My problem is this. If the value (text in this case) that is read from the flat file is longer than the MaxLength of the targeted DataTables DataColumn, I receive an exception stating such when I issue the following:
dataTable.Rows.Add(newRow);
Is there a way to tell ADO.Net (or my in-memory DataTable) to truncate the data instead of complaining?
Again, I can easily add logic in the loop to do this check/truncation for me, but those things add up when you're dealing with millions of rows of data.
something like this should work:
var newRow = dataTable.NewRow();
...
...
if(YourText.Length < ColumnMaxLength)
{
newRow["YourLimitedColumnName"] = YourText;
}
else
{
newRow["YourLimitedColumnName"] = YourText.Substring(0, ColumnMaxLength);
}
...
...
dataTable.Rows.Add(newRow);
I'm creating a HashMap mapping the ID field of a row in a DataTable to the row itself, to improve lookup time for some frequently accessed tables. Now, from time to time, I'm getting the RowNotInTableException:
This row has been removed from a table and does not have any data. BeginEdit() will allow creation of new data in this row.
After looking around the net a bit, it seems that DataRows don't like not being attached to a DataTable. Even though the DataTable stays in memory (not sure if the DataRows keep a reference to it, but I'm definitely still caching it anyway), is it possible I'm breaking something by keeping those rows all isolated in a HashMap? What other reason can there be for this error? This post
RowNotInTableException when accessing second time
discusses a similar problem but there's no solution either.
UPDATE
I'm actually storing DataRowViews if that makes any difference.
The DataRow should be always attached to some DataTable. Even if is removed from DataTable, the row still has reference to the table.
The reason is, the schema of table is placed in DataTable not in DataRow (and the data itself too).
If you want fast lookup without DataTables, use some own structure instead of DataRow.