This question might be silly but I want to understand how DataTable.Copy() commands works. It create a deep copy of original datatable, but does it always maintains the order of datarows from Original table? As per my testing it always creates row in same order as it is in original table. Is there a slight chance that order or rows in copied table can be different from original table?
You are correct that DataTable.Copy() returns a deep copy. There is no guarantee of row order specified on MSDN, but if it were different, it would by definition not be a copy.
If you are curious what the current implementation does, have a look at the Microsoft Reference Source for DataTable.Copy(). You will see it first performs a Clone(). Then, it performs a row-by-row copy. So yes, the current implementation will preserve the row order.
I have a DataGridView. It uses a BindingSource, a DataTable in a DataSet, and a TableAdapter to add/change/delete data in a table. It worked OK, but stopped working when I added a field/column, and I can't figure out what I did or how fix it.
The user can add a new row at the bottom of the DataGridView, but when he goes to save, the row disappears and is not saved. In addition, if he tries to type a second new row, the first new row disappears.
Existing Rows can be changed and saved back to the database successfully.
I've been asked for code. OK, here is code. (I've eliminated some error checking done by scanning dtDep) The point that after the third line is executed, there are no rows in dtDep even though a new row had been entered into the DataGridView. If a row had been retrieved, it would be in dtDep and the database table updated by the last statement.
this.Validate();
bsBelkDep.EndEdit();
DataTable dtDep = dsBelk.Tables["belk_elig_dep"];
int n = belk_elig_depTableAdapter.Update(this.dsBelk.belk_elig_dep);
It was a problem with the DataGridView, but I don't know what. I started deleting and re-creating the various object, and after the I recreated the DataGridView, it worked OK. Which was a pain because I have to do significant reformatting, but at least it works.
This is a very old question and I have no way of knowing if it was the OP's original problem, but I had the exact same scenario and this is how I resolved it.
For background: I have a WinForms application built using datasets and an Access database. I migrated that to use Sqlite and anything but datasets. To avoid destroying the application completely, first I copied the strongly typed data tables out, tweaked them to account for changes in the schema and then used PetaPoco to perform the data operations. That worked fine for a single test conversion.
The trouble arose when I wanted to move on and convert all data tables - I wasn't happy manually writing the logic for converting to and from typed data rows and POCOs, so I fell back to writing old school T4 templates to generate typed DataTable, DataRow classes and the necessary remapping code.
Worked a treat - for editing or removing data. But new rows disappeared on "creation", the binding navigator count didn't increment, and of course, when saving, I didn't detect any rows with the RowState of DataRowState.Added. The grid at start up was subtly different - a blank value in all columns instead of a negative number in the ID column. In hindsight, that should have been a big clue.
On reverting the behaviour back to the manually extracted typed class the grid started working again so it was clearly an error in the new code.
End of background; tldr;
The cause of the issue, in my case, was that the my Id column didn't have the AutoIncrement property set. As soon as I configured that to be true (along with setting AutoIncrementSeed and AutoIncrementStep to -1, although neither are required) new rows started being correctly added to the table.
How to make all columns allow null before adding a new row to the datatable .
dt.Rows.Add(dt.NewRow());
This line throws an exception
Column XXX does not allow nulls
How to fix this problem.
You don't add the row until it's filled and ready to be saved.
DataRow row = dt.NewRow();
...fill row with values...
dt.Rows.Add(row);
Usually the database designer specifies that a column can't be null for a reason. For example, it might be a primary key or a foreign key, or is otherwise mandatory information.
If you are sure that it is OK to provide no data for this column for this particular record, try passing an empty string.
It is usually better to initialize the whole row in memory before sending it to the database (instead of filling it field-by-field when it is already there).
However, if you absolutely must do that, and if your DBMS supports it, you can declare your NOT NULL constraints as deferred, so they are not checked until the transaction commits. Here is an Oracle example.
It's rarely good solution. Start from redesigning your database.
Consider removing NOT NULL constraints from all not necessary fields.
Also if any fields is obligatory, but you still do not want to fill it during row creation, set default value either in database or middle layer (ORM or whatever)
edit:
however, in this case it looks like you're just trying to pass an empty row to db, before initializing it with data. That will never work ;-)
Imagine a table and a button to add new rows to the table. On each click to the button, a new row will be inserted at the end of the table. The button event is functioning as follows:
first of all, it points out a reference row to copy.
whatever the controls and text are inside this referenced row they are copied to a datatable. Since a datatable cannot hold controls I am converting them to strings and saving them like that.
At the end, the datatable is stored within a cache.
Finally, on each page_init event I re-create the table using the data inside the datatable. Everything works fine.
However, I'm curious. Since I have from 3 to 5 tables in the page and all of them are stored in a different cache with a different datatable, and all of them are re-created during the page-cycle events, may it cause any problems in the future? By the way, please note that once the user leaves the page the cache is deleted.
I did not want to paste the whole code here since it's a bit long and may alienate people from reading the question. But I can give some statistics so that you can make some comments on it.
The class I've written is 118 lines long.
During the process of recreation of the table, there are 3 nested for/foreach loops, but they are not that long (the average loop times is probably from 5 to 10 for each).
And finally, as mentioned above, to re-create the table a datatable that is saved in cache is used.
So, I ask the question again: The code works perfectly, but I would like to know if building such a code is performance-friendly?
It depends completely on the amount of data in the table (number of rows / columns).
If its small like, pulling down a list of 10 users and their logins and passwords for example, it will work just fine with no performance issues.
But if this is going to be thousands and thousands of records, this will probably start to have performance issues.
Edit: Write a script to fill the database to a "worse case" expected amount of data, and then see how it performs.
I am using VSTS 2008 + C# + .Net 3.5 + SQL Server 2008 + ADO.Net. If I load a table from a database by using a DataTable of ADO.Net, and in the database table, I defined a couple of indexes on the table. My question is, whether on the ADO.Net DataTable, there is related index (the same as the indexes I created on physical database table) to improve certain operation performance on DataTable?
thanks in advance,
George
Actually George's question is not so "bad" as some people insist it is. (I am more and more convinced that there's no such thing as, "a bad question").
I have a rather big table which I load into the memory, in a DataTable object. A lot of processing is done on lines from this table, a lot of times, on various (and different) subsets which I can easily describe as "WHERE ..." of SELECT clauses. Now with this DataTable I can run Select() - a method of DataTable class - but it is quite inefficient.
In the end, I decided to load the DataTable sorted by specific columns and implemented my own
quick search, instead of using the Select() function. It proved to be much faster, but of course it works only on those sorted columns. The trouble would have been avoided, had a DataTable had indexes.
No, but possibly yes.
You can set up your own indices on a DataTable, using a DataView. As you change the table, the DataView will be rebuilt, so the index should always be up to date.
I did some bench tests for my own app. I use a DataTable to approximate a Boost MultiIndexContainer. To create an index on a column call "Author", I initialise the DataTable, and then the DataView...
_dvChangesByAuthor =
new DataView(
_dtChanges,
string.Empty,
"Author ASC",
DataViewRowState.CurrentRows);
To then pull data by Author from the table, you use the view's FindRows function...
dataRowViews = _dvChangesByAuthor.FindRows(author);
List<DataRow> returnRows = new List<DataRow>();
foreach (DataRowView drv in dataRowViews)
{
returnRows.Add(drv.Row);
}
I made a random large DataTable, and ran queries using DataTable.Select(), Linq-To-DataSet (with forced execution by exporting to list) and the above DataView method. The DataView method won easily. Linq took 5000 ticks, Select took over 26000 ticks, DataView took 192 ticks...
LOC=20141121-14:46:32.863,UTC=20141121-14:46:32.863,DELTA=72718,THR=9,DEBUG,LOG=Program,volumeTest() - Running queries for author >TFYN_AUTHOR_047<
LOC=20141121-14:46:32.863,UTC=20141121-14:46:32.863,DELTA=72718,THR=9,DEBUG,LOG=RightsChangeTracker,GetChangesByAuthorUsingLinqToDataset() - Query elapsed time: 2 ms, 4934 ticks; Rows=65
LOC=20141121-14:46:32.879,UTC=20141121-14:46:32.879,DELTA=72733,THR=9,DEBUG,LOG=RightsChangeTracker,GetChangesByAuthorUsingSelect() - Query elapsed time: 11 ms, 26575 ticks; Rows=65
LOC=20141121-14:46:32.879,UTC=20141121-14:46:32.879,DELTA=72733,THR=9,DEBUG,LOG=RightsChangeTracker,GetChangesByAuthorUsingDataview() - Query elapsed time: 0 ms, 192 ticks; Rows=65
So, if you want indices on a DataTable, I would suggest DataView, if you can deal with the fact that the index is re-built when the data changes.
You can create a primary key for the datatable. Filter operations get a big boost if you are searching in the primary key field. Check out this link: here
I had the same problem with many queries from a large datatable that are not according to the primary key.
The solution I found was to create DataView for each index I wanted to use, and then use it's Find and FindRows methods to extract the data.
DataView creates an internal index on the DataTable and behaves virtually as an index for this purpose.
In my case I was able to reduce 10,000 queries from 40 Seconds to ONE!!!
John above is correct. DataTables are disconnected in memory structures. They do not map to the physical implementation of the database.
The indexes on disk are used to speed up lookups because you don't have all the rows. If you have to load every row and scan them it is slow, so an index makes sense. In a DataTable you already have all the rows, so a comparison is fast already.
The correct answer here to the implicit question of creating an index on a DataTable is that you can't do that, but you can create one or more DataViews for the DataTable, which according to the doc will create an index based on the sorting the DataView specifies:
DataView constructs an index. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure that enables the DataView to find the row or rows associated with the key values quickly and efficiently. Operations that use the index, such as filtering and sorting, see signifcant performance increases. The index for a DataView is built both when the DataView is created and when any of the sorting or filtering information is modified. Creating a DataView and then setting the sorting or filtering information later causes the index to be built at least twice: once when the DataView is created, and again when any of the sort or filter properties are modified.
If you need to do a large number of lookups to an in-memory DataTable, it may be the most straightforward and performant to use a DataView with the Find() or FindRows() method to do indexed key lookups. In particular, if you need to do a number of lookups and modifications to the data this would prevent needing to transform your DataTable into another indexed class like a Dictionary and then transforming it back into a DataTable again.
Others have made the point that a DataSet is not intended to serve as a database system--just a representation of data. If you are working under the impression that a DataSet is a database then you are mistaken and might need to reconsider your implementation.
If you need a client-side database, consider using SQL Compact or SQL Lite, both are free redistributable Database systems which can be used without requiring separate installations or services. If you need something more full-featured the SQL Express is the next step up.
To help clarify though, DataSets/Tables are used in .NET development to temporarily hold data as needed. Think of them as the results of a SELECT query against a database; they are roughly similar to CSV files or other forms of tabular data--you can pull data into them from a database, work with the data, and then push the changes back to a database--but they, on their own, are not databases.
If you have a large collection of items which you need to keep in memory for one reason or another then you might consider building a lightweight DTO (data transfer object, Google it, they're very simple) and loading them into a HashTable. HashTables won't give you any form of relational data, but are very efficient at look-ups.
DataTables have a PrimaryKey field that can serve as an index (they are fast already anyway). This field is not copied from the Primary Keys of the database (although that might be nice).
My reading of the docs is that the correct way to achieve this (if needed) is to use AsDataView to produce a DataView (or LinqDataView) that's bound to the underlying table. If your DataTable is invariant then the DataView can be static to avoid redundant re-indexing.
I am currently investigating Linq to DataSet, and this q was helpful to me, so thanks.
DataTables are indexed if you (the coder) specify one or more DataColumns as the Primary Key. Interally ADO.NET uses a Red-Black tree to form this index giving log-time lookups. This Primary Key is not set automatically based on any underlying keying from the data provider.
George,
The answer is no.
Actually, some sort of indexing may be used internally, but only as an implementation detail. For instance, if you create a foreign key constraint, maybe that's assisted by an index. But it doesn't matter to a developer.