DataTable vs Array performance in c#

DataTable vs Array performance in c# - c#

I am querying data from Database table say(ABCD) and want to store it in some DataStructure, then read it row by row and perform calculations on the values and update the datastructure with the modified values.
Finally once calculation on all the rows is finished, i would update ABCD table with updated values for each row.
Please suggest the data structure to be used which would give best performance.
I am considering the use of 2D array or DataTable.

An array is much lighter than a datatable but has no data structure operations. However while the data table is a heavy object it is much easier to work with

Array
The fastest data structure to read from is an array.
Searching takes time
Very inefficient to insert or remove something from an array
DataTables
Faster searches (DataTable to LINQ is the fastest than DataTable.Select)
LINQ was still a lot faster than using normal DataSet methods (40-50 times faster!)
Fast insertions, removal, and sorting

Related

Sort-Merge two Datasets: which approach is faster?

Greetings Overflowers,
I have two huge DataSets that I need to merge and sort.
Shall I:
Use DataSet.Merge and then DataSet.Sort?
Insert both DataSets into an in-memory database that is indexed by sort columns then query against it?
Insert both DataSets into an in-memory database that is NOT indexed and do the sorting at query time?
Other option; e.g. multi-threaded merge-sort algorithm that takes both DataSets and outputs a merged and sorted DataSet?
Kind regards

Simplifying complexity in for a table object structure

I have an object structure that is mimicking the properties of an excel table. So i have a table object containing properties such as title, header row object and body row objects. Within the header row and each body row object, i have a cell object containing info on each cell per row. I am looking for a more efficient way to store this table structure since in one of my uses for this object, i am printing its structure to screen. Currently, i am doing an O(n^2) complexity for printing each row for each cell:
foreach(var row in Table.Rows){
foreach(var cell in row.Cells){
Console.WriteLine(cell.ToString())
}
}
Is there a more efficient way of storing this structure to avoid the n^2? I ask this because this printing functionality exists in another n^2 loop. Basically i have a list of tables titles and a list of tables. I need to find those tables whose titles are in the title list. Then for each of those tables, i need to print their rows and the cells in each row. Can any part of this operation be optimized by using a different data structure for storage perhaps? Im not sure how exactly they work but i have heard of hashing and dictionary?
Thanks

Since you are looking for tables with specific titles, you could use a dictionary to store the tables by title
Dictionary<string,Table> tablesByTitle = new Dictionary<string,Table>();
tablesByTitle.Add(table.Title, table);
...
table = tablesByTitle["SomeTableTitle"];
This would make finding a table an O(1) operation. Finding n tables would be an O(n) operation.
Printing the tables then of cause depends on the number of rows and columns. There is nothing, which can change that.
UPDATE:
string tablesFromGuiElement = "Employees;Companies;Addresses";
string[] selectedTables = tablesFromGuiElement.Split(';');
foreach (string title in selectedTables) {
Table tbl = tablesByTitle[title];
PrintTable(tbl);
}

There isn't anything more efficient than an N^2 operation for outputting an NxN matrix of values. Worst-case, you will always be doing this.
Now, if instead of storing the values in a multidimensional collection that defines the graphical relationship of rows and columns, you put them in a one-dimensional collection and included the row-column information with each cell, then you would only need to iterate through the cells that had values. Worst-case is still N^2 for a table of N rows and N columns that is fully populated (the one-dimensional array, though linear to enumerate, will have N^2 items), but the best case would be that only one cell in that table is populated (or none are) which would be constant-time.

This answer applies to the, printing the table part, but the question was extended.
for the getting the table part, see the other answer.
No, there is not.
Unless perhaps your values follow some predictable distribution, then you could use a function of x and y and store no data at all, or maybe a seed and a function.
You could cache the print output in a string or StringBuider if you require it multiple times.
If there is enough data I guess you might apply some compression algorithm but I wouldn't say that was simpler or more efficient.

C# - winforms - getting texts of specific column in listview in form of an array

is it possible to get the text values of specific column in a listview in form of array without using a loop function? lets say I have a listview contains 2 columns and 5000 records. what I want is to get all the texts under column 2 only but without using a loop. I know I can accomplish this with a loop but it takes ages if I had 5000+ records..
any ideas?

I feel your pain and I've personally experienced how slow a ListViewItemsCollection can be.
You can access the second column of a ListView without a for loop like this:
string[] column1 = this.listView1.Items.Cast<ListViewItem>().Select(item => item.SubItems[1].Text).ToArray();
but this isn't any faster than a for loop. In fact it's a little bit slower! Other tricks like trying to use ListViewItemsCollection.CopyTo also fail miserably.
A major problem with the ListView is that is pathologically slow when used as a data structure. The property ListView.Items looks like a data structure and you can use it like a data structure but any way you slice it:
data goes in but it doesn't come out fast.
So if you follow alexD's advice you will find that a real data structure will outperform a ListViewItemCollection by orders of magnitude. The moral of the story is if you want to query the data in a ListViewItemCollection fast, you have no alternative than to keep a fast copy outside of the ListView. Practically any data structure will do, e.g.
a string[][]
a string[,]
a List<List<string>>
a List<string[]> or even
a List<Tuple<string, string>>.
These can all store the same data as a two-column ListView and will be instantaneously fast by comparison to using the ListView as a data structure. It is inconvenient but that is the price we must pay for speed.

Don't read the data directly from the text file into the list view...store it in a data structure like an ArrayList which will maintain the order in which you insert elements.

Do ADO.Net DataTables have indexes?

I am using VSTS 2008 + C# + .Net 3.5 + SQL Server 2008 + ADO.Net. If I load a table from a database by using a DataTable of ADO.Net, and in the database table, I defined a couple of indexes on the table. My question is, whether on the ADO.Net DataTable, there is related index (the same as the indexes I created on physical database table) to improve certain operation performance on DataTable?
thanks in advance,
George

Actually George's question is not so "bad" as some people insist it is. (I am more and more convinced that there's no such thing as, "a bad question").
I have a rather big table which I load into the memory, in a DataTable object. A lot of processing is done on lines from this table, a lot of times, on various (and different) subsets which I can easily describe as "WHERE ..." of SELECT clauses. Now with this DataTable I can run Select() - a method of DataTable class - but it is quite inefficient.
In the end, I decided to load the DataTable sorted by specific columns and implemented my own
quick search, instead of using the Select() function. It proved to be much faster, but of course it works only on those sorted columns. The trouble would have been avoided, had a DataTable had indexes.

No, but possibly yes.
You can set up your own indices on a DataTable, using a DataView. As you change the table, the DataView will be rebuilt, so the index should always be up to date.
I did some bench tests for my own app. I use a DataTable to approximate a Boost MultiIndexContainer. To create an index on a column call "Author", I initialise the DataTable, and then the DataView...
_dvChangesByAuthor =
new DataView(
_dtChanges,
string.Empty,
"Author ASC",
DataViewRowState.CurrentRows);
To then pull data by Author from the table, you use the view's FindRows function...
dataRowViews = _dvChangesByAuthor.FindRows(author);
List<DataRow> returnRows = new List<DataRow>();
foreach (DataRowView drv in dataRowViews)
{
returnRows.Add(drv.Row);
}
I made a random large DataTable, and ran queries using DataTable.Select(), Linq-To-DataSet (with forced execution by exporting to list) and the above DataView method. The DataView method won easily. Linq took 5000 ticks, Select took over 26000 ticks, DataView took 192 ticks...
LOC=20141121-14:46:32.863,UTC=20141121-14:46:32.863,DELTA=72718,THR=9,DEBUG,LOG=Program,volumeTest() - Running queries for author >TFYN_AUTHOR_047<
LOC=20141121-14:46:32.863,UTC=20141121-14:46:32.863,DELTA=72718,THR=9,DEBUG,LOG=RightsChangeTracker,GetChangesByAuthorUsingLinqToDataset() - Query elapsed time: 2 ms, 4934 ticks; Rows=65
LOC=20141121-14:46:32.879,UTC=20141121-14:46:32.879,DELTA=72733,THR=9,DEBUG,LOG=RightsChangeTracker,GetChangesByAuthorUsingSelect() - Query elapsed time: 11 ms, 26575 ticks; Rows=65
LOC=20141121-14:46:32.879,UTC=20141121-14:46:32.879,DELTA=72733,THR=9,DEBUG,LOG=RightsChangeTracker,GetChangesByAuthorUsingDataview() - Query elapsed time: 0 ms, 192 ticks; Rows=65
So, if you want indices on a DataTable, I would suggest DataView, if you can deal with the fact that the index is re-built when the data changes.

You can create a primary key for the datatable. Filter operations get a big boost if you are searching in the primary key field. Check out this link: here

I had the same problem with many queries from a large datatable that are not according to the primary key.
The solution I found was to create DataView for each index I wanted to use, and then use it's Find and FindRows methods to extract the data.
DataView creates an internal index on the DataTable and behaves virtually as an index for this purpose.
In my case I was able to reduce 10,000 queries from 40 Seconds to ONE!!!

John above is correct. DataTables are disconnected in memory structures. They do not map to the physical implementation of the database.
The indexes on disk are used to speed up lookups because you don't have all the rows. If you have to load every row and scan them it is slow, so an index makes sense. In a DataTable you already have all the rows, so a comparison is fast already.

The correct answer here to the implicit question of creating an index on a DataTable is that you can't do that, but you can create one or more DataViews for the DataTable, which according to the doc will create an index based on the sorting the DataView specifies:
DataView constructs an index. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure that enables the DataView to find the row or rows associated with the key values quickly and efficiently. Operations that use the index, such as filtering and sorting, see signifcant performance increases. The index for a DataView is built both when the DataView is created and when any of the sorting or filtering information is modified. Creating a DataView and then setting the sorting or filtering information later causes the index to be built at least twice: once when the DataView is created, and again when any of the sort or filter properties are modified.
If you need to do a large number of lookups to an in-memory DataTable, it may be the most straightforward and performant to use a DataView with the Find() or FindRows() method to do indexed key lookups. In particular, if you need to do a number of lookups and modifications to the data this would prevent needing to transform your DataTable into another indexed class like a Dictionary and then transforming it back into a DataTable again.

Others have made the point that a DataSet is not intended to serve as a database system--just a representation of data. If you are working under the impression that a DataSet is a database then you are mistaken and might need to reconsider your implementation.
If you need a client-side database, consider using SQL Compact or SQL Lite, both are free redistributable Database systems which can be used without requiring separate installations or services. If you need something more full-featured the SQL Express is the next step up.
To help clarify though, DataSets/Tables are used in .NET development to temporarily hold data as needed. Think of them as the results of a SELECT query against a database; they are roughly similar to CSV files or other forms of tabular data--you can pull data into them from a database, work with the data, and then push the changes back to a database--but they, on their own, are not databases.
If you have a large collection of items which you need to keep in memory for one reason or another then you might consider building a lightweight DTO (data transfer object, Google it, they're very simple) and loading them into a HashTable. HashTables won't give you any form of relational data, but are very efficient at look-ups.

DataTables have a PrimaryKey field that can serve as an index (they are fast already anyway). This field is not copied from the Primary Keys of the database (although that might be nice).

My reading of the docs is that the correct way to achieve this (if needed) is to use AsDataView to produce a DataView (or LinqDataView) that's bound to the underlying table. If your DataTable is invariant then the DataView can be static to avoid redundant re-indexing.
I am currently investigating Linq to DataSet, and this q was helpful to me, so thanks.

DataTables are indexed if you (the coder) specify one or more DataColumns as the Primary Key. Interally ADO.NET uses a Red-Black tree to form this index giving log-time lookups. This Primary Key is not set automatically based on any underlying keying from the data provider.

George,
The answer is no.
Actually, some sort of indexing may be used internally, but only as an implementation detail. For instance, if you create a foreign key constraint, maybe that's assisted by an index. But it doesn't matter to a developer.

Datatable vs Dataset

I currently use a DataTable to get results from a database which I can use in my code.
However, many example on the web show using a DataSet instead and accessing the table(s) through the collections method.
Is there any advantage, performance wise or otherwise, of using DataSets or DataTables as a storage method for SQL results?

It really depends on the sort of data you're bringing back. Since a DataSet is (in effect) just a collection of DataTable objects, you can return multiple distinct sets of data into a single, and therefore more manageable, object.
Performance-wise, you're more likely to get inefficiency from unoptimized queries than from the "wrong" choice of .NET construct. At least, that's been my experience.

One major difference is that DataSets can hold multiple tables and you can define relationships between those tables.
If you are only returning a single result set though I would think a DataTable would be more optimized. I would think there has to be some overhead (granted small) to offer the functionality a DataSet does and keep track of multiple DataTables.

in 1.x there used to be things DataTables couldn't do which DataSets could (don't remember exactly what). All that was changed in 2.x. My guess is that's why a lot of examples still use DataSets. DataTables should be quicker as they are more lightweight. If you're only pulling a single resultset, its your best choice between the two.

One feature of the DataSet is that if you can call multiple select statements in your stored procedures, the DataSet will have one DataTable for each.

There are some optimizations you can use when filling a DataTable, such as calling BeginLoadData(), inserting the data, then calling EndLoadData(). This turns off some internal behavior within the DataTable, such as index maintenance, etc. See this article for further details.

When you are only dealing with a single table anyway, the biggest practical difference I have found is that DataSet has a "HasChanges" method but DataTable does not. Both have a "GetChanges" however, so you can use that and test for null.

A DataTable object represents tabular data as an in-memory, tabular cache of rows, columns, and constraints.
The DataSet consists of a collection of DataTable objects that you can relate to each other with DataRelation objects.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

DataTable vs Array performance in c# - c#

An array is much lighter than a datatable but has no data structure operations. However while the data table is a heavy object it is much easier to work with

Related

Sort-Merge two Datasets: which approach is faster?

Simplifying complexity in for a table object structure

C# - winforms - getting texts of specific column in listview in form of an array

Do ADO.Net DataTables have indexes?

Datatable vs Dataset

Categories

Resources