DataTable row limit - c#

Morning,
Regarding the following quote, is this limit independent of how many columns there are? (Im assuming not but its not specifically stated anywhere.) If it is linked to the number of columns, how do you calculate that your not over this limit?
To add rows to a DataTable, you must first use the NewRow method to return a new DataRow object. The NewRow method returns a row with the schema of the DataTable, as it is defined by the table's DataColumnCollection. The maximum number of rows that a DataTable can store is 16,777,216. For more information, see Adding Data to a DataTable.
"Link to where quote was taken from."
Thanks for your help.

I would expect that limit (which is 224) to be independent of the number of columns. I expect that it's just a single 32-bit integer internally is used to represent the row count as 24-bits and 8 bits are used for flags or something similar.
In practice, 16 million rows is going to take a long time to populate and a lot of memory... if you're in danger of hitting that limit, you should probably be rethinking how you're accessing data to start with.

It is not linked to the number of columns, except in so far as your memory has an upper limit so if you have so many columns you can't store the rows in memory you could get an out of memory error.

You can use the DataTable.Rows.count to check the current row count before adding the new row

Related

How to find the maximum row and column WITHOUT reading all data?

Using C# .Net Google Sheets API.
I am new to the API, so I may have missed it in the docs - but how do you find out the maximum row and column that contain a value without reading all the data in the sheet?
For example, if a sheet contains multiple values and the "last" cell in the sheet with a value is at C139 (no cells in the rows following have a value and no cells in any column after C have a value), then the maximum row would be 139 and the maximum column would be 2 (zero based) or 3 (one based).
I tried sheet.Properties.GridProperties.RowCount -- but that gives the TOTAL number of rows in the sheet (whether the cells have values or not).
Same goes for sheet.Properties.GridProperties.ColumnCount -- gives the TOTAL number of columns in the sheet (whether the cells have values or not).
Any links or ideas are welcome.
I understand that you want to know the last row of data in your Sheet. In that case, you can use a simple GET with a full range. For example let's assume that your Sheet only has two columns, in that case you can set up the range like A1:B. That range will include the full two columns, but the get will only get as far as the data goes. At this step you already have an array filled with your data range, so you only have to count the array index of the last element in order to know the last row value. If you don't know how many columns your Sheet have, you only have to modify the range in a similar way as before (i.e. A1:Z). Please ask me any doubts about this approach.

fastest way to compare two data tables cell by cell

I have two huge data tables with 300 columns and 100000 rows in both.I want to compare them cell by cell and show the result in a third data table. If match has occurred show 1 in result and if miss match happened show 0 in result.I used for loop but it was very slow and took a lot of time.can any one help please?
you can follow the below link : -
http://canlu.blogspot.in/2009/05/how-to-compare-two-datatables-in-adonet.html
https://www.dotnetperls.com/datatable-compare-rows
The only possible solution is the looping , but the above two links gives you some built-in collections that may ease the looping and give you performance .
First of all you need to provide some code and same expectation.
if you have a table with 300 columns I think you broke some fundamental normalization database design role.
if you want the result as t1.c1 = t2.c2 ... you can try to perform this in query with join as more performant way then loop through every columns for every rows

Simplifying complexity in for a table object structure

I have an object structure that is mimicking the properties of an excel table. So i have a table object containing properties such as title, header row object and body row objects. Within the header row and each body row object, i have a cell object containing info on each cell per row. I am looking for a more efficient way to store this table structure since in one of my uses for this object, i am printing its structure to screen. Currently, i am doing an O(n^2) complexity for printing each row for each cell:
foreach(var row in Table.Rows){
foreach(var cell in row.Cells){
Console.WriteLine(cell.ToString())
}
}
Is there a more efficient way of storing this structure to avoid the n^2? I ask this because this printing functionality exists in another n^2 loop. Basically i have a list of tables titles and a list of tables. I need to find those tables whose titles are in the title list. Then for each of those tables, i need to print their rows and the cells in each row. Can any part of this operation be optimized by using a different data structure for storage perhaps? Im not sure how exactly they work but i have heard of hashing and dictionary?
Thanks
Since you are looking for tables with specific titles, you could use a dictionary to store the tables by title
Dictionary<string,Table> tablesByTitle = new Dictionary<string,Table>();
tablesByTitle.Add(table.Title, table);
...
table = tablesByTitle["SomeTableTitle"];
This would make finding a table an O(1) operation. Finding n tables would be an O(n) operation.
Printing the tables then of cause depends on the number of rows and columns. There is nothing, which can change that.
UPDATE:
string tablesFromGuiElement = "Employees;Companies;Addresses";
string[] selectedTables = tablesFromGuiElement.Split(';');
foreach (string title in selectedTables) {
Table tbl = tablesByTitle[title];
PrintTable(tbl);
}
There isn't anything more efficient than an N^2 operation for outputting an NxN matrix of values. Worst-case, you will always be doing this.
Now, if instead of storing the values in a multidimensional collection that defines the graphical relationship of rows and columns, you put them in a one-dimensional collection and included the row-column information with each cell, then you would only need to iterate through the cells that had values. Worst-case is still N^2 for a table of N rows and N columns that is fully populated (the one-dimensional array, though linear to enumerate, will have N^2 items), but the best case would be that only one cell in that table is populated (or none are) which would be constant-time.
This answer applies to the, printing the table part, but the question was extended.
for the getting the table part, see the other answer.
No, there is not.
Unless perhaps your values follow some predictable distribution, then you could use a function of x and y and store no data at all, or maybe a seed and a function.
You could cache the print output in a string or StringBuider if you require it multiple times.
If there is enough data I guess you might apply some compression algorithm but I wouldn't say that was simpler or more efficient.

Update Cells in an In Memory Data table

Ok, the story so far is i have a datatable, about 10,000 lines or so. and about 150 columns per row. ao more or less 150.000 cells in this datatable. i have all updateing working fine
but the updating is slow.
I need to iterate through a list of porcedures then update cells in the table depending on the procedure. when i am completle finished updating about 75% - 80% of all the cells will have changed.
I am using a search on the table using a primary key index assigened to an INT value.
datatable.rows.find() seems a a little quicker
datatable.select ( expression ) almost the same but little difference.
Is there any ideas who may speed this up. uppon changing 80,000 - 120,000 cells it can take minutes.
anyideas would be great thanks.
A study in the March 2005 issue of ASP.Net Pro magazine compared various approaches involving DataTables, DataViews and DataReaders. Their findings were that the fastest approach depended upon the number of records involved.
For 50 records or less, by far the fastest search method was a For..Next loop on the DataTable's DataRowCollection. That approach was followed by DataRowCollection.Find. Many times slower were re-retrieving the data with a DataReader, using DataView.RowFilter, and worst of all using DataTable.Select.
For 500 - 5,000 records, the fastest search was with DataRowCollection.Find, followed closely by DataTable.Select. The worst by far for this range of records were DataView.RowFilter and DataView.FindRows.
For 50,000 records, the fastest search was accomplished with DataRowCollection.Find. In a close second place was re-retrieving the records with a DataReader. The worst by far for this category were DataView.RowFilter and DataView.FindRows.

The right data structure to use for an Excel clone

Let say I'm working on an Excel clone in C#.
My grid is represented as follows:
private struct CellValue
{
private int column;
private int row;
private string text;
}
private List<CellValue> cellValues = new List<CellValue>();
Each time user add a text, I just package it as CellValue and add it into cellValues. Given a CellValue type, I can determine its row and column in O(1) time, which is great. However, given a column and a row, I need to loop through the entire cellValues to find which text is in that column and row, which is terribly slow. Also, given a text, I too need to loop through the entire thing. Is there any data structure where I can achive all 3 task in O(1) time?
Updated:
Looking through some of the answers, I don't think I had found one that I like. Can I:
Not keeping more than 2 copies of CellValue, in order to avoid sync-ing them. In C world I would have made nice use of pointers.
Rows and Columns can be dynamically added (Unlike Excel).
I would opt for a sparse array (a linked list of linked lists) to give maximum flexibility with minimum storage.
In this example, you have a linked list of rows with each element pointing to a linked list of cells in that row (you could reverse the cells and rows depending on your needs).
|
V
+-+ +---+ +---+
|1| -> |1.1| ----------> |1.3| -:
+-+ +---+ +---+
|
V
+-+ +---+
|7| ----------> |7.2| -:
+-+ +---+
|
=
Each row element has the row number in it and each cell element has a pointer to its row element, so that getting the row number from a cell is O(1).
Similarly, each cell element has its column number, making that O(1) as well.
There's no easy way to get O(1) for finding immediately the cell at a given row/column but a sparse array is as fast as it's going to get unless you pre-allocate information for every possible cell so that you can do index lookups on an array - this would be very wasteful in terms of storage.
One thing you could do is make one dimension non-sparse, such as making the columns the primary array (rather than linked list) and limiting them to 1,000 - this would make the column lookup indexed (fast), then a search on the sparse rows.
I don't think you can ever get O(1) for a text lookup simply because text can be duplicated in multiple cells (unlike row/column). I still believe the sparse array will be the fastest way to search for text, unless you maintain a sorted index of all text values in another array (again, that can make it faster but at the expense of copious amounts of memory).
I think you should use one of the indexed collections to make it work reasonably fast, the perfect one is the KeyedCollection
You need to create your own collection by extending this class. This way your object will still contain row and column (so you will not loose anything), but you will be able to search for them. Probably you will have to create a class encapsulating (row, column) and make it the key (so make it immutable and override equals and get hash code)
I'd create
Collection<Collection<CellValue>> rowCellValues = new Collection<Collection<CellValue>>();
and
Collection<Collection<CellValue>> columnCellValues = new Collection<Collection<CellValue>>();
The outer collection has one entry for each row or column, indexed by the row or column number, the inner collection has all the cells in that row or column. These collections should be populated as part of the process that creates new CellValue objects.
rowCellValues[newCellValue.Row].Add(newCellValue);
columnCellValues[newCellValue.Column].Add(newCellValue);
This smells of premature optimization.
That said, there's a few features of excel that are important in choosing a good structure.
First is that excel uses the cells in a moderately non-linear fashion. The process of resolving formulas involves traversing the spreadsheets in effectively random order. The structure will need a mechanism of easily looking up values of random keys cheaply, marking them dirty, resolved, or unresolvable due to circular reference. It will also need some way to know when there are no more unresolved cells left, so that it can stop working. Any solution that involves a linked list is probably sub-optimal for this, since they would require a linear scan to get those cells.
Another issue is that excel displays a range of cells at one time. This may seem trivial, and to a large extent it is, but It will certainly be ideal if the app can pull all of the data needed to draw a range of cells in one shot. part of this may be keeping track of the display height and width of the rows and columns, so that the display system can iterate over the range until the desired width and height of cells has been collected. The need to iterate in this manner may preclude the use of a hashing strategy for sparse storage of cells.
On top of that, there are some weaknesses of the representational model of spreadsheets that could be addressed much more effectively by taking a slightly different approach.
For example, column aggregates are sort of clunky. A column total is easy enough to implement in excel, but it has a sort of magic behavior that works most of the time but not all of the time. For instance, if you add a row into the aggregated area, further calculations on that aggregate may continue to work, or not, depending on how you added it. If you copy and insert a row (and replace the values) everything works fine, but if you cut and paste the cells one row down, things don't work out so well.
Given that the data is 2-dimensional, I would have a 2D array to hold it in.
Well, you could store them in three Dictionaries: two Dictionary<int,CellValue> objects for rows and columns, and one Dictionary<string,CellValue> for text. You'd have to keep all three carefully in sync though.
I'm not sure that I wouldn't just go with a big two-dimensional array though...
If it's an exact clone, then an array-backed list of CellValue[256] arrays. Excel has 256 columns, but a growable number of rows.
If rows and columns can be added "dynamically", then you shouldn't store the row/column as an numeric attribute of the cell, but rather as a reference to a row or column object.
Example:
private struct CellValue
{
private List<CellValue> _column;
private List<CellValue> _row;
private string text;
public List<CellValue> column {
get { return _column; }
set {
if(_column!=null) { _column.Remove(this); }
_column = value;
_column.Add(this);
}
}
public List<CellValue> row {
get { return _row; }
set {
if(_row!=null) { _row.Remove(this); }
_row = value;
_row.Add(this);
}
}
}
private List<List<CellValue>> MyRows = new List<List<CellValue>>;
private List<List<CellValue>> MyColumns = new List<List<CellValue>>;
Each Row and Column object is implemented as a List of the CellValue objects. These are unordered--the order of the cells in a particular Row does not correspond to the Column index, and vice-versa.
Each sheet has a List of Rows and a list of Columns, in order of the sheet (shown above as MyRows and MyColumns).
This will allow you to rearrange and insert new rows and columns without looping through and updating any cells.
Deleting a row should loop through the cells on the row and delete them from their respective columns before deleting the row itself. And vice-versa for columns.
To find a particular Row and Column, find the appropriate Row and Column objects, then find the CellValue that they contain in common.
Example:
public CellValue GetCell(int rowIndex, int colIndex) {
List<CellValue> row = MyRows[rowIndex];
List<CellValue> col = MyColumns[colIndex];
return row.Intersect(col)[0];
}
(I'm a little fuzzy on these Extension methods in .NET 3.5, but this should be in the ballpark.)
If I recall correctly, there was an article about how Visicalc did it, maybe in Byte Magazine in the early 80s. I believe it was a sparse array of some sort. But I think there were links both up-and-down and left-and-right, so that any given cell had a pointer to the cell above it (however many cells away that may be), below it, to the left of it, and to the right of it.

Categories