How do I compare two datasets for equality - c#

I have two datasets each with one data table pulled from different sources and I need to know if there are any differences in the data contained in the data tables. I'm trying to avoid looping and comparing each individual record or column, although there may be no other way. All I need to know is if there is a difference in the data, I do not need to know the details of any difference.
I have tried the below code, but it appears that dataset.Merge does not update rowstatus so dataset.HasChanges() always returns false. Any help is appreciated:
var currentDataSet = GetSomeData();
var historicalDataSet = GetSomeHistoricalData();
historicalDataSet.Merge(currentDataSet);
if (historicalDataSet.HasChanges()) DoSomeStuff();

I don't know of any built-in support for this and I wouldn't expect it either. So you'll have to do this by yourself in some way.
The most obvious way would be a brute force, table by table and row by row approach.
If you can rely on certain factors to be the same, ie exactly the same naming, ordering of records etc then you could test if saving both as XML and comparing the results might be an efficient trick.

Related

Data structure to represent statistical table lookup data

I'm performing some statistical calculations where I need to lookup values from various tables dynamically.
I've tried representing the following data as JSON, and querying the relevant value:
using JsonDocument doc = JsonDocument.Parse(json);
JsonElement w_test_table = doc.RootElement;
w_test_table.GetProperty(factor).GetProperty(sampleCount.ToString()).GetDouble();
However this feels like I'm going down a bad path.
I have multiple tables I need to lookup for each calculation, so I'm starting to think there's a better way to do this that I'm unaware of.
My concern with storing this in the DB is that I'd have multiple round trips querying the DB to resolve the value.
The calculations I'm performing are done on a collection of sample sets - multiple sample sets I'm running the stats for, hence multiple times I need to resolve the values from various lookup tables, the input values will differ each time.
Any ideas on how I can represent these kinds of lookup tables in C# would be appreciated.
You could use a Data Table or a Multi-Dimensional array.
However since you may have a specific use case, you may want to have a custom class that holds the data inside using one of the above structures and have methods that abstract your specific logic.

Use linq to iterate through large DB tables

I have two tables: Foo and Bar. For each row in Foo, I now want to add a row in Bar which references the respective Foo record. Foo will likely contain several millions of records.
Normally this answer would have been perfect: linq to sql - loop through table data and set value. But as it says on the tin, using the following line is not particularly ideal for large tables.
List<User> users = dc.Users.ToList();
Since caching the entire table in a List<> is not going to work, what other options do I have? Is there an elegant way to "page through" the records, for instance? Since I am quite sure that this is a relatively common problem, I think it's likely that there is a best practice for this too. I have not been able to find it, however.
Your talking about several million rows of data, then Linq is not your friend.
Consider using a stored procedure or, if you like, DbContext.ExecuteCommand.
Both will result in a huge performance gain.
You can work with predefined batches using .Skip() and .Take() methods. Another thing to consider is using a trigger so that you don't need to worry about the second table at all.

How to manage a million records?

I really need an expert's help to answer my query.
Here is the scenario:
Im using an sql select query to retrieve a million records.
I need to perform sorting and grouping on the resultant records which im storing in a datatable( in one execution)
and looping through it for grouping and sorting it.
I know this is so childish and not the right way to process it.
How can i manage the million records effectively and apply the grouping and sorting to it?
Really need help out here. Heard of executing the select query batch wise but how to implement the grouping and sorting while we dont have the entire data in hand?
I cannot go for sql order by and group by directly and that's against my requirement.
Here is what i'm doing right now:
I have the following objects, i.e the column names for grouping and Sorting
List<Group> groupList;
List<Sort> sortList;
DataTable reportData; // Here im having the entire records from db
Im looping through the 'reportData' row by row and matches the current and previous row for the custom grouping and sorting. Would like to know how the same can be done when we are using a batchwise execution or any alternative solution is there?
I need to perform sorting and grouping on the resultant records which
im storing in a datatable( in one execution) and looping through it
for grouping and sorting it.
What for?
Seriously.
Do not pull then try plaing smart with a stupid object model behind (and datasets are not particularly smart, sorry).
Group and sort in your select statement, pull the data lready grouped and joined and be done with it.
A million records was a small amount of data for sql server when the original version was release (4.2 it was, a port of sysase sql server) 17 years of so ago. These days it is something that fits likely into the processor thiird level cache and is nothing a proper sql server even realizes it has just processed.
SQL is particulaly good ad doing projects and ever since they indoruced MARS you can even run multiple queries over one connection, which comes in handy here.
So, go back - throw away the dataset and "I try to program a sort algo" and create proper SQL statements to pull the data as you need it.
Sounds like you should implement Partition Pruning. Partitioning will allow for a separation of content like you are requesting in order to have faster queries.
If I understood correctly, in your case, I would create a temporary database table with the structure I want especially to cover my grouping.
Then I would select the records from main tables and insert them to the temporary one appying all modifications including grouping.
A specific index on how you want them sorted should be also applied.
After that, just select from this table, do what you have to do, and finally if the data are not needed any more, delete the temporary table.
I would choose the above solution because a million of records in memory smells trouble to me...
For example:
1. Lets assume that you would like to group them by their DocumentTypeID
var groupByType = reportData.GroupBy(g=>g.DocumentTypeID);
2. Sorting Alphabetically
var sortAlphabetically = reportData.OrderBy(g=>g.DocumentName);
3. Grouping and Sorting
var groupAndSort = reportData.GroupBy(g=>g.DocumentTypeID)
.OrderBy(g=>g.DocumentName);
4. Sort and Group
var groupAndSort = reportData.OrderBy(g=>g.DocumentName)
.GroupBy(g=>g.DocumentTypeID);
5. Multiple Grouping and sorting
var multipleGroupAndSort = reportData.GroupBy(g=>g.DocumentTypeID)
.GroupBy(g=>g.CreatedOnDate.Month)
.OrderBy(g=>g.DocumentName);
so on and so forth...
But I would still discourage bringing million rows to application. It will cost memory. There are of course ways to manage it through stored procedures etc.

How should I manage large DataTables?

For reasons that don't make a lot of sense (Read: Not my decision) I need to keep a large number of rows, about ~90,000, in a DataTable and I do not have the option of using a database.
I need to be able to search the DataTable efficiently to find rows that match some basic criteria. For example, I might be looking at a row that has the value 2 in two specific columns.
What is the best way to do this?
Edit: Please take a look at https://chat.stackoverflow.com/transcript/message/62648#62648 for more details; after I work on this I will try and summarize the extra details from the chat here as well as provide my solution.
You could easily use DataTable.Select()
The solution I ended up using for this painfully awkward and inconvenient situation was to use DataTable.Select(), populate a new DataTable and then use the same operation to select the rows I needed from the refined DataTable.
I think that this solution is clumsy, but then again the constraints on the problem were somewhat unrealistic seeing as I was on a tight schedule as well.

IMultipleResults: how do I deal with multiple result sets from a stored proc when they don't map to types?

This post on SO answers most of the questions I have (much thanks to Pure.Krome for the thorough response) about how to build a query that returns multiple results. However, in the case that I'm working with my tables that are coming back are sort of dependent on how the proc behaves. Can't change the proc. The results that are coming back are a set of datatables that don't map to types at all (for example, the first table is a mish mash of parts of the Customers table and the Orders table, the second table, if present, will be debugging output, then there might be a third table and so on).
Do I have to do this as a dataset/datadapter etc? Or is this possible with LINQ?
LINQ is an ORM (albeit a fairly simple one), with the "O" being (importantly) "object". If you can't predict the layout of the object returned in each grid, then it isn't a good fit for ORM.
Personally I wouldn't jump from LINQ to DataTable (but maybe I'm just biased against DataTable ;-p) - I would use SqlCommand.ExecuteReader and do my own object (etc) mapping. But maybe it might save time to just use a DataSet... YMMV etc.

Categories