C# Compare 2 DataTables fill changes into third DataTable - c#

Well, lets say I've got 2 Datatables from the beginning.
The first one (source) contains the data from da database.
The second one also contains data from a database, nut these values have to be updated into the first database.
Unfortunately they don't have the same structure.
The sourcedatatable has some additional columns which the second has not.
For example:
First DT: ID | Name | Company | Age
Second DT: Name | Company | Age
I want the FIRST DataTable to be updated with the values from the second DataTable IF THERE ARE SOME DIFFERENCES (and only the differences).
Any ideas on how to work that out? Any suggestions about performance, even if using very big databases?

If you are working with a big amount of data, I would suggest doing things as close to the DB as possible(if possible within a stored procedure).
If sticking to .NET is mandatory, these are the options I would consider given the description of your scenario you provided.
First I would choose how to load the data (the order in which I would consider them):
Generate Entities (LINQ to SQL).
Use F# Type providers
Use ADO directly
After this, I would either:
use .Select and .Except on the IQueryable sources, or
do something similar to http://canlu.blogspot.ro/2009/05/how-to-compare-two-datatables-in-adonet.html, if by some chance I was using ADO.NET
It is rather hard to give a specific and exact answer if you do not provide more information on the type of data, amount, hardware, database type.
Note: whichever solution you choose, you should keep in mind that it is hard to compare things of different structure, so an extra step to add empty columns to the one that is missing columns is required.

This code is just for reference, I did not have time to test it. It might reuire a bit of tweaking. Try something like:
var a = new DataTable();
a.Columns.Add("ID");
a.Columns.Add("Name");
a.Columns.Add("Company");
a.Columns.Add("Age");
var b = new DataTable();
b.Columns.Add("Name");
b.Columns.Add("Company");
b.Columns.Add("Age");
var destination = a.AsEnumerable();
var localValues = b.AsEnumerable();
var diff = destination.Join(localValues, dstRow => dstRow["Name"], srcRow => srcRow["Name"],
(dstRow, srcRow) =>
new {Destination = dstRow, Source = srcRow})
.Where(combinedView =>
combinedView.Destination["Age"] != combinedView.Source["Age"] ||
combinedView.Destination["Company"] != combinedView.Source["Company"]);
Also, I would really move to a proper DB, and maybe improve the data model.

Related

Get min, avg and max values from suppliers table

I have a SQL query that returns a table with 4 columns to my c# project (multiple records for each supplier)
Suppliers | Data before | Date After | Dates diff
What I need? Minimum, average and maximum days per supplier to all the 3 columns (date before, date after and date diff).
What's an elegant way of achievement this? Either c# or SQL are viable options.
Note: I did it through c# where I made a datatable from the query. Then I made a list with each supplier. Then for-each supplier in the list I runned through each row in the datatable, and when suppliers in list and in datatable were the same I did the math and added the results to a new datatable. Newbie style.
I guess that depends on what you mean by elegant....
To some people that means you should update your sql query to use Min(), Max(), Avg(). etc.
To others that would be creating a POCO, adding it to an Enumerable of some sort and using LINQ to get what you want.
Personally I believe both are viable and to me the decision is really dependent on too many other factors to mention, but in short, does it make sense to perform these operations locally to your program or rely on the database to do it for you?
There isnt really a good single answer to be given here.

Retrieve just some columns using an ORM

I'm using Entity Framework and SQL Server 2008 with the Database First approach.
My problem is :
I have some tables that holds many many columns (~100), and when I try to retrieve a lot of rows it takes a significant time before it returns the results, even if sometimes I need to use just 3 or 4 columns from that table.
I passed half a day in Stackoverflow trying to find a way to solve this problem, and I came up with two solutions :
Using stored procedures to retrieve data with the columns I want.
Edit the .edmx (xml) and the .cs files to remove the columns that I won't use.
My problem again is :
If I use stored procedures to retrieve the data with the columns that I want, Entity Framework loose it benefit and I can use ADO.NET instead of it and call directly the stored procedures ...
I can't take the second solution, because every time I make a change in the database, I'm obliged to regenerate the .edmx file and I loose the changes I made before :'(
Is there a way to do this somehow in Entity Framework ? Is that possible !
I know that other ORMs exist like NHibernate or Dapper, but I don't know if they can offer this feature without causing a lot of pain.
You don't have to return every column each time. You can specify which columns you need.
var query = from t in db.Table
select new { t.Column1, t.Column2, t.Column3 };
Normally if you project the data into a different poco it will do this automatically in EF / L2S etc:
var slim = from row in db.Customers
select new CustomerViewModel {
Name = row.Name, Id = row.Id };
I would expect that to only read 2 columns.
For tools like dapper: since you control the SQL, only specify columns you want - don't use *
You can create a second project with a code-first DbContext, POCO's and maps that return the subset of columns that you require.
This is a case of cut and paste code but it will get you what you need.
You can just create classes and project the data into them but I'm not sure you can make updates using this method. You can use anonymous types within a single method but you'll need actual classes to pass around between methods.
Another option would be to move to a code first development.

LINQ to DataSet Dataclass assignment question

I'm working on a Silverlight project trying to access a database using LINQ To DataSet and then sending data over to Silverlight via .ASMX web service.
I've defined my DataSet using the Server Explorer tool (dragging and dropping all the different tables that I'm interested in). The DataSet is able to access the server and database with no issues.
Below is code from one of my Web Methods:
public List<ClassSpecification> getSpecifications()
{
DataSet2TableAdapters.SpecificationTableAdapter Sta = new DataSet2TableAdapters.SpecificationTableAdapter();
return (from Spec in Sta.GetData().AsEnumerable()
select new ClassSpecification()
{
Specification = Spec.Field<String>("Specification"),
SpecificationType = Spec.Field<string>("SpecificationType"),
StatusChange = Spec.Field<DateTime>("StatusChange"),
Spec = Spec.Field<int>("Spec")
}).ToList<ClassSpecification>();
}
I created a "ClassSpecification" data class which is going to contain my data and it has all the table fields as properties.
My question is, is there a quicker way of doing the assignment than what is shown here? There are actually about 10 more fields, and I would imagine that since my DataSet knows my table definition, that I would have a quicker way of doing the assignment than going field by field. I tried just "select new ClassSpecification()).ToList
Any help would be greatly appreciated.
First, I'll recommend testing out different possibilities using LINQPad, which is free and awesome.
I can't quite remember what you can do from the table adapter, but you should be able to use the DataSet to get at the data you want, e.g.
string spec = myDataSet.MyTable.Rows[0] // or FindBy... or however you are choosing a row
.Specification;
So you might be able to do
foreach(var row in myDataSet.MyTable.Rows) {
string spec = row.Specification;
...
}
Or
return (from row in myDataSet.Specification
select new ClassSpecification()
{
Specification = row.Specification,
SpecificationType = row.SpecificationType,
StatusChange = row.StatusChange,
Spec = row.Spec,
}).ToList<ClassSpecification>();
Or even
return myDataSet.Specification.Cast<ClassSpecification>()
Not sure if the last one will work, but you can see that there are several ways to get what you want. Also, in my tests the row is strongly typed, so you shouldn't need to create a new class in which to put the data - you should just be able to use the existing "SpecificationRow" class. (In fact, I believe that this is the anemic domain model anti-pattern.)
So are you using a dataset for lack of a Linq provider to the database? That is the only reason I would consider this move. For instance, with Linq to Sql you can drag the table out and drop it. Then you have instant objects with the shape you want for it.

How to access Parents Child in DataColumn Expression

I've a DataSet with 3 DataTables:
dtPerson
dtSalary
dtFriend
Every person has salaries, and every person has one friend.
I've added a column dcHisFriend into dtSalary and would like to display friend name of a person owning specified salary.
So dtPerson has a column NAME, dtSalary has column VALUE and dtFriend has a column NAME.
I've added column dcHisFriend and set Expression to this:
dtSalary.Add(dcHisFriend);
dcHisFriend.Expression =
"Max(Parent.Child(Persons_Friend).NAME)";
But this obviously does not work.
Could you please tell me how to
put into column dcHisFriend name of a friend of a person with a salary into salary table?
I think, there is no way how to access any other row in "Expression" in DataColumn.
The only way, how to achieve similar behaviour is hook to DataColumnChanged event on DataTables where are source data and then set the computed value to the regular column (=column without expression).
There is actually a way to do this, provided the relationships between your tables are 1 to 1 (though missing rows aren't a huge problem): Create two relations rather than one, i.e.
var joinColT1 = table1.Columns["ID"];
var joinColT2 = table2.Columns["FK_IDT1"];
var rel1 = new DataRelation("R1To2", joinColT1, joinColT2, false);
var rel2 = new DataRelation("R2To1", joinColT2, joinColT1, false);
theDataSet.Relations.Add(rel1);
theDataSet.Relations.Add(rel2);
// Add the column you're after
var hisFriend = new DataColumn("HisFriend", typeof(string), "Parent([R2To1]).[HisFriend]");
table1.Columns.Add(hisFriend);
// Add a back-reference to the other table against the friend if you want, too
var hisFriendsSalary = new DataColumn("HisFriendsSalary", typeof(decimal) "Parent([R1To2]).[Salary]");
table2.Columns.Add(hisFriendsSalary);
A couple of notes, though: first, when I was first experimenting with this, I got syntax errors without the square brackets around the relation names in the expression. That might just have been to do with the names I'd used for the relations though.
Secondly, I believe the result of Expressions are stored against the rows (they aren't computed "just in time" on access, they're computed when values change, and the results are kept). That means you are storing the data twice by using this approach. Sometimes that's fine, and sometimes it isn't.
Third, you'll note that I'm not using constraints. That's because in my typical use-cases, I'm not expecting every row to have an analogue in the other table (that's why there are two tables in the first place, quite often!). That may (I haven't checked dotnetframework.org) have an impact on performance.

Programming pattern using typed datasets in VS 2008

I'm currently doing the following to use typed datasets in vs2008:
Right click on "app_code" add new dataset, name it tableDS.
Open tableDS, right click, add "table adapter"
In the wizard, choose a pre defined connection string, "use SQL statements"
select * from tablename and next + next to finish. (I generate one table adapter for each table in my DB)
In my code I do the following to get a row of data when I only need one:
cpcDS.tbl_cpcRow tr = (cpcDS.tbl_cpcRow)(new cpcDSTableAdapters.tbl_cpcTableAdapter()).GetData().Select("cpcID = " + cpcID)[0];
I believe this will get the entire table from the database and to the filtering in dotnet (ie not optimal), is there any way I can get the tableadapter to filer the result set on the database instead (IE what I want to is send select * from tbl_cpc where cpcID = 1 to the database)
And as a side note, I think this is a fairly ok design pattern for getting data from a database in vs2008. It's fairly easy to code with, read and mantain. But I would like to know it there are any other design patterns that is better out there? I use the datasets for read/update/insert and delete.
A bit of a shift, but you ask about different patterns - how about LINQ? Since you are using VS2008, it is possible (although not guaranteed) that you might also be able to use .NET 3.5.
A LINQ-to-SQL data-context provides much more managed access to data (filtered, etc). Is this an option? I'm not sure I'd go "Entity Framework" at the moment, though (see here).
Edit per request:
to get a row from the data-context, you simply need to specify the "predicate" - in this case, a primary key match:
int id = ... // the primary key we want to look for
using(var ctx = new MydataContext()) {
SomeType record = ctx.SomeTable.Single(x => x.SomeColumn == id);
//... etc
// ctx.SubmitChanges(); // to commit any updates
}
The use of Single above is deliberate - this particular usage [Single(predicate)] allows the data-context to make full use of local in-memory data - i.e. if the predicate is just on the primary key columns, it might not have to touch the database at all if the data-context has already seen that record.
However, LINQ is very flexible; you can also use "query syntax" - for example, a slightly different (list) query:
var myOrders = from row in ctx.Orders
where row.CustomerID = id && row.IsActive
orderby row.OrderDate
select row;
etc
There is two potential problem with using typed datasets,
one is testability. It's fairly hard work to set up the objects you want to use in a unit test when using typed datasets.
The other is maintainability. Using typed datasets is typically a symptom of a deeper problem, I'm guessing that all you business rules live outside the datasets, and a fair few of them take datasets as input and outputs some aggregated values based on them. This leads to business logic leaking all over the place, and though it will all be honky-dory the first 6 months, it will start to bite you after a while. Such a use of DataSets are fundamentally non-object oriented
That being said, it's perfectly possible to have a sensible architecture using datasets, but it doesn't come naturally. An ORM will be harder to set up initially, but will lend itself nicely to writing maintainable and testable code, so you don't have to look back on the mess you made 6 months from now.
You can add a query with a where clause to the tableadapter for the table you're interested in.
LINQ is nice, but it's really just shortcut syntax for what the OP is already doing.
Typed Datasets make perfect sense unless your data model is very complex. Then writing your own ORM would be the best choice. I'm a little confused as to why Andreas thinks typed datasets are hard to maintain. The only annoying thing about them is that the insert, update, and delete commands are removed whenever the select command is changed.
Also, the speed advantage of creating a typed dataset versus your own ORM lets you focus on the app itself and not the data access code.

Categories