Join 2 DataTables on dynamic number of columns - c#

I'm trying to join two DataTables on a dynamic number of columns. I've gotten as far as the code below. The problem is the ON statement of the join. How can I make this dynamic based on how many column names are in the list "joinColumnNames".
I was thinking I will need to build some sort of expression tree, but I can't find any examples of how to do this with multiple join columns and with the DataRow object which doesn't have properties for each column.
private DataTable Join(List<string> joinColumnNames, DataTable pullX, DataTable pullY)
{
DataTable joinedTable = new DataTable();
// Add all the columns from pullX
foreach (string colName in joinColumnNames)
{
joinedTable.Columns.Add(pullX.Columns[colName]);
}
// Add unique columns from PullY
foreach (DataColumn col in pullY.Columns)
{
if (!joinedTable.Columns.Contains((col.ColumnName)))
{
joinedTable.Columns.Add(col);
}
}
var Join = (from PX in pullX.AsEnumerable()
join PY in pullY.AsEnumerable() on
// This must be dynamic and join on every column mentioned in joinColumnNames
new { A = PX[joinColumnNames[0]], B = PX[joinColumnNames[1]] } equals new { A = PY[joinColumnNames[0]], B = PY[joinColumnNames[1]] }
into Outer
from PY in Outer.DefaultIfEmpty<DataRow>(pullY.NewRow())
select new { PX, PY });
foreach (var item in Join)
{
DataRow newRow = joinedTable.NewRow();
foreach (DataColumn col in joinedTable.Columns)
{
var pullXValue = item.PX.Table.Columns.Contains(col.ColumnName) ? item.PX[col.ColumnName] : string.Empty;
var pullYValue = item.PY.Table.Columns.Contains(col.ColumnName) ? item.PY[col.ColumnName] : string.Empty;
newRow[col.ColumnName] = (pullXValue == null || string.IsNullOrEmpty(pullXValue.ToString())) ? pullYValue : pullXValue;
}
joinedTable.Rows.Add(newRow);
}
return joinedTable;
}
Adding a specific example to show input/output using 3 join columns (Country, Company, and DateId):
Pull X:
Country Company DateId Sales
United States Test1 Ltd 20160722 $25
Canada Test3 Ltd 20160723 $30
Italy Test4 Ltd 20160724 $40
India Test2 Ltd 20160725 $35
Pull Y:
Country Company DateId Downloads
United States Test1 Ltd 20160722 500
Mexico Test2 Ltd 20160723 300
Italy Test4 Ltd 20160724 900
Result:
Country Company DateId Sales Downloads
United States Test1 Ltd 20160722 $25 500
Canada Test3 Ltd 20160723 $30
Mexico Test2 Ltd 20160723 300
Italy Test4 Ltd 20160724 $40 900
India Test2 Ltd 20160725 $35

var Join =
from PX in pullX.AsEnumerable()
join PY in pullY.AsEnumerable()
on string.Join("\0", joinColumnNames.Select(c => PX[c]))
equals string.Join("\0", joinColumnNames.Select(c => PY[c]))
into Outer
from PY in Outer.DefaultIfEmpty<DataRow>(pullY.NewRow())
select new { PX, PY };
Another way is to have both DataTable in a DataSet and use DataRelation
How To: Use DataRelation to perform a join on two DataTables in a DataSet?

Since you are using LINQ to Objects, there is no need to use expression trees. You can solve your problem with a custom equality comparer.
Create an equality comparer that can compare equality between two DataRow objects based on the values of specific columns. Here is an example:
public class MyEqualityComparer : IEqualityComparer<DataRow>
{
private readonly string[] columnNames;
public MyEqualityComparer(string[] columnNames)
{
this.columnNames = columnNames;
}
public bool Equals(DataRow x, DataRow y)
{
return columnNames.All(cn => x[cn].Equals(y[cn]));
}
public int GetHashCode(DataRow obj)
{
unchecked
{
int hash = 19;
foreach (var value in columnNames.Select(cn => obj[cn]))
{
hash = hash * 31 + value.GetHashCode();
}
return hash;
}
}
}
Then you can use it to make the join like this:
public class TwoRows
{
public DataRow Row1 { get; set; }
public DataRow Row2 { get; set; }
}
private static List<TwoRows> LeftOuterJoin(
List<string> joinColumnNames,
DataTable leftTable,
DataTable rightTable)
{
return leftTable
.AsEnumerable()
.GroupJoin(
rightTable.AsEnumerable(),
l => l,
r => r,
(l, rlist) => new {LeftValue = l, RightValues = rlist},
new MyEqualityComparer(joinColumnNames.ToArray()))
.SelectMany(
x => x.RightValues.DefaultIfEmpty(rightTable.NewRow()),
(x, y) => new TwoRows {Row1 = x.LeftValue, Row2 = y})
.ToList();
}
Please note that I am using method syntax because I don't think that you can use a custom equality comparer otherwise.
Please note that the method does a left outer join, not a full outer join. Based on the example you provided, you seem to want a full outer join. To do this you need to do two left outer joins (see this answer). Here is how the full method would look like:
private static DataTable FullOuterJoin(
List<string> joinColumnNames,
DataTable pullX,
DataTable pullY)
{
var pullYOtherColumns =
pullY.Columns
.Cast<DataColumn>()
.Where(x => !joinColumnNames.Contains(x.ColumnName))
.ToList();
var allColumns =
pullX.Columns
.Cast<DataColumn>()
.Concat(pullYOtherColumns)
.ToArray();
var allColumnsClone =
allColumns
.Select(x => new DataColumn(x.ColumnName, x.DataType))
.ToArray();
DataTable joinedTable = new DataTable();
joinedTable.Columns.AddRange(allColumnsClone);
var first =
LeftOuterJoin(joinColumnNames, pullX, pullY);
var resultRows = new List<DataRow>();
foreach (var item in first)
{
DataRow newRow = joinedTable.NewRow();
foreach (DataColumn col in joinedTable.Columns)
{
var value = pullX.Columns.Contains(col.ColumnName)
? item.Row1[col.ColumnName]
: item.Row2[col.ColumnName];
newRow[col.ColumnName] = value;
}
resultRows.Add(newRow);
}
var second =
LeftOuterJoin(joinColumnNames, pullY, pullX);
foreach (var item in second)
{
DataRow newRow = joinedTable.NewRow();
foreach (DataColumn col in joinedTable.Columns)
{
var value = pullY.Columns.Contains(col.ColumnName)
? item.Row1[col.ColumnName]
: item.Row2[col.ColumnName];
newRow[col.ColumnName] = value;
}
resultRows.Add(newRow);
}
var uniqueRows =
resultRows
.Distinct(
new MyEqualityComparer(
joinedTable.Columns
.Cast<DataColumn>()
.Select(x => x.ColumnName)
.ToArray()));
foreach (var uniqueRow in uniqueRows)
joinedTable.Rows.Add(uniqueRow);
return joinedTable;
}
Please note also how I clone the columns. You cannot use the same column object in two tables.

Related

join 2 datatable with and condition caluse LINQ

hi guy i have 2 datatable like this
dt1
id (1,2,3)
name (abc,xyz,def)
num(11,12,13)
dt2
id (1,2,3)
name (abc,xyz,def)
num_from (10,13,11)
num_to (14,14,14)
how could i select id which have num between num_from and num_to using linq
i tried this
dtres = (from t1 in dt1.AsEnumerable()
join t2 in dt1.AsEnumerable() on t1.Field<string>("ID") equals t2.Field<string>("ID")
where t1["num"]>= t2["num_from"] &&
t1["num"]<= t2["num_to"]
select t1).CopyToDataTable();
Consider the following code:
It produce the result as IEnumerable<AnonymousType>, not DataRow, so cannot apply CopyToDataTable() extension method, instead I have provided a custom extension method at the bottom of this code ToDataTable, you can change the number of columns from the final result, I have included everything.
My understanding from your question is you need a filter such that Num in DataTable1 is between Num_From and Num_To in the Datatable2
var resultDataTable =
dt1.AsEnumerable().Join(dt2.AsEnumerable(), t1 => t1["id"], t2 => t2["id"], (t1, t2) => new { t1, t2})
.Where(t => (int.Parse(t.t2["num_from"].ToString()) <= int.Parse(t.t1["num"].ToString()) && int.Parse(t.t2["num_to"].ToString()) >= int.Parse(t.t1["num"].ToString())))
.Select(t => new {
Id1 = t.t1["id"].ToString(),
Name1 = t.t1["name"].ToString(),
Num1 = t.t1["num"].ToString(),
Id2 = t.t2["id"].ToString(),
Name2 = t.t2["name"].ToString(),
Num_From = t.t2["num_from"].ToString(),
Num_To = t.t2["num_to"].ToString()
}
).ToList().ToDataTable();
Extension method to convert IEnumerable to DataTable
public static class ExtensionDT
{
public static DataTable ToDataTable<T>(this List<T> items)
{
var tb = new DataTable(typeof(T).Name);
PropertyInfo[] props = typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance);
foreach (var prop in props)
{
tb.Columns.Add(prop.Name, prop.PropertyType);
}
foreach (var item in items)
{
var values = new object[props.Length];
for (var i = 0; i < props.Length; i++)
{
values[i] = props[i].GetValue(item, null);
}
tb.Rows.Add(values);
}
return tb;
}
}
Creating a join operation on Linq database is not possible like we do on Mysql and sql. But you can create a simple function to help you do that. You will need a function to return a string or interger for you:
private ObservableCollection<Var_Items> _var_ItemsList;
public ObservableCollection<Var_Items> Var_ItemsList
{ get { return _var_ItemsList; }
set { _var_ItemsList = value;
NotifyPropertyChanged("Var_ItemsList");
}
}
dtres = from t1 in dt1.AsEnumerable() where t1["num"]>= getMyVar1(t1["num"]) and t1["num"]<= getMyVar1(t1["num"]) select t1;
public string getMyVar1(int find_var)
{
var thisvar = from t2 in dt2.AsEnumerable() where t2["num_from"] >= find_var select t2;
varitems = new ObservableCollection<Var_Items>(Var_ItemsList);
return varitems.Last();
}
public string getMyVar2(int find_var)
{
var thisvar = from t2 in dt2.AsEnumerable() where t2["num_to"] >= find_var select t2;
varitems = new ObservableCollection<Var_Items>(Var_ItemsList);
return varitems.Last();
}
I have tried to simplify my answer to be easier to understand. I hope this helps

How to use Inner Join and then fill a DataSet with result of the join?

Well, this is my question. In short terms; I have two tables, Consequents and Atomic propositions:
AtomicP table
ID Proposition
1 | A |
1 | B |
1 | C |
2 | D |
2 | E |
Consequent Table
ID | Consequent |
1 | A |
2 | B |
And all I just want to do, is to implement a inner join which gives me all the values where the ID for both tables is the same(i.e):
AtomicP Table "A" "B" "C" -> "A" Consequent Table
and withe result given tanks to the inner joins , save that result in a Data Set or in another data structure that could be better.
Best regards.
Assuming the destination table has the values Id, Proposition and Consequent ..
insert into newtable (id,proposition,consequent) select id,atomicP,Consequent from atmicp,consequent where atomicP.id = consequent.id
public class Proposition
{
public int Id;
public string Value;
public Proposition(int id, string value){
Id = id;
Value = value;
}
}
public class Consequent
{
public int Id;
public string Value;
public Consequent(int id, string value){
Id = id;
Value = value;
}
}
var atomicP = new List<Proposition>{
new Proposition(1, "A"),
new Proposition(1, "B"),
new Proposition(1, "C"),
new Proposition(2, "D"),
new Proposition(2, "E"),
}
var consequents = new List<Consequent>{
new Consequent(1, "A"),
new Consequent(2, "B"),
}
var query = from proposition in atomicP
join consequent in consequents on proposition.Id == consequent.Id
select proposition.Value;
return query.ToList();
use this function
private DataTable JoinDataTables(DataTable t1, DataTable t2, params Func<DataRow, DataRow, bool>[] joinOn)
{
DataTable result = new DataTable();
foreach (DataColumn col in t1.Columns)
{
if (result.Columns[col.ColumnName] == null)
result.Columns.Add(col.ColumnName, col.DataType);
}
foreach (DataColumn col in t2.Columns)
{
if (result.Columns[col.ColumnName] == null)
result.Columns.Add(col.ColumnName, col.DataType);
}
foreach (DataRow row1 in t1.Rows)
{
var joinRows = t2.AsEnumerable().Where(row2 =>
{
foreach (var parameter in joinOn)
{
if (!parameter(row1, row2)) return false;
}
return true;
});
foreach (DataRow fromRow in joinRows)
{
DataRow insertRow = result.NewRow();
foreach (DataColumn col1 in t1.Columns)
{
insertRow[col1.ColumnName] = row1[col1.ColumnName];
}
foreach (DataColumn col2 in t2.Columns)
{
insertRow[col2.ColumnName] = fromRow[col2.ColumnName];
}
result.Rows.Add(insertRow);
}
}
return result;
}
An example of how you might use this:
var test = JoinDataTables(Consequents, Atomic,
(row1, row2) =>
row1.Field<int>("ID") == row2.Field<int>("ID"));
I assume you want to join In C# and get DataTable(bit unclear in question).
Code snippets joins two DataTable using Linq and inserts to another Table.
DataTable results = new DataTable();
results.Columns.Add("ID", typeof(int));
results.Columns.Add("Proposition", typeof(string));
results.Columns.Add("Consequent", typeof(string));
var result1 = from arow in AtomicP.AsEnumerable()
join con in Consequent.AsEnumerable()
on arow.Field<int>("ID") equals con.Field<int>("ID")
select results.LoadDataRow(new object[]
{
arow.Field<int>("ID"),
arow.Field<string>("Proposition"),
con.Field<string>("Consequent")
}, false);
Now we can access results by iterating through results.
foreach(DataRow row in results.Rows)
{
foreach(DataColumn column in results.Columns)
{
//Console.WriteLine(row[column]);
}
}
Working Code

At least one object must implement IComparable. AsEnumerable() and Sum()

I try to group multiple column on the tbl.AsEnumerable(),
I want to group cus, salesman, ppj and curr while the amt_base should be sum up.
Everything fine,
but when i use grp.Sum(r => r.Field<decimal>("amt_base")) on the sum of the amt it shown At least one object must implement IComparable. errors on the foreach ().
var results = from rows in tbl.AsEnumerable()
group rows by new
{
cus = rows["cus"],
salesman = rows["salesman"],
ppj = rows["ppj"],
curr = rows["curr"],
}into grp
orderby grp.Key
select new
{
cus = grp.Key.cus,
nm = grp.First()["nm"],
salesman = grp.Key.salesman,
ppj = grp.Key.ppj,
curr = grp.Key.curr,
amt_base = grp.Sum(r => r.Field<decimal>("amt_base")),
};
DataTable tbl2 = new DataTable();
tbl2.Columns.Add("cus");
tbl2.Columns.Add("nm");
tbl2.Columns.Add("salesman");
tbl2.Columns.Add("ppj");
tbl2.Columns.Add("curr");
tbl2.Columns.Add("amt_base");
decimal tamt_base = 0;
foreach (var item in results)
{
DataRow dr2 = tbl2.NewRow();
dr2["cus"] = item.cus;
dr2["nm"] = item.nm;
dr2["salesman"] = item.salesman;
dr2["ppj"] = item.ppj;
dr2["curr"] = item.curr;
dr2["amt_base"] = Math.Round(item.amt_base, 2, MidpointRounding.AwayFromZero);
tbl2.Rows.Add(dr2);
tamt_base += item.amt_base;
}
It can't determine how to order the rows based on an anonymous type comprised of of 4 random columns. It needs to be able to compare each instance to the previous, which is usually done by having your class implement the IComparable interface... but you can't with an anonymous type.
Remove this:
orderby grp.Key
If you really need some sort of ordering, try using an individual field:
orderby grp.Key.cus

Convert Datatable GroupBy Multiple Columns with Sum using Linq

I want to sum of all TotalImages Column after Group BY but its' showing me error.
any one who can help me what's going wrong.
Remember just want to use from this syntax base and want DataTable not a List. Kindly if some one help me out will be grateful.
Sample Data:-
CountryId | CItyId | TotalImages
1 1 2
1 2 2
1 2 3
1 3 4
2 1 2
2 2 2
2 2 3
2 3 4
DataTable dt = dt.AsEnumerable()
.GroupBy(r => new { Col1 = r["CountryId"], Col2 = r["CityId"]})
.Select(g => g.Sum(r => r["TotalImages"]).First())
.CopyToDataTable();
You can use this:-
DataTable countriesTable = dt.AsEnumerable().GroupBy(x => new { CountryId = x.Field<int>("CountryId"), CityId = x.Field<int>("CityId") })
.Select(x => new Countries
{
CountryId = x.Key.CountryId,
CityId = x.Key.CityId,
TotalSum = x.Sum(z => z.Field<int>("TotalImages"))
}).PropertiesToDataTable<Countries>();
I am getting, following output:-
Since, We cannot use CopyToDataTable method for anonymous types, I have used an extension method took from here and modified it accordingly.
public static DataTable PropertiesToDataTable<T>(this IEnumerable<T> source)
{
DataTable dt = new DataTable();
var props = TypeDescriptor.GetProperties(typeof(T));
foreach (PropertyDescriptor prop in props)
{
DataColumn dc = dt.Columns.Add(prop.Name, prop.PropertyType);
dc.Caption = prop.DisplayName;
dc.ReadOnly = prop.IsReadOnly;
}
foreach (T item in source)
{
DataRow dr = dt.NewRow();
foreach (PropertyDescriptor prop in props)
{
dr[prop.Name] = prop.GetValue(item);
}
dt.Rows.Add(dr);
}
return dt;
}
And, here is the Countries type:-
public class Countries
{
public int CountryId { get; set; }
public int CityId { get; set; }
public int TotalSum { get; set; }
}
You can use any other approach to convert it to a DataTable if you wish.

How to concat only unique columns with data

I am using this below:
public static DataTable DataTableJoiner(DataTable dt1, DataTable dt2)
{
using (DataTable targetTable = dt1.Clone())
{
var dt2Query = dt2.Columns.OfType<DataColumn>().Select(dc =>
new DataColumn(dc.ColumnName, dc.DataType, dc.Expression,
dc.ColumnMapping));
var dt2FilterQuery = from dc in dt2Query.AsEnumerable()
where targetTable.Columns
.Contains(dc.ColumnName) == false
select dc;
targetTable.Columns.AddRange(dt2FilterQuery.ToArray());
var rowData = from row1 in dt1.AsEnumerable()
join row2 in dt2.AsEnumerable()
on row1.Field<int>("Code") equals
row2.Field<int>("Code")
select row1.ItemArray
.Concat(row2.ItemArray
.Where(r2 =>
row1.ItemArray.Contains(r2) == false)).ToArray();
foreach (object[] values in rowData) targetTable.Rows.Add(values);
return targetTable;
}
}
There is a problem with this line:
select row1.ItemArray.Concat(row2.ItemArray.Where(r2 =>
row1.ItemArray.Contains(r2) == false)).ToArray();
It seems to be saying don't include me if this value (rather than column) already exists.
I am using this method to join two tables together based on a column that both tables share, but I only want the unique columns with data of both tables as a final result.
Any ideas?
I am not sure if I understand your requirement 100%, but this:
row2.ItemArray.Where(r2 => row1.ItemArray.Contains(r2) == false)
will filter out those items that happen to appear in any column of table 1, not just the column you are joining on.
So what I would try to do is filter the item based on the index, using an overload of the Where extension method:
// Get the index of the column we are joining on:
int joinColumnIndex = dt2.Columns.IndexOf("Code");
// Now we can filter out the proper item in the rowData query:
row2.ItemArray.Where((r2,idx) => idx != joinColumnIndex)
...
No, wait. Here:
var dt2FilterQuery = from dc in dt2Query.AsEnumerable()
where targetTable.Columns
.Contains(dc.ColumnName) == false
select dc;
You are filtering out all columns of table 2 whose name also appear in table 1. So what you probably want is this:
public static DataTable DataTableJoiner(DataTable dt1, DataTable dt2)
{
DataTable targetTable = dt1.Clone();
var dt2Query = dt2.Columns.OfType<DataColumn>().Select(dc =>
new DataColumn(dc.ColumnName, dc.DataType, dc.Expression,
dc.ColumnMapping));
var dt2FilterQuery = from dc in dt2Query.AsEnumerable()
where !targetTable.Columns.Contains(dc.ColumnName)
select dc;
var columnsToAdd = dt2FilterQuery.ToArray();
var columnsIndices = columnsToAdd.Select(dc => dt2.Columns.IndexOf(dc.ColumnName));
targetTable.Columns.AddRange(columnsToAdd);
var rowData = from row1 in dt1.AsEnumerable()
join row2 in dt2.AsEnumerable()
on row1.Field<int>("Code") equals
row2.Field<int>("Code")
select row1.ItemArray
.Concat(row2.ItemArray
.Where((r2,idx) =>
columnsIndices.Contains(idx))).ToArray();
foreach (object[] values in rowData) targetTable.Rows.Add(values);
return targetTable;
}
Btw. I don't quite understand why you are wrapping the DataTable you return in a using statement. Imho it is kind of pointless to dispose the very object you return to your caller right away...

Categories