How to concat only unique columns with data - c#

I am using this below:
public static DataTable DataTableJoiner(DataTable dt1, DataTable dt2)
{
using (DataTable targetTable = dt1.Clone())
{
var dt2Query = dt2.Columns.OfType<DataColumn>().Select(dc =>
new DataColumn(dc.ColumnName, dc.DataType, dc.Expression,
dc.ColumnMapping));
var dt2FilterQuery = from dc in dt2Query.AsEnumerable()
where targetTable.Columns
.Contains(dc.ColumnName) == false
select dc;
targetTable.Columns.AddRange(dt2FilterQuery.ToArray());
var rowData = from row1 in dt1.AsEnumerable()
join row2 in dt2.AsEnumerable()
on row1.Field<int>("Code") equals
row2.Field<int>("Code")
select row1.ItemArray
.Concat(row2.ItemArray
.Where(r2 =>
row1.ItemArray.Contains(r2) == false)).ToArray();
foreach (object[] values in rowData) targetTable.Rows.Add(values);
return targetTable;
}
}
There is a problem with this line:
select row1.ItemArray.Concat(row2.ItemArray.Where(r2 =>
row1.ItemArray.Contains(r2) == false)).ToArray();
It seems to be saying don't include me if this value (rather than column) already exists.
I am using this method to join two tables together based on a column that both tables share, but I only want the unique columns with data of both tables as a final result.
Any ideas?

I am not sure if I understand your requirement 100%, but this:
row2.ItemArray.Where(r2 => row1.ItemArray.Contains(r2) == false)
will filter out those items that happen to appear in any column of table 1, not just the column you are joining on.
So what I would try to do is filter the item based on the index, using an overload of the Where extension method:
// Get the index of the column we are joining on:
int joinColumnIndex = dt2.Columns.IndexOf("Code");
// Now we can filter out the proper item in the rowData query:
row2.ItemArray.Where((r2,idx) => idx != joinColumnIndex)
...
No, wait. Here:
var dt2FilterQuery = from dc in dt2Query.AsEnumerable()
where targetTable.Columns
.Contains(dc.ColumnName) == false
select dc;
You are filtering out all columns of table 2 whose name also appear in table 1. So what you probably want is this:
public static DataTable DataTableJoiner(DataTable dt1, DataTable dt2)
{
DataTable targetTable = dt1.Clone();
var dt2Query = dt2.Columns.OfType<DataColumn>().Select(dc =>
new DataColumn(dc.ColumnName, dc.DataType, dc.Expression,
dc.ColumnMapping));
var dt2FilterQuery = from dc in dt2Query.AsEnumerable()
where !targetTable.Columns.Contains(dc.ColumnName)
select dc;
var columnsToAdd = dt2FilterQuery.ToArray();
var columnsIndices = columnsToAdd.Select(dc => dt2.Columns.IndexOf(dc.ColumnName));
targetTable.Columns.AddRange(columnsToAdd);
var rowData = from row1 in dt1.AsEnumerable()
join row2 in dt2.AsEnumerable()
on row1.Field<int>("Code") equals
row2.Field<int>("Code")
select row1.ItemArray
.Concat(row2.ItemArray
.Where((r2,idx) =>
columnsIndices.Contains(idx))).ToArray();
foreach (object[] values in rowData) targetTable.Rows.Add(values);
return targetTable;
}
Btw. I don't quite understand why you are wrapping the DataTable you return in a using statement. Imho it is kind of pointless to dispose the very object you return to your caller right away...

Related

How to throuhly join two datatables using linq without column names

This is propably answered somewhere else, but I haven't found working solution yet.
I have two datatables and I want to join them into one datatable containing all data from both of them, or at least from the first of them and some columns from the second datatable.
I don't want to list all columns (totally 180) from the first datatable. I have tried eg. this
var JoinedResult = from t1 in table1.Rows.Cast<DataRow>()
join t2 in table2.Rows.Cast<DataRow>()
on Convert.ToInt32(t1.Field<string>("ProductID")) equals t2.Field<int>("FuelId")
select t1;
but that gives only the columns from table1. How to get colums from table2 too to my result? Finally, I need to add my result to a dataset.
ResultSet.Tables.Add(JoinedResult.CopyToDataTable());
EDIT:
I ended up with this as the solution.
This follows an example given here Create join with Select All (select *) in linq to datasets
DataTable dtProduct = dsProduct.Tables[0];
DataTable dtMoistureLimits = ds.Tables[0];
//clone dt1, copies all the columns to newTable
DataTable dtProductWithMoistureLimits = dtProduct.Clone();
//copies all the columns from dt2 to newTable
foreach (DataColumn c in dtMoistureLimits.Columns)
dtProductWithMoistureLimits.Columns.Add(c.ColumnName, c.DataType);
var ProductsJoinedWithMoistureLimits = dtProduct.Rows.Cast<DataRow>()
.Join(dtMoistureLimits.Rows.Cast<DataRow>(),// join table1 and table2
t1 => new { ProductID = t1.Field<int>("ProductID"), DelivererID = t1.Field<int>("DelivererID") },
t2 => new { ProductID = t2.Field<int>("MoistureLimits_ProductID"), DelivererID = t2.Field<int>("MoistureLimits_DelivererID") },
(t1, t2) => // when they match
{ // make a new object
// containing the matching t1 and t2
DataRow row = dtProductWithMoistureLimits.NewRow();
row.ItemArray = t1.ItemArray.Concat(t2.ItemArray).ToArray();
dtProductWithMoistureLimits.Rows.Add(row);
return row;
});
However, in dtMoistureLimits there is not rows for all "ProductID" and "DelivererID" in dtProduct. Currently my solution returns only matching rows.
How to improve solution to return also those rows where there is not data for "ProductID" and "DelivererID" in dtMoistureLimits?
Solution using method syntax, without having to mention all columns
var result = table1.Rows.Cast<DataRow>()
.Join(table2.Rows.Cast<DataRow>(), // join table1 and table2
t1 => Convert.ToInt32(t1.Field<string>("ProductID")) // from every t1 get the productId
t2 => t2.Field<int>("FuelId") // from every t2 get the fuelId,
(t1, t2) => new // when they match
{ // make a new object
T1 = t1, // containing the matching t1 and t2
T2 = t2,
}
var JoinedResult = (from t1 in table1.Rows.Cast<DataRow>()
join t2 in table2.Rows.Cast<DataRow>()
on Convert.ToInt32(t1.Field<string>("ProductID")) equals t2.Field<int>("FuelId")
select new { T1 = t1,
T2 = t2.column_name // all columns needed can be listed here
}).ToList();
EDIT:
To convert the above result to a DataTable, use the following method:
DataTable dataTable = new DataTable();
//Get all the properties
PropertyInfo[] Props = JoinedResult.Select(y=>y.T1).First().GetType().GetProperties(BindingFlags.Public | BindingFlags.Instance);
foreach (PropertyInfo prop in Props)
{
//Defining type of data column gives proper data table
var type = (prop.PropertyType.IsGenericType && prop.PropertyType.GetGenericTypeDefinition() == typeof(Nullable<>) ? Nullable.GetUnderlyingType(prop.PropertyType) : prop.PropertyType);
//Setting column names as Property names
dataTable.Columns.Add(prop.Name, type);
}
dataTable.Columns.Add(t2_column_name, t2_column_type);
foreach (var item in JoinedResult)
{
var values = new object[Props.Length];
for (int i = 0; i < Props.Length; i++)
{
//inserting property values to datatable rows
values[i] = Props[i].GetValue(item.T1, null);
}
values[Props.Length] = item.T2;
dataTable.Rows.Add(values);
}

Returning Dictionary <string,List<string> with Linq from DataTable

i'm new in LINQ i want to group data from a DataTable with a specific column field1 value
Like this Dictionary<string,List <row>>
I tried this code
var results = from e in data.AsEnumerable()
group e by e.Field<string>("field1") into g
select new { e.Field<string>("field1"), g.ToList() } ;
I don't want to do this operation with foreach statement
Im doing it like this and i want it with LINQ
foreach (DataRow row in data.Rows)
{
string field1Val = row.Filed<string>("field1");
if (!sotrtedResult.ContainsKey(field1Val ))
{
sotrtedResult.Add(field1Val , new List<DataRow>() { row });
}
else
{
sotrtedResult[field1Val].Add(row);
}
}
You could try the following snippet:
var results = (from e in data.AsEnumerable()
group e by e.Field<string>("field1") into g
select new
{
field1 = g.Key,
values = g.Select(r=>r).ToList()
}).ToDictionary(x=>x.field1, x=>x.values);
When you group by a field, you could access it, when you make the projection as g.Key.

Join 2 DataTables on dynamic number of columns

I'm trying to join two DataTables on a dynamic number of columns. I've gotten as far as the code below. The problem is the ON statement of the join. How can I make this dynamic based on how many column names are in the list "joinColumnNames".
I was thinking I will need to build some sort of expression tree, but I can't find any examples of how to do this with multiple join columns and with the DataRow object which doesn't have properties for each column.
private DataTable Join(List<string> joinColumnNames, DataTable pullX, DataTable pullY)
{
DataTable joinedTable = new DataTable();
// Add all the columns from pullX
foreach (string colName in joinColumnNames)
{
joinedTable.Columns.Add(pullX.Columns[colName]);
}
// Add unique columns from PullY
foreach (DataColumn col in pullY.Columns)
{
if (!joinedTable.Columns.Contains((col.ColumnName)))
{
joinedTable.Columns.Add(col);
}
}
var Join = (from PX in pullX.AsEnumerable()
join PY in pullY.AsEnumerable() on
// This must be dynamic and join on every column mentioned in joinColumnNames
new { A = PX[joinColumnNames[0]], B = PX[joinColumnNames[1]] } equals new { A = PY[joinColumnNames[0]], B = PY[joinColumnNames[1]] }
into Outer
from PY in Outer.DefaultIfEmpty<DataRow>(pullY.NewRow())
select new { PX, PY });
foreach (var item in Join)
{
DataRow newRow = joinedTable.NewRow();
foreach (DataColumn col in joinedTable.Columns)
{
var pullXValue = item.PX.Table.Columns.Contains(col.ColumnName) ? item.PX[col.ColumnName] : string.Empty;
var pullYValue = item.PY.Table.Columns.Contains(col.ColumnName) ? item.PY[col.ColumnName] : string.Empty;
newRow[col.ColumnName] = (pullXValue == null || string.IsNullOrEmpty(pullXValue.ToString())) ? pullYValue : pullXValue;
}
joinedTable.Rows.Add(newRow);
}
return joinedTable;
}
Adding a specific example to show input/output using 3 join columns (Country, Company, and DateId):
Pull X:
Country Company DateId Sales
United States Test1 Ltd 20160722 $25
Canada Test3 Ltd 20160723 $30
Italy Test4 Ltd 20160724 $40
India Test2 Ltd 20160725 $35
Pull Y:
Country Company DateId Downloads
United States Test1 Ltd 20160722 500
Mexico Test2 Ltd 20160723 300
Italy Test4 Ltd 20160724 900
Result:
Country Company DateId Sales Downloads
United States Test1 Ltd 20160722 $25 500
Canada Test3 Ltd 20160723 $30
Mexico Test2 Ltd 20160723 300
Italy Test4 Ltd 20160724 $40 900
India Test2 Ltd 20160725 $35
var Join =
from PX in pullX.AsEnumerable()
join PY in pullY.AsEnumerable()
on string.Join("\0", joinColumnNames.Select(c => PX[c]))
equals string.Join("\0", joinColumnNames.Select(c => PY[c]))
into Outer
from PY in Outer.DefaultIfEmpty<DataRow>(pullY.NewRow())
select new { PX, PY };
Another way is to have both DataTable in a DataSet and use DataRelation
How To: Use DataRelation to perform a join on two DataTables in a DataSet?
Since you are using LINQ to Objects, there is no need to use expression trees. You can solve your problem with a custom equality comparer.
Create an equality comparer that can compare equality between two DataRow objects based on the values of specific columns. Here is an example:
public class MyEqualityComparer : IEqualityComparer<DataRow>
{
private readonly string[] columnNames;
public MyEqualityComparer(string[] columnNames)
{
this.columnNames = columnNames;
}
public bool Equals(DataRow x, DataRow y)
{
return columnNames.All(cn => x[cn].Equals(y[cn]));
}
public int GetHashCode(DataRow obj)
{
unchecked
{
int hash = 19;
foreach (var value in columnNames.Select(cn => obj[cn]))
{
hash = hash * 31 + value.GetHashCode();
}
return hash;
}
}
}
Then you can use it to make the join like this:
public class TwoRows
{
public DataRow Row1 { get; set; }
public DataRow Row2 { get; set; }
}
private static List<TwoRows> LeftOuterJoin(
List<string> joinColumnNames,
DataTable leftTable,
DataTable rightTable)
{
return leftTable
.AsEnumerable()
.GroupJoin(
rightTable.AsEnumerable(),
l => l,
r => r,
(l, rlist) => new {LeftValue = l, RightValues = rlist},
new MyEqualityComparer(joinColumnNames.ToArray()))
.SelectMany(
x => x.RightValues.DefaultIfEmpty(rightTable.NewRow()),
(x, y) => new TwoRows {Row1 = x.LeftValue, Row2 = y})
.ToList();
}
Please note that I am using method syntax because I don't think that you can use a custom equality comparer otherwise.
Please note that the method does a left outer join, not a full outer join. Based on the example you provided, you seem to want a full outer join. To do this you need to do two left outer joins (see this answer). Here is how the full method would look like:
private static DataTable FullOuterJoin(
List<string> joinColumnNames,
DataTable pullX,
DataTable pullY)
{
var pullYOtherColumns =
pullY.Columns
.Cast<DataColumn>()
.Where(x => !joinColumnNames.Contains(x.ColumnName))
.ToList();
var allColumns =
pullX.Columns
.Cast<DataColumn>()
.Concat(pullYOtherColumns)
.ToArray();
var allColumnsClone =
allColumns
.Select(x => new DataColumn(x.ColumnName, x.DataType))
.ToArray();
DataTable joinedTable = new DataTable();
joinedTable.Columns.AddRange(allColumnsClone);
var first =
LeftOuterJoin(joinColumnNames, pullX, pullY);
var resultRows = new List<DataRow>();
foreach (var item in first)
{
DataRow newRow = joinedTable.NewRow();
foreach (DataColumn col in joinedTable.Columns)
{
var value = pullX.Columns.Contains(col.ColumnName)
? item.Row1[col.ColumnName]
: item.Row2[col.ColumnName];
newRow[col.ColumnName] = value;
}
resultRows.Add(newRow);
}
var second =
LeftOuterJoin(joinColumnNames, pullY, pullX);
foreach (var item in second)
{
DataRow newRow = joinedTable.NewRow();
foreach (DataColumn col in joinedTable.Columns)
{
var value = pullY.Columns.Contains(col.ColumnName)
? item.Row1[col.ColumnName]
: item.Row2[col.ColumnName];
newRow[col.ColumnName] = value;
}
resultRows.Add(newRow);
}
var uniqueRows =
resultRows
.Distinct(
new MyEqualityComparer(
joinedTable.Columns
.Cast<DataColumn>()
.Select(x => x.ColumnName)
.ToArray()));
foreach (var uniqueRow in uniqueRows)
joinedTable.Rows.Add(uniqueRow);
return joinedTable;
}
Please note also how I clone the columns. You cannot use the same column object in two tables.

At least one object must implement IComparable. AsEnumerable() and Sum()

I try to group multiple column on the tbl.AsEnumerable(),
I want to group cus, salesman, ppj and curr while the amt_base should be sum up.
Everything fine,
but when i use grp.Sum(r => r.Field<decimal>("amt_base")) on the sum of the amt it shown At least one object must implement IComparable. errors on the foreach ().
var results = from rows in tbl.AsEnumerable()
group rows by new
{
cus = rows["cus"],
salesman = rows["salesman"],
ppj = rows["ppj"],
curr = rows["curr"],
}into grp
orderby grp.Key
select new
{
cus = grp.Key.cus,
nm = grp.First()["nm"],
salesman = grp.Key.salesman,
ppj = grp.Key.ppj,
curr = grp.Key.curr,
amt_base = grp.Sum(r => r.Field<decimal>("amt_base")),
};
DataTable tbl2 = new DataTable();
tbl2.Columns.Add("cus");
tbl2.Columns.Add("nm");
tbl2.Columns.Add("salesman");
tbl2.Columns.Add("ppj");
tbl2.Columns.Add("curr");
tbl2.Columns.Add("amt_base");
decimal tamt_base = 0;
foreach (var item in results)
{
DataRow dr2 = tbl2.NewRow();
dr2["cus"] = item.cus;
dr2["nm"] = item.nm;
dr2["salesman"] = item.salesman;
dr2["ppj"] = item.ppj;
dr2["curr"] = item.curr;
dr2["amt_base"] = Math.Round(item.amt_base, 2, MidpointRounding.AwayFromZero);
tbl2.Rows.Add(dr2);
tamt_base += item.amt_base;
}
It can't determine how to order the rows based on an anonymous type comprised of of 4 random columns. It needs to be able to compare each instance to the previous, which is usually done by having your class implement the IComparable interface... but you can't with an anonymous type.
Remove this:
orderby grp.Key
If you really need some sort of ordering, try using an individual field:
orderby grp.Key.cus

Compare two DataTables and select the rows that are not present in second table

I have two DataTables and I want to select the rows from the first one which are not present in second one
For example:
Table A
id column
1 data1
2 data2
3 data3
4 data4
Table B
id column
1 data10
3 data30
I want the result to be:
Table C
id column
2 data2
4 data4
You can use Linq, especially Enumerable.Except helps to find id's in TableA that are not in TableB:
var idsNotInB = TableA.AsEnumerable().Select(r => r.Field<int>("id"))
.Except(TableB.AsEnumerable().Select(r => r.Field<int>("id")));
DataTable TableC = (from row in TableA.AsEnumerable()
join id in idsNotInB
on row.Field<int>("id") equals id
select row).CopyToDataTable();
You can also use Where but it'll be less efficient:
DataTable TableC = TableA.AsEnumerable()
.Where(ra => !TableB.AsEnumerable()
.Any(rb => rb.Field<int>("id") == ra.Field<int>("id")))
.CopyToDataTable();
I got a solution which works without LINQ:
public DataTable CompareDataTables(DataTable first, DataTable second)
{
first.TableName = "FirstTable";
second.TableName = "SecondTable";
//Create Empty Table
DataTable table = new DataTable("Difference");
try
{
//Must use a Dataset to make use of a DataRelation object
using (DataSet ds = new DataSet())
{
//Add tables
ds.Tables.AddRange(new DataTable[] { first.Copy(), second.Copy() });
//Get Columns for DataRelation
DataColumn[] firstcolumns = new DataColumn[ds.Tables[0].Columns.Count];
for (int i = 0; i < firstcolumns.Length; i++)
{
firstcolumns[i] = ds.Tables[0].Columns[i];
}
DataColumn[] secondcolumns = new DataColumn[ds.Tables[1].Columns.Count];
for (int i = 0; i < secondcolumns.Length; i++)
{
secondcolumns[i] = ds.Tables[1].Columns[i];
}
//Create DataRelation
DataRelation r = new DataRelation(string.Empty, firstcolumns, secondcolumns, false);
ds.Relations.Add(r);
//Create columns for return table
for (int i = 0; i < first.Columns.Count; i++)
{
table.Columns.Add(first.Columns[i].ColumnName, first.Columns[i].DataType);
}
//If First Row not in Second, Add to return table.
table.BeginLoadData();
foreach (DataRow parentrow in ds.Tables[0].Rows)
{
DataRow[] childrows = parentrow.GetChildRows(r);
if (childrows == null || childrows.Length == 0)
table.LoadDataRow(parentrow.ItemArray, true);
}
table.EndLoadData();
}
}
}
For more Visit http://microsoftdotnetsolutions.blogspot.in/2012/12/compare-two-datatables.html
You can use Linq Enumerable.Except Method function to get diffence between two DataTable's Here i use firstDt and secondDt,remember both Dt's have the same structure.
var EntriesNotInB = firstDt.AsEnumerable().Select(r => r.Field<string>("abc")).Except(secondDt.AsEnumerable().Select(r => r.Field<string>("abc")));
if (EntriesNotInB.Count() > 0)
{
DataTable dt = (from row in firstDt.AsEnumerable()join id in EntriesNotInB on row.Field<string>("abc") equals id select row).CopyToDataTable();
foreach (DataRow row in dt.Rows)
{
/////Place your code to manipulate on datatable Rows
}
}
To read more on Enumerable.Except Method,Go to http://msdn.microsoft.com/en-us/library/system.linq.enumerable.except(v=vs.110).aspx
and its Done!!!! Happy Coding.........

Categories