Pivot table with more columns - c#

I was trying to pivot a list of records, then i came across this :
public static DataTable PivotTable<T, TColumn, TGroupRow, TRow, TData>(
this System.Collections.Generic.IEnumerable<T> list,
Func<T, TColumn> column,
Expression<Func<T, TRow>> row,
Func<IEnumerable<T>, TData> dataSelector)
{
DataTable table = new DataTable();
var rowName = ((MemberExpression)row.Body).Member.Name;
table.Columns.Add(new DataColumn(rowName));
var columns = list.Select(column).Distinct();
foreach (var col in columns)
table.Columns.Add(new DataColumn(col.ToString()));
var rows = list.GroupBy(row.Compile())
.Select(rowGroup => new
{
Key = rowGroup.Key,
Values = columns.GroupJoin(
rowGroup,
c => c,
r => column(r),
(c, columnGroup) => dataSelector(columnGroup))
});
foreach (var rowVal in rows)
{
var dataRow = table.NewRow();
var items = rowVal.Values.Cast<object>().ToList();
items.Insert(0, rowVal.Key);
dataRow.ItemArray = items.ToArray();
table.Rows.Add(dataRow);
}
return table;
}
This is some solution offered from this post http://techbrij.com/pivot-c-array-datatable-convert-column-to-row-linq
my client code would be like this
var _dataSourceMatrix = //Some logic that get the DataTable records
var _departmentList = //Some logic that get the DataTable records
var _joinDataSourceDepartmentList = from a in _dataSourceMatrix.AsEnumerable()
join b in _departmentList.AsEnumerable()
on a.Field<int>("EntityID") equals b.Field<int>("DepartmentID")
select new { EntityName = b.Field<string>("Name"), Period = a.Field<string>("Period"), Value = a.Field<double>("Value"), EntityID = a.Field<int>("EntityID") };
_dataSourceMatrix = _joinDataSourceDepartmentList .OrderBy(x => x.Period).PivotTable(x => x.Period, x => x.EntityName, x => x.Sum(y => y.Value));
However, this is just the 3 columns Fields : Period, EntityName and Value could be utilize. I wish to utilize the EntityID as well. It should be one of the key to group by the record with EntityName. Is there any way to modify the PivotTable function ?
The objective to be achieve should be like this :
EntityName Period1, Period2, Period3, EntityID
A Value1 Value2 Value3 1
B Value1 Value2 Value3 2

Related

Include more rows in Pivot

I am using the extension method in below link to pivot my data:
https://techbrij.com/pivot-c-array-datatable-convert-column-to-row-linq
I am including the code from the link just in case somebody finds this question in the future and the link is dead:
public static DataTable ToPivotTable<T, TColumn, TRow, TData>(
this IEnumerable<T> source,
Func<T, TColumn> columnSelector,
Expression<Func<T, TRow>> rowSelector,
Func<IEnumerable<T>, TData> dataSelector)
{
DataTable table = new DataTable();
var rowName = ((MemberExpression)rowSelector.Body).Member.Name;
table.Columns.Add(new DataColumn(rowName));
var columns = source.Select(columnSelector).Distinct();
foreach (var column in columns)
table.Columns.Add(new DataColumn(column.ToString()));
var rows = source.GroupBy(rowSelector.Compile())
.Select(rowGroup => new
{
Key = rowGroup.Key,
Values = columns.GroupJoin(
rowGroup,
c => c,
r => columnSelector(r),
(c, columnGroup) => dataSelector(columnGroup))
});
foreach (var row in rows)
{
var dataRow = table.NewRow();
var items = row.Values.Cast<object>().ToList();
items.Insert(0, row.Key);
dataRow.ItemArray = items.ToArray();
table.Rows.Add(dataRow);
}
return table;
}
Referring to the example in the link, you get the pivoted data like;
var pivotTable = data.ToPivotTable(
item => item.Year,
item => item.Product,
items => items.Any() ? items.Sum(x=>x.Sales) : 0);
My question is, how can I include more rows into this query to return for example, ProductCode as well.. item => new {item.Product, item.ProductCode} does not work..
============== EDIT / 23 OCT 2018 ==============
Assuming my data is this;
With the help of the above mentioned code, I can manage to do this;
What I want to achieve is this (extra col: STOCKID or any other cols as well);
Anonymous types cannot be passed as generic parameters. Try defining your pivot key as a struct:
public struct PivotKey
{
public string Product;
public int ProductCode; // assuming your product codes are integers
}
This way you can take advantage of struct's default Equals and GetHashCode methods implementation in terms of the equality and hash code of all the fields.
Then, define the rowSelector as below:
item => new PivotKey { Product = item.Product, ProductCode = item.ProductCode}
Example: https://dotnetfiddle.net/mXr9sh
The issue seems to be fetching the row names from the expression, since it's only designed to handle one row. That can be fixed by this function:
public static IEnumerable<string> GetMemberNames<T1, T2>(Expression<Func<T1, T2>> expression)
{
var memberExpression = expression.Body as MemberExpression;
if (memberExpression != null)
{
return new[]{ memberExpression.Member.Name };
}
var memberInitExpression = expression.Body as MemberInitExpression;
if (memberInitExpression != null)
{
return memberInitExpression.Bindings.Select(x => x.Member.Name);
}
var newExpression = expression.Body as NewExpression;
if (newExpression != null)
{
return newExpression.Arguments.Select(x => (x as MemberExpression).Member.Name);
}
throw new ArgumentException("expression"); //use: `nameof(expression)` if C#6 or above
}
Once you have this function you can replace these lines:
var rowName = ((MemberExpression)rowSelector.Body).Member.Name;
table.Columns.Add(new DataColumn(rowName));
With this:
var rowNames = GetMemberNames(rowSelector);
rowNames.ToList().ForEach(x => table.Columns.Add(new DataColumn(x)));
One downside of this approach is the various values for these columns get returned concatenated in a single column; so you'll need to extract the data from the strings.
Resulting DataTable:
(displayed as JSON)
[
{
"StockId": "{ StockId = 65, Name = Milk }",
"Name": "3",
"Branch 1": "1",
"Branch 2": "0",
"Central Branch": null
},
{
"StockId": "{ StockId = 67, Name = Coffee }",
"Name": "0",
"Branch 1": "0",
"Branch 2": "22",
"Central Branch": null
}
]
Full Code Listing
using System;
using System.Data;
using System.Linq;
using System.Linq.Expressions;
using System.Collections.Generic;
using Newtonsoft.Json; //just for displaying output
public class Program
{
public static void Main()
{
var data = new[] {
new { StockId = 65, Name = "Milk", Branch = 23, BranchName = "Branch 1", Stock = 3 },
new { StockId = 65, Name = "Milk", Branch = 24, BranchName = "Branch 2", Stock = 1 },
new { StockId = 67, Name = "Coffee", Branch = 22, BranchName = "Central Branch", Stock = 22 }
};
var pivotTable = data.ToPivotTable(
item => item.BranchName,
item => new {item.StockId, item.Name},
items => items.Any() ? items.Sum(x=>x.Stock) : 0);
//easy way to view our pivotTable if using linqPad or similar
//Console.WriteLine(pivotTable);
//if not using linqPad, convert to JSON for easy display
Console.WriteLine(JsonConvert.SerializeObject(pivotTable, Formatting.Indented));
}
}
public static class PivotExtensions
{
public static DataTable ToPivotTable<T, TColumn, TRow, TData>(
this IEnumerable<T> source,
Func<T, TColumn> columnSelector,
Expression<Func<T, TRow>> rowSelector,
Func<IEnumerable<T>, TData> dataSelector)
{
DataTable table = new DataTable();
//foreach (var row in rowSelector()
var rowNames = GetMemberNames(rowSelector);
rowNames.ToList().ForEach(x => table.Columns.Add(new DataColumn(x)));
var columns = source.Select(columnSelector).Distinct();
foreach (var column in columns)
table.Columns.Add(new DataColumn(column.ToString()));
var rows = source.GroupBy(rowSelector.Compile())
.Select(rowGroup => new
{
Key = rowGroup.Key,
Values = columns.GroupJoin(
rowGroup,
c => c,
r => columnSelector(r),
(c, columnGroup) => dataSelector(columnGroup))
});
foreach (var row in rows)
{
var dataRow = table.NewRow();
var items = row.Values.Cast<object>().ToList();
items.Insert(0, row.Key);
dataRow.ItemArray = items.ToArray();
table.Rows.Add(dataRow);
}
return table;
}
public static IEnumerable<string> GetMemberNames<T1, T2>(Expression<Func<T1, T2>> expression)
{
var memberExpression = expression.Body as MemberExpression;
if (memberExpression != null)
{
return new[]{ memberExpression.Member.Name };
}
var memberInitExpression = expression.Body as MemberInitExpression;
if (memberInitExpression != null)
{
return memberInitExpression.Bindings.Select(x => x.Member.Name);
}
var newExpression = expression.Body as NewExpression;
if (newExpression != null)
{
return newExpression.Arguments.Select(x => (x as MemberExpression).Member.Name);
}
throw new ArgumentException("expression"); //use: `nameof(expression)` if C#6 or above
}
}

Join 2 DataTables on dynamic number of columns

I'm trying to join two DataTables on a dynamic number of columns. I've gotten as far as the code below. The problem is the ON statement of the join. How can I make this dynamic based on how many column names are in the list "joinColumnNames".
I was thinking I will need to build some sort of expression tree, but I can't find any examples of how to do this with multiple join columns and with the DataRow object which doesn't have properties for each column.
private DataTable Join(List<string> joinColumnNames, DataTable pullX, DataTable pullY)
{
DataTable joinedTable = new DataTable();
// Add all the columns from pullX
foreach (string colName in joinColumnNames)
{
joinedTable.Columns.Add(pullX.Columns[colName]);
}
// Add unique columns from PullY
foreach (DataColumn col in pullY.Columns)
{
if (!joinedTable.Columns.Contains((col.ColumnName)))
{
joinedTable.Columns.Add(col);
}
}
var Join = (from PX in pullX.AsEnumerable()
join PY in pullY.AsEnumerable() on
// This must be dynamic and join on every column mentioned in joinColumnNames
new { A = PX[joinColumnNames[0]], B = PX[joinColumnNames[1]] } equals new { A = PY[joinColumnNames[0]], B = PY[joinColumnNames[1]] }
into Outer
from PY in Outer.DefaultIfEmpty<DataRow>(pullY.NewRow())
select new { PX, PY });
foreach (var item in Join)
{
DataRow newRow = joinedTable.NewRow();
foreach (DataColumn col in joinedTable.Columns)
{
var pullXValue = item.PX.Table.Columns.Contains(col.ColumnName) ? item.PX[col.ColumnName] : string.Empty;
var pullYValue = item.PY.Table.Columns.Contains(col.ColumnName) ? item.PY[col.ColumnName] : string.Empty;
newRow[col.ColumnName] = (pullXValue == null || string.IsNullOrEmpty(pullXValue.ToString())) ? pullYValue : pullXValue;
}
joinedTable.Rows.Add(newRow);
}
return joinedTable;
}
Adding a specific example to show input/output using 3 join columns (Country, Company, and DateId):
Pull X:
Country Company DateId Sales
United States Test1 Ltd 20160722 $25
Canada Test3 Ltd 20160723 $30
Italy Test4 Ltd 20160724 $40
India Test2 Ltd 20160725 $35
Pull Y:
Country Company DateId Downloads
United States Test1 Ltd 20160722 500
Mexico Test2 Ltd 20160723 300
Italy Test4 Ltd 20160724 900
Result:
Country Company DateId Sales Downloads
United States Test1 Ltd 20160722 $25 500
Canada Test3 Ltd 20160723 $30
Mexico Test2 Ltd 20160723 300
Italy Test4 Ltd 20160724 $40 900
India Test2 Ltd 20160725 $35
var Join =
from PX in pullX.AsEnumerable()
join PY in pullY.AsEnumerable()
on string.Join("\0", joinColumnNames.Select(c => PX[c]))
equals string.Join("\0", joinColumnNames.Select(c => PY[c]))
into Outer
from PY in Outer.DefaultIfEmpty<DataRow>(pullY.NewRow())
select new { PX, PY };
Another way is to have both DataTable in a DataSet and use DataRelation
How To: Use DataRelation to perform a join on two DataTables in a DataSet?
Since you are using LINQ to Objects, there is no need to use expression trees. You can solve your problem with a custom equality comparer.
Create an equality comparer that can compare equality between two DataRow objects based on the values of specific columns. Here is an example:
public class MyEqualityComparer : IEqualityComparer<DataRow>
{
private readonly string[] columnNames;
public MyEqualityComparer(string[] columnNames)
{
this.columnNames = columnNames;
}
public bool Equals(DataRow x, DataRow y)
{
return columnNames.All(cn => x[cn].Equals(y[cn]));
}
public int GetHashCode(DataRow obj)
{
unchecked
{
int hash = 19;
foreach (var value in columnNames.Select(cn => obj[cn]))
{
hash = hash * 31 + value.GetHashCode();
}
return hash;
}
}
}
Then you can use it to make the join like this:
public class TwoRows
{
public DataRow Row1 { get; set; }
public DataRow Row2 { get; set; }
}
private static List<TwoRows> LeftOuterJoin(
List<string> joinColumnNames,
DataTable leftTable,
DataTable rightTable)
{
return leftTable
.AsEnumerable()
.GroupJoin(
rightTable.AsEnumerable(),
l => l,
r => r,
(l, rlist) => new {LeftValue = l, RightValues = rlist},
new MyEqualityComparer(joinColumnNames.ToArray()))
.SelectMany(
x => x.RightValues.DefaultIfEmpty(rightTable.NewRow()),
(x, y) => new TwoRows {Row1 = x.LeftValue, Row2 = y})
.ToList();
}
Please note that I am using method syntax because I don't think that you can use a custom equality comparer otherwise.
Please note that the method does a left outer join, not a full outer join. Based on the example you provided, you seem to want a full outer join. To do this you need to do two left outer joins (see this answer). Here is how the full method would look like:
private static DataTable FullOuterJoin(
List<string> joinColumnNames,
DataTable pullX,
DataTable pullY)
{
var pullYOtherColumns =
pullY.Columns
.Cast<DataColumn>()
.Where(x => !joinColumnNames.Contains(x.ColumnName))
.ToList();
var allColumns =
pullX.Columns
.Cast<DataColumn>()
.Concat(pullYOtherColumns)
.ToArray();
var allColumnsClone =
allColumns
.Select(x => new DataColumn(x.ColumnName, x.DataType))
.ToArray();
DataTable joinedTable = new DataTable();
joinedTable.Columns.AddRange(allColumnsClone);
var first =
LeftOuterJoin(joinColumnNames, pullX, pullY);
var resultRows = new List<DataRow>();
foreach (var item in first)
{
DataRow newRow = joinedTable.NewRow();
foreach (DataColumn col in joinedTable.Columns)
{
var value = pullX.Columns.Contains(col.ColumnName)
? item.Row1[col.ColumnName]
: item.Row2[col.ColumnName];
newRow[col.ColumnName] = value;
}
resultRows.Add(newRow);
}
var second =
LeftOuterJoin(joinColumnNames, pullY, pullX);
foreach (var item in second)
{
DataRow newRow = joinedTable.NewRow();
foreach (DataColumn col in joinedTable.Columns)
{
var value = pullY.Columns.Contains(col.ColumnName)
? item.Row1[col.ColumnName]
: item.Row2[col.ColumnName];
newRow[col.ColumnName] = value;
}
resultRows.Add(newRow);
}
var uniqueRows =
resultRows
.Distinct(
new MyEqualityComparer(
joinedTable.Columns
.Cast<DataColumn>()
.Select(x => x.ColumnName)
.ToArray()));
foreach (var uniqueRow in uniqueRows)
joinedTable.Rows.Add(uniqueRow);
return joinedTable;
}
Please note also how I clone the columns. You cannot use the same column object in two tables.

C# Linq rows to column

I would like to turn linq result into columns from rows, the field names are user changeable so I need the function to be dynamic.
sample data
ID: 331 FieldName: "BusinessCategory" FieldContents: "Regulatory"
ID: 331 FieldName: "PriorityGroup" FieldContents: "Must Do"
ID: 332 FieldName: "BusinessCategory" FieldContents: "Financial"
ID: 332 FieldName: "PriorityGroup" FieldContents: "Should Do"
Turn it into (sample end output)
ID BusinessCategory PriorityGroup
331 Regulatory Must Do
332 Financial Should DO
Here is the code block to extract to fieldnames and contents from the database.
public static IEnumerable<InitProjectValues1> GetProgramInitiativeAttributesPart1(int id)
{
using (dpm db = new dpm())
{
string partit = (string)HttpContext.Current.Session["sitePart"];
var configrefs = from c in (
from e in db.Metrics
join j in db.ProgramLink on e.ProjectRef equals j.LinkedProject
where (j.ProjectRef == id) && e.PartitNo == partit
select new
{
FieldName = e.FieldName,
FieldContents = e.MetricValue,
ProjectRef = e.ProjectRef,
})
select new InitProjectValues1
{
ProjectRef = c.ProjectRef,
FieldName = c.FieldName,
FieldContents = c.FieldContents,
}; //somewhere here would be the code to cover this into a single row per ProjectRef number.
return configrefs.ToList();
}
}
Here is the data model.
public class InitProjectValues1
{
public int? ProjectRef { get; set; }
public string FieldName { get; set; }
public string FieldContents { get; set; }
}
I really don't know where to go from here, hoping someone can provide guidance / sample code
The kind of operation you need is called a pivot. You are effectively rotating the table around a unique productRef and changing the rows to columns.
You could try this which makes use of a dynamic object which you require for dynamic column generation.
var configrefs = from c in (
from e in db.Metrics
join j in db.ProgramLink on e.ProjectRef equals j.LinkedProject
where (j.ProjectRef == id) && e.PartitNo == partit
select new
{
FieldName = e.FieldName,
FieldContents = e.MetricValue,
ProjectRef = e.ProjectRef,
}).ToArray();
return configrefs.ToPivotArray(
i => i.FieldName,
i => i.ProjectRef,
items => items.Any() ? items.FirstOrDefault().FieldContents : null);
Private method to get dynamic object:
private static dynamic GetAnonymousObject(IEnumerable<string> columns, IEnumerable<object> values)
{
IDictionary<string, object> eo = new ExpandoObject() as IDictionary<string, object>;
int i;
for (i = 0; i < columns.Count(); i++)
{
eo.Add(columns.ElementAt<string>(i), values.ElementAt<object>(i));
}
return eo;
}
And the extension method
public static dynamic[] ToPivotArray<T, TColumn, TRow, TData>(
this IEnumerable<T> source,
Func<T, TColumn> columnSelector,
Expression<Func<T, TRow>> rowSelector,
Func<IEnumerable<T>, TData> dataSelector)
{
var arr = new List<object>();
var cols = new List<string>();
String rowName = ((MemberExpression)rowSelector.Body).Member.Name;
var columns = source.Select(columnSelector).Distinct();
cols =(new []{ rowName}).Concat(columns.Select(x=>x.ToString())).ToList();
var rows = source.GroupBy(rowSelector.Compile())
.Select(rowGroup => new
{
Key = rowGroup.Key,
Values = columns.GroupJoin(
rowGroup,
c => c,
r => columnSelector(r),
(c, columnGroup) => dataSelector(columnGroup))
}).ToArray();
foreach (var row in rows)
{
var items = row.Values.Cast<object>().ToList();
items.Insert(0, row.Key);
var obj = GetAnonymousObject(cols, items);
arr.Add(obj);
}
return arr.ToArray();
}
Modified the ToPivotArray extension to handle multiple column selectors (using an anonymous class as the column selector)
public static dynamic[] ToPivotArrayNew<T, TColumn, TRow, TData>(
this IEnumerable<T> source,
Func<T, TColumn> columnSelector,
Expression<Func<T, TRow>> rowSelector,
Func<IEnumerable<T>, TData> dataSelector)
{
var arr = new List<object>();
var cols = new List<string>();
List<string> rowNames = new List<string>();
bool isObjectSelector = false;
if (rowSelector.Body.GetType() == typeof(MemberExpression))
{
rowNames.Add(((MemberExpression)rowSelector.Body).Member.Name);
}
else if (rowSelector.Body.GetType() == typeof(NewExpression))
{
isObjectSelector = true;
((NewExpression)rowSelector.Body).Members.ToList().ForEach(m => rowNames.Add(m.Name));
}
var columns = source.Select(columnSelector).Distinct();
cols = rowNames.ToArray().Concat(columns.Select(x => x.ToString())).ToList();
var rows = source.GroupBy(rowSelector.Compile())
.Select(rowGroup => new
{
Key = rowGroup.Key,
Values = columns.GroupJoin(
rowGroup,
c => c,
r => columnSelector(r),
(c, columnGroup) => dataSelector(columnGroup))
}).ToArray();
foreach (var row in rows)
{
var items = row.Values.Cast<object>().ToList();
if (isObjectSelector)
{
for (int i = 0; i < rowNames.Count(); i++)
{
items.Insert(i, row.Key.GetType().GetProperty(rowNames[i]).GetValue(row.Key));
}
}
else
{
items.Insert(0, row.Key);
}
var obj = GetAnonymousObject(cols, items);
arr.Add(obj);
}
return arr.ToArray();
}

Remove rows with same column value from DataTable and add corresponding values

I have a DataTable with multiple columns. If the value of certain column repeats, I need to remove that row and add the quantities against it. For example, following datatable
ITEM QTY
------------
1 20
2 10
2 10
3 20
would become:
ITEM QTY
-----------
1 20
2 20
3 20
This is what I did
var table = dt.AsEnumerable()
.GroupBy(row => row.Field("ITEM"))
.Select(group => group.First())
.CopyToDataTable();
It removes the extra row but doesn't add up the quantities. So please help me in this regard.
You can use Sum. You just have to find the duplicate-rows first:
var dupGroups = dt.AsEnumerable()
.GroupBy(row => row.Field<int>("ITEM"))
.Where(g => g.Count() > 1);
Now you can use them to get the sum and to remove the redundant rows from the table.
foreach (var group in dupGroups)
{
DataRow first = group.First();
int sum = group.Sum(r => r.Field<int>("QTY"));
first.SetField("QTY", sum);
foreach (DataRow row in group.Skip(1))
dt.Rows.Remove(row);
}
Or in one query which creates a new DataTable.
DataTable newTable = dt.AsEnumerable()
.GroupBy(row => row.Field<int>("ITEM"))
.Select(g =>
{
DataRow first = g.First();
if (g.Count() > 1)
{
int sum = g.Sum(r => r.Field<int>("QTY"));
first.SetField("QTY", sum);
}
return first;
})
.CopyToDataTable();
However, even the second approach modifies the original table which might be undesired since you use CopyToDatatable to create a new DataTable. You need to clone the original table(DataTable newTable = dt.Clone();) to get an empty table with the same schema. Then use NewRow + ItemArray.Clone() or table.ImportRow to create a real clone without modifying the original data.
See: C# simple way to copy or clone a DataRow?
Edit: Here is an example how you can create a clone without touching the original table:
DataTable newTable = dt.Clone();
var itemGroups = dt.AsEnumerable()
.GroupBy(row => row.Field<int>("ITEM"));
foreach (var group in itemGroups)
{
DataRow first = group.First();
if (group.Count() == 1)
newTable.ImportRow(first);
else
{
DataRow clone = newTable.Rows.Add((object[])first.ItemArray.Clone());
int qtySum = group.Sum(r => r.Field<int>("QTY"));
clone.SetField("QTY", qtySum);
}
}
var table = dt.AsEnumerable()
.GroupBy(row => row.Field<int>("ITEM"))
.Select(group => {
var row = group.First();
row['QTY'] = group.Sum(x => x.Field<int>('QTY'));
return row;
}).CopyToDataTable();
This won't change your original DataTable:
var table = dt.Copy().AsEnumerable()
.GroupBy(row=>row["ITEM"])
.Select(g=> {
DataRow dr = g.First();
dr.SetField("QTY", g.Sum(x=>x.Field<int>("QTY")));
return dr;
})
.CopyToDataTable();

How to concat only unique columns with data

I am using this below:
public static DataTable DataTableJoiner(DataTable dt1, DataTable dt2)
{
using (DataTable targetTable = dt1.Clone())
{
var dt2Query = dt2.Columns.OfType<DataColumn>().Select(dc =>
new DataColumn(dc.ColumnName, dc.DataType, dc.Expression,
dc.ColumnMapping));
var dt2FilterQuery = from dc in dt2Query.AsEnumerable()
where targetTable.Columns
.Contains(dc.ColumnName) == false
select dc;
targetTable.Columns.AddRange(dt2FilterQuery.ToArray());
var rowData = from row1 in dt1.AsEnumerable()
join row2 in dt2.AsEnumerable()
on row1.Field<int>("Code") equals
row2.Field<int>("Code")
select row1.ItemArray
.Concat(row2.ItemArray
.Where(r2 =>
row1.ItemArray.Contains(r2) == false)).ToArray();
foreach (object[] values in rowData) targetTable.Rows.Add(values);
return targetTable;
}
}
There is a problem with this line:
select row1.ItemArray.Concat(row2.ItemArray.Where(r2 =>
row1.ItemArray.Contains(r2) == false)).ToArray();
It seems to be saying don't include me if this value (rather than column) already exists.
I am using this method to join two tables together based on a column that both tables share, but I only want the unique columns with data of both tables as a final result.
Any ideas?
I am not sure if I understand your requirement 100%, but this:
row2.ItemArray.Where(r2 => row1.ItemArray.Contains(r2) == false)
will filter out those items that happen to appear in any column of table 1, not just the column you are joining on.
So what I would try to do is filter the item based on the index, using an overload of the Where extension method:
// Get the index of the column we are joining on:
int joinColumnIndex = dt2.Columns.IndexOf("Code");
// Now we can filter out the proper item in the rowData query:
row2.ItemArray.Where((r2,idx) => idx != joinColumnIndex)
...
No, wait. Here:
var dt2FilterQuery = from dc in dt2Query.AsEnumerable()
where targetTable.Columns
.Contains(dc.ColumnName) == false
select dc;
You are filtering out all columns of table 2 whose name also appear in table 1. So what you probably want is this:
public static DataTable DataTableJoiner(DataTable dt1, DataTable dt2)
{
DataTable targetTable = dt1.Clone();
var dt2Query = dt2.Columns.OfType<DataColumn>().Select(dc =>
new DataColumn(dc.ColumnName, dc.DataType, dc.Expression,
dc.ColumnMapping));
var dt2FilterQuery = from dc in dt2Query.AsEnumerable()
where !targetTable.Columns.Contains(dc.ColumnName)
select dc;
var columnsToAdd = dt2FilterQuery.ToArray();
var columnsIndices = columnsToAdd.Select(dc => dt2.Columns.IndexOf(dc.ColumnName));
targetTable.Columns.AddRange(columnsToAdd);
var rowData = from row1 in dt1.AsEnumerable()
join row2 in dt2.AsEnumerable()
on row1.Field<int>("Code") equals
row2.Field<int>("Code")
select row1.ItemArray
.Concat(row2.ItemArray
.Where((r2,idx) =>
columnsIndices.Contains(idx))).ToArray();
foreach (object[] values in rowData) targetTable.Rows.Add(values);
return targetTable;
}
Btw. I don't quite understand why you are wrapping the DataTable you return in a using statement. Imho it is kind of pointless to dispose the very object you return to your caller right away...

Categories