Linq Extension Methods and Error Handling - c#

I have the following C# code...
// We're essentially pivoting the data, using LINQ's GroupBy.
var pivotedOperands = Operands.GroupBy(o => new { outputid = (Guid)o[DETAILS_OUTPUTID], unitid = o[DETAILS_UNITID] })
.Select(g => new
{
PivotKey = g.Key,
c1 = g.Where(x => (int)x[DETAILS_OV_SEQUENCEID] == 1).Sum(x => double.Parse(x[DETAILS_VALUE].ToString())),
r1 = g.Where(x => (int)x[DETAILS_OV_SEQUENCEID] == 2).Sum(x => double.Parse(x[DETAILS_VALUE].ToString())),
a1 = g.Where(x => (int)x[DETAILS_OV_SEQUENCEID] == 3).Sum(x => double.Parse(x[DETAILS_VALUE].ToString()))
});
It takes the data in Operands (which is a List object) and uses the GroupBy() extension method to perform a pivot on the data. Essentially c1, r1 and a1 are all values in different DataRow objects with sequence IDs of 1, 2 and 3 respectively. (I can go into more detain on that if it becomes necessary, but I think it won't.)
So sometimes the value for c1 might be empty. (It's not supposed to, but bugs have happened further upstream in the process from time to time.) If c1 is not a numeric value, the double.Parse() call will raise an exception. That's fine. Here's my problem. If the Operands object contains, for example, 9 rows that will be pivoted into 3 rows and one of those nine values is not numeric, is it possible to determine which DataRow object raised the exception?
example:
If Operands contains the following values for SequenceID and Value...
OutputID UnitID SequenceID Value
A 1 1 '0'
A 1 2 '0'
A 1 3 '0'
A 2 1 ''
A 2 2 '0'
A 2 3 '0'
B 1 1 '0'
B 1 2 '0'
B 1 3 '0'
...then we will get an "Input string was not in a correct format" exception when it tries to process the empty string through the double.Parse() method for the 4th row of my data set. I want to raise a friendly exception to the users telling them which row is the problem; not just that there was a problem somewhere in this set of data. Is it possible to identify exactly what caused the exception?
If you create a new C# console application in Visual studio and dump the following code into the Main method, you will be able to reproduce my problem.
// Create a DataTable so that we can easily create new DataRows to add to our List.
DataTable dt = new DataTable();
DataColumn col = new DataColumn();
col.DataType = System.Type.GetType("System.String");
col.ColumnName = "OutputID";
dt.Columns.Add(col);
col = new DataColumn();
col.DataType = System.Type.GetType("System.Int32");
col.ColumnName = "UnitID";
dt.Columns.Add(col);
col = new DataColumn();
col.DataType = System.Type.GetType("System.Int32");
col.ColumnName = "SequenceID";
dt.Columns.Add(col);
col = new DataColumn();
col.DataType = System.Type.GetType("System.String");
col.ColumnName = "Value";
dt.Columns.Add(col);
// Create the List and add our sample data
List<DataRow> Operands = new List<DataRow>();
DataRow dr = dt.NewRow();
dr["OutputID"] = "A";
dr["UnitID"] = "1";
dr["SequenceID"] = 1;
dr["Value"] = "0";
Operands.Add(dr);
dr = dt.NewRow();
dr["OutputID"] = "A";
dr["UnitID"] = "1";
dr["SequenceID"] = 2;
dr["Value"] = "0";
Operands.Add(dr);
dr = dt.NewRow();
dr["OutputID"] = "A";
dr["UnitID"] = "1";
dr["SequenceID"] = 3;
dr["Value"] = "0";
Operands.Add(dr);
dr = dt.NewRow();
dr["OutputID"] = "A";
dr["UnitID"] = "2";
dr["SequenceID"] = 1;
dr["Value"] = ""; // This should cause an error.
Operands.Add(dr);
dr = dt.NewRow();
dr["OutputID"] = "A";
dr["UnitID"] = "2";
dr["SequenceID"] = 2;
dr["Value"] = "0";
Operands.Add(dr);
dr = dt.NewRow();
dr["OutputID"] = "A";
dr["UnitID"] = "2";
dr["SequenceID"] = 3;
dr["Value"] = "0";
Operands.Add(dr);
dr = dt.NewRow();
dr["OutputID"] = "B";
dr["UnitID"] = "1";
dr["SequenceID"] = 1;
dr["Value"] = "0";
Operands.Add(dr);
dr = dt.NewRow();
dr["OutputID"] = "B";
dr["UnitID"] = "1";
dr["SequenceID"] = 2;
dr["Value"] = "0";
Operands.Add(dr);
dr = dt.NewRow();
dr["OutputID"] = "B";
dr["UnitID"] = "1";
dr["SequenceID"] = 3;
dr["Value"] = "0";
Operands.Add(dr);
// Now pivot the data
try
{
var pivotedOperands = Operands.GroupBy(o => new { outputid = o[0], unitid = o[1] })
.Select(g => new
{
PivotKey = g.Key,
c1 = g.Where(x => (int)x[2] == 1).Sum(x => double.Parse(x[3].ToString())),
r1 = g.Where(x => (int)x[2] == 2).Sum(x => double.Parse(x[3].ToString())),
a1 = g.Where(x => (int)x[2] == 3).Sum(x => double.Parse(x[3].ToString()))
});
foreach (var o in pivotedOperands)
{
Console.WriteLine(string.Format("c1 = {0}; r1 = {1}; a1 = {2}", o.c1, o.r1, o.a1));
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Console.WriteLine("Done.");
Console.ReadLine();

Depending on how you want the information surfaced, you can either change the type of your results to account for the possibility of failure, or you can capture contextual information about the exception and throw a new exception with more information in it.
In either case, don't be afraid to use helper methods. For example, suppose you got rid of the repetitive code in your selector by creating a method like this:
string GetSumOrErrorMessage(int idToMatch, IEnumerable<DataRow> dataRow)
{
try
{
var sum = dataRow.Where(x => (int)x[2] == idToMatch).Sum(x => double.Parse(x[3].ToString()));
return sum.ToString();
}
catch (Exception)
{
return "Error happened here"; // or something more specific
}
}
Now you can change your query like this:
var pivotedOperands = Operands.GroupBy(o => new { outputid = o[0], unitid = o[1] })
.Select(g => new
{
PivotKey = g.Key,
c1 = GetSumOrErrorMessage(1, g),
r1 = GetSumOrErrorMessage(2, g),
a1 = GetSumOrErrorMessage(3, g)
});
And your output turns into:
c1 = 0; r1 = 0; a1 = 0
c1 = Error happened here; r1 = 0; a1 = 0
c1 = 0; r1 = 0; a1 = 0
If you like this pattern, rather than just returning a string you may want to look into specialized Monadic types that can help with this. For example, you could create a class that has a generic value when an action is successful, or an error message when it's not. You can create a variety of extension methods and helpers to make this easier to deal with, similar to how my CallMeMaybe library would allow you to attempt to parse a value, but just return an empty Maybe<> if parsing fails. (e.g. Maybe.From(x[3].ToString()).ParseInt64().Select(i => i.ToString()).Else("Error happened here")).
Alternatively, if you actually want to halt when you get bad input, but still want to know where the bad input was, you can catch and throw:
double GetSum(int idToMatch, IGrouping<object, DataRow> dataRows)
{
try
{
return dataRows.Where(x => (int)x[2] == idToMatch).Sum(x => double.Parse(x[3].ToString()));
}
catch (Exception e)
{
throw new Exception($"Failure when matching {idToMatch} with group {dataRows.Key}", e);
}
}
...
var pivotedOperands = Operands.GroupBy(o => new { outputid = o[0], unitid = o[1] })
.Select(g => new
{
PivotKey = g.Key,
c1 = GetSum(1, g),
r1 = GetSum(2, g),
a1 = GetSum(3, g)
});
Output:
c1 = 0; r1 = 0; a1 = 0
Failure when matching 1 with group { outputid = A, unitid = 2 }

You try using TryParse to get around the exception. If TryParse is false then default to zero (0)
.Sum(x => {
double value = 0;
return double.TryParse(x[DETAILS_VALUE].ToString(), out value) ? value : 0;
})

Related

Get DataRow if column names in a string array have matching values in a string array

I'm trying to get a DataRow from a dtResult datatable if column name in [colName] list has a matching value as [grbByValue] list. my goal in the below code is to get [test1] and [test2] return datarow from dtResult and should be the same as [update] (which is hard coded). but have issue in both test1 & test2. test1 has error and don't know how fix and test2 is returning null.
rule is a DataTable that looks like this:
All the below logic is run for each row of rule.
dtResult is also a DataTable that looks like this:
EDITED CODE
string[] grpby = { "ageband","gender","code"};
List<string> grbByValue = new List<string>() { "1","85+","1","1010"};
DataTable dtResult = new DataTable();
DataColumn dc = dtResult.Columns.Add("id", typeof(int));
dc.AutoIncrement = true;
dc.AutoIncrementSeed = 1;
dc.AutoIncrementStep = 1;
dtResult.Columns.Add("DataSourceID");
dtResult.Columns["DataSourceID"].DefaultValue = "1";
dtResult.Columns.Add("RuleID");
dtResult.Columns.Add("GroupBy0");
dtResult.Columns.Add("GroupBy1");
dtResult.Columns.Add("GroupBy2");
dtResult.Columns.Add("GroupBy3");
dtResult.Columns.Add("GroupBy4");
dtResult.Columns.Add("GroupBy5");
dtResult.Columns.Add("Result", typeof(decimal));
dtResult.Columns["Result"].DefaultValue = 0.00;
var colName = (from a in dtResult.Columns.Cast<DataColumn>()
where a.ColumnName.ToString().StartsWith("GroupBy")
select a.ColumnName).OrderBy(x => x).ToList();
colName.Insert(0, "RuleID");
colName = colName.GetRange(0, grbByValue.Count);
//comment/UNCOMMENT below to test [test1]
//DataRow z = dtResult.NewRow();
//for (int i = 0; i < grbByValue.Count; i++)
//{
// z[colName[i]] = grbByValue[i];
//}
//dtResult.Rows.Add(z.ItemArray);
var distDtResult = dtResult.DefaultView.ToTable(true, colName.ToArray());
bool exist = false;
DataRow update = null;
foreach (DataRow dr in distDtResult.Rows)
{
var row = dr.ItemArray.ToList();
exist = row.SequenceEqual(grbByValue);
if (exist == true)
{
//var test1 = (from t1 in distDtResult.AsEnumerable().Where(r => r.ItemArray == dr.ItemArray)
// join t2 in (from m in dtResult.AsEnumerable()
// select new
// {
// //ideally the below column list will be derived from [colName] dynamically
// RuleID = m.Field<string>("RuleID"),
// GroupBy0 = m.Field<string>("GroupBy0"),
// GroupBy1 = m.Field<string>("GroupBy1"),
// GroupBy2 = m.Field<string>("GroupBy2")
// }) on t1.ItemArray equals t2.ItemArray
// select new
// {
// t2
// }).FirstOrDefault();
update = dtResult.AsEnumerable().Where(r =>
r.Field<int>("id") == 1 &&
r.Field<string>("DataSourceID") == "1" &&
r.Field<string>("RuleID") == "1" &&
r.Field<string>("GroupBy0") == "85+" &&
r.Field<string>("GroupBy1") == "1" &&
r.Field<string>("GroupBy2") == "1010").FirstOrDefault();
break;
}
}
if (exist == false)
{
DataRow a = dtResult.NewRow();
for (int i = 0; i < grbByValue.Count; i++)
{
a[colName[i]] = grbByValue[i];
}
dtResult.Rows.Add(a.ItemArray);
var test2 = dtResult.AsEnumerable().Where(r => r.ItemArray.Equals(a.ItemArray)).FirstOrDefault();
update = dtResult.AsEnumerable().Where(r =>
r.Field<int>("id") == 1 &&
r.Field<string>("DataSourceID") == "1" &&
r.Field<string>("RuleID") == "1" &&
r.Field<string>("GroupBy0") == "85+" &&
r.Field<string>("GroupBy1") == "1" &&
r.Field<string>("GroupBy2") == "1010").FirstOrDefault();
}
This might be a good starting point, at least to better ask questions and move towards an answer.
string[] colName = { "RuleID", "GroupBy0", "GroupBy1", "GroupBy2" };
// "All the below logic is run for each row of rule"
// this goes through each row of the rule DataTable
foreach (DataRow rule in ruleTable.Rows)
{
// This is going to be equivalent to the grpby variable you specified
var groupRules = rule.Field<string>("GroupBy").ToString().Split("|");
// Some sort of mapping may need to go here to go from "ageband" to "GroupBy0", "gender" to "GroupBy1", etc.
foreach(DataRow row in dtResult.Rows)
{
DataTable distDtResult = dtResult.DefaultView.ToTable(true, colName);
var updateTEST = from dr in distDtResult.AsEnumerable()
where dr.Field<string>("RuleID") == rule["RuleID"].ToString()
&& dr.Field<string>("GroupBy0") == row["GroupBy0"].ToString() // ageband
&& dr.Field<string>("GroupBy1") == row["GroupBy1"].ToString() // gender
&& dr.Field<string>("GroupBy2") == row["GroupBy2"].ToString() // code
// more
select dr;
}
}

Row already belongs to another table error when trying to add rows?

I tried this solution below:
This Row already belongs to another table error when trying to add rows?
I have a Datatable that contains 597 Columns and 20 Rows and are trying to export the data to excel. However, Excel has a maximum column count 256 and so I need to divide the source data into 3 datatables to make the export work.
Below is the code I have written.
var dtmasterdata = data.Tables[name];
for (int j = 1; j < datatableNumberCount; j++)
{
DataTable dt2 = new DataTable();
dt2.TableName = "Master_" + j;
dt2 = dtmasterdata.Copy();
foreach (DataColumn col in dtmasterdata.Columns)
{
DataColumn dtcol = new DataColumn();
dtcol = col;
dt2.Columns.Add(dtcol.ColumnName, dtcol.DataType);
}
for (int k = 0; k < dtmasterdata.Rows.Count; k++)
{
DataRow dr = dt2.NewRow();
dr = dtmasterdata.Rows[k];
dt2.ImportRow(dtmasterdata.Rows[k]);
//dt2.Rows.Add(dr.ItemArray);
}
After that I need to delete few columns like below and I want to create 3 datatables
foreach (DataColumn col in dtmasterdata.Columns)
{
if (j == 1)
{
// condition 1
if (col.Ordinal >= 255)
{
dt2.Columns.RemoveAt(col.Ordinal);
}
}
if (j == 2)
{
// condition 2.
if (col.Ordinal < 255 || col.Ordinal >= 510)
{
dt2.Columns.RemoveAt(col.Ordinal);
}
}
if (j == 3)
{
// condition 3.
if (col.Ordinal <= 510 || col.Ordinal >= 765)
{
dt2.Columns.Add(col);
}
}
}
int worksheetNumber = 1;
string worksheetNameWithNumber = "Master Data";
if (worksheetNumber > 1)
worksheetNameWithNumber = String.Format("{0}_{1}", ws1, worksheetNumber.ToString());
Infragistics.Excel.Worksheet worksheet = wb.Worksheets.Add(worksheetNameWithNumber);
Infragistics.WebUI.UltraWebGrid.UltraWebGrid masterData1 = new Infragistics.WebUI.UltraWebGrid.UltraWebGrid("masterDataGrid");
masterData1.Browser = Infragistics.WebUI.UltraWebGrid.BrowserLevel.UpLevel;
masterData1.DataSource = dt2;
masterData1.DataMember = "Master_" + j;
masterData1.DisplayLayout.HeaderStyleDefault.Font.Bold = true;
masterData1.DisplayLayout.HeaderStyleDefault.Font.Name = "Arial";
masterData1.DisplayLayout.HeaderStyleDefault.Font.Size = FontUnit.Parse("10px");
masterData1.DisplayLayout.HeaderStyleDefault.BackColor = System.Drawing.Color.LightGray;
masterData1.DisplayLayout.RowStyleDefault.Font.Name = "Arial";
masterData1.DisplayLayout.RowStyleDefault.Font.Size = FontUnit.Parse("10px");
Infragistics.WebUI.UltraWebGrid.UltraGridBand masterBand1 = new Infragistics.WebUI.UltraWebGrid.UltraGridBand();
masterData1.Bands.Add(masterBand1);
dgResults.Controls.Add(masterData1);
masterData1.DataBind();
wb.ActiveWorksheet = worksheet;
this.ugWebGridExporter.Export(masterData1, worksheet);
worksheetNumber++;
Your error is because you are trying to add a column to a datatable that already belongs to your source datatable.
dt2.Columns.Add(col);
You can't just iterate through the columns of a datatable and add them to another.
I've a solution to this, which involves cloning the source data and removing what you don't need.
1st, make 3 clones of the datatables you need. Below is an example with me creating my own source table with 596 columns. Notice that clone only takes the data table structure, no data!
var source597ColsTable = new DataTable("Source");
for (var i = 0; i <= 596; i++)
{
source597ColsTable.Columns.Add(new DataColumn("Column" + i , typeof(string)));
}
DataRow newRow = source597ColsTable.NewRow();
source597ColsTable.Rows.Add(newRow);
var cols0To199Table = source597ColsTable.Clone();
var cols200To399Table = source597ColsTable.Clone();
var cols400To596Table = source597ColsTable.Clone();
Next copy all the rows from the source table into the clones. The below is a simple function to do so.
private DataTable CopyRowsFromSource(DataTable sourceTable, DataTable destinationTable)
{
foreach (DataRow row in sourceTable.Rows)
{
destinationTable.Rows.Add(row.ItemArray);
}
return destinationTable;
}
Then call this function for each of your tables.
cols0To199Table = CopyRowsFromSource(source597ColsTable, cols0To199Table);
cols200To399Table = CopyRowsFromSource(source597ColsTable, cols200To399Table);
cols400To596Table = CopyRowsFromSource(source597ColsTable, cols400To596Table);
Finally, remove all the columns from the datatables to give you your split.
private DataTable RemoveColumns(DataTable table, int startCol, int endCol)
{
var colsToRemove = new List<DataColumn>();
for (var colCount = startCol; colCount <= endCol; colCount++)
{
colsToRemove.Add(table.Columns[colCount]);
}
foreach (DataColumn col in colsToRemove)
{
table.Columns.Remove(col);
}
return table;
}
Then call.. again for each cloned table.
cols0To199Table = RemoveColumns(cols0To199Table, 200, 596);
cols200To399Table = RemoveColumns(cols200To399Table, 0, 199);
cols200To399Table = RemoveColumns(cols200To399Table, 200, 396);
cols400To596Table = RemoveColumns(cols400To596Table, 0, 399);
After running this, you will have 3 datatables, columns 0-199, 200-399 and 400-596.
Hope that helps.
I am not sure to have really understood all of your code, but to copy a subset of columns to another datatable there is a very simple method in the DataView class named ToTable where you can list the columns you want in the new table. As added bonus, this method copies also the data in the 20 rows of your original table.
So the only difficult is to list these columns to the method.
You can proceed in this way using linq over the DataColumn collection
string[] firstCols = dtmasterdata.Columns
.Cast<DataColumn>()
.Take(255)
.Select(x => x.ColumnName).ToArray();
string[] secondCols = dtmasterdata.Columns
.Cast<DataColumn>()
.Skip(255)
.Take(255)
.Select(x => x.ColumnName).ToArray();
string[] thirdCols = dtmasterdata.Columns
.Cast<DataColumn>()
.Skip(510)
.Select(x => x.ColumnName).ToArray();
DataTable t1 = dtmasterdata.DefaultView.ToTable("Master_1", false, firstCols);
DataTable t2 = dtmasterdata.DefaultView.ToTable("Master_2", false, secondCols);
DataTable t3 = dtmasterdata.DefaultView.ToTable("Master_3", false, thirdCols);

DropDownCheckBoxes doesn't bind with DataSet values

I'm using DropDownCheckBoxes CodePlex control. This works fine with the below code
var t = new string[20];
var currentYear = DateTime.Now.Year;
for (int i = 0; i < t.Length; i++)
t[i] = "Test " + i.ToString();
DropDownCheckBoxes1.DataSource = t;
DropDownCheckBoxes1.DataBind();
But when I use the same logic and get the value from DataSet, it doesn't work. DropDownCheckBoxes1 is not loaded with any values. Please let me know what is wrong here. I know we can reduce the code here and directly assign DropDownCheckBoxes1.DataSource = q.Distinct() but nothing is working for me
DataSet ds = GetTheData("Jan 2014");
DataTable dt = ds.Tables[0];
var q = from a in dt.AsEnumerable()
where a.Field<string>("SomeColumn1") == "Jan 2014"
select a.Field<string>("SomeColumn2");
var s = q.Distinct().ToList();
var years = new string[s.Count];
for (int i = 0; i < s.Count; i++)
years[i] = s[i];
DropDownCheckBoxes1.DataSource = years;
DropDownCheckBoxes1.DataBind();
Try this:
DataSet ds = GetTheData("Jan 2014");
DataTable dt = ds.Tables[0];
var q = dt.AsEnumerable().Where(a => a.Field<String>("SomeColumn1") == "Jan 2014")
.Select(a => a.Field<String>("SomeColumn2"));
DropDownCheckBoxes1.DataSource = q;
DropDownCheckBoxes1.DataBind();

Update two columns in a DataTable using LINQ

I want to update two columns of DataTable in a single line using LINQ query. Currently I am using following two lines to do the same:
oldSP.Select(string.Format("[itemGuid] = '{0}'", itemGuid)).ToList<DataRow>().ForEach(r => r["startdate"] = stDate);
oldSP.Select(string.Format("[itemGuid] = '{0}'", itemGuid)).ToList<DataRow>().ForEach(r => r["enddate"] = enDate);
How can I do this in one line, using one Select?
You can do it in one 'line', just pass appropriate action delegate to ForEach method:
oldSP.Select(string.Format("[itemGuid] = '{0}'", itemGuid))
.ToList<DataRow>()
.ForEach(r => {
r["startdate"] = stDate;
r["enddate"] = enDate;
});
Also you can use LINQ to DataSet (looks more readable to me, than one-liner):
var rowsToUpdate =
oldSP.AsEnumerable().Where(r => r.Field<string>("itemGuid") == itemGuid);
foreach(var row in rowsToUpdate)
{
row.SetField("startdate", stDate);
row.SetField("enddate", enDate);
}
Use curly bracers to do two on more operations:
oldSP.Select(string.Format("[itemGuid] = '{0}'", itemGuid))
.ToList<DataRow>()
.ForEach(r => { r["enddate"] = enDate); r["startdate"] = stDate; });
But for code readability I would use old-fashioned foreach loop.
Try this :
oldSP.Select(string.Format("[itemGuid] = '{0}'", itemGuid)).ToList<DataRow>()
.ForEach(r => { r["startdate"] = stDate; r["enddate"] = enDate; });
I didn't like any of the examples I saw on the web, so here's my example
DataTable dt = new DataTable();
dt.Columns.Add("Year");
dt.Columns.Add("Month");
dt.Columns.Add("Views");
for (int year = 2011; year < 2015; year++)
{
for (int month = 1; month < 13; month++)
{
DataRow newRow = dt.NewRow();
newRow[0] = year;
newRow[1] = month;
newRow[2] = 0;
dt.Rows.Add(newRow);
}
}
dataGridView1.DataSource = dt;
//if using Lambda
//var test = dt.AsEnumerable().Where(x => x.Field<string>("Year") == "2013" && x.Field<string>("Month") == "2").ToList();
var test = (from x in dt.AsEnumerable()
where x.Field<string>("Year") == "2013"
where x.Field<string>("Month") == "2"
select x).ToList();
test[0][0] = "2015";
dt.AcceptChanges();
//if writing to sql use dt.SubmitChanges() instead

Remove all rows in duplication (different from distinct row selection)

How can I remove EVERY duplicating row in a DataTable, based on the value of two columns that are in duplication. Unfortunately, I am unable to find the equivalent LINQ Query. (I dont want distinct values even). The table below shall explain my problem
I want to delete every row in duplication based on Column_A and Column_B
COLUMN_A COLUMN_B COLUMN_C COLUMN_D.....
A B
C D
E F
G H
A B
E F
EXPECTED OUTPUT:
COLUMN_A COLUMN_B COLUMN_C COLUMN_D.....
C D
G H
Please help
var rowsToDelete = dataTable.AsEnumerable()
.GroupBy(r => new{A=r["COLUMN_A"],B=r["COLUMN_B"]})
.Where(g => g.Count() > 1)
.SelectMany(g=>g)
.ToList();
foreach (var row in rowsToDelete)
{
dataTable.Rows.Remove(row);
}
You can try with this sample
Link : http://geekswithblogs.net/ajohns/archive/2004/06/24/7191.aspx
Coding
private static void RemoveDuplicates(DataTable tbl,
DataColumn[] keyColumns)
{
int rowNdx = 0;
while(rowNdx < tbl.Rows.Count-1)
{
DataRow[] dups = FindDups(tbl, rowNdx, keyColumns);
if(dups.Length>0)
{
foreach(DataRow dup in dups)
{
tbl.Rows.Remove(dup);
}
}
else
{
rowNdx++;
}
}
}
private static DataRow[] FindDups(DataTable tbl,
int sourceNdx,
DataColumn[] keyColumns)
{
ArrayList retVal = new ArrayList();
DataRow sourceRow = tbl.Rows[sourceNdx];
for(int i=sourceNdx + 1; i<tbl.Rows.Count; i++)
{
DataRow targetRow = tbl.Rows[i];
if(IsDup(sourceRow, targetRow, keyColumns))
{
retVal.Add(targetRow);
}
}
return (DataRow[]) retVal.ToArray(typeof(DataRow));
}
private static bool IsDup(DataRow sourceRow,
DataRow targetRow,
DataColumn[] keyColumns)
{
bool retVal = true;
foreach(DataColumn column in keyColumns)
{
retVal = retVal && sourceRow[column].Equals(targetRow[column]);
if(!retVal) break;
}
return retVal;
}
Test
// create an example datatable with duplicate rows
DataTable tbl = new DataTable();
tbl.Columns.Add("ColumnA");
tbl.Columns.Add("ColumnB");
tbl.Columns.Add("ColumnC");
for(int i = 0; i<10; i++)
{
DataRow nr = tbl.NewRow();
nr["ColumnA"] = "A" + i.ToString();
nr["ColumnB"] = "B" + i.ToString();
nr["ColumnC"] = "C" + i.ToString();
tbl.Rows.Add(nr);
// duplicate
nr = tbl.NewRow();
nr["ColumnA"] = "A" + i.ToString();
nr["ColumnB"] = "B" + i.ToString();
nr["ColumnC"] = "C" + i.ToString();
tbl.Rows.Add(nr);
}
PrintRows(tbl); // show table with duplicates
//Create an array of DataColumns to compare
//If these columns all match we consider the
//rows duplicate.
DataColumn[] keyColumns =
new DataColumn[]{tbl.Columns["ColumnA"],
tbl.Columns["ColumnA"]};
//remove the duplicates
RemoveDuplicates(tbl, keyColumns);

Categories