Removal of Duplicate Rows from Data table Based on Multiple columns

Removal of Duplicate Rows from Data table Based on Multiple columns - c#

I have data table which contains many duplicate rows i need to filter those rows from data table based upon multiple columns to get distinct rows in resultant data table....
Barcode Itemid PacktypeId
1 100 1
1 100 2
1 100 3
1 100 1
1 100 3
need only rows which contains packtypeid 1,2,3 remaining 4th and 5th row should be removed
I have tried using two methods but none didn't turns for better result
Data table contains more than 10 columns but unique column's is "Barcode", "ItemID", "PackTypeID"
Method-1:
dt_Barcode = dt_Barcode.DefaultView.ToTable(true, "Barcode", "ItemID", "PackTypeID");
The above method filter's the rows but it returns columns only 3 column values i need entire 10 column values.
Method-2:
List<string> keyColumns = new List<string>();
keyColumns.Add("Barcode");
keyColumns.Add("ItemID");
keyColumns.Add("PackTypeID");
RemoveDuplicates(DataTable table, List<string> keyColumns)
{
var uniqueness = new HashSet<string>();
StringBuilder sb = new StringBuilder();
int rowIndex = 0;
DataRow row;
DataRowCollection rows = table.Rows;
int i = rows.Count;
while (rowIndex < i)
{
row = rows[rowIndex];
sb.Length = 0;
foreach (string colname in keyColumns)
{
sb.Append(row[colname]);
sb.Append("|");
}
if (uniqueness.Contains(sb.ToString()))
{
rows.Remove(row);
}
else
{
uniqueness.Add(sb.ToString());
rowIndex++;
}
}
The Above Method returns exception like there is no rows at position 5

Method 3:
Instead of Trying above 2 methods i found this Linq Method something very useful
dt_Barcode = dt_Barcode.AsEnumerable().GroupBy(r => new { ItemID = r.Field<Int64>("ItemID"), PacktypeId = r.Field<Int32>("PackTypeID") }).Select(g => g.First()).CopyToDataTable();

It happens because you remove rows.
If you want to preserve the same algorithm, instead of using while (rowIndex < i) use this form of loop:
for (var rowIndex = rows.Count - 1; rowIndex >= 0; rowIndex--)
{
...
if (uniqueness.Contains(sb.ToString()))
{
rows.Remove(row);
rowIndex--;
}
...
}

public void RemoveDuplicatesFromDataTable(ref DataTable table, List<string> keyColumns)
{
Dictionary<string, string> uniquenessDict = new Dictionary<string, string>(table.Rows.Count);
StringBuilder stringBuilder = null;
int rowIndex = 0;
DataRow row;
DataRowCollection rows = table.Rows;
string error = string.Empty;
try
{
while (rowIndex < rows.Count)
{
row = rows[rowIndex];
stringBuilder = new StringBuilder();
foreach (string colname in keyColumns)
{
try
{
if (row[colname].ToString() != string.Empty)
{
stringBuilder.Append(((string)row[colname]));
}
else
{
//If it comes here, means one of the keys are blank
error += "One of the key values is blank.";
}
}
catch (Exception ss)
{
error += "Error " + ss.Message + ".";
}
}
if (uniquenessDict.ContainsKey(stringBuilder.ToString()))
{
rows.Remove(row);
}
else
{
uniquenessDict.Add(stringBuilder.ToString().Replace(",", ""), string.Empty);
rowIndex++;
}
}
}
catch (Exception ex)
{
error = "Failed - " + ex.Message;
}
if(error != string.Empty)
Show`enter code here`(error);
}

Related

Row already belongs to another table error when trying to add rows?

I tried this solution below:
This Row already belongs to another table error when trying to add rows?
I have a Datatable that contains 597 Columns and 20 Rows and are trying to export the data to excel. However, Excel has a maximum column count 256 and so I need to divide the source data into 3 datatables to make the export work.
Below is the code I have written.
var dtmasterdata = data.Tables[name];
for (int j = 1; j < datatableNumberCount; j++)
{
DataTable dt2 = new DataTable();
dt2.TableName = "Master_" + j;
dt2 = dtmasterdata.Copy();
foreach (DataColumn col in dtmasterdata.Columns)
{
DataColumn dtcol = new DataColumn();
dtcol = col;
dt2.Columns.Add(dtcol.ColumnName, dtcol.DataType);
}
for (int k = 0; k < dtmasterdata.Rows.Count; k++)
{
DataRow dr = dt2.NewRow();
dr = dtmasterdata.Rows[k];
dt2.ImportRow(dtmasterdata.Rows[k]);
//dt2.Rows.Add(dr.ItemArray);
}
After that I need to delete few columns like below and I want to create 3 datatables
foreach (DataColumn col in dtmasterdata.Columns)
{
if (j == 1)
{
// condition 1
if (col.Ordinal >= 255)
{
dt2.Columns.RemoveAt(col.Ordinal);
}
}
if (j == 2)
{
// condition 2.
if (col.Ordinal < 255 || col.Ordinal >= 510)
{
dt2.Columns.RemoveAt(col.Ordinal);
}
}
if (j == 3)
{
// condition 3.
if (col.Ordinal <= 510 || col.Ordinal >= 765)
{
dt2.Columns.Add(col);
}
}
}
int worksheetNumber = 1;
string worksheetNameWithNumber = "Master Data";
if (worksheetNumber > 1)
worksheetNameWithNumber = String.Format("{0}_{1}", ws1, worksheetNumber.ToString());
Infragistics.Excel.Worksheet worksheet = wb.Worksheets.Add(worksheetNameWithNumber);
Infragistics.WebUI.UltraWebGrid.UltraWebGrid masterData1 = new Infragistics.WebUI.UltraWebGrid.UltraWebGrid("masterDataGrid");
masterData1.Browser = Infragistics.WebUI.UltraWebGrid.BrowserLevel.UpLevel;
masterData1.DataSource = dt2;
masterData1.DataMember = "Master_" + j;
masterData1.DisplayLayout.HeaderStyleDefault.Font.Bold = true;
masterData1.DisplayLayout.HeaderStyleDefault.Font.Name = "Arial";
masterData1.DisplayLayout.HeaderStyleDefault.Font.Size = FontUnit.Parse("10px");
masterData1.DisplayLayout.HeaderStyleDefault.BackColor = System.Drawing.Color.LightGray;
masterData1.DisplayLayout.RowStyleDefault.Font.Name = "Arial";
masterData1.DisplayLayout.RowStyleDefault.Font.Size = FontUnit.Parse("10px");
Infragistics.WebUI.UltraWebGrid.UltraGridBand masterBand1 = new Infragistics.WebUI.UltraWebGrid.UltraGridBand();
masterData1.Bands.Add(masterBand1);
dgResults.Controls.Add(masterData1);
masterData1.DataBind();
wb.ActiveWorksheet = worksheet;
this.ugWebGridExporter.Export(masterData1, worksheet);
worksheetNumber++;

Your error is because you are trying to add a column to a datatable that already belongs to your source datatable.
dt2.Columns.Add(col);
You can't just iterate through the columns of a datatable and add them to another.
I've a solution to this, which involves cloning the source data and removing what you don't need.
1st, make 3 clones of the datatables you need. Below is an example with me creating my own source table with 596 columns. Notice that clone only takes the data table structure, no data!
var source597ColsTable = new DataTable("Source");
for (var i = 0; i <= 596; i++)
{
source597ColsTable.Columns.Add(new DataColumn("Column" + i , typeof(string)));
}
DataRow newRow = source597ColsTable.NewRow();
source597ColsTable.Rows.Add(newRow);
var cols0To199Table = source597ColsTable.Clone();
var cols200To399Table = source597ColsTable.Clone();
var cols400To596Table = source597ColsTable.Clone();
Next copy all the rows from the source table into the clones. The below is a simple function to do so.
private DataTable CopyRowsFromSource(DataTable sourceTable, DataTable destinationTable)
{
foreach (DataRow row in sourceTable.Rows)
{
destinationTable.Rows.Add(row.ItemArray);
}
return destinationTable;
}
Then call this function for each of your tables.
cols0To199Table = CopyRowsFromSource(source597ColsTable, cols0To199Table);
cols200To399Table = CopyRowsFromSource(source597ColsTable, cols200To399Table);
cols400To596Table = CopyRowsFromSource(source597ColsTable, cols400To596Table);
Finally, remove all the columns from the datatables to give you your split.
private DataTable RemoveColumns(DataTable table, int startCol, int endCol)
{
var colsToRemove = new List<DataColumn>();
for (var colCount = startCol; colCount <= endCol; colCount++)
{
colsToRemove.Add(table.Columns[colCount]);
}
foreach (DataColumn col in colsToRemove)
{
table.Columns.Remove(col);
}
return table;
}
Then call.. again for each cloned table.
cols0To199Table = RemoveColumns(cols0To199Table, 200, 596);
cols200To399Table = RemoveColumns(cols200To399Table, 0, 199);
cols200To399Table = RemoveColumns(cols200To399Table, 200, 396);
cols400To596Table = RemoveColumns(cols400To596Table, 0, 399);
After running this, you will have 3 datatables, columns 0-199, 200-399 and 400-596.
Hope that helps.

I am not sure to have really understood all of your code, but to copy a subset of columns to another datatable there is a very simple method in the DataView class named ToTable where you can list the columns you want in the new table. As added bonus, this method copies also the data in the 20 rows of your original table.
So the only difficult is to list these columns to the method.
You can proceed in this way using linq over the DataColumn collection
string[] firstCols = dtmasterdata.Columns
.Cast<DataColumn>()
.Take(255)
.Select(x => x.ColumnName).ToArray();
string[] secondCols = dtmasterdata.Columns
.Cast<DataColumn>()
.Skip(255)
.Take(255)
.Select(x => x.ColumnName).ToArray();
string[] thirdCols = dtmasterdata.Columns
.Cast<DataColumn>()
.Skip(510)
.Select(x => x.ColumnName).ToArray();
DataTable t1 = dtmasterdata.DefaultView.ToTable("Master_1", false, firstCols);
DataTable t2 = dtmasterdata.DefaultView.ToTable("Master_2", false, secondCols);
DataTable t3 = dtmasterdata.DefaultView.ToTable("Master_3", false, thirdCols);

Ordering a ConcurrentDictionary. Why is this not working?

We have a C# app that populates tables on worksheets within an Excel document.
The tables must be populated in the order the rows are returned from the database.
The object DataFileColData is defined as a List and contains the result set rows. For testing purposes, I'm only using [0] of the List.
Code segment #1 below doesn't work. Row order is not preserved in that the end result has the data displayed out of order although the numbers themselves are listed in order:
if (DataFileColData[0].Count() > 0)
{
ConcurrentDictionary<int, DataRow> theRows = new ConcurrentDictionary<int, DataRow>(9, DataFileColData[0].Count());
Parallel.For(0, DataFileColData[0].Count(), i =>
{
// go through each column
int c = 0;
try
{
foreach (var Col in DataFileColData)
{
var cell = Col[i];
if (cell != null)
{
if (cell.GetType().Name == "JArray") //If Jarray then table compression was used not column compression
{
if (theRows.TryAdd(i, Dt.NewRow()))
theRows[i].ItemArray = JsonConvert.DeserializeObject<object[]>(Col[i].ToString());
}
else
{
if (theRows.TryAdd(i, Dt.NewRow()))
theRows[i][c] = cell;
}
}
c++;
}
} //try
catch (Exception e)
{
throw new Exception("Exception thrown in \"PublicMethods.cs | RenderExcelFile\" while in foreach loop over DataFileColData: " + e.ToString());
}
} //for
); //parallel
//Add the rows to the datatable in their original order
//(might have gotten skewed from the parallel.for loop)
for (int x = 0; x < theRows.Count; x++)
Dt.Rows.Add(theRows[x]);
//Set the name so it appears nicely in the Excel Name Box dropdown instead of "table1", "table2", etc etc.
Dt.TableName = ExcelTableSpec.TableTitle + " " + r.TableID;
}
code segment #2 below does work with the row order and data associated with each row preserved :
if (DataFileColData[0].Count() > 0)
{
DataRow[] theRows = new DataRow[DataFileColData[0].Count()];
Parallel.For(0, DataFileColData[0].Count(), i =>
{
DataRow Rw = Dt.NewRow();
// go through each column
int c = 0;
try
{
foreach (var Col in DataFileColData)
{
var cell = Col[i];
if (cell != null)
{
if (cell.GetType().Name == "JArray") //If Jarray then table compression was used not column compression
{
lock (theRows)
{
theRows[i] = Dt.NewRow();
theRows[i].ItemArray = JsonConvert.DeserializeObject<object[]>(Col[i].ToString());
}
}
else
{
lock (theRows)
{
theRows[i] = Dt.NewRow();
theRows[i][c] = cell;
}
}
}
c++;
}
} //try
catch (Exception e)
{
throw new Exception("Exception thrown in \"PublicMethods.cs | RenderExcelFile\" while in foreach loop over DataFileColData: " + e.ToString());
}
} //for
); //parallel
//Add the rows to the datatable in their original order
//(might have gotten skewed from the parallel.for loop)
Dt = theRows.CopyToDataTable();
//Set the name so it appears nicely in the Excel Name Box dropdown instead of "table1", "table2", etc etc.
Dt.TableName = ExcelTableSpec.TableTitle + " " + r.TableID;
}
I don't understand why. I didn't think the locking mechanism would be needed because each thread gets its own instance of "i" and a ConcurrentDictionary is supposed to be thread safe.
Would someone be able to explain to me please why the code isn't working the way I think it should?
Thank you!
UPDATED CODE as per #Enigmativity's comments below.
The MSDN documentation isn't quite clear (to me anyway), but does appear to update the DataTable even though the MSDN documentation doesn't indicate it does when executing the NewRow() method.
New working code below:
if (DataFileColData[0].Count() > 0)
{
DataRow[] theRows = new DataRow[DataFileColData[0].Count()];
Parallel.For(0, DataFileColData[0].Count(), i =>
//for (int i = 0; i < DataFileColData[0].Count(); i++)
{
lock (Dt)
{
theRows[i] = Dt.NewRow();
}
// go through each column
int c = 0;
try
{
foreach (var Col in DataFileColData)
{
var cell = Col[i];
if (cell != null)
{
if (cell.GetType().Name == "JArray") //If Jarray then table compression was used not column compression
{
theRows[i].ItemArray = JsonConvert.DeserializeObject<object[]>(Col[i].ToString());
}
else
{
theRows[i][c] = cell;
}
}
c += 1;
} //foreach
} //try
catch (Exception e)
{
throw new Exception("Exception thrown in \"PublicMethods.cs | RenderExcelFile\" while in foreach loop over DataFileColData: " + e.ToString());
}
} //for
); //parallel
//Add the rows to the datatable in their original order
//(might have gotten skewed from the parallel.for loop)
Dt = theRows.CopyToDataTable();
//Set the name so it appears nicely in the Excel Name Box dropdown instead of "table1", "table2", etc etc.
Dt.TableName = ExcelTableSpec.TableTitle + " " + r.TableID;
//cleanup
if (theRows != null)
Array.Clear(theRows, 0, theRows.Length);
theRows = null;
} //if (DataFileColData[0].Count() > 0)

Please see the documentation for (MSDN Data Tables).
The key point is:
Thread Safety
This type is safe for multithreaded read operations. You must
synchronize any write operations.
So it's not i the the ConcurrentDictionary causing your issues.
I've decompiled the NewRow method and there is a call to NewRow(int record). This code clearly shows write operations.
internal DataRow NewRow(int record)
{
if (-1 == record)
record = this.NewRecord(-1);
this.rowBuilder._record = record;
DataRow row = this.NewRowFromBuilder(this.rowBuilder);
this.recordManager[record] = row;
if (this.dataSet != null)
this.DataSet.OnDataRowCreated(row);
return row;
}

How to check duplicate rows in dataset?

I have a dataset which has duplicate rows i want my error message to execute when duplicate rows are present.
Below is my code please help
DataSet dsXml = new DataSet();
dsXml.ReadXml(new XmlTextReader(new StringReader(xml)));
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
foreach (DataRow drow in dsXml.Tables[0].Rows)
{
if (hTable.Contains(drow))
{
duplicateList.Add(drow);
}
else
{
script.Append("alert('Error - There are some Duplicate entries.'); ");
ErrorOcc = true;
if (ErrorOcc)
{
this.ScriptOutput = script + " ValidateBeforeSaving = false;";
this.StayContent = "yes";
return;
}
}
}

Your code is not working, because DataRow instances will be compared by references instead of comparing their fields. You can use custom comparer:
public class CustomDataRowComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
if (x.ItemArray.Length != y.ItemArray.Length)
return false;
for (int i = 0; i < x.ItemArray.Length; i++)
if (!x[i].Equals(y[i]))
return false;
return true;
}
public int GetHashCode(DataRow obj)
{
int hash = 17;
foreach (object field in obj.ItemArray)
hash = hash * 19 + field.GetHashCode();
return hash;
}
}
or use existing DataRowComparer which compares DataRow objects for equivalence by using value-based comparison:
HashSet<DataRow> set = new HashSet<DataRow>(DataRowComparer.Default);
// or: new HashSet<DataRow>(new CustomDataRowComparer());
foreach (DataRow row in dsXml.Tables[0].Rows)
{
if (!set.Add(row))
// duplicate row
}
You can also check if duplicated rows exist with Linq to DataSet query:
var duplicatedRowsExist = dsXml.Tables[0].AsEnumerable()
.GroupBy(r => r, DataRowComparer.Default)
.Any(g => g.Count() > 1);

You have to compare the content of the rows, not the rows themselves. Something like this should do it:
var hasDupes = dsXml.Tables[0].Rows
.AsEnumerable()
.GroupBy(row => new
{
row.Field<string>("Title"),
row.Field<string>("Address"),
row.Field<string>("State"),
row.Field<string>("City"),
row.Field<int>("Status"),
row.Field<int>("CreatedBy"),
row.Field<int>("UpdatedBy")
})
.Where(g => g.Count() > 1)
.Any();
if(hasDupes)
//Show error message

I think you have to alter you logic a little. You don't add the row to the hTable, so there are never duplicates. And I guess you have to show the message in the end, else the list will not be complete yet.
As stated by others, you do need Sergeys answer to get the comparison to work. If you have that covered, this code will solve the other logic problems.
foreach (DataRow drow in dsXml.Tables[0].Rows)
{
if (!hTable.Contains(drow))
{
hTable.Contains(drow);
hTable.Add(drow);
}
else
{
duplicateList.Add(drow);
}
}
script.Append("alert('Error - There are some Duplicate entries.'); ");
ErrorOcc = true;
if (ErrorOcc)
{
this.ScriptOutput = script + " ValidateBeforeSaving = false;";
this.StayContent = "yes";
return;
}

First, you need to define your comparison between rows. It appears when you create your hTable there is nothing in it, so the hTable.Contains call is always going to return false.
As a side note, you can't just compare a DataRow with another DataRow, it will use the default equality comparison (implemented using IEqualityComparer) and effectively boils down to a reference equality check, which none of the rows will be equal to each other.
Somewhere, you can either implement your own IEqualityCompariosn, or simply write a custom method to check the values of each row.

Here is the answer of my above Question
You can Check duplicate rows in dataset.. it is working fine try it.
DataSet dsXml = new DataSet();
dsXml.ReadXml(new XmlTextReader(new StringReader(xml)));
List<string> duplicateList = new List<string>();
foreach (DataRow drow in dsXml.Tables[0].Rows)
{
string strr = "";
for (int j = 0; j < dsXml.Tables[0].Columns.Count; j++ )
{
strr += drow[j];
}
if (!duplicateList.Contains(strr))
{
duplicateList.Add(strr);
}
else
{
script.Append("alert('Error - There are some Duplicate entries.'); ");
ErrorOcc = true;
if (ErrorOcc)
{
this.ScriptOutput = script + " ValidateBeforeSaving = false;";
this.StayContent = "yes";
return;
}
}
}

How to find rowindex of a datatable into another datatable?

I am using C#. I have two data tables and i want to find the rows of first data table into second data table.
Example.
First data table's data:
1 inam
2 sohan
Second data tables's data:
3 ranjan
1 inam
2 sohan
Now i want to know the index of first two rows of first data table into second data table.
Please help guys.
Any answer or advice
Best Regards

You can use following extension method which returns the first index of a "sub-sequence":
// I've used String.Join to get something that is comparable easily
// from the ItemArray that is the object-array of all fields
IEnumerable<string> first = table1.AsEnumerable()
.Select(r => string.Join(",",r.ItemArray)); //
IEnumerable<string> second = table2.AsEnumerable()
.Select(r => string.Join(",", r.ItemArray));
int index = second.IndexOfSequence(first, null); // 1
Here the extension:
public static int IndexOfSequence<TSource>(this IEnumerable<TSource> input, IEnumerable<TSource> sequence, IEqualityComparer<TSource> comparer)
{
if (input == null) throw new ArgumentNullException("input");
if (sequence == null) throw new ArgumentNullException("sequence");
if (!sequence.Any()) throw new ArgumentException("Sequence must not be empty", "sequence");
if (comparer == null)
{
comparer = EqualityComparer<TSource>.Default;
}
int index = -1;
int firstIndex = -1;
bool found = false;
TSource first = sequence.First();
using (IEnumerator<TSource> enumerator = input.GetEnumerator())
{
using (IEnumerator<TSource> enumerator2 = sequence.GetEnumerator())
{
enumerator2.MoveNext();
while (enumerator.MoveNext())
{
index++;
found = comparer.Equals(enumerator.Current, enumerator2.Current);
if (found && firstIndex == -1) firstIndex = index;
if (found && !enumerator2.MoveNext())
return firstIndex;
}
}
}
return -1;
}
tested with this sample data:
var table1 = new DataTable();
table1.Columns.Add("ID", typeof(int));
table1.Columns.Add("Name");
var table2 = table1.Clone();
table1.Rows.Add(1, "inam");
table1.Rows.Add(2, "Sohan");
table2.Rows.Add(3, "ranjan");
table2.Rows.Add(1, "inam");
table2.Rows.Add(2, "Sohan");

If you don't have much volume this might work....
var tableOneIndex = -1;
var tableTwoIndex = -1;
foreach (var tableOneRow in tableOne.Rows)
{
tableOneIndex++;
foreach (var tableTwoRow in tableTwo.Rows)
{
tableTwoIndex++;
if (tableOneRow["name"].ToString() == tableTwoRow["name"].ToString())
{
// Do whatever you wanted to do with the index values
}
}
}

As a simple solution, this should suffice:
// Create and populate data tables
DataTable dataTable1 = new DataTable();
dataTable1.Columns.Add("Name", typeof(string));
DataRow row1 = dataTable1.NewRow();
row1["Name"] = "Inam";
DataRow row2 = dataTable1.NewRow();
row2["Name"] = "Sohan";
dataTable1.Rows.Add(row1);
dataTable1.Rows.Add(row2);
DataTable dataTable2 = new DataTable();
dataTable2.Columns.Add("Name", typeof(string));
DataRow row3 = dataTable2.NewRow();
row3["Name"] = "Ranjan";
DataRow row4 = dataTable2.NewRow();
row4["Name"] = "Inam";
DataRow row5 = dataTable2.NewRow();
row5["Name"] = "Sohan";
dataTable2.Rows.Add(row3);
dataTable2.Rows.Add(row4);
dataTable2.Rows.Add(row5);
// Loop through rows in first table
foreach (DataRow row in dataTable1.Rows)
{
int rowIndexInSecondTable = 0;
// Loop through rows in second table
for (int i = 0; i < dataTable2.Rows.Count; i++)
{
// Check if the column values are the same
if (row["Name"] == dataTable2.Rows[i]["Name"])
{
// Set the current index and break to stop further processing
rowIndexInSecondTable = i;
break;
}
}
// The index of the row in the second table is now stored in the rowIndexInSecondTable variable, use it as needed, for example, writing to the console
Console.WriteLine("Row with name '" + row["Name"] + "' found at index " + rowIndexInSecondTable.ToString());
}

Remove all rows in duplication (different from distinct row selection)

How can I remove EVERY duplicating row in a DataTable, based on the value of two columns that are in duplication. Unfortunately, I am unable to find the equivalent LINQ Query. (I dont want distinct values even). The table below shall explain my problem
I want to delete every row in duplication based on Column_A and Column_B
COLUMN_A COLUMN_B COLUMN_C COLUMN_D.....
A B
C D
E F
G H
A B
E F
EXPECTED OUTPUT:
COLUMN_A COLUMN_B COLUMN_C COLUMN_D.....
C D
G H
Please help

var rowsToDelete = dataTable.AsEnumerable()
.GroupBy(r => new{A=r["COLUMN_A"],B=r["COLUMN_B"]})
.Where(g => g.Count() > 1)
.SelectMany(g=>g)
.ToList();
foreach (var row in rowsToDelete)
{
dataTable.Rows.Remove(row);
}

You can try with this sample
Link : http://geekswithblogs.net/ajohns/archive/2004/06/24/7191.aspx
Coding
private static void RemoveDuplicates(DataTable tbl,
DataColumn[] keyColumns)
{
int rowNdx = 0;
while(rowNdx < tbl.Rows.Count-1)
{
DataRow[] dups = FindDups(tbl, rowNdx, keyColumns);
if(dups.Length>0)
{
foreach(DataRow dup in dups)
{
tbl.Rows.Remove(dup);
}
}
else
{
rowNdx++;
}
}
}
private static DataRow[] FindDups(DataTable tbl,
int sourceNdx,
DataColumn[] keyColumns)
{
ArrayList retVal = new ArrayList();
DataRow sourceRow = tbl.Rows[sourceNdx];
for(int i=sourceNdx + 1; i<tbl.Rows.Count; i++)
{
DataRow targetRow = tbl.Rows[i];
if(IsDup(sourceRow, targetRow, keyColumns))
{
retVal.Add(targetRow);
}
}
return (DataRow[]) retVal.ToArray(typeof(DataRow));
}
private static bool IsDup(DataRow sourceRow,
DataRow targetRow,
DataColumn[] keyColumns)
{
bool retVal = true;
foreach(DataColumn column in keyColumns)
{
retVal = retVal && sourceRow[column].Equals(targetRow[column]);
if(!retVal) break;
}
return retVal;
}
Test
// create an example datatable with duplicate rows
DataTable tbl = new DataTable();
tbl.Columns.Add("ColumnA");
tbl.Columns.Add("ColumnB");
tbl.Columns.Add("ColumnC");
for(int i = 0; i<10; i++)
{
DataRow nr = tbl.NewRow();
nr["ColumnA"] = "A" + i.ToString();
nr["ColumnB"] = "B" + i.ToString();
nr["ColumnC"] = "C" + i.ToString();
tbl.Rows.Add(nr);
// duplicate
nr = tbl.NewRow();
nr["ColumnA"] = "A" + i.ToString();
nr["ColumnB"] = "B" + i.ToString();
nr["ColumnC"] = "C" + i.ToString();
tbl.Rows.Add(nr);
}
PrintRows(tbl); // show table with duplicates
//Create an array of DataColumns to compare
//If these columns all match we consider the
//rows duplicate.
DataColumn[] keyColumns =
new DataColumn[]{tbl.Columns["ColumnA"],
tbl.Columns["ColumnA"]};
//remove the duplicates
RemoveDuplicates(tbl, keyColumns);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Removal of Duplicate Rows from Data table Based on Multiple columns - c#

Method 3: Instead of Trying above 2 methods i found this Linq Method something very useful dt_Barcode = dt_Barcode.AsEnumerable().GroupBy(r => new { ItemID = r.Field<Int64>("ItemID"), PacktypeId = r.Field<Int32>("PackTypeID") }).Select(g => g.First()).CopyToDataTable();

It happens because you remove rows. If you want to preserve the same algorithm, instead of using while (rowIndex < i) use this form of loop: for (var rowIndex = rows.Count - 1; rowIndex >= 0; rowIndex--) { ... if (uniqueness.Contains(sb.ToString())) { rows.Remove(row); rowIndex--; } ... }

Related

Row already belongs to another table error when trying to add rows?

Ordering a ConcurrentDictionary. Why is this not working?

How to check duplicate rows in dataset?

How to find rowindex of a datatable into another datatable?

Remove all rows in duplication (different from distinct row selection)

Categories

Resources