C# Method will not strip out duplicates. [duplicate] - c#

What is the best way to remove duplicate entries from a Data Table?

Do dtEmp on your current working DataTable:
DataTable distinctTable = dtEmp.DefaultView.ToTable( /*distinct*/ true);
It's nice.

Remove Duplicates
public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}
Here Links below
http://www.dotnetspider.com/resources/4535-Remove-duplicate-records-from-table.aspx
http://www.dotnetspark.com/kb/94-remove-duplicate-rows-value-from-datatable.aspx
For remove duplicates in column
http://dotnetguts.blogspot.com/2007/02/removing-duplicate-records-from.html

A simple way would be:
var newDt= dt.AsEnumerable()
.GroupBy(x => x.Field<int>("ColumnName"))
.Select(y => y.First())
.CopyToDataTable();

This post is regarding fetching only Distincts rows from Data table on basis of multiple Columns.
Public coid removeDuplicatesRows(DataTable dt)
{
DataTable uniqueCols = dt.DefaultView.ToTable(true, "RNORFQNo", "ManufacturerPartNo", "RNORFQId", "ItemId", "RNONo", "Quantity", "NSNNo", "UOMName", "MOQ", "ItemDescription");
}
You need to call this method and you need to assign value to datatable.
In Above code we have RNORFQNo , PartNo,RFQ id,ItemId, RNONo, QUantity, NSNNO, UOMName,MOQ, and Item Description as Column on which we want distinct values.

Heres a easy and fast way using AsEnumerable().Distinct()
private DataTable RemoveDuplicatesRecords(DataTable dt)
{
//Returns just 5 unique rows
var UniqueRows = dt.AsEnumerable().Distinct(DataRowComparer.Default);
DataTable dt2 = UniqueRows.CopyToDataTable();
return dt2;
}

/* To eliminate Duplicate rows */
private void RemoveDuplicates(DataTable dt)
{
if (dt.Rows.Count > 0)
{
for (int i = dt.Rows.Count - 1; i >= 0; i--)
{
if (i == 0)
{
break;
}
for (int j = i - 1; j >= 0; j--)
{
if (Convert.ToInt32(dt.Rows[i]["ID"]) == Convert.ToInt32(dt.Rows[j]["ID"]) && dt.Rows[i]["Name"].ToString() == dt.Rows[j]["Name"].ToString())
{
dt.Rows[i].Delete();
break;
}
}
}
dt.AcceptChanges();
}
}

There is a simple way using Linq GroupBy Method.
var duplicateValues = dt.AsEnumerable()
.GroupBy(row => row[0])
.Where(group => (group.Count() == 1 || group.Count() > 1))
.Select(g => g.Key);
foreach (var d in duplicateValues)
Console.WriteLine(d);
See more at: http://www.dotnetlines.com/Blogs/tabid/85/EntryId/49/Remove-duplicate-rows-from-a-DataTable-using-LINQ.aspx

Completely distinct rows:
public static DataTable Dictinct(this dt) => dt.DefaultView.ToTable(true);
Distinct by particular row(s) (Note that the columns mentioned in "distinctCulumnNames" will be returned in resulting DataTable):
public static DataTable Dictinct(this dt, params string[] distinctColumnNames) =>
dt.DefaultView.ToTable(true, distinctColumnNames);
Distinct by particular column (preserves all columns in given DataTable):
public static void Distinct(this DataTable dataTable, string distinctColumnName)
{
var distinctResult = new DataTable();
distinctResult.Merge(
.GroupBy(row => row.Field<object>(distinctColumnName))
.Select(group => group.First())
.CopyToDataTable()
);
if (distinctResult.DefaultView.Count < dataTable.DefaultView.Count)
{
dataTable.Clear();
dataTable.Merge(distinctResult);
dataTable.AcceptChanges();
}
}

You can use the DefaultView.ToTable method of a DataTable to do the filtering like this (adapt to C#):
Public Sub RemoveDuplicateRows(ByRef rDataTable As DataTable)
Dim pNewDataTable As DataTable
Dim pCurrentRowCopy As DataRow
Dim pColumnList As New List(Of String)
Dim pColumn As DataColumn
'Build column list
For Each pColumn In rDataTable.Columns
pColumnList.Add(pColumn.ColumnName)
Next
'Filter by all columns
pNewDataTable = rDataTable.DefaultView.ToTable(True, pColumnList.ToArray)
rDataTable = rDataTable.Clone
'Import rows into original table structure
For Each pCurrentRowCopy In pNewDataTable.Rows
rDataTable.ImportRow(pCurrentRowCopy)
Next
End Sub

In order to distinct all datatable columns, you can easily retrieve the names of the columns in a string array
public static DataTable RemoveDuplicateRows(this DataTable dataTable)
{
List<string> columnNames = new List<string>();
foreach (DataColumn col in dataTable.Columns)
{
columnNames.Add(col.ColumnName);
}
return dataTable.DefaultView.ToTable(true, columnNames.Select(c => c.ToString()).ToArray());
}
As you can notice, I thought of using it as an extension to DataTable class

I would prefer this as this is faster than DefaultView.ToTable and foreach loop to remove duplicates. Using this, we can have group by on multiple columns as well.
DataTable distinctDT = (from rows in dt.AsEnumerable()
group rows by new { ColA = rows["ColA"], ColB = rows["ColB"]} into grp
select grp.First()).CopyToDataTable();

Related

Remove duplicate rows from DataTable with condition - C#

I have a datatable as below
StudentID Marks
AAA NULL
AAA 100
BBB 200
I have to remove the row from datatable by checking studentID in a condition that
If there are same studentID then remove row with NULL value and display only student id with value.
If both marks are NULL of that student then show only one row.
Resulted Datatable should be
StudentID Marks
AAA 100
BBB 200
I have tried to remove duplicate rows from above table using below function
public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName])&& drow["Marks"]==null)
{
duplicateList.Add(drow);
}
else
{
hTable.Add(drow[colName], string.Empty);
}
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}
DataTable datatabble = new DataTable();
datatabble.Columns.Add("studentid", typeof(string));
datatabble.Columns.Add("marks", typeof(int));
datatabble.Rows.Add("AAA");
datatabble.Rows.Add("AAA",100);
datatabble.Rows.Add("BBB",200);
var duplicates = datatabble.AsEnumerable().GroupBy(r => r[0]).Where(gr => gr.Count() > 1)
.Select(dupl => dupl.Key).ToList();
var result = datatabble.AsEnumerable().Where(x =>
(
(duplicates.Contains(x[0]) && !string.IsNullOrEmpty(x[1].ToString()))
|| !duplicates.Contains(x[0])
)
).ToList();
Output: You can see output with 2 row count with filtered null value.

c# - dynamically generate linq

I need to dynamically generate a query to access certain columns from a datatable.
string cols="row[Id],row[UserId], row[Code]";
var result= (from DataRow row in dt.Rows
select cols);
but this only returns "row[Id],row[UserId], row[Code]". How can I access the values in those columns?
I doubt this problem can be solved elegantly with a linq-based solution. It can be solved pretty easily using a loop and by accessing the column of the DataRow using the Item property.
public IEnumerable<object[]> GetValues(IList<string> columns, DataTable dt) {
foreach (var row in dt.Rows) {
var rowResult = new object[columns.Count];
for (var col = 0; col < columns.Count; col++) {
rowResult[col] = row.Item[columns[col]];
}
yield return rowResult;
}
}
Why not putting it in a dictionary? Dictionary<string, object> the key is the column name and the value is the value of the column.
string[] cols = new string[] { "Id", "UserId", "Code" };
var result = (from DataRow row in dt.Rows
select cols.ToDictionary(c => c, c => row[c]));

Find a string in all DataTable columns

I am trying to find a fast way to find a string in all datatable columns!
Followed is not working as I want to search within all columns value.
string str = "%whatever%";
foreach (DataRow row in dataTable.Rows)
foreach (DataColumn col in row.ItemArray)
if (row[col].ToString() == str) return true;
You can use LINQ. It wouldn't be any faster, because you still need to look at each cell in case the value is not there, but it will fit in a single line:
return dataTable
.Rows
.Cast<DataRow>()
.Any(r => r.ItemArray.Any(c => c.ToString().Contains("whatever")));
For searching for random text and returning an array of rows with at least one cell that has a case-insensitive match, use this:
var text = "whatever";
return dataTable
.Rows
.Cast<DataRow>()
.Where(r => r.ItemArray.Any(
c => c.ToString().IndexOf(text, StringComparison.OrdinalIgnoreCase) > 0
)).ToArray();
If you want to check every row of every column in your Datatable, try this (it works for me!).
DataTable YourTable = new DataTable();
// Fill your DataTable here with whatever you've got.
foreach (DataRow row in YourTable.Rows)
{
foreach (object item in row.ItemArray)
{
//Do what ya gotta do with that information here!
}
}
Don't forget to typecast object item to whatever you need (string, int etc).
I've stepped through with the debugger and it works a charm. I hope this helps, and good luck!
This can be achieved by filtering. Create a (re-usable) filtering string based on all the columns:
bool UseContains = false;
int colCount = MyDataTable.Columns.Count;
string likeStatement = (UseContains) ? " Like '%{0}%'" : " Like '{0}%'";
for (int i = 0; i < colCount; i++)
{
string colName = MyDataTable.Columns[i].ColumnName;
query.Append(string.Concat("Convert(", colName, ", 'System.String')", likeStatement));
if (i != colCount - 1)
query.Append(" OR ");
}
filterString = query.ToString();
Now you can get the rows where one of the columns matches your searchstring:
string currFilter = string.Format(filterString, searchText);
DataRow[] tmpRows = MyDataTable.Select(currFilter, somethingToOrderBy);
You can create a routine of search with an array of strings with the names of the columns, as well:
string[] elems = {"GUID", "CODE", "NAME", "DESCRIPTION"};//Names of the columns
foreach(string column in elems)
{
string expression = string.Format("{0} like '%{1}%'",column,
txtSearch.Text.Trim());//Search Expression
DataRow[] row = data.Select(expression);
if(row.Length > 0) {
// Some code here
} else {
// Other code here
}
}
You can get names of columns by using ColmunName Method. Then, you can search every column in DataTable by using them. For example, follwing code will work.
string str = "whatever";
foreach (DataRow row in dataTable.Rows)
{
foreach (DataColumn column in dataTable.Columns)
{
if (row[column.ColumnName.ToString()].ToString().Contains(str))
{
return true;
}
}
}
You can create a filter expression on the datatable as well. See this MSDN article. Use like in your filter expression.
string filterExp = "Status = 'Active'";
string sortExp = "City";
DataRow[] drarray;
drarray = dataSet1.Customers.Select(filterExp, sortExp, DataViewRowState.CurrentRows);
for (int i=0; i < drarray.Length; i++)
{
listBox1.Items.Add(drarray[i]["City"].ToString());
}

Update a DataTable in C# without using a loop?

Let suppose there are three columns in my DataTable
code
name
color
If I know the code and name, how can I update the color of that specific row whose code and name match my criteria? I want to do this without using Loops!
You can use LINQ:
DataRow dr = datatable.AsEnumerable().Where(r => ((string)r["code"]).Equals(someCode) && ((string)r["name"]).Equals(someName)).First();
dr["color"] = someColor;
Of course I'm assuming all those criteria are strings. You should change the casts to the correct types.
// Use the Select method to find all rows matching the name and code.
DataRow[] rows = myDataTable.Select("name 'nameValue' AND code = 'codeValue');
for(int i = 0; i < rows.Length; i ++)
{
rows[i]["color"] = colorValue;
}
DataTable recTable = new DataTable();
// do stuff to populate table
recTable.Select(string.Format("[code] = '{0}' and [name] = '{1}'", someCode, someName)).ToList<DataRow>().ForEach(r => r["Color"] = colorValue);
With LINQ:
var dataRows = dt.AsEnumerable().Select(c => { c["color"] = c["Code"].ToString() == "1" ? "Red" : "White"; return c; });
dt = dataRows.CopyToDataTable();
You could do:
foreach (DataRow row in datatable.Rows)
{
if(row["code"].ToString() == someCode && row["name"].ToString() == someName)
{
row["color"] = someColor;
}
}

How do I remove duplicates from a datatable altogether based on a column's value?

I have 3 columns in a DataTable
Id Name Count
1 James 4345
2 Kristen 89231
3 James 599
4 Suneel 317113
I need rows 1 and 3 gone, and the new datatable returning only rows 2 and 4. I found a really good related question in the suggestions on SO--this guy. But his solution uses hashtables, and only eliminates row 3, not both 1 and 3. Help!
I tried this Remove duplicates from a datatable..
using System.Data;
using System.Linq;
...
//assuming 'ds' is your DataSet
//and that ds has only one DataTable, therefore that table's index is '0'
DataTable dt = ds.Tables[0];
DataView dv = new DataView(dt);
string cols = string.Empty;
foreach (DataColumn col in dt.Columns)
{
if (!string.IsNullOrEmpty(cols)) cols += ",";
cols += col.ColumnName;
}
dt = dv.ToTable(true, cols.Split(','));
ds.Tables.RemoveAt(0);
ds.Tables.Add(dt);
Following single line of code will avoid the duplicate rows.
ds.Tables["Employee"].DefaultView.ToTable(true,"Name");
ds – Dataset object
dt.DefaultView.ToTable( true, "Name");
dt – DataTable object
How about something like this;
Pseudo code:
Assuming the object has 3 properties: [Id, Name, Value] and called NameObjects and is IEnumerable (List NameObjects;)
var _newNameObjectList = new List<NameObject>();
foreach(var nameObject in NameObjecs)
{
if(_newNameObjectList.Select(x => x.Name == nameObject.Name).ToList().Count > 0)
{
_newNameObjectList.RemoveAll(x => x.Name == nameObject.Name);
continue;
}
else
{
_newNameObjectList.Add(nameObject);
}
}
This should work. This uses the namespace System.Linq;
Okay, so I looked at the blog pointed out to me by Pandiya. In the comments section, a chap called Kevin Morris has posted a solution using a C# Dictionary, which worked for me.
In my main block, I wrote:
string keyColumn = "Website";
RemoveDuplicates(table1, keyColumn);
And my RemoveDuplicates function was defined as:
private void RemoveDuplicates(DataTable table1, string keyColumn)
{
Dictionary<string, string> uniquenessDict = new Dictionary<string, string>(table1.Rows.Count);
StringBuilder sb = null;
int rowIndex = 0;
DataRow row;
DataRowCollection rows = table1.Rows;
while (rowIndex < rows.Count - 1)
{
row = rows[rowIndex];
sb = new StringBuilder();
sb.Append(((string)row[keyColumn]));
if (uniquenessDict.ContainsKey(sb.ToString()))
{
rows.Remove(row);
if (RemoveAllDupes)
{
row = rows[rowIndex - 1];
rows.Remove(row);
}
}
else
{
uniquenessDict.Add(sb.ToString(), string.Empty);
rowIndex++;
}
}
}
If you go to the blog, you will find a more generic function that allows sniffing dupes over multiple columns. I've added a flag--RemoveAllDupes--in case I want to remove all duplicate rows, but this still assumes that the rows are ordered by name, and involves only duplicates and not triplicates, quadruplicates and so on. If anyone can, please update this code to reflect removal of such.

Categories