Remove duplicate rows from DataTable with condition - C# - c#

I have a datatable as below
StudentID Marks
AAA NULL
AAA 100
BBB 200
I have to remove the row from datatable by checking studentID in a condition that
If there are same studentID then remove row with NULL value and display only student id with value.
If both marks are NULL of that student then show only one row.
Resulted Datatable should be
StudentID Marks
AAA 100
BBB 200
I have tried to remove duplicate rows from above table using below function
public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName])&& drow["Marks"]==null)
{
duplicateList.Add(drow);
}
else
{
hTable.Add(drow[colName], string.Empty);
}
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}

DataTable datatabble = new DataTable();
datatabble.Columns.Add("studentid", typeof(string));
datatabble.Columns.Add("marks", typeof(int));
datatabble.Rows.Add("AAA");
datatabble.Rows.Add("AAA",100);
datatabble.Rows.Add("BBB",200);
var duplicates = datatabble.AsEnumerable().GroupBy(r => r[0]).Where(gr => gr.Count() > 1)
.Select(dupl => dupl.Key).ToList();
var result = datatabble.AsEnumerable().Where(x =>
(
(duplicates.Contains(x[0]) && !string.IsNullOrEmpty(x[1].ToString()))
|| !duplicates.Contains(x[0])
)
).ToList();
Output: You can see output with 2 row count with filtered null value.

Related

DataTable customization with records

I Have 1 Datatable having 10 rows and ListBox having 8 ListItems contains 6 records from the DataTable and 2 new records.
I want to update the DataTable in such a way that 6 records should be as it is and remove remaining 4 records from DataTable and add 2 newly added entries from ListBox in DataTable.
What I tried is I looped ListBox record from DataTable and created list of matched records.
string impactedTC;
List<int> index = new List<int>();
// This retruns my dataset having 10 records
DataTable dttable = GetImpactedTestCaseDetailsToUpdateStatus().Tables[0];
for (int i = 0; i < ListBox1.Items.Count; i++)
{
int count = 0;
string dTestCase = ListBox1.Items[i].Text;
foreach (DataRow dtRow in dttable.Rows)
{
impactedTC = dtRow["TestCaseName"].ToString();
if (impactedTC == dTestCase)
{
index.Add(count);
}
count++;
}
}
You can do that using Ling:
To keep the 6 rows and remove the remaining 4 from the DataTable:
//Assuming the names are DataTable1 and ListBox1.
var rowsToRemove = from r in DataTable1.Rows.Cast<DataRow>()
where listBox1.Items
.Cast<ListItem>()
.Aggregate(0, (n, li) => li.Text.ToLower() == r.Field<string>("TestCaseName").ToLower() ? n + 1 : n) == 0
select r;
To get the new items from the ListBox:
var newItems = from li in listBox1.Items.Cast<ListItem>()
where DataTable1.Rows
.Cast<DataRow>()
.Aggregate(0, (n, r) => r.Field<string>("TestCaseName").ToLower() == li.Text.ToLower() ? n + 1 : n) == 0
select li;
and finally update the DataTable:
rowsToRemove.ToList().ForEach(r => DataTable1.Rows.Remove(r));
newItems.ToList().ForEach(li => DataTable1.Rows.Add(li.Text)); //or maybe li.Value
Important
You might need to replace any li.Text with li.Value in the preceding code and that depends on how the ListItem objects are created. Please check this for more details.

C# Method will not strip out duplicates. [duplicate]

What is the best way to remove duplicate entries from a Data Table?
Do dtEmp on your current working DataTable:
DataTable distinctTable = dtEmp.DefaultView.ToTable( /*distinct*/ true);
It's nice.
Remove Duplicates
public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}
Here Links below
http://www.dotnetspider.com/resources/4535-Remove-duplicate-records-from-table.aspx
http://www.dotnetspark.com/kb/94-remove-duplicate-rows-value-from-datatable.aspx
For remove duplicates in column
http://dotnetguts.blogspot.com/2007/02/removing-duplicate-records-from.html
A simple way would be:
var newDt= dt.AsEnumerable()
.GroupBy(x => x.Field<int>("ColumnName"))
.Select(y => y.First())
.CopyToDataTable();
This post is regarding fetching only Distincts rows from Data table on basis of multiple Columns.
Public coid removeDuplicatesRows(DataTable dt)
{
DataTable uniqueCols = dt.DefaultView.ToTable(true, "RNORFQNo", "ManufacturerPartNo", "RNORFQId", "ItemId", "RNONo", "Quantity", "NSNNo", "UOMName", "MOQ", "ItemDescription");
}
You need to call this method and you need to assign value to datatable.
In Above code we have RNORFQNo , PartNo,RFQ id,ItemId, RNONo, QUantity, NSNNO, UOMName,MOQ, and Item Description as Column on which we want distinct values.
Heres a easy and fast way using AsEnumerable().Distinct()
private DataTable RemoveDuplicatesRecords(DataTable dt)
{
//Returns just 5 unique rows
var UniqueRows = dt.AsEnumerable().Distinct(DataRowComparer.Default);
DataTable dt2 = UniqueRows.CopyToDataTable();
return dt2;
}
/* To eliminate Duplicate rows */
private void RemoveDuplicates(DataTable dt)
{
if (dt.Rows.Count > 0)
{
for (int i = dt.Rows.Count - 1; i >= 0; i--)
{
if (i == 0)
{
break;
}
for (int j = i - 1; j >= 0; j--)
{
if (Convert.ToInt32(dt.Rows[i]["ID"]) == Convert.ToInt32(dt.Rows[j]["ID"]) && dt.Rows[i]["Name"].ToString() == dt.Rows[j]["Name"].ToString())
{
dt.Rows[i].Delete();
break;
}
}
}
dt.AcceptChanges();
}
}
There is a simple way using Linq GroupBy Method.
var duplicateValues = dt.AsEnumerable()
.GroupBy(row => row[0])
.Where(group => (group.Count() == 1 || group.Count() > 1))
.Select(g => g.Key);
foreach (var d in duplicateValues)
Console.WriteLine(d);
See more at: http://www.dotnetlines.com/Blogs/tabid/85/EntryId/49/Remove-duplicate-rows-from-a-DataTable-using-LINQ.aspx
Completely distinct rows:
public static DataTable Dictinct(this dt) => dt.DefaultView.ToTable(true);
Distinct by particular row(s) (Note that the columns mentioned in "distinctCulumnNames" will be returned in resulting DataTable):
public static DataTable Dictinct(this dt, params string[] distinctColumnNames) =>
dt.DefaultView.ToTable(true, distinctColumnNames);
Distinct by particular column (preserves all columns in given DataTable):
public static void Distinct(this DataTable dataTable, string distinctColumnName)
{
var distinctResult = new DataTable();
distinctResult.Merge(
.GroupBy(row => row.Field<object>(distinctColumnName))
.Select(group => group.First())
.CopyToDataTable()
);
if (distinctResult.DefaultView.Count < dataTable.DefaultView.Count)
{
dataTable.Clear();
dataTable.Merge(distinctResult);
dataTable.AcceptChanges();
}
}
You can use the DefaultView.ToTable method of a DataTable to do the filtering like this (adapt to C#):
Public Sub RemoveDuplicateRows(ByRef rDataTable As DataTable)
Dim pNewDataTable As DataTable
Dim pCurrentRowCopy As DataRow
Dim pColumnList As New List(Of String)
Dim pColumn As DataColumn
'Build column list
For Each pColumn In rDataTable.Columns
pColumnList.Add(pColumn.ColumnName)
Next
'Filter by all columns
pNewDataTable = rDataTable.DefaultView.ToTable(True, pColumnList.ToArray)
rDataTable = rDataTable.Clone
'Import rows into original table structure
For Each pCurrentRowCopy In pNewDataTable.Rows
rDataTable.ImportRow(pCurrentRowCopy)
Next
End Sub
In order to distinct all datatable columns, you can easily retrieve the names of the columns in a string array
public static DataTable RemoveDuplicateRows(this DataTable dataTable)
{
List<string> columnNames = new List<string>();
foreach (DataColumn col in dataTable.Columns)
{
columnNames.Add(col.ColumnName);
}
return dataTable.DefaultView.ToTable(true, columnNames.Select(c => c.ToString()).ToArray());
}
As you can notice, I thought of using it as an extension to DataTable class
I would prefer this as this is faster than DefaultView.ToTable and foreach loop to remove duplicates. Using this, we can have group by on multiple columns as well.
DataTable distinctDT = (from rows in dt.AsEnumerable()
group rows by new { ColA = rows["ColA"], ColB = rows["ColB"]} into grp
select grp.First()).CopyToDataTable();

Remove Column from Dataset having same row values

ID Name Value
......................
1 aa 123
2 bb 123
3 cd 123
Wanted to remove column "value" which has all the row values equal to 123 from Dataset using linq
If you want to delet the whole column if all values are the same use Enumerable.All, for example in:
foreach(DataTable dt in ds.Tables)
{
if(dt.Rows.Count > 0 && dt.Columns.Contains("Value") && dt.Columns["Value"].DataType == typeof(int))
{
int firstValue = dt.Rows[0].Field<int>("Value");
if(dt.AsEnumerable().Skip(1).All(r => r.Field<int>("Value") == firstValue))
{
dt.Columns.Remove("Value");
}
}
}
Update: "wanted to find and delete the column were all the values are same in that column."
Then you just have to generalize above code:
foreach (DataTable dt in ds.Tables)
{
List<DataColumn> columnsToDelete = new List<DataColumn>();
foreach (DataColumn col in dt.Columns)
{
object first = dt.Rows[0][col];
if (dt.AsEnumerable().Skip(1).All(r => r[col].Equals(first)))
{
columnsToDelete.Add(col);
}
}
foreach (DataColumn colToRemove in columnsToDelete)
dt.Columns.Remove(colToRemove);
}
Here return in list of datarow
ds.Tables["tableName"].AsEnumerable().Where(g => g.Field<Int32>("Value") != 123).ToList<DataRow>();
Worked with the solution given by SLaks https://stackoverflow.com/a/1766950/848286
foreach (var column in ds.Tables[0].Columns.Cast<DataColumn>().ToArray())
{
if (ds.Tables[0].AsEnumerable().All(dr => dr.IsNull(column)))
{
ds.Tables[0].Columns.Remove(column);
}
}

How to sum columns in a dataTable?

How can I get a sum for all the columns in a datatable? Say I had the following table. How can I calculate the "total" row? It should be easy to add total row to a datatable.
Columns hits uniques sigups, etc...
Rows
1 12 1 23
2 1 0 5
3 6 2 9
total 19 3 37
Update
I ended up with this. It was the only thing I could get to work.
For Each col As DataColumn In TotalsTable.Columns
If col.DataType.Name = "DateTime" Then
count = count + 1
Continue For
End If
Dim colTotal As Double = 0
Dim value As Double
For Each row As DataRow In TotalsTable.Rows
If Double.TryParse(row(col), value) Then
colTotal += Double.Parse(row(col))
End If
Next
totalRow(count) = colTotal
count = count + 1
Next
There is also a way to do this without loops using the DataTable.Compute Method. The following example comes from that page. You can see that the code used is pretty simple.:
private void ComputeBySalesSalesID(DataSet dataSet)
{
// Presumes a DataTable named "Orders" that has a column named "Total."
DataTable table;
table = dataSet.Tables["Orders"];
// Declare an object variable.
object sumObject;
sumObject = table.Compute("Sum(Total)", "EmpID = 5");
}
I must add that if you do not need to filter the results, you can always pass an empty string:
sumObject = table.Compute("Sum(Total)", "")
Try this:
DataTable dt = new DataTable();
int sum = 0;
foreach (DataRow dr in dt.Rows)
{
foreach (DataColumn dc in dt.Columns)
{
sum += (int)dr[dc];
}
}
I doubt that this is what you want but your question is a little bit vague
Dim totalCount As Int32 = DataTable1.Columns.Count * DataTable1.Rows.Count
If all your columns are numeric-columns you might want this:
You could use DataTable.Compute to Sum all values in the column.
Dim totalCount As Double
For Each col As DataColumn In DataTable1.Columns
totalCount += Double.Parse(DataTable1.Compute(String.Format("SUM({0})", col.ColumnName), Nothing).ToString)
Next
After you've edited your question and added more informations, this should work:
Dim totalRow = DataTable1.NewRow
For Each col As DataColumn In DataTable1.Columns
totalRow(col.ColumnName) = Double.Parse(DataTable1.Compute("SUM(" & col.ColumnName & ")", Nothing).ToString)
Next
DataTable1.Rows.Add(totalRow)
You can loop through the DataColumn and DataRow collections in your DataTable:
// Sum rows.
foreach (DataRow row in dt.Rows) {
int rowTotal = 0;
foreach (DataColumn col in row.Table.Columns) {
Console.WriteLine(row[col]);
rowTotal += Int32.Parse(row[col].ToString());
}
Console.WriteLine("row total: {0}", rowTotal);
}
// Sum columns.
foreach (DataColumn col in dt.Columns) {
int colTotal = 0;
foreach (DataRow row in col.Table.Rows) {
Console.WriteLine(row[col]);
colTotal += Int32.Parse(row[col].ToString());
}
Console.WriteLine("column total: {0}", colTotal);
}
Beware: The code above does not do any sort of checking before casting an object to an int.
EDIT: add a DataRow displaying the column sums
Try this to create a new row to display your column sums:
DataRow totalsRow = dt.NewRow();
foreach (DataColumn col in dt.Columns) {
int colTotal = 0;
foreach (DataRow row in col.Table.Rows) {
colTotal += Int32.Parse(row[col].ToString());
}
totalsRow[col.ColumnName] = colTotal;
}
dt.Rows.Add(totalsRow);
This approach is fine if the data type of any of your DataTable's DataRows are non-numeric or if you want to inspect the value of each cell as you sum. Otherwise I believe #Tim's response using DataTable.Compute is a better.
It's a pity to use .NET and not use collections and lambda to save your time and code lines
This is an example of how this works:
Transform yourDataTable to Enumerable, filter it if you want , according a "FILTER_ROWS_FIELD" column, and if you want, group your data by a "A_GROUP_BY_FIELD".
Then get the count, the sum, or whatever you wish.
If you want a count and a sum without grouby don't group the data
var groupedData = from b in yourDataTable.AsEnumerable().Where(r=>r.Field<int>("FILTER_ROWS_FIELD").Equals(9999))
group b by b.Field<string>("A_GROUP_BY_FIELD") into g
select new
{
tag = g.Key,
count = g.Count(),
sum = g.Sum(c => c.Field<double>("rvMoney"))
};
for (int i=0;i<=dtB.Columns.Count-1;i++)
{
array(0, i) = dtB.Compute("SUM([" & dtB.Columns(i).ColumnName & "])", "")
}

How do I remove duplicates from a datatable altogether based on a column's value?

I have 3 columns in a DataTable
Id Name Count
1 James 4345
2 Kristen 89231
3 James 599
4 Suneel 317113
I need rows 1 and 3 gone, and the new datatable returning only rows 2 and 4. I found a really good related question in the suggestions on SO--this guy. But his solution uses hashtables, and only eliminates row 3, not both 1 and 3. Help!
I tried this Remove duplicates from a datatable..
using System.Data;
using System.Linq;
...
//assuming 'ds' is your DataSet
//and that ds has only one DataTable, therefore that table's index is '0'
DataTable dt = ds.Tables[0];
DataView dv = new DataView(dt);
string cols = string.Empty;
foreach (DataColumn col in dt.Columns)
{
if (!string.IsNullOrEmpty(cols)) cols += ",";
cols += col.ColumnName;
}
dt = dv.ToTable(true, cols.Split(','));
ds.Tables.RemoveAt(0);
ds.Tables.Add(dt);
Following single line of code will avoid the duplicate rows.
ds.Tables["Employee"].DefaultView.ToTable(true,"Name");
ds – Dataset object
dt.DefaultView.ToTable( true, "Name");
dt – DataTable object
How about something like this;
Pseudo code:
Assuming the object has 3 properties: [Id, Name, Value] and called NameObjects and is IEnumerable (List NameObjects;)
var _newNameObjectList = new List<NameObject>();
foreach(var nameObject in NameObjecs)
{
if(_newNameObjectList.Select(x => x.Name == nameObject.Name).ToList().Count > 0)
{
_newNameObjectList.RemoveAll(x => x.Name == nameObject.Name);
continue;
}
else
{
_newNameObjectList.Add(nameObject);
}
}
This should work. This uses the namespace System.Linq;
Okay, so I looked at the blog pointed out to me by Pandiya. In the comments section, a chap called Kevin Morris has posted a solution using a C# Dictionary, which worked for me.
In my main block, I wrote:
string keyColumn = "Website";
RemoveDuplicates(table1, keyColumn);
And my RemoveDuplicates function was defined as:
private void RemoveDuplicates(DataTable table1, string keyColumn)
{
Dictionary<string, string> uniquenessDict = new Dictionary<string, string>(table1.Rows.Count);
StringBuilder sb = null;
int rowIndex = 0;
DataRow row;
DataRowCollection rows = table1.Rows;
while (rowIndex < rows.Count - 1)
{
row = rows[rowIndex];
sb = new StringBuilder();
sb.Append(((string)row[keyColumn]));
if (uniquenessDict.ContainsKey(sb.ToString()))
{
rows.Remove(row);
if (RemoveAllDupes)
{
row = rows[rowIndex - 1];
rows.Remove(row);
}
}
else
{
uniquenessDict.Add(sb.ToString(), string.Empty);
rowIndex++;
}
}
}
If you go to the blog, you will find a more generic function that allows sniffing dupes over multiple columns. I've added a flag--RemoveAllDupes--in case I want to remove all duplicate rows, but this still assumes that the rows are ordered by name, and involves only duplicates and not triplicates, quadruplicates and so on. If anyone can, please update this code to reflect removal of such.

Categories