Remove duplicate rows from a list based on selected columns?

Remove duplicate rows from a list based on selected columns? - c#

I have a list of a class and there are two columns in this class. Now i want to remove the duplicate rows from that class using specific columns. Like remove duplicate from first column only ,remove from send column only or remove from both.So for this i am using following code. Is there any best way to do this process because in future i will have 20-25 columns in this class and at that time i have to add 20-25 if statements in this function?
public List<ContactTemp> RemoveDupliacacse(List<ContactTemp> ContactTempList, List<string> objcolumn)
{
List<ContactTemp> ContactTempListRemobdup = new List<ContactTemp>();
if (objcolumn.Contains("CITY"))
{
ContactTempListRemobdup = ContactTempList.GroupBy(s => s.City).Select(group => group.First()).ToList();
}
if (objcolumn.Contains("STATE"))
{
ContactTempListRemobdup = ContactTempList.GroupBy(s => s.State).Select(group => group.First()).ToList();
}
return ContactTempListRemobdup;
}

I think your class like
public class ContactTemp
{
public string CITY{}
public int STATE{}
}
This list "ContactTempList" will have duplicates. you want to find and remove items from this list where CITY and STATE are duplicates.
I meant that,This will return one item for each "type" (like a Distinct) (so if you have A, A, B, C it will return A, B, C)
List<ContactTemp> noDups = ContactTempList.GroupBy(d => new {d.CITY,d.STATE} )
.Select(d => d.First())
.ToList();
If you want only the elements that don't have a duplicate (so if you have A, A, B, C it will return B, C):
List<ContactTemp> noDups = ContactTempList.GroupBy(d => new {d.CITY,d.STATE} )
.Where(d => d.Count() == 1)
.Select(d => d.First())
.ToList();

You can achieve it via reflection with same signature, i.e. arbitrary number of columns:
public List<ContactTemp> RemoveDupliacacse(List<ContactTemp> ContactTempList,
List<string> objcolumn)
{
var type = typeof(ContactTemp);
foreach (var column in objcolumn)
{
var property = type.GetProperty(column);
ContactTempList = ContactTempList.GroupBy(x => property.GetValue(x))
.Select(x => x.First()).ToList();
}
return ContactTempList;
}

How about something like this?
public static List<ContactTemp> RemoveDupliacacse(
List<ContactTemp> ContactTempList,
IEnumerable<Func<ContactTemp, object>> columnSelectors)
{
IEnumerable<ContactTemp> ContactTempListRemobdup = ContactTempList;
foreach(var selector in columnSelectors)
{
ContactTempListRemobdup = ContactTempListRemobdup
.GroupBy(s => selector(s))
.Select(group => group.First());
}
return ContactTempListRemobdup.ToList();
}
You can use it like;
RemoveDupliacacse(list, new List<Func<ContactTemp, object>> {
(ContactTemp contact) => contact.State, (ContactTemp contact) => contact.City })
As you may already know, when you select multiple columns, the method removes duplicates for each column. Please check the following examples:
var list = new List<ContactTemp> {
new ContactTemp { City = "1", State = "1" },
new ContactTemp { City = "1", State = "2" },
new ContactTemp { City = "2", State = "1" },
new ContactTemp { City = "2", State = "2" }
};
foreach (var contact in RemoveDupliacacse(
list,
new List<Func<ContactTemp, object>> {
(ContactTemp contact) => contact.State,
(ContactTemp contact) => contact.City }))
{
Console.WriteLine($"City:{contact.City}, State:{contact.State}");
}
// This will output:
// City: 1, State: 1
// If you want to check duplication of the combination of the selected columns,
// you can do it like this;
foreach (var contact in RemoveDupliacacse(
list,
new List<Func<ContactTemp, object>> {
(ContactTemp contact) => new { contact.State, contact.City } }))
{
Console.WriteLine($"City:{contact.City}, State:{contact.State}");
}
// This will output:
// City: 1, State: 1
// City: 1, State: 2
// City: 2, State: 1
// City: 2, State: 2

Related

Select all from one List, replace values that exist on another List

Few days ago I asked same question with SQL, but now it arises in C# code
Lets say we have this kind of class for holding different id/text pairs:
public class Text {
public int id { get; set; }
public string text { get; set; }
...
}
Now lets populate some data,
ListA gets a lot of data:
List<Text> ListA = new List<Text>{
new () {id = 1, text = "aaa1"},
new () {id = 2, text = "aaa2"},
new () {id = 3, text = "aaa3"},
new () {id = 4, text = "aaa4"},
new () {id = 5, text = "aaa5"},
new () {id = 6, text = "aaa6"},
};
And ListB gets just a little bit of data:
List<Text> ListB = new List<Text>{
new () {id = 4, text = "bbb4"},
new () {id = 5, text = "bbb5"},
};
And now what we are looking:
var result = ... // Some Linq or Lambda magic goes here
// and if we do:
foreach(var item in result){
Console.WriteLine(item.Id + " " + item.Text);
}
// Result will be:
1 : aaa1
2 : aaa2
3 : aaa3
4 : bbb4
5 : bbb5
6 : aaa6

You can try looking for id within ListB:
var result = ListA
.Select(a => ListB.FirstOrDefault(b => b.id == a.id) ?? a);
Here for each a within ListA we try to find corresponding by id (b.id == a.id) item within ListB. If no such item is found we just return ListA item: ?? item
In case of .Net 6 you can use overloaded .FirstOrDefault version (we can pass a as a default value):
var result = ListA
.Select(a => ListB.FirstOrDefault(b => a.id == b.id, a));

It might be more efficient to convert ListB to a Dictionary first:
var dictB = ListB.ToDictionary(x=> x.id)
Then you can write
var result = ListA.Select(x => dictB.TryGetValue(x.id, out var b) ? b : x)
UPD Rewrote taking comment suggestions into account

One option is to do an Union operation, by specifying an EqualityComparer. If the order is important, you can do an OrderBy operation at the end.
class TextIdComparer : EqualityComparer<Text> {
public override bool Equals(Text x, Text y) => x.id == y.id;
}
var result = ListB.Union(ListA, new TextIdComparer()).OrderBy(x => x.id)

How to use dictionary in c# to compare two lists

Currently, I have implemented two lists with a double for loop to find matches between the two lists so I can join on them.
I have a list A which contains an ID and some other columns. I have a list B which contains an ID and some other columns. I have currently implemented a for loop within a for loop in order to make the comparisons for all the IDs so that I can find the ones that match and then return the joined results. I know want to understand how to implement a dictionary in this case as that will be more efficient to fix this problem.
public IEnumerable<Details> GetDetails(string ID)
{
// there are two lists defined up here
for (var item in listA)
{
for (var item2 in listB)
{
if (item.ID == item2.ID)
{
item.Name = item2.name;
}
}
}
return results;
}
Instead of having this double for loop, which is very inefficient. I want to learn how to implement a dictionary to fix this problem.

The dictionary would use the ids as keys (or indexes) so
Dictionary<string, object> myListA = new Dictionary<string, object>();
Dictionary<string, object> myListB = new Dictionary<string, object>();
public object GetDetails(string ID)
{
object a = myListA[ID];
object b = myListB[ID];
// combine them here how you want
// object c = a + b;
return c;
}

How about using linq to achieve your actual requirement? Something like:
public IEnumerable<A> GetDetails(int ID)
{
var listA = new List<A>
{
new A(){ ID = 1, Name = 2 },
new A(){ ID = 3, Name = 4 },
new A(){ ID = 5, Name = 6 },
};
var listB = new List<B>
{
new B(){ X = 1, name = 0 },
new B(){ X = 3, name = 1 }
};
return listA.Join(listB, k => k.ID, k => k.ID, (item, item2) =>
{
item.Name = item2.name;
return item;
}).Where(w => w.ID == ID);
}

If you just want the common IDs in the two lists, you can achieve that like this:
var commonIds = listA.Select(o => o.ID).Intersect(listB.Select(o => o.ID));

C# filter out duplicates from list of custom objects based on list attribute

I have a List<IPInfo> of custom IPInfo objects. I need to filter out duplicates based on two attributes from this class.
Here is a class:
class IPInfo
{
public String TRADE_DATE;
public String CUSTOMER_NAME;
public List<String> ORIGINAL_IP;
public List<String> LOGON_IP = new List<String>();
}
The List<IPInfo> fields has records with the same CUSTOMER_NAME and LOGON_IP. I want to remove them, so that the entry with the same CUSTOMER_NAME is guaranteed to have different LOGON_IP.
I tried LINQ based on other posted answers. But this code is not compiling.
private static List<IPInfo> selectFields(ref List<IPInfo> fields)
{
var distinct = fields.GroupBy(x => new { x.CUSTOMER_NAME, x.LOGON_IP })
.Select(y => new IPInfo()
{
TRADE_DATE = y.Key.TRADE_DATE,
CUSTOMER_NAME = y.Key.CUSTOMER_NAME,
ORIGINAL_IP = y.ToList(),
LOGON_IP = y.ToList()
}
).ToList();
return distinct;
}
Please give me some hints.
Trade_date Customer_name Original_IP Logon_IP

Give this a crack:
private static List<IPInfo> selectFields(ref List<IPInfo> fields)
{
var distinct =
fields
.GroupBy(x => new { x.TRADE_DATE, x.CUSTOMER_NAME })
.Select(y => new IPInfo()
{
TRADE_DATE = y.Key.TRADE_DATE,
CUSTOMER_NAME = y.Key.CUSTOMER_NAME,
ORIGINAL_IP = y.SelectMany(x => x.ORIGINAL_IP).Distinct().ToList(),
LOGON_IP = y.SelectMany(x => x.LOGON_IP).Distinct().ToList()
})
.ToList();
return distinct;
}

You can try GroupBy and First (if we have a group of duplicates, we should take first item only).
Another issue is how to group by List<T> (LOGON_IP is a list); assuming that LOGON_IP are equal if and only if they have same items in the same order we can
turn LOGON_IP into string with a help of string.Join; if order within LOGON_IP doesn't matter, we can use
string.Join(" ", x.LOGON_IP.OrderBy(ip => ip))
Code:
private static List<IPInfo> selectFields(ref List<IPInfo> fields)
{
var distinct = fields
.GroupBy(x => new { x.CUSTOMER_NAME, ips = string.Join(" ", x.LOGON_IP) })
.Select(chunk => chunk.First())
.ToList();
return distinct;
}
Edit: In case we don't want to return duplicates at all (i.e. if item has a duplicate we should remove all its occurrences), let's check Count:
private static List<IPInfo> selectFields(ref List<IPInfo> fields)
{
var distinct = fields
.GroupBy(x => new { x.CUSTOMER_NAME, ips = string.Join(" ", x.LOGON_IP) })
.Where(chunk => chunk.Count() == 1)
.Select(chunk => chunk.First())
.ToList();
return distinct;
}

Group items by the items it holds

Please note: My question contains pseudo code!
In my army I have foot soldiers.
Every soldier is unique: name, strength etc...
All soldiers have inventory. It can be empty.
Inventory can contain: weapons, shields, other items.
I want to group my footsoldiers by their exact inventory.
Very simple example:
I have a collection of:
Weapons: {"AK-47", "Grenade", "Knife"}
Shields: {"Aegis"}
OtherItems: {"KevlarVest"}
Collection of footsoldiers. (Count = 6)
"Joe" : {"AK-47", "Kevlar Vest"}
"Fred" : {"AK-47"}
"John" : {"AK-47", "Grenade"}
"Rambo" : {"Knife"}
"Foo" : {"AK-47"}
"Bar" : {"KevlarVest"}
These are the resulting groups (count=5) : (already in specific order now)
{"AK-47"}
{"AK-47", "Grenade"}
{"AK-47", "Kevlar Vest"}
{"Knife"}
{"KevlarVest"}
I want to sort the groups by: Weapons, then by shields, then by other items in specific order in which they are declared within their collection.
When I open the inventorygroup {"Knife"} I will find a collection with 1 footsoldier named "Rambo".
Please note: I have made this simplified version, in order not to distract you with the complexity of the data at hand. In my business case I am working with ConditionalActionFlags, that may hold Conditions of a certain type.
Hereby I supply a TestMethod that still fails now.
Can you rewrite the GetSoldierGroupings method so that the TestSoldierGroupings method succeeds ?
public class FootSoldier
{
public string Name { get; set; }
public string[] Inventory { get; set; }
}
public class ArrayComparer<T> : IEqualityComparer<T[]>
{
public bool Equals(T[] x, T[] y)
{
return x.SequenceEqual(y);
}
public int GetHashCode(T[] obj)
{
return obj.Aggregate(string.Empty, (s, i) => s + i.GetHashCode(), s => s.GetHashCode());
}
}
[TestMethod]
public void TestSoldierGroupings()
{
//Arrange
var weapons = new[] { "AK-47", "Grenade", "Knife" };
var shields = new[] { "Aegis" };
var otherItems = new[] { "KevlarVest" };
var footSoldiers = new FootSoldier[]
{
new FootSoldier() { Name="Joe" , Inventory= new string[]{ "AK-47", "Kevlar Vest" } },
new FootSoldier() { Name="Fred" , Inventory= new string[]{ "AK-47" } },
new FootSoldier() { Name="John" , Inventory= new string[]{ "AK-47", "Grenade" } },
new FootSoldier() { Name="Rambo" , Inventory= new string[]{ "Knife" } },
new FootSoldier() { Name="Foo" , Inventory= new string[]{ "AK-47" } },
new FootSoldier() { Name="Bar" , Inventory= new string[]{ "Kevlar Vest" } }
};
//Act
var result = GetSoldierGroupings(footSoldiers, weapons, shields, otherItems);
//Assert
Assert.AreEqual(result.Count, 5);
Assert.AreEqual(result.First().Key, new[] { "AK-47" });
Assert.AreEqual(result.First().Value.Count(), 2);
Assert.AreEqual(result.Last().Key, new[] { "Kevlar Vest" });
Assert.AreEqual(result[new[] { "Knife" }].First().Name, "Rambo");
}
public Dictionary<string[], FootSoldier[]> GetSoldierGroupings(FootSoldier[] footSoldiers,
string[] weapons,
string[] shields,
string[] otherItems)
{
//var result = new Dictionary<string[], FootSoldier[]>();
var result = footSoldiers
.GroupBy(fs => fs.Inventory, new ArrayComparer<string>())
.ToDictionary(x => x.Key, x => x.ToArray());
//TODO: the actual sorting.
return result;
}

You need to group your soldiers by a key of combined items. It can be done using custom comparers.
As for me, I would make it simpler by using String.Join with separator which cannot be met in any weapon, shield etc.
Assuming that a soldiers has a property Items which is an array of strings (like ["AK-47", "Kevlar Vest"]), you can do something like this:
var groups = soldiers
.GroupBy(s => String.Join("~~~", s.Items))
.ToDictionary(g => g.First().Items, g => g.ToArray());
It will result into a Dictionary where key is unique item set, and value is an array of all soldiers having such set.
You may change this code such that it returns IGrouping, array of classes \ structs, Dictionary, whatever else convenient for you.
I would go for a Dictionary or an array of something like SoldiersItemGroup[] with items and soldiers as properties.
Make sure to change such join separator that no weapon can theoretically contain it.

How do I order the elements in a group by linq query, and pick the first?

I have a set of data that contains a type, a date, and a value.
I want to group by the type, and for each set of values in each group I want to pick the one with the newest date.
Here is some code that works and gives the correct result, but I want to do it all in one linq query rather than in the iteration. Any ideas how I can achieve the same result as this with purely a linq query...?
var mydata = new List<Item> {
new Item { Type = "A", Date = DateTime.Parse("2016/08/11"), Value = 1 },
new Item { Type = "A", Date = DateTime.Parse("2016/08/12"), Value = 2 },
new Item { Type = "B", Date = DateTime.Parse("2016/08/20"), Value = 3 },
new Item { Type = "A", Date = DateTime.Parse("2016/08/09"), Value = 4 },
new Item { Type = "A", Date = DateTime.Parse("2016/08/08"), Value = 5 },
new Item { Type = "C", Date = DateTime.Parse("2016/08/17"), Value = 6 },
new Item { Type = "B", Date = DateTime.Parse("2016/08/30"), Value = 7 },
new Item { Type = "B", Date = DateTime.Parse("2016/08/18"), Value = 8 },
};
var data = mydata.GroupBy(_ => _.Type);
foreach (var thing in data) {
#region
// How can I remove this section and make it part of the group by query above... ?
var subset = thing.OrderByDescending(_ => _.Date);
var top = subset.First();
#endregion
Console.WriteLine($"{thing.Key} {top.Date.ToString("yyyy-MM-dd")} {top.Value}");
}
Where Item is defined as:
public class Item {
public string Type {get;set;}
public DateTime Date {get;set;}
public int Value {get;set;}
}
Expected output:
A 2016-08-12 2
B 2016-08-30 7
C 2016-08-17 6

Use select to get the FirstOrDefault (or First - because of the grouping you won't get a null) ordered descending:
var data = mydata.GroupBy(item => item.Type)
.Select(group => group.OrderByDescending(x => x.Date)
.FirstOrDefault())
.ToList();
Or SelectMany together with Take(1)
var data = mydata.GroupBy(item => item.Type)
.SelectMany(group => group.OrderByDescending(x => x.Date)
.Take(1))
.ToList();

You can Select the first element of the ordered groups:
var topItems = mydata.GroupBy(item => item.Type)
.Select(group => group.OrderByDescending(item => item.Date).First())
.ToList();
topItems is now a List<Item> containing only the top items per Type.
You may also retrieve it as Dictionary<string,Item> mapping the Type strings to the top Item for that Type:
var topItems = mydata.GroupBy(item => item.Type)
.ToDictionary(group => group.Key,
group => group.OrderByDescending(item => item.Date).First());

var data = mydata.GroupBy(
item => item.Type,
(type, items) => items.OrderByDescending(item => item.Date).First());
foreach (var item in data)
{
Console.WriteLine($"{item.Type} {item.Date.ToString("yyyy-MM-dd")} {item.Value}");
}

Group by type, for each group, order it by date descending and pick the first (newest).
mydata.GroupBy(i => i.Type).Select(g => g.OrderByDescending(i => i.Date).First());

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove duplicate rows from a list based on selected columns? - c#

Related

Select all from one List, replace values that exist on another List

How to use dictionary in c# to compare two lists

C# filter out duplicates from list of custom objects based on list attribute

Group items by the items it holds

How do I order the elements in a group by linq query, and pick the first?

Categories

Resources