Getting duplicate data based on dynamic key - c#

I have a list of Person objects:
List<PersonData> AllPersons
From this list I want all those person objects that are duplicated based on a certain property.
Example, this code give all the duplicates based on the Id
var duplicateKeys = AllPersons.GroupBy(p => p.Id).Select(g => new { g.Key, Count = g.Count() }).Where(x => x.Count > 1).ToList().Select(d => d.Key);
duplicates = AllPersons.Where(p => duplicateKeys.Contains(p.Id)).ToList();
Can the part p.Id be dynamic?
Meaning if the user specifies the unique column in a config file and it's read like so:
string uniqueColumn = "FirstName";
How can the query be composed to add that functionality?
Regards.

You can use Reflection to achieve that:
List<PersonData> AllPersons = new List<PersonData>()
{
new PersonData { Id = 1, FirstName = "Tom" },
new PersonData { Id = 2, FirstName = "Jon" },
new PersonData { Id = 3, FirstName = "Tom" }
};
string uniqueColumn = "FirstName";
var prop = typeof(PersonData).GetProperty(uniqueColumn);
var duplicateKeys = AllPersons.GroupBy(p => prop.GetValue(p, null))
.Select(g => new { g.Key, Count = g.Count() })
.Where(x => x.Count > 1)
.Select(d => d.Key)
.ToList();
var duplicates = AllPersons.Where(p => duplicateKeys.Contains(prop.GetValue(p, null))).ToList();
duplicates have 2 elements with FirstName == "Tom" after query execution.

You might want to look into Dynamic LINQ or PredicateBuilder.

Related

Include count = 0 in linq results

I have a table having TeamName and CurrentStatus fields. I am making a linq query to get for each team and for each status the count of records:
var teamStatusCounts = models.GroupBy(x => new { x.CurrentStatus, x.TeamName })
.Select(g => new { g.Key, Count = g.Count() });
The results of this query returns all the counts except where count is 0. I need to get the rows where there is no record for a specific team and a specific status (where count = 0).
You could have a separate collection for team name and statuses you are expecting and add the missing ones to the result set
//assuming allTeamNamesAndStatuses is a cross joing of all 'CurrentStatus' and 'TeamNames'
var teamStatusCounts = models.GroupBy(x => new { x.CurrentStatus, x.TeamName })
.Select(g => new { g.Key, Count = g.Count() })
.ToList();
var missingTeamsAndStatuses = allTeamNamesAndStatuses
.Where(a=>
!teamStatusCounts.Any(b=>
b.Key.CurrentStatus == a.CurrentStatus
&& b.Key.TeamName == a.TeamName))
.Select(a=>new {
Key = new { a.CurrentStatus, a.TeamName },
Count = 0
});
teamStatusCounts.AddRange(emptyGroups);
I've created a fiddle demonstrating the answer as well
I would select the team names and status first:
var teams = models.Select(x => x.TeamName).Distinct().ToList();
var status = models.Select(x => x.CurrentStatus).Distinct().ToList();
You can skip this if you know the list entries already.
Then you can select for each team and each state the number of models:
var teamStatusCounts = teams.SelectMany(team => states.Select(state =>
new
{
TeamName = team,
CurrentStatus = state,
Count = models.Count(model =>
model.TeamName == team && model.CurrentStatus == state)
}));

c# Linq List - handle null

which in the second list im trying to create a relationship, however if it cant a match, how do I ignore and not add an item?
var clientData = File.ReadAllLines(txtClients.Text)
.Skip(1)
.Select(x => x.Split(','))
.Select(x => new Client()
{
ClientTempId = x[0],
Email = x[1],
FirstName = x[2],
LastName = x[3],
AccountId = accountId
});
var orderData = File.ReadAllLines(txtOrders.Text)
.Skip(1)
.Select(x => x.Split(','))
.Select(x => new Order()
{
OrderTempId = x[0],
ClientId = clientData.FirstOrDefault(c=>c.ClientTempId == x[1]).Id ==string.Empty?"Error here!!":x[1],
//How do I handle errors, if client does not exist, or row is in wrong format? dont want to break code just want a list or issues
Name = x[3],
AccountId = accountId
});
You can return null instead and then filter those out:
var orderData = File.ReadAllLines(txtOrders.Text)
.Skip(1)
.Select(x => x.Split(','))
.Select(x =>
{
// do your check here, and return null
if (clientData.FirstOrDefault(c => c.ClientTempId == x[1]) == null)
return null;
// otherwise return the normal Order object
return new Order()
{
OrderTempId = x[0],
ClientId = x[1],
Name = x[3],
AccountId = accountId
};
})
// then filter out null values
.Where(x => x != null);
Once that is covered, as EZI pointed out in the comments, your actual check is quite expensive. You can make it more efficient by turning your clientData into a dictionary:
var clientDataDictionary = clientData.ToDictionary(c => c.ClientTempId);
Then, you can do the lookup above in constant time:
if (clientDataDictionary.ContainsKey(x[1]))
return null;

Select Last non null-able item per product?

Let's say I have,
class Product
{
public int Id {get; set;}
public string Name {get; set;}
public int Order {get; set;}
}
and my data have,
products[0] = new Product { Id = 1, Name = "P1", Order = 1 };
products[1] = new Product { Id = 1, Name = "P2", Order = 2 };
products[2] = new Product { Id = 1, Name = null, Order = 3 };
products[3] = new Product { Id = 2, Name = "P3", Order = 4 };
products[4] = new Product { Id = 2, Name = null, Order = 5 };
products[5] = new Product { Id = 2, Name = null, Order = 6 };
What I need is the last(order by Order desc) non-nullable value of Name per Product.Id. So my final output will look like,
items[0] = new { Id = 1, Name = "P2"};
items[1] = new { Id = 2, Name = "P3"};
If Id=1, I have 3 Names (P1, P2, null) and non-nullable Names (P1, P2) but last one is P3.
This should get the last products in order.
var lastOrders = products
.Where(x => x.Name != null) // Remove inapplicable data
.OrderBy(x => x.Order) // Order by the Order
.GroupBy(x => x.Id) // Group the sorted Products
.Select(x => x.Last()); // Get the last products in the groups
var result = products
.GroupBy(p => p.Id)
.Select(g => g.OrderBy(x => x.Order).Last(x => x.Name != null));
this will give you your desired output:
products.GroupBy(p => p.Id)
.Select(g => g.OrderByDescending(gg => gg.Name)
.Where(gg => gg.Name != null)
.Select(gg => new { gg.Id, gg.Name })
.First());
The task can be solved using the following Linq statement.
var Result = products.OrderBy().Where( null != iProduct.Name ).First();
This requires products to contain at least one item where Name is null, otherwise an Exception will be thrown. Alternatively,
var Result = products.OrderBy().Where( null != iProduct.Name ).FirstOrDefault();
will return null if products contains no such item.
Try with :
var expectedProduct =products.Where(p => p.Id != null).OrderByDescending(p => p.Order).GroupBy(p => p.Id).Last()

Assign values from one list to another using LINQ

Hello I have a little problem with assigning property values from one lists items to anothers. I know i could solve it "the old way" by iterating through both lists etc. but I am looking for more elegant solution using LINQ.
Let's start with the code ...
class SourceType
{
public int Id;
public string Name;
// other properties
}
class DestinationType
{
public int Id;
public string Name;
// other properties
}
List<SourceType> sourceList = new List<SourceType>();
sourceList.Add(new SourceType { Id = 1, Name = "1111" });
sourceList.Add(new SourceType { Id = 2, Name = "2222" });
sourceList.Add(new SourceType { Id = 3, Name = "3333" });
sourceList.Add(new SourceType { Id = 5, Name = "5555" });
List<DestinationType> destinationList = new List<DestinationType>();
destinationList.Add(new DestinationType { Id = 1, Name = null });
destinationList.Add(new DestinationType { Id = 2, Name = null });
destinationList.Add(new DestinationType { Id = 3, Name = null });
destinationList.Add(new DestinationType { Id = 4, Name = null });
I would like to achieve the following:
destinationList should be filled with Names of corresponding entries (by Id) in sourceList
destinationList should not contain entries that are not present in both lists at once (eg. Id: 4,5 should be eliminated) - something like inner join
I would like to avoid creating new destinationList with updated entries because both lists already exist and are very large,
so no "convert" or "select new".
In the end destinationList should contain:
1 "1111"
2 "2222"
3 "3333"
Is there some kind of elegant (one line Lambda? ;) solution to this using LINQ ?
Any help will be greatly appreciated! Thanks!
I would just build up a dictionary and use that:
Dictionary<int, string> map = sourceList.ToDictionary(x => x.Id, x => x.Name);
foreach (var item in destinationList)
if (map.ContainsKey(item.Id))
item.Name = map[item.Id];
destinationList.RemoveAll(x=> x.Name == null);
Hope this will your desired result. First join two list based on key(Id) and then set property value from sourceList.
var result = destinationList.Join(sourceList, d => d.Id, s => s.Id, (d, s) =>
{
d.Name = s.Name;
return d;
}).ToList();
Barring the last requirement of "avoid creating new destinationList" this should work
var newList = destinationList.Join(sourceList, d => d.Id, s => s.Id, (d, s) => s);
To take care of "avoid creating new destinationList", below can be used, which is not any different than looping thru whole list, except that it probably is less verbose.
destinationList.ForEach(d => {
var si = sourceList
.Where(s => s.Id == d.Id)
.FirstOrDefault();
d.Name = si != null ? si.Name : "";
});
destinationList.RemoveAll(d => string.IsNullOrEmpty(d.Name));
Frankly, this is the simplest:
var dictionary = sourceList.ToDictionary(x => x.Id, x => x.Name);
foreach(var item in desitnationList) {
if(dictionary.ContainsKey(item.Id)) {
item.Name = dictionary[item.Id];
}
}
destinationList = destinationList.Where(x => x.Name != null).ToList();
You could do something ugly with Join but I wouldn't bother.
I hope this will be useful for you. At the end, destinationList has the correct data, without creating any new list of any kind.
destinationList.ForEach(x =>
{
SourceType newSource = sourceList.Find(s=>s.Id == x.Id);
if (newSource == null)
{
destinationList.Remove(destinationList.Find(d => d.Id == x.Id));
}
else
{
x.Name = newSource.Name;
}
});

Group By Multiple Columns

How can I do GroupBy multiple columns in LINQ
Something similar to this in SQL:
SELECT * FROM <TableName> GROUP BY <Column1>,<Column2>
How can I convert this to LINQ:
QuantityBreakdown
(
MaterialID int,
ProductID int,
Quantity float
)
INSERT INTO #QuantityBreakdown (MaterialID, ProductID, Quantity)
SELECT MaterialID, ProductID, SUM(Quantity)
FROM #Transactions
GROUP BY MaterialID, ProductID
Use an anonymous type.
Eg
group x by new { x.Column1, x.Column2 }
Procedural sample:
.GroupBy(x => new { x.Column1, x.Column2 })
Ok got this as:
var query = (from t in Transactions
group t by new {t.MaterialID, t.ProductID}
into grp
select new
{
grp.Key.MaterialID,
grp.Key.ProductID,
Quantity = grp.Sum(t => t.Quantity)
}).ToList();
For Group By Multiple Columns, Try this instead...
GroupBy(x=> new { x.Column1, x.Column2 }, (key, group) => new
{
Key1 = key.Column1,
Key2 = key.Column2,
Result = group.ToList()
});
Same way you can add Column3, Column4 etc.
Since C# 7 you can also use value tuples:
group x by (x.Column1, x.Column2)
or
.GroupBy(x => (x.Column1, x.Column2))
C# 7.1 or greater using Tuples and Inferred tuple element names (currently it works only with linq to objects and it is not supported when expression trees are required e.g. someIQueryable.GroupBy(...). Github issue):
// declarative query syntax
var result =
from x in inMemoryTable
group x by (x.Column1, x.Column2) into g
select (g.Key.Column1, g.Key.Column2, QuantitySum: g.Sum(x => x.Quantity));
// or method syntax
var result2 = inMemoryTable.GroupBy(x => (x.Column1, x.Column2))
.Select(g => (g.Key.Column1, g.Key.Column2, QuantitySum: g.Sum(x => x.Quantity)));
C# 3 or greater using anonymous types:
// declarative query syntax
var result3 =
from x in table
group x by new { x.Column1, x.Column2 } into g
select new { g.Key.Column1, g.Key.Column2, QuantitySum = g.Sum(x => x.Quantity) };
// or method syntax
var result4 = table.GroupBy(x => new { x.Column1, x.Column2 })
.Select(g =>
new { g.Key.Column1, g.Key.Column2 , QuantitySum= g.Sum(x => x.Quantity) });
You can also use a Tuple<> for a strongly-typed grouping.
from grouping in list.GroupBy(x => new Tuple<string,string,string>(x.Person.LastName,x.Person.FirstName,x.Person.MiddleName))
select new SummaryItem
{
LastName = grouping.Key.Item1,
FirstName = grouping.Key.Item2,
MiddleName = grouping.Key.Item3,
DayCount = grouping.Count(),
AmountBilled = grouping.Sum(x => x.Rate),
}
Though this question is asking about group by class properties, if you want to group by multiple columns against a ADO object (like a DataTable), you have to assign your "new" items to variables:
EnumerableRowCollection<DataRow> ClientProfiles = CurrentProfiles.AsEnumerable()
.Where(x => CheckProfileTypes.Contains(x.Field<object>(ProfileTypeField).ToString()));
// do other stuff, then check for dups...
var Dups = ClientProfiles.AsParallel()
.GroupBy(x => new { InterfaceID = x.Field<object>(InterfaceField).ToString(), ProfileType = x.Field<object>(ProfileTypeField).ToString() })
.Where(z => z.Count() > 1)
.Select(z => z);
var Results= query.GroupBy(f => new { /* add members here */ });
A thing to note is that you need to send in an object for Lambda expressions and can't use an instance for a class.
Example:
public class Key
{
public string Prop1 { get; set; }
public string Prop2 { get; set; }
}
This will compile but will generate one key per cycle.
var groupedCycles = cycles.GroupBy(x => new Key
{
Prop1 = x.Column1,
Prop2 = x.Column2
})
If you wan't to name the key properties and then retreive them you can do it like this instead. This will GroupBy correctly and give you the key properties.
var groupedCycles = cycles.GroupBy(x => new
{
Prop1 = x.Column1,
Prop2= x.Column2
})
foreach (var groupedCycle in groupedCycles)
{
var key = new Key();
key.Prop1 = groupedCycle.Key.Prop1;
key.Prop2 = groupedCycle.Key.Prop2;
}
group x by new { x.Col, x.Col}
.GroupBy(x => (x.MaterialID, x.ProductID))
.GroupBy(x => x.Column1 + " " + x.Column2)
For VB and anonymous/lambda:
query.GroupBy(Function(x) New With {Key x.Field1, Key x.Field2, Key x.FieldN })

Categories