deduping a list based on a part of a class

deduping a list based on a part of a class - c#

I have a List<Person> and want to remove duplicates based on the NPI field.
I am struggling to find anything related to this or solving it on my own. Below is the Person class. I need to dedupe based on the NPI field, but my problem is that some NPI fields are blank so I can't have anything that has a blank field show up as a dedupe and remove it from the list.
It doesn't matter which duplicates are removed from the list as long as there are no duplicate NPIs left.
class Person
{
string NPI;
string Address;
public string getNPI()
{
return NPI;
}
public Person(String npi, String address)
{
this.NPI = npi;
this.Address = address;
}
}

If you want to select all the items that have empty NPI fields, along with the unique NPI fields from the rest, you can use a GroupBy clause along with some checks for blank NPI values:
var people = new List<Person>
{
new Person ("dupe", "dupe npi"),
new Person ("dupe", "another dupe npi"),
new Person ("", "empty npi"),
new Person ("", "another empty npi"),
new Person ("unique", "unique npi")
};
var uniqueAndBlank = people
.Where(p => string.IsNullOrEmpty(p.getNPI())) // Get all the empty NPIs
.Concat(people // Concat with other items
.Where(p => !string.IsNullOrEmpty(p.getNPI())) // Whose NPIs are set
.GroupBy(p => p.getNPI()) // Grouped by their NPI fields
.Select(g => g.First())) // Grab the first item
.ToList();
// Result:
// {{ "", "empty npi" },
// { "", "another empty npi" },
// { "dupe", "dupe npi" },
// { "unique", "unique npi" }}

Related

Multiple condition on same column inside Linq where

How can I write a linq query to match two condition on same column in the table?
Here one person can be assigned to multiple types of works and it is store in PersonWorkTypes table containing the details of persons and their worktypes.
So I need to get the list of persons who have both fulltime and freelance works.
I have tried
people.where(w => w.worktype == "freelance" && w.worktype == "fulltime")
But it returns an empty result.

You can try this
public class Person {
public string Name {get;set;}
public List<PersonWorkType> PersonWorkTypes {get;set;}
}
public class PersonWorkType {
public string Type {get;set;}
}
public static void Main()
{
var people = new List<Person>();
var person = new Person { Name = "Toño", PersonWorkTypes = new List<PersonWorkType>() { new PersonWorkType { Type = "freelance" } } };
var person2 = new Person { Name = "Aldo", PersonWorkTypes = new List<PersonWorkType>() { new PersonWorkType { Type = "freelance" }, new PersonWorkType { Type = "fulltime" } } };
var person3 = new Person { Name = "John", PersonWorkTypes = new List<PersonWorkType>() { new PersonWorkType { Type = "freelance" }, new PersonWorkType { Type = "fulltime" } } };
people.Add(person);
people.Add(person2);
people.Add(person3);
var filter = people.Where(p => p.PersonWorkTypes.Any(t => t.Type == "freelance") && p.PersonWorkTypes.Any(t => t.Type == "fulltime"));
foreach(var item in filter) {
Console.WriteLine(item.Name);
}
}
This returns person that contains both types in PersonWorkTypes

AS already said, && operator means, that BOTH conditions has to be met. So in your condition it means that you want worktype type to by freelanceand fulltime at the same time, which is not possible :)
Most probably you want employees that have work type freelance OR fulltime, thus your condition should be:
people.Where(w=>w.worktype=="freelance" || w.worktype =="fulltime")
Or, if person can be set more than once in this table, then you could do:
people
.Where(w=>w.worktype=="freelance" || w.worktype =="fulltime")
// here I assume that you have name of a person,
// Basically, here I group by person
.GroupBy(p => p.Name)
// Here we check if any person has two entries,
// but you have to be careful here, as if person has two entries
// with worktype freelance or two entries with fulltime, it
// will pass condition as well.
.Where(grp => grp.Count() == 2)
.Select(grp => grp.FirstOrDefault());

w.worktype=="freelance"
w.worktype=="fulltime"
These are mutually exclusive to each other, and therefore cannot both be true to ever satisfy your AND(&&) operator.
I am inferring that you have two (or more) different rows in your table per person, one for each type of work they do. If so, the Where() method is going to check your list line-by-line individually and won't be able to check two different elements of a list to see if Alice (for example) both has en entry for "freelance" and an entry for "fulltime" as two different elements in the list. Unfortuantely, I can't think of an easy way to do this in a single query, but something like this might work:
var fulltimeWorkers = people.Where(w=>w.worktype=="fulltime");
var freelanceWorkers = people.Where(w=>w.worktype=="freelance");
List<Person> peopleWhoDoBoth = new List<Person>();
foreach (var worker in fulltimeWorkers)
{
if (freelanceWorkers.Contains(worker)
peopleWhoDoBoth.Add(worker);
}
This is probably not the most efficient way possible of doing it, but for small data sets, it shouldn't matter.

Comparing two list of different objects

I have following list.
One list with Person object has Id & Name property. Other list with People object has Id, Name & Address property.
List<Person> p1 = new List<Person>();
p1.Add(new Person() { Id = 1, Name = "a" });
p1.Add(new Person() { Id = 2, Name = "b" });
p1.Add(new Person() { Id = 3, Name = "c" });
p1.Add(new Person() { Id = 4, Name = "d" });
List<People> p2 = new List<People>();
p2.Add(new People() { Id = 1, Name = "a", Address=100 });
p2.Add(new People() { Id = 3, Name = "x", Address=101 });
p2.Add(new People() { Id = 4, Name = "y", Address=102 });
p2.Add(new People() { Id = 8, Name = "z", Address=103 });
Want to filter list so I used below code. But code returns List of Ids. I want List of People object with matched Ids.
var filteredList = p2.Select(y => y.Id).Intersect(p1.Select(z => z.Id));

You're better off with Join
var filteredList = p2.Join(p1,
people => people.Id,
person => person.Id,
(people, _) => people)
.ToList();
The method will match items from both lists by the key you provide - Id of the People class and Id of Person class.
For each pair where people.Id == person.Id it applies the selector function (people, _) => people. The function says for each pair of matched people and person just give me the people instance; I don't care about person.

Something like this should do the trick :
var result= p1.Join(p2, person => person.Id, people => people.Id, (person, people) => people);

If your list is large enough you should use hashed collection to filter it and improve performance:
var hashedIds = new HashSet<int>(p1.Select(p => p.Id));
var filteredList = p2.Where(p => hashedIds.Contains(p.Id)).ToList();
This will work and work extremely fast because Hashed collections like Dictionary or HashSet allows to perform fast lookups with almost O(1) complexity (which effectively means that in order to find element with certain hash compiler knows exactly where to look for it. And with List<T> to find certain element compiler would have to loop the entire collection in order to find it.
For example line: p2.Where(p => p1.Contains(p.Id)).ToList();
has complexity of O(N2) because using of both .Where and .Contains will form nested loops.
Do not use the simplest answer (and method), use the one that better suits your needs.
Simple performance test against .Join() ...
And the larger collection is the more difference it would make.

Populating List<Object> using LINQ [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a file like this:
City|Country|Phone Number
Name
City|Country|Phone Number
Name
City|Country|Phone Number
Name
and so on...
I have made a class as:
class Person
{
string city, country, phone, name;
}
After reading this big file, I want to make a List and place all the values in their respective fields.
My work so far
List<PersonObj> objectsList = new List<PersonObj>();
string[] peopleFile = System.IO.File.ReadAllLines(fileName);
var oddLines = peopleFile.Where((src, index) => index % 2 != 0);
var evenLines = peopleFile.Where((src, index) => index % 2 == 0);
and its successfully retrieving address and name lines separately in oddLines and evenLines respectively.
What I Want
I should have objectsList populated using LINQ rather than populating them one by one using loop
Thanks a lot for your help

This should do what you want
var people = File.ReadLines(fileName)
.Select((l,i) => new { Line = l.Split('|'), LineNo = i })
.GroupBy(x => x.LineNo/2)
.Select(grp => new Person
{
city = grp.First().Line[0],
country = grp.First().Line.Skip(1).FirstOrDefault(),
phone = grp.First().Line.Skip(2).FirstOrDefault(),
name = grp.Last().Line[0]
})
.ToList();
Basically you use the overload of Select that gives you the index and that will allow you to group by the line number so you get groups of the first 2 lines, the next 2 lines and so on. Then just pull from the first or last line in the group (there should only be 2) and the index of the array from doing the split.
Note that this will give incorrect results if the file doesn't match the format, such as the name and city being the same for the last entry if the file has an odd number of lines or the country or phone being null if the odd lines do not have at least two pipe characters.
Also I used File.ReadLines instead of File.ReadAllLines to avoid an unneeded intermediate array being created.

You can do this using Zip. I assume that evenLines contains the cities etc. and oddLines the names.
var persons = oddLines.Zip(
evenLines.Select(line => line.Split('|')),
(name, data) => new Person {name = name, city = data[0], country = data[1], phone = data[2]});
Zip combines each line of oddLines with the corresponding line of evenLines. This second line is split by | and for each combination a new Person object is generated and filled with its data.
Of course there should be a little more error handling as this may throw exceptions if there are values missing in your file.

For better separation of concern, you could first combine the two results in oddLines and evenLines to create a complete string:
var lines = from o in oddLines
from ev in evenLines
select o + "|" + ev;
and then use double LINQ Select:
objectsList = lines.Select(x => x.Split('|'))
.Select(y => new PersonObj() {
city = y[0],
country = y[1],
phone = y[2],
name = y[3],
}).ToList();
The first Select will be used to split each row in the file to string[] with 4 elements and the second Select is used to create PersonObj item from them.
Note that you have to make your fields (city, country, phone, name) to public rather than private to do this.

Here is a LINQ way:
class Person
{
public string Name { get; set; }
public string Country { get; set; }
public string City { get; set; }
public string PhoneNumber { get; set; }
}
class Program
{
static void Main(string[] args)
{
string[] lines = File.ReadAllLines("data.txt");
List<Person> people =
lines
.Select((line, index) =>
new
{
Index = index / 2,
RawData = line
}
)
.GroupBy(obj => obj.Index)
.Select(group =>
{
var rawPerson = group.ToArray();
string name = rawPerson[1].RawData;
string[] rawDetails = rawPerson[0].RawData.Split('|');
return
new Person()
{
Name = name,
City = rawDetails[0],
Country = rawDetails[1],
PhoneNumber = rawDetails[2]
};
}
)
.ToList();
}
}

Sorting List<List<MyType>>

I have 2D a list of "NameValuePair"s that I've been trying to order with no luck so far.
NameValuePair is defined as:
public class NameValuePair
{
[DataMember]
public string Name { get; set; }
[DataMember]
public string Value { get; set; }
}
The list is defined as:
List<List<NameValuePair>> outerList = new List<List<NameValuePair>>();
Each list in outer list might have different number of items at different indices but each one has a "Date" item for sure.
e.g.
List<List<NameValuePair>> outerList = new List<List<NameValuePair>>();
List<NameValuePair> innerList = new List<NameValuePair>();
List<NameValuePair> innerList2 = new List<NameValuePair>();
innerList.Add(new NameValuePair { Name = "List1Item1", Value = "someValue" });
innerList.Add(new NameValuePair { Name = "List1Item2", Value = "otherValue" });
innerList.Add(new NameValuePair { Name = "List1ItemN", Value = "anotherValue" });
innerList.Add(new NameValuePair { Name = "Date", Value = "aDateInStringFormat" });
innerList2.Add(new NameValuePair { Name = "List2Item1", Value = "myValue" });
innerList2.Add(new NameValuePair { Name = "Date", Value = "anotherDateInStringFormat" });
innerList2.Add(new NameValuePair { Name = "List2ItemM", Value = "bestValue" });
outerList.Add(innerList);
outerList.Add(innerList2);
I have tried sorting with outerList.Sort(); and outerList.OrderByDescending(x => x.Where(y => y.Name == "Date")).ToList(); with no luck so far.
I also tried implementing IComparable to my NameValuePair type by overloading CompareTo() but couldn't get it working either.
Any suggestions are more than welcome.

Assuming each inner list has exactly one item with name Date and a proper formated date Value:
var sorted = outerList.OrderBy(x => DateTime.Parse(x.Single(y => y.Name == "Date").Value))
.ToList();
The Linq query takes the NameValuePair with the Name "Date", converts the Value to a DateTime object and sorts the outer list by this value.
Anyway you should think about creating a class with a DateTime property instead.

I know this has already been answered, but here's a slightly different take which is likely to be more performant.
Firstly, do a single pass of all the lists to extract the date times into a separate sequence:
var keys = outerList.Select(x => DateTime.Parse(x.Single(y => y.Name == "Date").Value));
Then use Zip and that DateTime sequence to sort the outer list by that sequence:
outerList = outerList.Zip(keys, (pairs, date) => new {Pairs = pairs, Date = date})
.OrderByDescending(item => item.Date)
.Select(item => item.Pairs)
.ToList();
This avoid multiple calls to IEnumerable.Single() and DateTime parsing whenever two elements are compared during the sorting.

Grouping by property value and writing group members

I need to group the following list by the department value but am having trouble with the LINQ syntax. Here's my list of objects:
var people = new List<Person>
{
new Person { name = "John", department = new List<fields> {new fields { name = "department", value = "IT"}}},
new Person { name = "Sally", department = new List<fields> {new fields { name = "department", value = "IT"}}},
new Person { name = "Bob", department = new List<fields> {new fields { name = "department", value = "Finance"}}},
new Person { name = "Wanda", department = new List<fields> {new fields { name = "department", value = "Finance"}}},
};
I've toyed around with grouping. This is as far as I've got:
var query = from p in people
from field in p.department
where field.name == "department"
group p by field.value into departments
select new
{
Department = departments.Key,
Name = departments
};
So can iterate over the groups, but not sure how to list the Person names -
foreach (var department in query)
{
Console.WriteLine("Department: {0}", department.Department);
foreach (var foo in department.Department)
{
// ??
}
}
Any ideas on what to do better or how to list the names of the relevant departments?

Ah, should have been:
foreach (Person p in department.Name) Console.WriteLine(p.name);
Thanks for the extra set of eyes, Fyodor!

Your department property seems like an awkward implementation, particularly if you want to group by department. Grouping with a List as your key is going to lead to a ton of complexity, and it's unnecessary since you only care about one element in the List.
Also, you seem to have created the fields class as a way of simulating either dynamic/anonymous types, or just the Dictionary<string, string> class, I can't really tell. I suggest not doing that; C# already has those types baked in, and working around them will just be inefficient and stop you from using Intellisense. Whatever led you to do that, there's probably a better, more C#-ish way. Besides--and this is key--your code looks like you can just forget all that and make department a simple string.
If you have control over the data structure, I'd suggest reorganizing it:
var people = new List<Person> {
new Person { name = "John", department = "IT"},
new Person { name = "Sally", department = "IT"},
new Person { name = "Bob", department = "Finance"},
new Person { name = "Wanda", department = "Finance"},
};
Suddenly, grouping all that becomes simple:
var departments = from p in people
group p by p.department into dept
select dept;
foreach (var dept in departments)
{
Console.WriteLine("Department: {0}", dept.Key);
foreach (var person in dept)
{
Console.WriteLine("Person: {0}", person.name);
}
}
If you must leave the data structure as it is, you could try this:
from p in people
from field in p.department
where field.name equals "department"
group p by field.value into dept
select dept;
That should work with the above nested loop.

The list of persons for each department can be accessed via department.Name. Simply iterate over it:
foreach( var person in department.Name ) Console.WriteLine( person.name );
The value of department.Department, on the other hand, is of type string. This value comes from departments.Key, which in turn comes from field.value - because that's the key that you group by.
The foreach statement over department.Department still compiles fine, because string implements IEnumerable<char>. Consequently, your foo variable is of type char.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

deduping a list based on a part of a class - c#

Related

Multiple condition on same column inside Linq where

Comparing two list of different objects

Populating List<Object> using LINQ [closed]

Sorting List<List<MyType>>

Grouping by property value and writing group members

Categories

Resources