AsParallel - Before vs After in Linq Where Clause Performance

AsParallel - Before vs After in Linq Where Clause Performance - c#

I'm having a List<Boss> Collection, every Boss has 2 to 10 Assistant Staff. I'm grouping all the Employees including Boss. Now I'm having List<Person>, from this I'm searching "Raj" using Parallel LINQ, where can I place the supportive method AsParallel() to get better performance, Before or After the Where Clause ?
public class Person
{
public int EmpID { get; set; }
public string Name { get; set; }
public string Department { get; set; }
public string Gender { get; set; }
}
void Main()
{
List<Boss> BossList = new List<Boss>()
{
new Boss()
{
EmpID = 101,
Name = "Harry",
Department = "Development",
Gender = "Male",
Employees = new List<Person>()
{
new Person() {EmpID = 102, Name = "Peter", Department = "Development",Gender = "Male"},
new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development",Gender = "Female"},
}
},
new Boss()
{
EmpID = 104,
Name = "Raj",
Department = "Development",
Gender = "Male",
Employees = new List<Person>()
{
new Person() {EmpID = 105, Name = "Kaliya", Department = "Development",Gender = "Male"},
new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development",Gender = "Female"},
}
}
};
List<Person> result = BossList
.SelectMany(x =>
new[] { new Person { Name = x.Name, Department = x.Department, Gender = x.Gender, EmpID = x.EmpID } }
.Concat(x.Employees))
.GroupBy(x => x.EmpID) //Group by employee ID
.Select(g => g.First()) //And select a single instance for each unique employee
.ToList();
List<Person> SelectedResult = new List<Person>();
// AsParallel() - Before Where Clause
SelectedResult = result.AsParallel().Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant())).ToList();
// AsParallel() - After Where Clause
SelectedResult = result.Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant())).AsParallel().ToList();
}
Core Source Code:
List<Person> SelectedResult = new List<Person>();
// AsParallel() - Before Where Clause
SelectedResult = result.AsParallel().Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant())).ToList();
// AsParallel() - After Where Clause
SelectedResult = result.Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant())).AsParallel().ToList();

Before.
AsParallel helps us to run queries in parallel, which is enabling parallel threads to improve performance. If you put before WHERE clause the filtering will be done in series, and only then will anything be parallelized.
Here is some test code:
using System;
using System.Diagnostics;
using System.Linq;
using System.Threading;
class AsParallelTest
{
static void Main()
{
var query = Enumerable.Range(0, 1000)
.Select(ProjectionExample)
.Where(x => x > 10)
.AsParallel();
Stopwatch stopWatch = Stopwatch.StartNew();
int count = query.Count();
stopWatch.Stop();
Console.WriteLine("Count: {0} in {1}ms", count,
stopWatch.ElapsedMilliseconds);
query = Enumerable.Range(0, 1000)
.AsParallel()
.Select(ProjectionExample)
.Where(x => x > 10);
stopWatch = Stopwatch.StartNew();
count = query.Count();
stopWatch.Stop();
Console.WriteLine("Count: {0} in {1}ms", count,
stopWatch.ElapsedMilliseconds);
}
static int ProjectionExample(int arg)
{
Thread.Sleep(10);
return arg;
}
}
Result:
Count: 989 in 10574ms
Count: 989 in 1409ms
It's obvious that the first result hasn't been parallelized, where the second has. If you have only one processor core, the results should be close. If you have more than two processor cores, the AsParallel call may increase performance even more.
Also, you can read this article.

Related

Best average student(s) score (C#, LINQ, without loops)

Can I somehow calculate the average for different items and choose student(s) with best GPA?
public static List<Student> LoadSampleData()
{
List<Student> output = new List<Student>();
output.Add(new Student { ID = 1, FirstName = "Tim", LastName = "Corey ", Patronymic = "Fitzpatrick ", Group = "A", Math = 5, Programming = 5, Informatics = 5});
output.Add(new Student { ID = 2, FirstName = "Joe", LastName = "Smith ", Patronymic = "Mackenzie ", Group = "A", Math = 3, Programming = 3, Informatics = 4});
output.Add(new Student { ID = 3, FirstName = "Ellie", LastName = "Williams ", Patronymic = "", Group = "B", Math = 4, Programming = 5, Informatics = 4});
output.Add(new Student { ID = 4, FirstName = "Joel", LastName = "Miller ", Patronymic = "", Group = "B", Math = 4, Programming = 4, Informatics = 5});
return output;
}
I need it to be calculated approximately according to the following logic (finding the average for all subjects for each student. For example: student_avarage(Math+Programming+Informatics) and find the best score). Without using loops like: for, while, if and etc. ("foreach{}" too)
public static void BestStudentsAvarage()
{
List<Student> students = ListManager.LoadSampleData();
var StudentAverage =
from student in students
group student by student.ID into studentID
select new
{
ID = studentID.Key,
student_Average = studentID.Average(x => x.(Math+Programming+Informatics))
};
var bestGrade = StudentAverage.Max(gr => gr.student_Average);
var bestIDs_1 = StudentAverage.Where(g => g.student_Average == bestGrade);
var bestID_1 = bestIDs_1.FirstOrDefault();
Console.WriteLine($"\nBest student(s) GPA: {bestID_1.ID} \nScore: {bestID_1.student_Average}");
Console.ReadLine();
}

I think this is what you actually want(divide the sum of the three subjects through 3):
public static List<(Student student, decimal average)> BestStudentsAvarage(List<Student> students)
{
return students
.Select(s => (Student:s,Average:(s.Math+s.Programming+s.Informatics)/3m))
.GroupBy(g => g.Average)
.OrderByDescending(g => g.Key)
.First()
.ToList();
}
List<Student> sample = LoadSampleData();
List<(Student student, decimal average)> bestAvarageStudents = BestStudentsAvarage(sample);
foreach(var x in bestAvarageStudents)
{
Console.WriteLine($"Best student <{x.student.FirstName} {x.student.LastName}> with Average <{x.average}>");
}
With your example it would output: Best student <Tim Corey> with Average <5>

How to get a value from dicitionary in c#

static void Main(string[] args)
{
List<People> people = new List<People>(){
new People(){FirstName = "aaa", LastName = "zzz", Age = 3, Location = "Berlin"},
new People(){FirstName = "aaa", LastName = "yyy", Age = 6, Location = "Paris"},
new People(){FirstName = "bbb", LastName = "zzz", Age = 5, Location = "Texas"},
new People(){FirstName = "bbb", LastName = "yyy", Age = 4, Location = "Sydney"},
new People(){FirstName = "ccc", LastName = "zzz", Age = 2, Location = "Berlin"},
new People(){FirstName = "ccc", LastName = "yyy", Age = 3, Location = "New York"},
new People(){FirstName = "aaa", LastName = "xxx", Age = 2, Location = "Dallas"},
new People(){FirstName = "bbb", LastName = "www", Age = 6, Location = "DC"},
new People(){FirstName = "ccc", LastName = "vvv", Age = 3, Location = "Detroit"},
new People(){FirstName = "ddd", LastName = "uuu", Age = 5, Location = "Gotham"}
};
var dict = people
.GroupBy(x => (x.FirstName, x.LastName))
.ToDictionary(x => x.Key,
x => x.ToList());
/**
how to get a value from dictionary when i just have first name.
i want to get all value from dict where name = "aaa"
**/
}
public class People
{
public string FirstName {get; set;}
public string LastName {get; set;}
public int Age {get; set;}
public string Location {get; set;}
}
is there a way to get a value from dictionary with just 1 key (example i just have name "aaa", and i want to get all people with Firstname "aaa"). i can get it with where but there is no point in using dictionary. should i used nested dictionary or there's other way ?

There is no point using the dictionary, I'm not sure what you're trying to achive by using it? Why not just use the list as mcjmzn said?
List<people> peopleCohort = people.Where(p=> p.FirstName == "aaa").ToList();
Think you might be overthinking it.
Stu.
UPDATE:
Given the following test:
SqlCommand com = new SqlCommand("SELECT Name FROM [Responses_PersonalData]", con);
con.Open();
List<Person> listPeople = new List<Person>();
Dictionary<string, Person> dicPeople = new Dictionary<string, Person>();
using (con)
{
Random rand = new Random();
using (SqlDataReader reader = com.ExecuteReader())
{
if (reader.HasRows)
{
while (reader.Read())
{
//Only use data where we have firstname, surname, approx 49,000 names in db.
string[] name = reader["Name"].ToString().Trim().Split(' ');
if (name.Length == 2)
{
Person person = new Person() { Age = rand.Next(0, 100), FirstName = name[0], LastName = name[1], Location = name[1] };
listPeople.Add(person);
}
}
}
}
}
//Creates approx 100 million people exponentially.
for (int i = 1; i < 12; i++)
listPeople.AddRange(listPeople);
//Group by firstname lastname tuple
var tuppleDicPeople = listPeople
.GroupBy(x => (x.FirstName, x.LastName))
.ToDictionary(x => x.Key,
x => x.ToList());
//Method 1
List<Person> listPeopleCohortResults = listPeople.FindAll(p => p.FirstName == "Dean");
//Method 2
List<Person> dicPeopleCohortResults = tuppleDicPeople.Where(kvp => kvp.Key.FirstName == "Dean").SelectMany(kvp => kvp.Value).ToList();
Findings:
The group by operation is very expensive.
listPeople.FindAll(p => p.FirstName == "Dean"); => 1651ms, returns 32768 results.
List dicPeopleCohortResults = tuppleDicPeople.Where(kvp => kvp.Key.FirstName == "Dean").SelectMany(kvp => kvp.Value).ToList(); => 10ms, returns 32768 results.
If you can afford the expense of the group by then your solution is optimal given the limited research I've done.
Stu.

Update list of items in c#

I would like to know if you can suggest me an efficient way to update a list of items in c#. Here is a generic example:
If CurrentList is
[ {Id: 154, Name: "George", Salary: 10 000}
{Id: 233, Name: "Alice", Salary: 10 000}]
And NewList is
[ {Id: 154, Name: "George", Salary: 25 000}
{Id: 234, Name: "Bob", Salary: 10 000}]
Then the result should be:
[{Id: 154, Name: "George", Salary: 25 000}
{Id: 234, Name: "Bob", Salary: 10 000} ]
I don't want just to clear the first one and use the values from the second one, but want to update the ones with the same ID, remove the ones that have been deleted and add any new ones.
Thanks in advance.

I would do something like this: (for ordinairy lists)
// the current list
var currentList = new List<Employee>();
currentList.Add(new Employee { Id = 154, Name = "George", Salary = 10000 });
currentList.Add(new Employee { Id = 233, Name = "Alice", Salary = 10000 });
// new list
var newList = new List<Employee>();
newList.Add(new Employee { Id = 154, Name = "George", Salary = 25000 });
newList.Add(new Employee { Id = 234, Name = "Bob", Salary = 10000 });
// clean up
foreach (var oldEmployee in currentList.ToArray())
if (!newList.Any(item => oldEmployee.Id == item.Id))
currentList.Remove(oldEmployee);
// check if the new item is found within the currentlist.
// If so? update it's values else add the object.
foreach (var newEmployee in newList)
{
var oldEmployee = currentList.FirstOrDefault(item => item.Id == newEmployee.Id);
if (oldEmployee == null)
{
// add
currentList.Add(newEmployee);
}
else
{
// modify
oldEmployee.Name = newEmployee.Name;
oldEmployee.Salary = newEmployee.Salary;
}
}
You can speed it up, using dictionaries, but that's not your question (for now)

You can do it with use of for loop and Linq expression:
for (int i = 0; i < NewList.Count; i++)
{
var record = CurrentList.FirstOrDefault(item => item.Id == NewList[i].Id);
if (record == null) { CurrentList.Add(NewList[i]); }
else { record.Id = NewList[i].Id; record.Name = NewList[i].Name; record.Salary = NewList[i].Salary; }
}
CurrentList.RemoveAll(item => NewList.FirstOrDefault(item2 => item2.Id == item.Id) == null);
Example of usage:
Example

A LINQ'y version wrapped in an extension method, could modified to be generic if 'Id' is on a interface of some sort.
The merge Action could potentially be a Merge() method on entity objects such as employee but I chose to use a delegate here .
public class Tests
{
[Test]
public void MergeSpike()
{
// the current list
var currentList = new List<Employee>();
currentList.Add(new Employee { Id = 154, Name = "George", Salary = 10000 });
currentList.Add(new Employee { Id = 233, Name = "Alice", Salary = 10000 });
// new list
var newList = new List<Employee>();
newList.Add(new Employee { Id = 154, Name = "George", Salary = 25000 });
newList.Add(new Employee { Id = 234, Name = "Bob", Salary = 30000 });
currentList.Merge(newList, (o, n) =>
{
if(o.Id != n.Id) throw new ArgumentOutOfRangeException("Attempt to merge on mismatched IDs");
o.Name = n.Name;
o.Salary = n.Salary;
});
Assert.That(currentList.Count(), Is.EqualTo(2));
Assert.That(currentList.First(c => c.Id == 154).Salary, Is.EqualTo(25000));
Assert.That(currentList.Any(c => c.Id == 233), Is.False);
Assert.That(currentList.First(c => c.Id == 234).Salary, Is.EqualTo(30000));
}
}
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
public int Salary { get; set; }
}
public static class EmployeeListExtensions
{
public static void Merge(this List<Employee> currentList, IEnumerable<Employee> newList, Action<Employee, Employee> merge)
{
// Updates
currentList.Where(e => newList.Any(n => n.Id == e.Id))
.ToList().ForEach(e => merge(e, newList.First(n1 => n1.Id == e.Id)));
// Deletes
var remove = currentList.Where(cl => newList.All(nl => cl.Id != nl.Id)).ToList();
currentList.RemoveAll(e => remove.Any(r => r.Id == e.Id));
// Inserts
currentList.AddRange(newList.Where(nl => currentList.Any(c => c.Id != nl.Id)));
}
}

Select top N records after filtering in each group

I am an old bee in .NET but very new to Linq! After some basic reading I have decided to check my skill and I failed completely! I don't know where I am making mistake.
I want to select highest 2 order for each person for while Amount % 100 == 0.
Here is my code.
var crecords = new[] {
new {
Name = "XYZ",
Orders = new[]
{
new { OrderId = 1, Amount = 340 },
new { OrderId = 2, Amount = 100 },
new { OrderId = 3, Amount = 200 }
}
},
new {
Name = "ABC",
Orders = new[]
{
new { OrderId = 11, Amount = 900 },
new { OrderId = 12, Amount = 800 },
new { OrderId = 13, Amount = 700 }
}
}
};
var result = crecords
.OrderBy(record => record.Name)
.ForEach
(
person => person.Orders
.Where(order => order.Amount % 100 == 0)
.OrderByDescending(t => t.Amount)
.Take(2)
);
foreach (var record in result)
{
Console.WriteLine(record.Name);
foreach (var order in record.Orders)
{
Console.WriteLine("-->" + order.Amount.ToString());
}
}
Can anyone focus and tell me what would be correct query?
Thanks in advance

Try this query:
var result = crecords.Select(person =>
new
{
Name = person.Name,
Orders = person.Orders.Where(order => order.Amount%100 == 0)
.OrderByDescending(x => x.Amount)
.Take(2)
});
Using your foreach loop to print the resulting IEnumerable, the output of it is:
XYZ
-->200
-->100
ABC
-->900
-->800

This has already been answered but if you didn't want to create new objects and simply modify your existing crecords, the code would look like this alternatively. But you wouldn't be able to use anonymous structures like shown in your example. Meaning you would have to create People and Order classes
private class People
{
public string Name;
public IEnumerable<Order> Orders;
}
private class Order
{
public int OrderId;
public int Amount;
}
public void PrintPeople()
{
IEnumerable<People> crecords = new[] {
new People{
Name = "XYZ",
Orders = new Order[]
{
new Order{ OrderId = 1, Amount = 340 },
new Order{ OrderId = 2, Amount = 100 },
new Order{ OrderId = 3, Amount = 200 }
}
},
new People{
Name = "ABC",
Orders = new Order[]
{
new Order{ OrderId = 11, Amount = 900 },
new Order{ OrderId = 12, Amount = 800 },
new Order{ OrderId = 13, Amount = 700 }
}
}
};
crecords = crecords.OrderBy(record => record.Name);
crecords.ToList().ForEach(
person =>
{
person.Orders = person.Orders
.Where(order => order.Amount%100 == 0)
.OrderByDescending(t => t.Amount)
.Take(2);
}
);
foreach (People record in crecords)
{
Console.WriteLine(record.Name);
foreach (var order in record.Orders)
{
Console.WriteLine("-->" + order.Amount.ToString());
}
}
}

Using List<Person> Distinct() to return 2 values

I have a Person class, with Name and AreaID properties.
public class Person
{
public string Name;
public int AreaID;
// snip
}
I have a List<Person> with the potential for hundreds of Person objects in the list.
e.g., 100 Persons with AreaID = 1 and 100 Persons with AreaID = 2
I want to return distinct list of AreaID's and how many Persons have that AreaID.
For example,
AreaID = 1 Persons = 100
AreaID = 2 Persons = 100

Use the GroupBy method.
var list = ...list of Persons...
var areas = list.GroupBy( p => p.AreaID )
.Select( g => new {
AreaID = g.Key,
Count = g.Count()
});

Looks like you want to group by area ID then:
var groups = from person in persons
group 1 by person.AreaID into area
select new { AreaID = area.Key, Persons = area.Count() };
I'm using "group 1" to indicate that I really don't care about the data within each group - only the count and the key.
This is inefficient in that it has to buffer all the results for the sake of grouping - you make well be able to use Reactive LINQ in .NET 4.0 to do this more efficiently, or you could certainly use Push LINQ if you wanted to. Then again, for relatively small datasets it probably doesn't matter :)

Surprisingly nobody advised to override Equals and GetHashCode. If you do so you can do folowing:
List<Person> unique = personList.Distinct();
Or even
List<Person> areaGroup = personList.GroupBy(p => p.AreaID);
List<Person> area1Count = personList.Where(p => p.AreaID == 1).Count();
This gives you more flexibility, - no need in useless anonymous class.

return list.GroupBy(p => p.AreaID)
.Select(g => new { AreaID = g.Key, People = g.Count() });

you could use list.GroupBy(x => x.AreaID);

You can try this:
var groups = from person in list
group person by person.AreaID into areaGroup
select new {
AreaID = areaGroup.Key,
Count = areaGroup.Count()
};

var people = new List<Person>();
var q = from p in people
group p by p.AreaId into g
select new { Id = g.Key, Total = g.Count() };
people.Add(new Person { AreaId = 1, Name = "Alex" });
people.Add(new Person { AreaId = 1, Name = "Alex" });
people.Add(new Person { AreaId = 2, Name = "Alex" });
people.Add(new Person { AreaId = 3, Name = "Alex" });
people.Add(new Person { AreaId = 3, Name = "Alex" });
people.Add(new Person { AreaId = 4, Name = "Alex" });
people.Add(new Person { AreaId = 2, Name = "Alex" });
people.Add(new Person { AreaId = 4, Name = "Alex" });
people.Add(new Person { AreaId = 1, Name = "Alex" });
foreach (var item in q)
{
Console.WriteLine("AreaId: {0}, Total: {1}",item.Id,item.Total);
}

Something like this, perhaps ?
List<Person> persons = new List<Person> ();
persons.Add (new Person (1, "test1"));
persons.Add (new Person (1, "test2"));
persons.Add (new Person (2, "test3"));
var results =
persons.GroupBy (p => p.AreaId);
foreach( var r in results )
{
Console.WriteLine (String.Format ("Area Id: {0} - Number of members: {1}", r.Key, r.Count ()));
}
Console.ReadLine ();

Instead of distinct, use GroupBy, or the more succinct LINQ statement:
var results = from p in PersonList
group p by p.AreaID into g
select new { AreaID=g.Key, Count=g.Count() };
foreach(var item in results)
Console.WriteLine("There were {0} items in Area {1}", item.Count, item.AreaID);

ToLookup() will do what you want.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

AsParallel - Before vs After in Linq Where Clause Performance - c#

Related

Best average student(s) score (C#, LINQ, without loops)

How to get a value from dicitionary in c#

Update list of items in c#

Select top N records after filtering in each group

Using List<Person> Distinct() to return 2 values

Categories

Resources