Parallel.ForEach and IGrouping source item issue

Parallel.ForEach and IGrouping source item issue - c#

I am trying to parallelize a query with a groupby statement in it. The query is similar to
var colletionByWeek = (
from item in objectCollection
group item by item.WeekStartDate into weekGroups
select weekGroups
).ToList();
If I use Parallel.ForEach with shared variable like below, it works fine. But I don't want to use shared variables in parallel query.
var pSummary=new List<object>();
Parallel.ForEach(colletionByWeek, week =>
{
pSummary.Add(new object()
{
p1 = week.First().someprop,
p2= week.key,
.....
});
}
);
So, I have changed the above parallel statement to use local variables. But the compiler complains about the source type <IEnumerable<IGrouping<DateTime, object>> can not be converted into System.Collections.Concurrent.OrderablePartitioner<IEnumerable<IGrouping<DateTime, object>>.
Am I giving a wrong source type? or is this type IGouping type handled differently? Any help would be appreciated. Thanks!
Parallel.ForEach<IEnumerable<IGrouping<DateTime, object>>, IEnumerable<object>>
(spotColletionByWeek,
() => new List<object>(),
(week, loop, summary) =>
{
summary.Add(new object()
{
p1 = week.First().someprop,
p2= week.key,
.....
});
return new List<object>();
},
(finalResult) => pSummary.AddRange(finalResult)
);

The type parameter TSource is the element type, not the collection type. And the second type parameter represents the local storage type, so it should be List<T>, if you want to Add() to it. This should work:
Parallel.ForEach<IGrouping<DateTime, object>, List<object>>
That's assuming you don't actually have objects there, but some specific type.
Although explicit type parameters are not even necessary here. The compiler should be able to infer them.
But there are other problems in the code:
you shouldn't return new List from the main delegate, but summary
the delegate that processes finalResult might be executed concurrently on multiple threads, so you should use locks or a concurrent collection there.

I'm going to skip the 'Are you sure you even need to optimize this' stage, and assume you have a performance issue which you hope to solve by parallelizing.
First of all, you're not doing yourself any favors trying to use Parallel.Foreach<> for this task. I'm pretty sure you will get a readable and more optimal result using PLINQ:
var random = new Random();
var weeks = new List<Week>();
for (int i=0; i<1000000; i++)
{
weeks.Add(
new Week {
WeekStartDate = DateTime.Now.Date.AddDays(7 * random.Next(0, 100))
});
}
var parallelCollectionByWeek =
(from item in weeks.AsParallel()
group item by item.WeekStartDate into weekGroups
select new
{
p1 = weekGroups.First().WeekStartDate,
p2 = weekGroups.Key,
}).ToList();
It's worth noting that there is some overhead associated with parallelizing the GroupBy operator, so the benefit will be marginal at best. (Some crude benchmarks hint at a 10-20% speed up)
Apart from that, the reason you're getting a compile error is because the first Type parameter is supposed to be an IGrouping<DateTime, object> and not an IE<IG<..,..>>.

Related

implicit type var [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Use of var keyword in C#
Being relatively new to C# I was wondering the motivation MS had to introduce the var implicit typed variables. The documentation says:
An implicitly typed local variable is strongly typed just as if you
had declared the type yourself, but the compiler determines the type.
Some lines further:
In many cases the use of var is optional and is just a syntactic
convenience
This is all nice but in my mind, this will only cause confusion.
Say you are reviewing this for loop.
foreach (var item in custQuery){
// A bench of code...
}
Instead of reviewing the content and the semantics of the loop, I would loose precious time looking for the item's type!
I would prefer the following instead:
foreach (String item in custQuery){
// A bench of code...
}
The question is: I read that implicit typed variables help when dealing with LINQ, does really help to use it in other scenarios?

The var keyword was needed when LINQ was introduced, so that the language could create a strongly typed variable for an anonymous type.
Example:
var x = new { Y = 42 };
Now x is a strongly typed variable that has a specific type, but there is no name for that type. The compiler knows what x.Y means, so you don't have to use reflection to get to the data in the object, as you would if you did:
object x = new { Y = 42 };
Now x is of the type object, so you can't use x.Y.
When used with LINQ it can for example look like this:
var x = from item in source select new { X = item.X, Y = item.Y };
The x variable is now an IEnumerable<T> where T is a specific type which doesn't have a name.
Since the var keyword was introduced, it has also been used to make code more readable, and misused to save keystrokes.
An example where it makes the code more readable would be:
var list =
new System.Collections.Generic.List<System.Windows.Forms.Message.Msg>();
instead of:
System.Collections.Generic.List<System.Windows.Forms.Message.Msg> list =
new System.Collections.Generic.List<System.Windows.Forms.Message.Msg>();
This is a good use of the var keyword, as the type already exists in the statement. A case where the keyword can be misused is in a statement like:
var result = SomeMethod();
As the SomeMethod name doesn't give any indication of what type it returns, it's not obvious what the type of the variable will be. In this case you should write out the type rather than using the var keyword.

I think some of the motivation was to allow something like this -
List<int> list = new List<int>();
to be turned into this -
var list = new List<int>();
The second example is shorter and more readable, but still clearly expresses the intent of the code. There are instances when it will be less clear, but in lots of situations you get conciseness with no loss of clarity.

var is really needed for anonymous types, which are used in Linq a bit:
var results =
from item in context.Table
select new {Name=item.Name, id=item.id};
Since the collection is of an anonymous type, it can not be named. It has a real type, but not one with a name before compilation.

Convert anonymous type to array or arraylist. Can it be done

Trying to retrieve data using linq from a database. I would like to use anonymous types and convert to an Ilist, Array, ArrayList or Collection. The data is used in a third party object that accepts Ilist, arraylist or collections.
I can't seem to get this to work. I get the following error, "Sequence operators not supported for type 'System.String'"
using (var db = new dbDataContext())
{
var query = from e in db.people
select new
{
Count = e.LastName.Count()
};
Array test;
test = query.ToArray();
}

It's got nothing to do with converting the results to array lists, or even anonymous types. Here's another version of your code which will fail:
using (var db = new dbDataContext())
{
var query = db.people.Select(x => x.LastName.Count());
foreach (int x in query)
{
Console.WriteLine(x);
}
}
That will still fail in the same way - because it's the translation of this bit:
x => x.LastName.Count()
into SQL which is causing problems.
Change it to:
x => x.LastName.Length
and I suspect you'll find it works. Note that this isn't really a C# issue - it's just LINQ to SQL's translation abilities.
I would suggest that you don't use an anonymous type here though - it's pointless. Maybe this isn't your complete code, but in general if you find yourself creating an anonymous type with a single member, ask yourself if it's really doing you any good.

The ArrayList class has a constructor that accepts ICollection.
You should be able to feed it a List version of your LINQ result.
using (var db = new dbDataContext()) {
var query = from e in db.people
select new { Count = e.LastName.Count() };
ArrayList list = new ArrayList(query.ToList());
}
I don't have Visual Studio here (I'm on my Mac), but it might be of help.
(ToArray should suffice as well)
You might need to replace your Count() by Length.

Using Linq to run a method on a collection of objects?

This is a long shot, I know...
Let's say I have a collection
List<MyClass> objects;
and I want to run the same method on every object in the collection, with or without a return value. Before Linq I would have said:
List<ReturnType> results = new List<ReturnType>();
List<int> FormulaResults = new List<int>();
foreach (MyClass obj in objects) {
results.Add(obj.MyMethod());
FormulaResults.Add(ApplyFormula(obj));
}
I would love to be able to do something like this:
List<ReturnType> results = new List<ReturnType>();
results.AddRange(objects.Execute(obj => obj.MyMethod()));
// obviously .Execute() above is fictitious
List<int> FormulaResults = new List<int>();
FormulaResults.AddRange(objects.Execute(obj => ApplyFormula(obj)));
I haven't found anything that will do this. Is there such a thing?
If there's nothing generic like I've posited above, at least maybe there's a way of doing it for the purposes I'm working on now: I have a collection of one object that has a wrapper class:
class WrapperClass {
private WrappedClass wrapped;
public WrapperClass(WrappedClass wc) {
this.wrapped = wc;
}
}
My code has a collection List<WrappedClass> objects and I want to convert that to a List<WrapperClass>. Is there some clever Linq way of doing this, without doing the tedious
List<WrapperClass> result = new List<WrapperClass>();
foreach (WrappedClass obj in objects)
results.Add(new WrapperClass(obj));
Thanks...

Would:
results.AddRange(objects.Select(obj => ApplyFormula(obj)));
do?
or (simpler)
var results = objects.Select(obj => ApplyFormula(obj)).ToList();

I think that the Select() extension method can do what you're looking for:
objects.Select( obj => obj.MyMethod() ).ToList(); // produces List<Result>
objects.Select( obj => ApplyFormula(obj) ).ToList(); // produces List<int>
Same thing for the last case:
objects.Select( obj => new WrapperClass( obj ) ).ToList();
If you have a void method which you want to call, here's a trick you can use with IEnumerable, which doesn't have a ForEach() extension, to create a similar behavior without a lot of effort.
objects.Select( obj => { obj.SomeVoidMethod(); false; } ).Count();
The Select() will produce a sequence of [false] values after invoking SomeVoidMethod() on each [obj] in the objects sequence. Since Select() uses deferred execution, we call the Count() extension to force each element in the sequence to be evaluated. It works quite well when you want something like a ForEach() behavior.

If the method MyMethod that you want to apply returns an object of type T then you can obtain an IEnumerable<T> of the result of the method via:
var results = objects.Select(o => o.MyMethod());
If the method MyMethod that you want to apply has return type void then you can apply the method via:
objects.ForEach(o => o.MyMethod());
This assumes that objects is of generic type List<>. If all you have is an IEnumerable<> then you can roll your own ForEach extension method or apply objects.ToList() first and use the above syntax .

The C# compiler maps a LINQ select onto the .Select extension method, defined over IEnumerable (or IQueryable which we'll ignore here). Actually, that .Select method is exactly the kind of projection function that you're after.
LBushkin is correct, but you can actually use LINQ syntax as well...
var query = from o in objects
select o.MyMethod();

You can also run a custom method using the marvelous Jon Skeet's morelinq library
For example if you had a text property on your MyClass that you needed to change in runtime using a method on the same class:
objects = objects.Pipe<MyClass>(class => class.Text = class.UpdateText()).ToList();
This method will now be implemented on every object in your list. I love morelinq!

http://www.hookedonlinq.com/UpdateOperator.ashx has an extended Update method you can use. Or you can use a select statement as posted by others.

How can I convert anonymous type to strong type in LINQ?

I have an array of ListViewItems ( ListViewItem[] ), where I store a SalesOrderMaster object in each ListViewItem.Tag for later reference.
I have some code that right now, goes through each ListViewItem safely casts the .Tag property into a SalesOrderMaster object, then adds that object to a collection of SalesOrders, only after checking to make sure the order doesn't already exist in that collection.
The process to compare sales orders is expensive, and I would like to convert this to a LINQ expression for clarity and performance. ( I also have the Parallel Extensions to .NET Framework 3.5 installed so I can use that to further improve LINQ performance)
So without further ado: This is what I have, and then what I want. ( what I want won't compile, so I know I am doing something wrong, but I hope it illustrates the point )
What I have: ( Slow )
foreach (ListViewItem item in e.Argument as ListViewItem[])
{
SalesOrderMaster order = item.Tag as SalesOrderMaster;
if ( order == null )
{
return;
}
if (!All_SalesOrders.Contains(order))
{
All_SalesOrders.Add(order);
}
}
What I want: ( Theory )
List<SalesOrderMaster> orders =
(from item in (e.Argument as ListViewItem[]).AsParallel()
select new { ((SalesOrderMaster)item.Tag) }).Distinct();
EDIT: I know the cast is cheap, I said the "Compare", which in this case translates to the .Contains(order) operation
EDIT: Everyone's answer was awesome! I wish I could mark more than one answer, but in the end I have to pick one.
EDIT : This is what I ended up with:
List<SalesOrderMaster> orders =
(from item in (e.Argument as ListViewItem[]) select (SalesOrderMaster) item.Tag).GroupBy(item => item.Number).Select(x => x.First()).ToList();

I see nobody has addressed your need to convert an anonymous type to a named type explicitly, so here goes... By using "select new { }" you are creating an anonymous type, but you don't need to. You can write your query like this:
List<SalesOrderMaster> orders =
(from item in (e.Argument as ListViewItem[]).AsParallel()
select (SalesOrderMaster)item.Tag)
.Distinct()
.ToList();
Notice that the query selects (SalesOrderMaster)item.Tag without new { }, so it doesn't create an anonymous type. Also note I added ToList() since you want a List<SalesOrderMaster>.
This solves your anonymous type problem. However, I agree with Mark and Guffa that using a parallel query here isn't you best option. To use HashSet<SalesOrderMaster> as Guffa suggested, you can do this:
IEnumerable<SalesOrderMaster> query =
from item in (ListViewItem[])e.Argument
select (SalesOrderMaster)item.Tag;
HashSet<SalesOrderMaster> orders = new HashSet<SalesOrderMaster>(query);
(I avoided using var so the returned types are clear in the examples.)

The part in that code that is expensive is calling the Contains method on the list. As it's an O(n) operation it gets slower the more objects you add to the list.
Just use a HashSet<SalesOrderMaster> for the objects instead of a List<SalesOrderMaster>. The Contains method of the HashSet is an O(1) operation, so your loop will be an O(n) operation instead of an O(n*n) operation.

Like Marc Gravell said, you shouldn't access the Tag property from different threads, and the cast is quite cheap, so you have:
var items = (e.Argument as ListViewItem[]).Select(x=>x.Tag)
.OfType<SalesOrderMaster>().ToList();
but then, you want to find distinct items - here you can try using AsParallel:
var orders = items.AsParallel().Distinct();

Union two List in C#

I want to union, merge in a List that contains both references, so this is my code, how can I define a list ready for this porpouses?
if (e.CommandName == "AddtoSelected")
{
List<DetalleCita> lstAux = new List<DetalleCita>();
foreach (GridViewRow row in this.dgvEstudios.Rows)
{
var GridData = GetValues(row);
var GridData2 = GetValues(row);
IList AftList2 = GridData2.Values.Where(r => r != null).ToList();
AftList2.Cast<DetalleCita>();
chkEstudio = dgvEstudios.Rows[index].FindControl("ChkAsignar") as CheckBox;
if (chkEstudio.Checked)
{
IList AftList = GridData.Values.Where(r => r != null).ToList();
lstAux.Add(
new DetalleCita
{
codigoclase = Convert.ToInt32(AftList[0]),
nombreestudio = AftList[1].ToString(),
precioestudio = Convert.ToDouble(AftList[2]),
horacita = dt,
codigoestudio = AftList[4].ToString()
});
}
index++;
//this line to merge
lstAux.ToList().AddRange(AftList2);
}
dgvEstudios.DataSource = lstAux;
dgvEstudios.DataBind();
}
this is inside a rowcommand event.

If you want to add all entries from AftList2 to lstAux you should define AftList2 as IEnumerable<> with elements of type DetalleCita (being IEnumerable<DetalleCita> is enough to be used as parameter of AddRange() on List<DetalleCita>). For example like this:
var AftList2 = GridData2.Values.Where(r => r != null).Cast<DetalleCita>();
And then you can add all its elements to lstAux:
lstAux.AddRange(AftList2);
Clarification:
I think you are misunderstanding what extension method ToList() does. It creates new list from IEnumerable<T> and its result is not connected with original IEnumerable<T> that it is applied to.
That is why you are just do nothing useful trying to do list.ToList().AddRange(...) - you are copying list to (another newly created by ToList()) list, update it and then basically throwing away it (because you are not even doing something like list2 = var1.ToList(), original var1 stays unchanged after that!!! you most likely want to save result of ToList() if you are calling it).
Also you don't usually need to convert one list to another list, ToList() is useful when you need list (List<T>) but have IEnumerable<T> (that is not indexable and you may need fast access by index, or lazy evaluates but you need all results calculated at this time -- both situations may arise while trying to use result of LINQ to objects query for example: IEnumerable<int> ints = from i in anotherInts where i > 20 select i; -- even if anotherInts was List<int> result of query ints cannot be cast to List<int> because it is not list but implementation of IEnumerable<int>. In this case you could use ToList() to get list anyway: List<int> ints = (from i in anotherInts where i > 20 select i).ToList();).
UPDATE:
If you really mean union semantics (e.g. for { 1, 2 } and { 1, 3 } union would be something like { 1, 2, 3 }, with no duplication of equal elements from two collections) consider switching to HashSet<T> (it most likely available in your situation 'cause you are using C# 3.0 and I suppose yoou have recent .NET framework) or use Union() extension method instead of AddRange (I don't think this is better than first solution and be careful because it works more like ToList() -- a.Union(b) return new collection and does NOT updates either a or b).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parallel.ForEach and IGrouping source item issue - c#

Related

implicit type var [duplicate]

Convert anonymous type to array or arraylist. Can it be done

Using Linq to run a method on a collection of objects?

How can I convert anonymous type to strong type in LINQ?

Union two List in C#

Categories

Resources