I have a LinQ Expression resulting in a IEnumerable<string> statements.
This I want to route to an Observer with a ForEach() via OnNext.
Now I see a hint about using Reactive Extensions instead of ForEach and the code looks like this.
statements.ToObservable().Subscribe(
s => this.statementObserver.OnNext(new Statement(replyTo, jobId, s)));
// foreach (var s in statements)
// {
// this.statementObserver.OnNext(new Statement(replyTo, jobId, s));
// }
Is this correct or can I directly connect my statements to the statementObserver ?
You can do either of the methods you've suggested, but this might be the more idiomatic way to do it:
statements
.ToObservable()
.Select(s => new Statement(replyTo, jobId, s))
.Subscribe(this.statementObserver);
Do be careful exposing observers like that though. One call to .OnCompleted() and you've killed your object. It's best to pass in the observable and let the class observe it how it likes.
Related
Basically I have a procedure like
var results = await Task.WhenAll(
from input in inputs
select Task.Run(async () => await InnerMethodAsync(input))
);
.
.
.
private static async Task<Output> InnerMethodAsync(Input input)
{
var x = await Foo(input);
var y = await Bar(x);
var z = await Baz(y);
return z;
}
and I'm wondering whether there's a fancy way to combine this into a single LINQ query that's like an "async stream" (best way I can describe it).
When you use LINQ, there are generally two parts to it: creation and iteration.
Creation:
var query = list.Select( a => a.Name);
These calls are always synchronous. But this code doesn't do much more than create an object that exposes an IEnumerable. The actual work isn't done till later, due to a pattern called deferred execution.
Iteration:
var results = query.ToList();
This code takes the enumerable and gets the value of each item, which typically will involve the invocation of your callback delegates (in this case, a => a.Name ). This is the part that is potentially expensive, and could benefit from asychronousness, e.g. if your callback is something like async a => await httpClient.GetByteArrayAsync(a).
So it's the iteration part that we're interested in, if we want to make it async.
The issue here is that ToList() (and most of the other methods that force iteration, like Any() or Last()) are not asynchronous methods, so your callback delegate will be invoked synchronously, and you’ll end up with a list of tasks instead of the data you want.
We can get around that with a piece of code like this:
public static class ExtensionMethods
{
static public async Task<List<T>> ToListAsync<T>(this IEnumerable<Task<T>> This)
{
var tasks = This.ToList(); //Force LINQ to iterate and create all the tasks. Tasks always start when created.
var results = new List<T>(); //Create a list to hold the results (not the tasks)
foreach (var item in tasks)
{
results.Add(await item); //Await the result for each task and add to results list
}
return results;
}
}
With this extension method, we can rewrite your code:
var results = await inputs.Select( async i => await InnerMethodAsync(i) ).ToListAsync();
^That should give you the async behavior you're looking for, and avoids creating thread pool tasks, as your example does.
Note: If you are using LINQ-to-entities, the expensive part (the data retrieval) isn't exposed to you. For LINQ-to-entities, you'd want to use the ToListAsync() that comes with the EF framework instead.
Try it out and see the timings in my demo on DotNetFiddle.
A rather obvious answer, but you have just used LINQ and async together - you're using LINQ's select to project, and start, a bunch of async Tasks, and then await on the results, which provides an asynchronous parallelism pattern.
Although you've likely just provided a sample, there are a couple of things to note in your code (I've switched to Lambda syntax, but the same principals apply)
Since there's basically zero CPU bound work on each Task before the first await (i.e. no work done before var x = await Foo(input);), there's no real reason to use Task.Run here.
And since there's no work to be done in the lambda after call to InnerMethodAsync, you don't need to wrap the InnerMethodAsync calls in an async lambda (but be wary of IDisposable)
i.e. You can just select the Task returned from InnerMethodAsync and await these with Task.WhenAll.
var tasks = inputs
.Select(input => InnerMethodAsync(input)) // or just .Select(InnerMethodAsync);
var results = await Task.WhenAll(tasks);
More complex patterns are possible with asynchronony and Linq, but rather than reinventing the wheel, you should have a look at Reactive Extensions, and the TPL Data Flow Library, which have many building blocks for complex flows.
Try using Microsoft's Reactive Framework. Then you can do this:
IObservable<Output[]> query =
from input in inputs.ToObservable()
from x in Observable.FromAsync(() => Foo(input))
from y in Observable.FromAsync(() => Bar(x))
from z in Observable.FromAsync(() => Baz(y))
select z;
Output[] results = await query.ToArray();
Simple.
Just NuGet "System.Reactive" and add using System.Reactive.Linq; to your code.
I am trying to get a method invoked for each item in a list while passing that method the list item itself. Basically I can do it the drawn out way but was trying to get it in a concise LINQ statement like so:
var urls = html.DocumentNode.SelectNodes("//a[#href]")
.Select(a => a.Attributes["href"].Value)
.Where(href => !href.StartsWith("mailto:")) // skip emails, find only url links
.ToList();
//.ToList().ForEach(href => getWEbData(href.ToString ()));
foreach (string s in urls) {
getWEbData(s);
}
I could not figure out how to get the .ForEach() in to the LINQ
shorthand or if its possible.
You can't. LINQ functions are designed to not cause side effects. ForEach is designed to cause side effects. Hence, there is no ForEach LINQ function.
See "foreach" vs "ForEach" by Eric Lippert
Don't try to use foreach with Linq. Id adds no values and makes it harder to debug. You can embed the query in the foreach call like so:
foreach (string s in html.DocumentNode
.SelectNodes("//a[#href]")
.Select(a => a.Attributes["href"].Value)
.Where(href => !href.StartsWith("mailto:")))
{
getWEbData(s);
}
Note that ToList() is unnecessary (whether you do the query inside or outside of the foreach)
you can use foreach with Linq but its better to have a constructor i.e. in Select statement take a new class object and make a parameterized constructor of that class an in the constructor you can do whatever you want it is one of the easiest and efficient way.
There is no LINQ .ForEach method, but you can easily write your own:
public static class IEnumerableExtensions {
public static void ForEach<T>(this IEnumerable<T> pEnumerable, Action<T> pAction) {
foreach (var item in pEnumerable)
pAction(item);
}
}
and then
html
.DocumentNode
.SelectNodes("//a[#href]")
.Select(a => a.Attributes["href"].Value)
.Where(href => !href.StartsWith("mailto:")) // skip emails, find only url links
.ForEach(href => getWEbData(href.ToString ()));
or slightly better (although I think href may already be a string):
...
.Select(href => href.ToString())
.ForEach(getWEbData);
Although, as others have indicated, just because you can doesn't necessarily mean you should, but that was not your question.
I took a second to look why my app had terrible performance. All i did was pause the debugger twice and i found it.
Is there a practical reason why it runs my code everytime? The only way i know to prevent this is to add ToArray() at the end. I guess i need to revise all my code and make sure they return arrays?
Online demo http://ideone.com/EUfJN
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
class Test
{
static void Main()
{
string[] test = new string[] { "a", "sdj", "bb", "d444"};
var expensivePrint = false;
IEnumerable<int> ls = test.Select(s => { if (expensivePrint) { Console.WriteLine("Doing expensive math"); } return s.Length; });
expensivePrint = true;
foreach (var v in ls)
{
Console.WriteLine(v);
}
Console.WriteLine("If you dont think it does it everytime, lets try it again");
foreach (var v in ls)
{
Console.WriteLine(v);
}
}
}
Output
Doing expensive math
1
Doing expensive math
3
Doing expensive math
2
Doing expensive math
4
If you dont think it does it everytime, lets try it again
Doing expensive math
1
Doing expensive math
3
Doing expensive math
2
Doing expensive math
4
Enumerables evaluate lazily (only when required). Add a .ToList() after the select and it will force evaluation.
LINQ has lazy evaluation methods and Select is one of them.
And the thing is you are using foreach two times and it prints the values two times.
The Select causes the iterator to be... iterated.
If it is expensive to build the result, you can .ToList() the result once, then use that list going forward.
List<int> resultAsList = ls.ToList();
// Use resultAsList in each of the foreach statements
When you are building the query
IEnumerable<int> ls = test.Select(s => { if (expensivePrint) { Console.WriteLine("Doing expensive math"); } return s.Length; });
It actually does not EXECUTE and cache the result as you are apparently expecting. This is called "deffered execution".
It just builds the query. The execution of the query actually takes place when the foreach statement is called on the query.
If you call ToList() or ToArray() or Sum() or Average() or any operator of the kind on your query, it will however execute it IMMEDIATELY.
The best thing to do if you want to keep the result of the query, is to cache it in a array or list by calling ToList() or ToArray() and to enumerate on this list or array rather than on the constructed query.
Please refer to the documentation of Enumerable.Select
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
By iterating the result of the Select method, the query is executed. foreach is one way to iterate that result. ToArray is another.
Is there a practical reason why it runs my code everytime?
Yes, if the result was not deferred, then more iteration would be performed than necessary:
IEnumerable<string> query = Enumerable.Range(0, 100000)
.Select(x => x.ToString())
.Where(s => s.Length == 6)
.Take(5);
This is Linq's deferred execution. If you need a concise yet complete explanation, read this:
http://weblogs.asp.net/dixin/archive/2010/03/16/understanding-linq-to-objects-6-deferred-execution.aspx
I would suggest you use .ToArray() which will return int[] for you and will give better performance
Reason why int[] because it will be declare and create at once, well as the List<T> wil be created one by one at runtime
int[] array = test.Select(s =>
{
if (expensivePrint)
{
Console.WriteLine("Doing expensive math");
}
return s.Length;
}).ToArray();
Consider the requirement to change a data member on one or more properties of an object that is 5 or 6 levels deep.
There are sub-collections that need to be iterated through to get to the property that needs inspection & modification.
Here we're calling a method that cleans the street address of a Employee. Since we're changing data within the loops, the current implementation needs a for loop to prevent the exception:
Cannot assign to "someVariable" because it is a 'foreach iteration variable'
Here's the current algorithm (obfuscated) with nested foreach and a for.
foreach (var emp in company.internalData.Emps)
{
foreach (var addr in emp.privateData.Addresses)
{
int numberAddresses = addr.Items.Length;
for (int i = 0; i < numberAddresses; i++)
{
//transform this street address via a static method
if (addr.Items[i].Type =="StreetAddress")
addr.Items[i].Text = CleanStreetAddressLine(addr.Items[i].Text);
}
}
}
Question:
Can this algorithm be reimplemented using LINQ? The requirement is for the original collection to have its data changed by that static method call.
Update: I was thinking/leaning in the direction of a jQuery/selector type solution. I didn't specifically word this question in that way. I realize that I was over-reaching on that idea (no side-effects). Thanks to everyone! If there is such a way to perform a jQuery-like selector, please let's see it!
foreach(var item in company.internalData.Emps
.SelectMany(emp => emp.privateData.Addresses)
.SelectMany(addr => addr.Items)
.Where(addr => addr.Type == "StreetAddress"))
item.Text = CleanStreetAddressLine(item.Text);
var dirtyAddresses = company.internalData.Emps.SelectMany( x => x.privateData.Addresses )
.SelectMany(y => y.Items)
.Where( z => z.Type == "StreetAddress");
foreach(var addr in dirtyAddresses)
addr.Text = CleanStreetAddressLine(addr.Text);
LINQ is not intended to modify sets of objects. You wouldn't expect a SELECT sql statement to modify the values of the rows being selected, would you? It helps to remember what LINQ stands for - Language INtegrated Query. Modifying objects within a linq query is, IMHO, an anti-pattern.
Stan R.'s answer would be a better solution using a foreach loop, I think.
I don't like mixing "query comprehension" syntax and dotted-method-call syntax in the same statement.
I do like the idea of separating the query from the action. These are semantically distinct, so separating them in code often makes sense.
var addrItemQuery = from emp in company.internalData.Emps
from addr in emp.privateData.Addresses
from addrItem in addr.Items
where addrItem.Type == "StreetAddress"
select addrItem;
foreach (var addrItem in addrItemQuery)
{
addrItem.Text = CleanStreetAddressLine(addrItem.Text);
}
A few style notes about your code; these are personal, so I you may not agree:
In general, I avoid abbreviations (Emps, emp, addr)
Inconsistent names are more confusing (addr vs. Addresses): pick one and stick with it
The word "number" is ambigious. It can either be an identity ("Prisoner number 378 please step forward.") or a count ("the number of sheep in that field is 12."). Since we use both concepts in code a lot, it is valuable to get this clear. I use often use "index" for the first one and "count" for the second.
Having the type field be a string is a code smell. If you can make it an enum your code will probably be better off.
Dirty one-liner.
company.internalData.Emps.SelectMany(x => x.privateData.Addresses)
.SelectMany(x => x.Items)
.Where(x => x.Type == "StreetAddress")
.Select(x => { x.Text = CleanStreetAddressLine(x.Text); return x; });
LINQ does not provide the option of having side effects. however you could do:
company.internalData.Emps.SelectMany(emp => emp.Addresses).SelectMany(addr => Addr.Items).ToList().ForEach(/*either make an anonymous method or refactor your side effect code out to a method on its own*/);
You can do this, but you don't really want to. Several bloggers have talked about the functional nature of Linq, and if you look at all the MS supplied Linq methods, you will find that they don't produce side effects. They produce return values, but they don't change anything else. Search for the arguments over a Linq ForEach method, and you'll get a good explanation of this concept.
With that in mind, what you probaly want is something like this:
var addressItems = company.internalData.Emps.SelectMany(
emp => emp.privateData.Addresses.SelectMany(
addr => addr.Items
)
);
foreach (var item in addressItems)
{
...
}
However, if you do want to do exactly what you asked, then this is the direction you'll need to go:
var addressItems = company.internalData.Emps.SelectMany(
emp => emp.privateData.Addresses.SelectMany(
addr => addr.Items.Select(item =>
{
// Do the stuff
return item;
})
)
);
To update the LINQ result using FOREACH loop, I first create local ‘list’ variable and then perform the update using FOREACH Loop. The value are updated this way. Read more here:
How to update value of LINQ results using FOREACH loop
I cloned list and worked NET 4.7.2
List<TrendWords> ListCopy = new List<TrendWords>(sorted);
foreach (var words in stopWords)
{
foreach (var item in ListCopy.Where(w => w.word == words))
{
item.disabled = true;
}
}
I could have sworn that there was an extension method already built for the Queryable class that I just can't find, but maybe I'm thinking of something different.
I'm looking for something along the lines of:
IQueryable<Entity> en = from e in IDB.Entities select e;
en.ForEach(foo => foo.Status = "Complete");
en.Foreach() would essential perform:
foreach(Entity foo in en){
foo.Status = "Complete";
}
Is this already written? If not, is it possible to write said Extension Method, preferably allowing for any LINQ Table and any Field on that table. Where is a good place to start?
There's nothing in the base class library. Many, many developers have this in their own common library, however, and we have it in MoreLINQ too.
It sort of goes against the spirit of LINQ, in that it's all about side-effects - but it's really useful, so I think pragmatism trumps dogma here.
One thing to note - there's really no point in using a query expression in your example. (It's not entirely redundant, but if you're not exposing the value it doesn't matter.) The select isn't doing anything useful here. You can just do:
IDB.Entities.ForEach(foo => foo.status = "Complete");
Even if you want to do a single "where" or "select" I'd normally use dot notation:
IDB.Entities.Where(foo => foo.Name == "fred")
.ForEach(foo => foo.status = "Complete");
Only use query expressions where they actually make the code simpler :)
public static void ForEach<T>(this IEnumerable<T> sequence, Action<T> action)
{
foreach (var item in sequence)
{
action(item);
}
}
There is a foreach on a List<>. Roughly something along these lines:
IQueryable<Entity> en = from e in IDB.Entities select e;
en.ToList().ForEach(foo => foo.status = "Complete");