Parallel.ForEach not setting all values in loop - c#

I am querying a sql data base for some employees.
When i receive these employees I and looping each one using the Parallel.ForEach.
The only reason I'm looping he employees that was retrieved from the database is so expand a few of the properties that I do not want to clutter up the data base with.
Never-the-less in this example I am attempting to set the Avatar for the current employee in the loop, but only one out of three always gets set, none of the other employees Avatar ever gets set to their correct URI. Basically, I'm taking the file-name of the avatar and building the full path to the users folder.
What am I doing wrong here to where each employees Avatar is not updated with the full path to their directory, like the only one that is being set? Parallel stack and there is in deep four
I'm sure I've got the code formatted incorrectly. I've looked at that Parallel Task and it does in deep create 4 Parallel Task on 6 Threads.
Can someone point out to me the correct way to format the code to use Parallel?
Also, one thing, if I remove the return await Task.Run()=> from the GetEmployees method I get an error of cannot finish task because some other task fished first.
The Parallel is acting as if it is only setting one of the Avatars for one of the employees.
---Caller
public async static Task<List<uspGetEmployees_Result>> GetEmployess(int professionalID, int startIndex, int pageSize, string where, string equals)
{
var httpCurrent = HttpContext.Current;
return await Task.Run(() =>
{
List<uspGetEmployees_Result> emps = null;
try
{
using (AFCCInc_ComEntities db = new AFCCInc_ComEntities())
{
var tempEmps = db.uspGetEmployees(professionalID, startIndex, pageSize, where, equals);
if (tempEmps != null)
{
emps = tempEmps.ToList<uspGetEmployees_Result>();
Parallel.ForEach<uspGetEmployees_Result>(
emps,
async (e) =>
{
e.Avatar = await Task.Run(() => BuildUserFilePath(e.Avatar, e.UserId, httpCurrent, true));
}
);
};
}
}
catch (SqlException ex)
{
throw ex;
};
return emps;
});
}
--Callee
static string BuildUserFilePath(object fileName, object userProviderKey, HttpContext context, bool resolveForClient = false)
{
return string.Format("{0}/{1}/{2}",
resolveForClient ?
context.Request.Url.AbsoluteUri.Replace(context.Request.Url.PathAndQuery, "") : "~",
_membersFolderPath + AFCCIncSecurity.Folder.GetEncryptNameForSiteMemberFolder(userProviderKey.ToString(), _cryptPassword),
fileName.ToString());
}
----------------------------------------Edit------------------------------------
The final code that I'm using with everyone's help. Thanks so much!
public async static Task<List<uspGetEmployees_Result>> GetEmployess(int professionalID, int startIndex, int pageSize, string where, string equals)
{
var httpCurrent = HttpContext.Current;
List<uspGetEmployees_Result> emps = null;
using (AFCCInc_ComEntities db = new AFCCInc_ComEntities())
{
emps = await Task.Run(() => (db.uspGetEmployees(professionalID, startIndex, pageSize, where, equals) ?? Enumerable.Empty<uspGetEmployees_Result>()).ToList());
if (emps.Count() == 0) { return null; }
int skip = 0;
while (true)
{
// Do parallel processing in "waves".
var tasks = emps
.Take(Environment.ProcessorCount)
.Select(e => Task.Run(() => e.Avatar = BuildUserFilePath(e.Avatar, e.UserId, httpCurrent, true))) // No await here - we just want the tasks.
.Skip(skip)
.ToArray();
if (tasks.Length == 0) { break; }
skip += Environment.ProcessorCount;
await Task.WhenAll(tasks);
};
}
return emps;
}

Your definition of BuildUserFilePath and its usage are inconsistent. The definition clearly states that it's a string-returning method, whereas its usage implies that it returns a Task<>.
Parallel.ForEach and async don't mix very well - that's the reason your bug happened in the first place.
Unrelated but still worth noting: your try/catch is redundant as all it does is rethrow the original SqlException (and even that it doesn't do very well because you'll end up losing the stack trace).
Do you really, really want to return null?
public async static Task<List<uspGetEmployees_Result>> GetEmployess(int professionalID, int startIndex, int pageSize, string where, string equals)
{
var httpCurrent = HttpContext.Current;
// Most of these operations are unlikely to be time-consuming,
// so why await the whole thing?
using (AFCCInc_ComEntities db = new AFCCInc_ComEntities())
{
// I don't really know what exactly uspGetEmployees returns
// and, if it's an IEnumerable, whether it yields its elements lazily.
// The fact that it can be null, however, bothers me, so I'll sidestep it.
List<uspGetEmployees_Result> emps = await Task.Run(() =>
(db.uspGetEmployees(professionalID, startIndex, pageSize, where, equals) ?? Enumerable.Empty<uspGetEmployees_Result>()).ToList()
);
// I'm assuming that BuildUserFilePath returns string - no async.
await Task.Run(() =>
{
Parallel.ForEach(emps, e =>
{
// NO async/await within the ForEach delegate body.
e.Avatar = BuildUserFilePath(e.Avatar, e.UserId, httpCurrent, true);
});
});
}
}

There seems to be over-use of async and Task.Run() in this code. For example, what are you hoping to achieve from this segment?
Parallel.ForEach<uspGetEmployees_Result>(
emps,
async (e) =>
{
e.Avatar = await Task.Run(() => BuildUserFilePath(e.Avatar, e.UserId, httpCurrent, true));
}
);
You're already using await on the result of the entire method, and you've used a Parallel.ForEach to get parallel execution of items in your loop, so what does the additional use of await Task.Run() get you? The code would certainly be a lot easier to follow without it.
It is not clear to me what you are trying to achieve here. Can you describe what your objectives are for this method?

Related

C# LanguageExt - combine multiple async calls into one grouped call

I have a method that looks up an item asynchronously from a datastore;
class MyThing {}
Task<Try<MyThing>> GetThing(int thingId) {...}
I want to look up multiple items from the datastore, and wrote a new method to do this. I also wrote a helper method that will take multiple Try<T> and combine their results into a single Try<IEnumerable<T>>.
public static class TryExtensions
{
Try<IEnumerable<T>> Collapse<T>(this IEnumerable<Try<T>> items)
{
var failures = items.Fails().ToArray();
return failures.Any() ?
Try<IEnumerable<T>>(new AggregateException(failures)) :
Try(items.Select(i => i.Succ(a => a).Fail(Enumerable.Empty<T>())));
}
}
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var results = new List<Try<Things>>();
foreach (var id in ids)
{
var thing = await GetThing(id);
results.Add(thing);
}
return results.Collapse().Map(p => p.ToArray());
}
Another way to do it would be like this;
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var tasks = ids.Select(async id => await GetThing(id)).ToArray();
await Task.WhenAll(tasks);
return tasks.Select(t => t.Result).Collapse().Map(p => p.ToArray());
}
The problem with this is that all the tasks will run in parallel and I don't want to hammer my datastore with lots of parallel requests. What I really want is to make my code functional, using monadic principles and features of LanguageExt. Does anyone know how to achieve this?
Update
Thanks for the suggestion #MatthewWatson, this is what it looks like with the SemaphoreSlim;
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var mutex = new SemaphoreSlim(1);
var results = ids.Select(async id =>
{
await mutex.WaitAsync();
try { return await GetThing(id); }
finally { mutex.Release(); }
}).ToArray();
await Task.WhenAll(tasks);
return tasks.Select(t => t.Result).Collapse().Map(Enumerable.ToArray);
return results.Collapse().Map(p => p.ToArray());
}
Problem is, this is still not very monadic / functional, and ends up with more lines of code than my original code with a foreach block.
In the "Another way" you almost achieved your goal when you called:
var tasks = ids.Select(async id => await GetThing(id)).ToArray();
Except that Tasks doesn't run sequentially so you will end up with many queries hitting your datastore, which is caused by .ToArray() and Task.WhenAll. Once you called .ToArray() it allocated and started the Tasks already, so if you can "tolerate" one foreach to achieve the sequential tasks running, like this:
public static class TaskExtensions
{
public static async Task RunSequentially<T>(this IEnumerable<Task<T>> tasks)
{
foreach (var task in tasks) await task;
}
}
Despite that running a "loop" of queries is not a quite good practice
in general, unless you have in some background service and some
special scenario, leveraging this to the Database engine through
WHERE thingId IN (...) in general is a better option. Even you
have big amount of thingIds we can slice it into small 10s, 100s.. to
narrow the WHERE IN footprint.
Back to our RunSequentially, I would like to make it more functional like this for example:
tasks.ToList().ForEach(async task => await task);
But sadly this will still run kinda "Parallel" tasks.
So the final usage should be:
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var tasks = ids.Select(id => GetThing(id));// remember don't use .ToArray or ToList...
await tasks.RunSequentially();
return tasks.Select(t => t.Result).Collapse().Map(p => p.ToArray());
}
Another overkill functional solution is to get Lazy in a Queue recursively !!
Instead GetThing, get a Lazy one GetLazyThing that returns Lazy<Task<Try<MyThing>>> simply by wrapping GetThing:
new Lazy<Task<Try<MyThing>>>(() => GetThing(id))
Now using couple extensions/functions:
public static async Task RecRunSequentially<T>(this IEnumerable<Lazy<Task<T>>> tasks)
{
var queue = tasks.EnqueueAll();
await RunQueue(queue);
}
public static Queue<T> EnqueueAll<T>(this IEnumerable<T> list)
{
var queue = new Queue<T>();
list.ToList().ForEach(m => queue.Enqueue(m));
return queue;
}
public static async Task RunQueue<T>(Queue<Lazy<Task<T>>> queue)
{
if (queue.Count > 0)
{
var task = queue.Dequeue();
await task.Value; // this unwraps the Lazy object content
await RunQueue(queue);
}
}
Finally:
var lazyTasks = ids.Select(id => GetLazyThing(id));
await lazyTasks.RecRunSequentially();
// Now collapse and map as you like
Update
However if you don't like the fact that EnqueueAll and RunQueue are not "pure", we can take the following approach with the same Lazy trick
public static async Task AwaitSequentially<T>(this Lazy<Task<T>>[] array, int index = 0)
{
if (array == null || index < 0 || index >= array.Length - 1) return;
await array[index].Value;
await AwaitSequentially(array, index + 1); // ++index is not pure :)
}
Now:
var lazyTasks = ids.Select(id => GetLazyThing(id));
await tasks.ToArray().AwaitSequentially();
// Now collapse and map as you like

Why is my IEnumerable<T>.Where iterator not executing inside a using block while Async call?

I'm sure this is not a Dapper issue however I am finding, in the following snippet, that the predicate supplied to the Where function is never executed.
private async Task<IEnumerable<Product>> GetProducts()
{
using (var connection = await _connectionFactory.Create())
{
var products = await connection.QueryAsync<Product>("select * from Products");
return products.Where(p => p.Active);
}
}
However if I move the operation to outside the using it is executed.
private async Task<IEnumerable<Product>> GetProducts()
{
var products = Enumerable.Empty<Product>();
using (var connection = await _connectionFactory.Create())
{
products = await connection.QueryAsync<Product>("select * from Products");
}
return products.Where(p => p.Active);
}
Is there some sort of deferred execution going on?
In the first example, if you can make the following modification in the return statement:
return products.Where(p => p.Active).ToList();, then it will work as expected.
Case 1:
Issue here is Where clause applied on the IEnumerable<Product> is deferred execution, which is returned wrapped up in the Task as follows Task<IEnumerable<Product>>, but now you need to run the Task, which shall execute the predicate too, Not sure how are you executing the Task or may be there's an issue with wrapping the deferred execution in this manner, but end result is predicate is not coming in effect as expected, even when its applied on the Dapper result, which is buffered by default (no-streaming)
Case 2:
It works in the second case, since you are completely getting rid of deferred execution, Enumerable.Empty<Product>() is ensuring that memory is first allocated, so predicate is executed moment its applied there's no deferred execution. In fact predicate is any way applied outside the using block
In the Async method, you are disposing the connection with the using block, mostly since Dapper internally allocates memory that's why all the data is sent across, connection is then disposed, and predicate is never executed. I have similar sample, which doesn't rely on database connection and it works as expected, therefore we can deduce that connection dispose play a role here in predicate not executing. In second case predicate is applied outside using block, so connection dispose has no role and memory is already allocated.
Sample Code (using LinqPad):
async Task Main()
{
var result = await GetTest();
result.Dump();
}
public async Task<IEnumerable<Test>> GetTest()
{
var value = await GetTestDb();
return value.Where(x => x.Id == 1);
}
public async Task<IEnumerable<Test>> GetTestDb()
{
return await Task.FromResult(
new List<Test>
{
new Test{Id = 1, Name = "M"},
new Test{Id = 2, Name = "S"}
}
);
}
public class Test
{
public int Id { get; set; }
public string Name { get; set; }
}
Result:
Your predicate is not actually working as predicate. It is simply a LINQ call.
return products.Where(p => p.Active);
When above line execute, products is already filled by all the rows from the table based on your query and QueryAsync call in earlier line.
Good thing about Dapper is that, it provides complete control of query writing to you. So, if you want to filter the records, why not write the query that way?
using(var connection = ....)
{
var param = new DynamicParameters();
param.Add("#Active", 1);
var products = await connection.QueryAsync<Product>("select * from Products where Active = #Active", param);
return products;
}
You should then remove the products.Where line.
About actual problem you asked in question:
I could not reproduce the problem. When I run following code to read the output in console application, it returns expected results.
DbDataReader dbDataReader = new DbDataReader();
IEnumerable<Product> activeProducts = dbDataReader.GetProducts().Result;
Console.WriteLine(activeProducts.Count());
Your method is little modified as below:
public class DbDataReader
{
string connectionString = #"....";
public async Task<IEnumerable<Product>> GetProducts()
{
using(var connection = await GetOpenConnection())
{
var products = await connection.QueryAsync<Product>("select * from Products;WAITFOR DELAY '00:00:05'");
return products.Where(p => p.Active);
}
}
private async Task<SqlConnection> GetOpenConnection()
{
SqlConnection sqlConnection = new SqlConnection(connectionString);
await sqlConnection.OpenAsync();
return sqlConnection;
}
}
Note that I have intentionally delayed the QueryAsync call with WAITFOR.

C# Running many async tasks the same time

I'm kinda new to async tasks.
I've a function that takes student ID and scrapes data from specific university website with the required ID.
private static HttpClient client = new HttpClient();
public static async Task<Student> ParseAsync(string departmentLink, int id, CancellationToken ct)
{
string website = string.Format(departmentLink, id);
try
{
string data;
var stream = await client.GetAsync(website, ct);
using (var reader = new StreamReader(await stream.Content.ReadAsStreamAsync(), Encoding.GetEncoding("windows-1256")))
data = reader.ReadToEnd();
//Parse data here and return Student.
} catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
And it works correctly. Sometimes though I need to run this function for a lot of students so I use the following
for(int i = ids.first; i <= ids.last; i++)
{
tasks[i - ids.first] = ParseStudentData.ParseAsync(entity.Link, i, cts.Token).ContinueWith(t =>
{
Dispatcher.Invoke(() =>
{
listview_students.Items.Add(t.Result);
//Students.Add(t.Result);
//lbl_count.Content = $"{listview_students.Items.Count}/{testerino.Length}";
});
});
}
I'm storing tasks in an array to wait for them later.
This also works finely as long as the students count is between (0, ~600?) it's kinda random.
And then for every other student that still hasn't been parsed throws A task was cancelled.
Keep in mind that, I never use the cancellation token at all.
I need to run this function on so many students it can reach ~9000 async task altogether. So what's happening?
You are basically creating a denial of service attack on the website when you are queuing up 9000 requests in such a short time frame. Not only is this causing you errors, but it could take down the website. It would be best to limit the number of concurrent requests to a more reasonable value (say 30). While there are probably several ways to do this, one that comes to mind is the following:
private async Task Test()
{
var tasks = new List<Task>();
for (int i = ids.first; i <= ids.last; i++)
{
tasks.Add(/* Do stuff */);
await WaitList(tasks, 30);
}
}
private async Task WaitList(IList<Task> tasks, int maxSize)
{
while (tasks.Count > maxSize)
{
var completed = await Task.WhenAny(tasks).ConfigureAwait(false);
tasks.Remove(completed);
}
}
Other approaches might leverage the producer/consumer pattern using .Net classes such as a BlockingCollection
This is what I ended up with based on #erdomke code:
public static async Task ForEachParallel<T>(
this IEnumerable<T> list,
Func<T, Task> action,
int dop)
{
var tasks = new List<Task>(dop);
foreach (var item in list)
{
tasks.Add(action(item));
while (tasks.Count >= dop)
{
var completed = await Task.WhenAny(tasks).ConfigureAwait(false);
tasks.Remove(completed);
}
}
// Wait for all remaining tasks.
await Task.WhenAll(tasks).ConfigureAwait(false);
}
// usage
await Enumerable
.Range(1, 500)
.ForEachParallel(i => ProcessItem(i), Environment.ProcessorCount);

How to await in a method returning list?

i have a static method that should return list. but i want to do an await inside the method.
public static List<ContactModel> CreateSampleData()
{
var data = new List<ContactModel>();
StorageFolder musiclibrary = KnownFolders.MusicLibrary;
artists = (await musiclibrary.GetFoldersAsync(CommonFolderQuery.GroupByAlbumArtist)).ToList();
for (var i = 0; i < artists.Count; i++)
{
try
{
data.Add(new ContactModel(artists[i].Name));
}
catch { }
}
return data;
}
when i make it
public static async Task<List<ContactModel>> CreateSampleData(){//method contents}
i get error on another page for this code
Error: Task<List<ContactModel>> doesnt contain a definition for ToAlphaGroups
var items = ContactModel.CreateSampleData();
data = items.ToAlphaGroups(x => x.Name);
You have to await your async method:
var items = await ContactModel.CreateSampleData();
Your method now returns a Task, thats why you get the error message.
I don't know whether I should mention this because I agree with the answer of Jan-Patric Ahnen.
But since you said you cannot add await to your code: Task has a property called Result that returns the "result" of the Task.
var items = ContactModel.CreateSampleData().Result;
data = items.ToAlphaGroups(x => x.Name);
A few things before you use Result:
Result blocks the calling thread, if called from the UI thread your app might become unresponsive
You should try to avoid Result at all cost and try to use await since Result can produce unexpected results.

async and await while adding elements to List<T>

I wrote method, which adds elements to the List from many sources. See below:
public static async Task<List<SearchingItem>> GetItemsToSelect()
{
List<SearchingItem> searchingItems = new List<SearchingItem>();
foreach (Place place in await GetPlaces())
{
searchingItems.Add(new SearchingItem() {
IdFromRealModel=place.Id, NameToDisplay=place.FullName,
ExtraInformation=place.Name, TypeOfSearchingItem=TypeOfSearchingItem.PLACE });
}
foreach (Group group in await GetGroups())
{
searchingItems.Add(new SearchingItem()
{
IdFromRealModel = group.Id, NameToDisplay = group.Name,
ExtraInformation = group.TypeName, TypeOfSearchingItem = TypeOfSearchingItem.GROUP
});
}
return searchingItems;
}
I tested this method and works propertly. I suppose that it works propertly, because GetPlaces method return 160 elements and GetGroups return 3000. But, I was wondering if it will work if the methods return elements in the same time. Should I lock list searchingItems ?
Thank you for advice.
Your items do not run at the same time, you start GetPlaces(), stop and wait for GetPlaces() result, then go in to the first loop. You then start GetGroups(), stop and wait for GetGroups() result, then go in to the second loop. Your loops are not concurrent so you have no need to lock while adding them.
However if you have noticed your two async methods are also not concurrent, you can easily modify your program to make it so though.
public static async Task<List<SearchingItem>> GetItemsToSelect()
{
List<SearchingItem> searchingItems = new List<SearchingItem>();
var getPlacesTask = GetPlaces();
var getGroupsTask = GetGroups();
foreach (Place place in await getPlacesTask)
{
searchingItems.Add(new SearchingItem() {
IdFromRealModel=place.Id, NameToDisplay=place.FullName,
ExtraInformation=place.Name, TypeOfSearchingItem=TypeOfSearchingItem.PLACE });
}
foreach (Group group in await getGroupsTask)
{
searchingItems.Add(new SearchingItem()
{
IdFromRealModel = group.Id, NameToDisplay = group.Name,
ExtraInformation = group.TypeName, TypeOfSearchingItem = TypeOfSearchingItem.GROUP
});
}
return searchingItems;
}
What this will do will start GetPlaces(), start GetGroups(), stop and wait for GetPlaces() result, then go in to the first loop, stop and wait for GetGroups() result, then go in to the second loop.
The two loops are still not concurrent, but your two await-able methods are which may give you a small performance boost. I doubt you would get any benifit from making the loops concurrent, they appear to just be building models and the overhead of making it thread safe would not be worth it for how little work is being done.
If you really wanted to try and make it more parallel (but I doubt you will see much benefit) is use PLINQ to build your models.
public static async Task<List<SearchingItem>> GetItemsToSelect()
{
var getPlacesTask = GetPlaces();
var getGroupsTask = GetGroups();
var places = await getPlacesTask;
//Just make the initial list from the LINQ object.
List<SearchingItem> searchingItems = places.AsParallel().Select(place=>
new SearchingItem() {
IdFromRealModel=place.Id, NameToDisplay=place.FullName,
ExtraInformation=place.Name, TypeOfSearchingItem=TypeOfSearchingItem.PLACE
}).ToList();
var groups = await getGroupsTask;
//build up a PLINQ IEnumerable
var groupSearchItems = groups.AsParallel().Select(group=>
new SearchingItem()
{
IdFromRealModel = group.Id, NameToDisplay = group.Name,
ExtraInformation = group.TypeName, TypeOfSearchingItem = TypeOfSearchingItem.GROUP
});
//The building of the IEnumerable was parallel but the adding is serial.
searchingItems.AddRange(groupSearchItems);
return searchingItems;
}

Categories