Task fired again after WaitAll - c#

Using HttpClient.GetAsync or any of its async method, or any BCL async method in Linq Select might result in some strange twice shoot.
Here a unit test case:
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Select(d =>
{
k++;
var client = new System.Net.Http.HttpClient();
return client.GetAsync(new Uri("http://testdevserver.ibs.local:8020/prestashop/api/products/1"));
});
Task.WaitAll(tasks.ToArray());
foreach (var r in tasks)
{
}
Assert.AreEqual(1, k);
}
The test will fail, since k is 2. Somehow the program run the delegate of firing GetAsync twice. Why?
If I remove foreach (var r in tasks), the test pass. Why?
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Select(d =>
{
k++;
var client = new System.Net.Http.HttpClient();
return client.GetAsync(new Uri("http://testdevserver.ibs.local:8020/prestashop/api/products/1"));
});
Task.WaitAll(tasks.ToArray());
Assert.AreEqual(1, k);
}
If I use foreach instead of items.Select, the test pass. Why?
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = new List<Task<System.Net.Http.HttpResponseMessage>>();
foreach (var item in items)
{
k++;
var client = new System.Net.Http.HttpClient();
tasks.Add( client.GetAsync(new Uri("http://testdevserver.ibs.local:8020/prestashop/api/products/1")));
};
Task.WaitAll(tasks.ToArray());
foreach (var r in tasks)
{
}
Assert.AreEqual(1, k);
}
Apparently the enumerator returned by items.Select is not living well with the Task object returned, as soon as I walk the enumerator, the delegate got fired again.
This test pass.
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Select(d =>
{
k++;
var client = new System.Net.Http.HttpClient();
return client.GetAsync(new Uri("http://testdevserver.ibs.local:8020/prestashop/api/products/1"));
});
var tasksArray = tasks.ToArray();
Task.WaitAll(tasksArray);
foreach (var r in tasksArray)
{
}
Assert.AreEqual(1, k);
}
Scott mentioned that the Select may run again when walking the enumerator, however, this test pass
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Select(d =>
{
k++;
return int.Parse(d);
});
foreach (var r in tasks)
{
};
Assert.AreEqual(1, k);
}
I guess the Linq Select has some special treatment against Task.
After all, what's the good way of firing multiple async method in Linq and the examine the results after WaitAll?

It is because tasks is IEnumerable<Task> and each time you enumerate through the list it will re-run the .Select() operation. Currently you run through the list twice, one when you call .ToArray() and once when you pass it in to the foreach
To fix the problem just use the .ToArray() like you are but move it earlier up.
var tasks = items.Select(d =>
{
k++;
var client = new System.Net.Http.HttpClient();
return client.GetAsync(new Uri("http://testdevserver.ibs.local:8020/prestashop/api/products/1"));
}).ToArray(); //This makes tasks a "Task[]" instead of a IEnumerable<Task>.
Task.WaitAll(tasks);
foreach (var r in tasks)
{
};
Things like what happened to you is why Microsoft reccomends that when you write Linq statements that they do not have any side effects (like incrementing k) because it is hard to tell how many times the statement will be run, especially if the resultant IEnumerable<T> goes out of your scope of control by being returned as a result or passed in to a new function.

I think the problem is my misconception about how enumeration works. These tests pass:
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Select(d =>
{
k++;
return int.Parse(d);
});
foreach (var r in tasks)
{
};
foreach (var r in tasks)
{
};
Assert.AreEqual(2, k);
}
[TestMethod]
public void TestTwiceShoot2()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Where(d =>
{
k++;
return true;
});
foreach (var r in tasks)
{
};
foreach (var r in tasks)
{
};
Assert.AreEqual(2, k);
}
I had though the Linq statement returns an IEnumerable object which store the results of the delegate. However, obviously it stores only the shortcuts to the delegates, so each enumerator walk will trigger the delegate. Therefore, it is good to use ToArray() or ToList() to get a list of results, like this one:
[TestMethod]
public void TestTwiceShoot2()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Where(d =>
{
k++;
return true;
}).ToList();
foreach (var r in tasks)
{
};
foreach (var r in tasks)
{
};
Assert.AreEqual(1, k);
}

Related

How to await multiple IAsyncEnumerable

We have code like this:
var intList = new List<int>{1,2,3};
var asyncEnumerables = intList.Select(Foo);
private async IAsyncEnumerable<int> Foo(int a)
{
while (true)
{
await Task.Delay(5000);
yield return a;
}
}
I need to start await foreach for every asyncEnumerable's entry. Every loop iteration should wait each other, and when every iteration is done i need to collect every iteration's data and process that by another method.
Can i somehow achieve that by TPL? Otherwise, couldn't you give me some ideas?
What works for me is the Zip function in this repo (81 line)
I'm using it like this
var intList = new List<int> { 1, 2, 3 };
var asyncEnumerables = intList.Select(RunAsyncIterations);
var enumerableToIterate = async_enumerable_dotnet.AsyncEnumerable.Zip(s => s, asyncEnumerables.ToArray());
await foreach (int[] enumerablesConcatenation in enumerableToIterate)
{
Console.WriteLine(enumerablesConcatenation.Sum()); //Sum returns 6
await Task.Delay(2000);
}
static async IAsyncEnumerable<int> RunAsyncIterations(int i)
{
while (true)
yield return i;
}
Here is a generic method Zip you could use, implemented as an iterator. The cancellationToken is decorated with the EnumeratorCancellation attribute, so that the resulting IAsyncEnumerable is WithCancellation friendly.
using System.Runtime.CompilerServices;
public static async IAsyncEnumerable<TSource[]> Zip<TSource>(
IEnumerable<IAsyncEnumerable<TSource>> sources,
[EnumeratorCancellation]CancellationToken cancellationToken = default)
{
var enumerators = sources
.Select(x => x.GetAsyncEnumerator(cancellationToken))
.ToArray();
try
{
while (true)
{
var array = new TSource[enumerators.Length];
for (int i = 0; i < enumerators.Length; i++)
{
if (!await enumerators[i].MoveNextAsync()) yield break;
array[i] = enumerators[i].Current;
}
yield return array;
}
}
finally
{
foreach (var enumerator in enumerators)
{
await enumerator.DisposeAsync();
}
}
}
Usage example:
await foreach (int[] result in Zip(asyncEnumerables))
{
Console.WriteLine($"Result: {String.Join(", ", result)}");
}

Starting tasks inside another task is duplicating my WebRequests

I use the code below to check some pdf files online and return a string accordingly.
The problem is: When I added the second Task.Factory.StartNew() it started duplicating all requests, but still returning only one answer(as it should be).
I need this to be as fast as possible so I can't waste time sending two requests to the server.
public static void Main(string[] args)
{
var listT = new List<string>()
{
"24006025062"
};
var task = listT.Select(x => Task.Factory.StartNew(() => TesteTask(x)));
Task.WaitAll(task.ToArray(), TimeSpan.FromSeconds(120));
List<string> results = new List<string>();
foreach (var result in task)
{
results.Add(result.Result);
}
}
private static string TesteTask(string codCart)
{
var teste = new Consulta();
var retorno = string.Empty;
var session = teste.GetCaptcha();
for (int i = 0; i < 10; i++)
{
session.CaptchaResolvida = QuebraCaptcha(session.CaptchaCodificada).CaptchaResolvida;
if (session.CaptchaResolvida.Length > 0)
{
var links = teste.Consulta(codCart, session).Retorno;
if (links.Any())
{
var tasks = links.Select(x => Task.Factory.StartNew(() => Executa(teste, session, x)));
Task.WaitAll(tasks.ToArray(), TimeSpan.FromSeconds(120));
var modelList = from Result in tasks select Result.Result;
retorno = teste.FinalizaProcesso(modelList.ToList());
break;
}
}
}
return retorno;
}
private static string Executa(Consulta teste, Model<Request> session, string link)
{
var retorno = string.Empty;
for (int i = 0; i < 10; i++)
{
var CaptchaResolvida = QuebraCaptcha(teste.GetCaptchaPdf(session)).CaptchaResolvida;
if (CaptchaResolvida != null && CaptchaResolvida != string.Empty)
{
var status = teste.BaixaPdf(link, CaptchaResolvida, session);
if (status != string.Empty)
{
retorno = status;
break;
}
}
}
return retorno;
}
Ps: This is my first post on stack overflow, if I'm not clear enough please let me know!
You are getting this behavior because you are iterating twice on the Select returned IEnumerable. Try this:
public static void Main(string[] args)
{
var listT = new List<string>()
{
"24006025062"
};
var task = list
.Select(x => Task.Factory.StartNew(() => TesteTask(x)))
.ToArray();
Task.WaitAll(task, TimeSpan.FromSeconds(120));
List<string> results = new List<string>();
foreach (var result in task)
{
results.Add(result.Result);
}
}
By moving the ToArray() just after the Select() it creates the results IEnumerable only once instead of twice.
Hope it helps!

Parallel execution issue

I need a little education here with regards to the execution of parallel tasks.
I have created a small fiddle:
https://dotnetfiddle.net/JO2a4m
What I am trying to do send a few accounts to process in batches to another method and creating a unit of work (task) for each batch but when I execute the tasks, it only executes the last task which was added. This is something I am trying to break my head around.
Code:
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
public class Program
{
public static void Main()
{
var accounts = GenerateAccount();
var accountsProcess = new List<Account>();
var taskList = new List<Task>();
var batch = 4;
var count = 0;
foreach (var account in accounts)
{
if (count == batch)
{
taskList.Add(new Task(() => ProcessAccount(accountsProcess)));
count = 0;
accountsProcess.Clear();
}
count++;
accountsProcess.Add(account);
}
Parallel.ForEach(taskList, t =>
{
t.Start();
}
);
Task.WaitAll(taskList.ToArray());
if (accountsProcess.Count > 0)
ProcessAccount(accountsProcess);
}
public static List<Account> GenerateAccount()
{
var accounts = new List<Account>();
var first = "First";
var second = "Second";
for (int i = 0; i <= 1000; i++)
{
var account = new Account();
account.first = first + i;
account.second = second + i;
accounts.Add(account);
}
return accounts;
}
public static void ProcessAccount(List<Account> accounts)
{
Console.WriteLine(accounts.Count);
foreach (var account in accounts)
{
Console.WriteLine(account.first + account.second);
}
}
}
public class Account
{
public string first;
public string second;
}
foreach (var account in accounts)
{
if (count == batch)
{
taskList.Add(new Task(() => ProcessAccount(accountsProcess)));
count = 0;
accountsProcess.Clear();
}
count++;
accountsProcess.Add(account);
}
The issue is that all of the Tasks are sharing the same List<Account> object.
I would suggest changing the code to:
foreach (var account in accounts)
{
if (count == batch)
{
var bob = accountsProcess;
taskList.Add(new Task(() => ProcessAccount(bob)));
count = 0;
accountsProcess = new List<Account>();
}
count++;
accountsProcess.Add(account);
}
By using bob and assigning a new List to accountsProcess we ensure each Task gets its own List - rather than sharing a single List.
Also, consider using MoreLINQ's Batch rather than rolling your own.

Hashset First() + Remove() performance

I encountered a performance issue today involving hashset.remove and I'm still unclear on what the problem was. The question I'm left with is, why is the Approach2 method significantly faster than the Approach1 method? I'm assuming it's the calls to HashSet.Remove but the MSDN docs say HashSet.Remove is O(1).
public class HashSetTester
{
int TestNum = 20000;
public void Run()
{
var hashset2 = CreateTestHashSet();
var watch2 = new Stopwatch();
watch2.Start();
Approach2(hashset2);
watch2.Stop();
var hashset1 = CreateTestHashSet();
var watch1 = new Stopwatch();
watch1.Start();
Approach1(hashset1);
watch1.Stop();
Console.WriteLine("Approach1 is {0:0.0}x slower than Approach2", watch1.Elapsed.TotalSeconds / watch2.Elapsed.TotalSeconds);
}
HashSet<object> CreateTestHashSet()
{
var result = new HashSet<object>();
var rnd = new Random();
for (int i = 0; i < TestNum; i++)
{
result.Add(rnd.Next());
}
return result;
}
void Approach1(HashSet<object> hashset)
{
while (hashset.Any())
{
var instance = hashset.First();
hashset.Remove(instance);
DoSomething(instance, hashset);
}
}
void Approach2(HashSet<object> hashset)
{
var tempItems = new List<object>();
while (hashset.Any())
{
tempItems.Clear();
tempItems.AddRange(hashset);
hashset.Clear();
foreach (var instance in tempItems)
{
DoSomething(instance, hashset);
}
}
}
void DoSomething(object obj, HashSet<object> hashset)
{
// In some cases, hashset would be added to here
}
public static void Main()
{
new HashSetTester().Run();
}
}
If you need to do something for every entry of the hashset and then just remove the entries why not just use linq instead?
hashSet.ForEach(item => DoSomething(item, hashSet));
hashSet.Clear();
This will probably give you acceptable performance when dealing with big collections.

C# TPL lock shared object between tasks vs populating it with the results from Task(TResult) tasks

In my use of the Task Parallel Library in .net 4.0 I am wondering what is the best way to merge results from my parallel tasks, normally i would require a lock on a shared object between them, but now i wondering if the Task(TResult) class and merging at the end is a better solution? Below I have the two approaches i consider:
TPL with lock:
public MyObject DoWork()
{
var result = new MyObject();
var resultLock = new object();
var taskArray = new Task[this._objects.Length];
for (var i = 0; i < taskArray.Length; i++)
{
taskArray[i] = new Task((obj) =>
{
var _o = obj as AnObject;
var tmpResult = _o.DoTaskWork();
lock (resultLock)
result.Add(tmpResult);
}, this._objects[i]);
}
Task.WaitAll(taskArray);
return result;
}
And TPL with the merging at the end:
public MyObject DoWork()
{
var taskArray = new Task<String>[this._objects.Length];
for (var i = 0; i < taskArray.Length; i++)
{
taskArray[i] = new Task<String>((obj) =>
{
var _o = obj as AnObject;
return _o.DoTaskWork();
}, this._objects[i]);
}
Task<String>.WaitAll(taskArray);
var result = new MyObject();
for (var i = 0; i < taskArray.Length; i++)
result.Add(taskArray[i].Result);
return result;
}
Is there a right or wrong solution or what is a best practice (other possible solution to merging results from parallel tasks are most welcome)
using Parallel LINQ (which does all the nasty stuff for you), you could boil this down to a single line or so:
var workResults = _objects.AsParallel().Select(DoTaskWork);
foreach(var r in workResults)
{
result.Add(r);
}

Categories