Task.Run with or without async and await

Task.Run with or without async and await - c#

Can anyone explain the difference of using Task.Run with or without async and await like the code below, e.g. await Task.Run(async ()=>...)?
public class AsyncRun
{
public void Entry()
{
Test4().Wait();
}
private async Task Test4()
{
Console.WriteLine($"1 {DateTime.Now}");
await Task.Run(async () => await Get());
Console.WriteLine($"2 {DateTime.Now}");
Console.Read();
//1 23 / 05 / 2020 07:52:42
//Get 1 23 / 05 / 2020 07:52:42
//Get 2 23 / 05 / 2020 07:52:43
//2 23 / 05 / 2020 07:52:43
}
private Task Test3()
{
Console.WriteLine($"1 {DateTime.Now}");
Task.Run(async ()=> await Get());
Console.WriteLine($"2 {DateTime.Now}");
Console.Read();
return Task.CompletedTask;
//1 23 / 05 / 2020 07:47:24
//2 23 / 05 / 2020 07:47:24
//Get 1 23 / 05 / 2020 07:47:24
//Get 2 23 / 05 / 2020 07:47:25
}
private async Task Test2()
{
Console.WriteLine($"1 {DateTime.Now}");
await Task.Run(Get);
Console.WriteLine($"2 {DateTime.Now}");
Console.Read();
//1 23 / 05 / 2020 07:43:24
//Get 1 23 / 05 / 2020 07:43:24
//Get 2 23 / 05 / 2020 07:43:25
//2 23 / 05 / 2020 07:43:25
}
private void Test1()
{
Console.WriteLine($"1 {DateTime.Now}");
Task.Run(Get);
Console.WriteLine($"2 {DateTime.Now}");
Console.Read();
//1 23 / 05 / 2020 07:41:09
//2 23 / 05 / 2020 07:41:09
//Get 1 23 / 05 / 2020 07:41:09
//Get 2 23 / 05 / 2020 07:41:10
}
private Task Get()
{
Console.WriteLine($"Get 1 {DateTime.Now}");
Thread.Sleep(1000);
Console.WriteLine($"Get 2 {DateTime.Now}");
return Task.CompletedTask;
}
}

The difference is showing in your Console.WriteLines output.
When you use await Task.Run, you're waiting for the task you ran to finish, then continue the code execution, so you're getting the following logs:
// 1 23 / 05 / 2020 07:52:42
// Get 1 23 / 05 / 2020 07:52:42
// Get 2 23 / 05 / 2020 07:52:43
// 2 23 / 05 / 2020 07:52:43
When you don't wait the task you just ran, it's like you're running it and "forgetting" about it. This means that your code will continue to execute, and the task you ran will execute elsewhere, thus the logs are:
// 1 23 / 05 / 2020 07:47:24
// 2 23 / 05 / 2020 07:47:24
// ^ Your test code finished executing, without waiting for the Get task
// Get 1 23 / 05 / 2020 07:47:24
// Get 2 23 / 05 / 2020 07:47:25
Also, there is no difference between await Task.Run(async () => await Get())
and await Task.Run(Get) in terms of what actually happens. The only difference is that in the first one, you're creating another async lambda function to generate a Task while your Get method is already a Task, so you can use it directly.

General notes
The Console is not a good environment to learn ot test multitasking in any way, shape or form. One big issue is with MT keeping the application alive, without blocking continuation code. In consoles we have to do that manually. GUI's are the right environment, as they do it by accident. The EventQueue keeps the application alive, while still allowing I/O to happen.
Thread.Sleep() should not be used in Get. You should use the more agnostic Task.Delay(1000) instead. Using Thread classes here is going to cause unwanted side effects and even ruin the reason we have await in the first place.
Try it again in a GUI and without Sleep, as it will get you more meaningfull results.
What is Task
Task is just a construct to help with all forms of Multitasking. It is "agnostic" to how it is run. Get() can be executed asynchronously using ThreadPools. Asynchronously using async and await. Synchronously by just calling RunSynchronously(). Or asynchronously by calling RunSynchronously() from a Thread you manually started. And propably a few other ways, I can not remember yet.
Different kinds of Multitasking
Multithreading can be used to implement Multitasking. Technically we only need Multithreading with CPU bound work, but for the longest time it was the most expedient way to implement MT in general, so we tended to use it. A lot. Especially in cases where it was not nessesary.
Even fully undertanding why we did it, I think we overused to abused Thread based multitasking. And especially with GUIs, Multithreading causes some issues. Still, it was the easier way. So a lot of things, a lot of examples are still designed for Thread Based Multitasking.
Only recently, did we get async and await as alternative. async and await are a way of Multitasking without restorting to Mutlthreading. We always had and still have the option do the work of those two ourself. But that is quite code intensive and prone to error. Those two are easy and resolved reliably between the Compiler and the Runtime. So only now, do we start going back to forms of Threadless Multitasking.
Task.Run:
"Queues the specified work to run on the ThreadPool and returns a task or Task handle for that work." - https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.run
So Run takes a task, and executes it via a ThreadPool - a form of Thread based mutltiasking.

Related

Bad SemaphoreSlim perfomance using a lot of semaphores

When I run a lot of operations in parallel using SemaphoreSlim for each, their invocations are not so quick as expected.
Here is the code
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 50; i++) {
int localI = i;
Task.Run(async () => {
var semaphore = new SemaphoreSlim(1, 1);
await semaphore.WaitAsync();
Thread.Sleep(1000);
counter++;
semaphore.Release();
Debug.WriteLine($"{localI} - {sw.ElapsedMilliseconds}");
});
}
Thread.Sleep(5000);
And here is the output:
2 - 1015
0 - 1015
1 - 1015
3 - 2053
4 - 2053
5 - 2053
6 - 2120
7 - 3009
8 - 3064
9 - 3066
10 - 3068
11 - 3134
12 - 4011
13 - 4016
14 - 4070
15 - 4071
16 - 4073
17 - 4140
Can somebody explain why they were not invoked approximately in 1 second?

What you are seeing is the limited thread pool injection rate. It has nothing to do with SemaphoreSlim or even async, as all the code posted is actually synchronous.
On your machine, three threads are able to run immediately. The thread pool sees that it has other work to do (47 other items already queued). So it waits for a bit and then injects another thread. The next group of work uses four threads. The thread pool is still "behind", so it waits for a bit and then injects another thread, etc.
The "wait for a bit" part of the description above is the limited thread pool injection rate. The thread pool has to wait for a bit, or else whenever it gets more work, it would immediately create a bunch of threads, which would then be disposed of when the work is done. So to be more efficient and prevent this "thread thrashing", the thread pool waits for a bit before creating new threads.

Is ParallelOptions.MaxDegreeOfParallelism applied globally over multiple concurrent Parallel calls?

Consider this code run on a CPU with 32 cores:
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = 8;
Parallel.For(0, 4, po, (i) =>
{
Parallel.For(0, 4, po, (j) =>
{
WorkMethod(i, j); // assume a long-running method
});
}
);
My question is what is the actual maximum possibly concurrency of WorkMethod(i, j)? Is it 4, 8, or 16?

ParallelOptions.MaxDegreeOfParallelism is not applied globally. If you have enough cores, and the scheduler sees fit you will get a multiplication of the nested MPD values with each For able to spin up that many tasks (if the workloads are unconstrained).
Consider this example, 3 tasks can start 3 more tasks. This is limited by the MDP option of 3.
int k = 0;
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = 3;
Parallel.For(0, 10, po, (i) =>
{
Parallel.For(0, 10, po, (j) =>
{
Interlocked.Increment(ref k);
Console.WriteLine(k);
Thread.Sleep(2000);
Interlocked.Decrement(ref k);
});
Thread.Sleep(2000);
});
Output
1
2
3
4
7
5
6
8
9
9
5
6
7
9
9
8
8
9
...
If MDP was global you would only get 3 I guess, since it's not you get 9s.

ParallelOptions.MaxDegreeOfParallelism is not global, it is per parallel loop. And more specifically, it sets the max number of tasks that can run in parallel, not the max number of cores or threads that will run those tasks in parallel.
Some demo tests
note: i have 4 cores, 8 threads
What's happening in the code
We're running 2 async methods; each one kicks off nested parallel loops.
We're setting max degrees of parallelism to 2 and a sleep time of 2 seconds to simulate the work each task does
So, due to setting MaxDegreeOfParallelism to 2, we would expect to reach up to 12 concurrent tasks before the 40 tasks complete (i'm only counting tasks kicked off by the nested parallel loops)
how do i get 12?
2 max concurrent tasks started in the outer loop
+4 max concurrent tasks from inner loop (2 started per task started in outer loop)
that's 6 (per asynchronous task kicked off in Main)
12 total
test code
using System;
using System.Threading;
using System.Threading.Tasks;
namespace forfun
{
class Program
{
static void Main(string[] args)
{
var taskRunner = new TaskRunner();
taskRunner.RunTheseTasks();
taskRunner.RunTheseTasksToo();
Console.ReadLine();
}
private class TaskRunner
{
private int _totalTasks = 0;
private int _runningTasks = 0;
public async void RunTheseTasks()
{
await Task.Run(() => ProcessThingsInParallel());
}
public async void RunTheseTasksToo()
{
await Task.Run(() => ProcessThingsInParallel());
}
private void ProcessThingsInParallel()
{
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = 2;
Parallel.For(0, 4, po, (i) =>
{
Interlocked.Increment(ref _totalTasks);
Interlocked.Increment(ref _runningTasks);
Console.WriteLine($"{_runningTasks} currently running of {_totalTasks} total tasks");
Parallel.For(0, 4, po, (j) =>
{
Interlocked.Increment(ref _totalTasks);
Interlocked.Increment(ref _runningTasks);
Console.WriteLine($"{_runningTasks} currently running of {_totalTasks} total tasks");
WorkMethod(i, j); // assume a long-running method
Interlocked.Decrement(ref _runningTasks);
});
Interlocked.Decrement(ref _runningTasks);
}
);
}
private static void WorkMethod(int i, int l)
{
Thread.Sleep(2000);
}
}
}
}
Spoiler, the output shows that setting MaxDegreeOfParallelism is not global, is not limited to core or thread count, and is specifically setting a max on concurrent running tasks.
output with max set to 2:
1 currently running of 1 total tasks
3 currently running of 3 total tasks
2 currently running of 2 total tasks
4 currently running of 4 total tasks
5 currently running of 5 total tasks
7 currently running of 7 total tasks
[ ... snip ...]
11 currently running of 33 total tasks
12 currently running of 34 total tasks
11 currently running of 35 total tasks
12 currently running of 36 total tasks
11 currently running of 37 total tasks
12 currently running of 38 total tasks
11 currently running of 39 total tasks
12 currently running of 40 total tasks
(output will vary, but each time, the max concurrent should be 12)
output without max set:
1 currently running of 1 total tasks
3 currently running of 3 total tasks
4 currently running of 4 total tasks
2 currently running of 2 total tasks
5 currently running of 5 total tasks
7 currently running of 7 total tasks
[ ... snip ...]
19 currently running of 28 total tasks
19 currently running of 29 total tasks
18 currently running of 30 total tasks
13 currently running of 31 total tasks
13 currently running of 32 total tasks
16 currently running of 35 total tasks
16 currently running of 36 total tasks
14 currently running of 33 total tasks
15 currently running of 34 total tasks
15 currently running of 37 total tasks
16 currently running of 38 total tasks
16 currently running of 39 total tasks
17 currently running of 40 total tasks
notice how without setting the max, we get up to 19 concurrent tasks
- now the 2 second sleep time is limiting the number of tasks that could kick off before others finished
output after increasing sleep time to 12 seconds
1 currently running of 1 total tasks
2 currently running of 2 total tasks
3 currently running of 3 total tasks
4 currently running of 4 total tasks
[ ... snip ...]
26 currently running of 34 total tasks
26 currently running of 35 total tasks
27 currently running of 36 total tasks
28 currently running of 37 total tasks
28 currently running of 38 total tasks
28 currently running of 39 total tasks
28 currently running of 40 total tasks
got up to 28 concurrent tasks
now setting loops to 10 nested in 10 and setting sleep time back to 2 seconds - again no max set
1 currently running of 1 total tasks
3 currently running of 3 total tasks
2 currently running of 2 total tasks
4 currently running of 4 total tasks
[ ... snip ...]
38 currently running of 176 total tasks
38 currently running of 177 total tasks
38 currently running of 178 total tasks
37 currently running of 179 total tasks
38 currently running of 180 total tasks
38 currently running of 181 total tasks
[ ... snip ...]
35 currently running of 216 total tasks
35 currently running of 217 total tasks
32 currently running of 218 total tasks
32 currently running of 219 total tasks
33 currently running of 220 total tasks
got up to 38 concurrent tasks before all 220 finished
More related information
ParallelOptions.MaxDegreeOfParallelism Property
The MaxDegreeOfParallelism property affects the number of concurrent operations run by Parallel method calls that are passed this ParallelOptions instance. A positive property value limits the number of concurrent operations to the set value. If it is -1, there is no limit on the number of concurrently running operations.
By default, For and ForEach will utilize however many threads the underlying scheduler provides, so changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used.
to get the max degree of parallelism, don't set it, rather allow the TPL and its scheduler handle it
setting the max degree of parallelism only affects the number of concurrent tasks, not threads used
the maximum number of concurrent tasks is not equal to the number of threads available--threads will still be able to juggle multiple tasks; and even if your app is using all threads, it is still sharing those threads with the other processes that the machine is hosting
Environment.ProcessorCount
Gets the number of processors on the current machine.
What if we say MaxDegreeOfParallelism = Environment.ProcessorCount?
Even setting max degree of parallism to Environment.ProcessorCount does not dynamically ensure that you get the maximum concurrency regardless of the system your app is running on. Doing this still limits the degree of parallelism, because any given thread can switch between many tasks--so this would just limit the number of concurrent tasks to equal the number of available threads--and this does not necessarily mean that each concurrent task will be assigned neatly to each thread in a one-to-one relationship.

C# concurrent requests strange timing reported

I am working on a console application that sends multiple requests to an API and I am making use of async, tasks and await. I am using the Stopwatch to show the time spent for each request/task and I noticed that it starts very low (150 ms) and there is adding around ~100 ms for each next task.
I think the tasks are running concurrently because the program completes 83 requests/tasks in 8 seconds and when I measure the get request with Chrome it showed around 200ms.
Do you know why the time is increasing as the tasks go? Is there something wrong with the measuring or with my code logic?
Isn't this suppose to be faster? From what I red, WhenAll should run the tasks concurrently and the overall completion time is the max task time from the list.
public async Task<List<CatalogEvent>> GetEventsAsync(int id)
{
sw.Restart();
var request = GetRequest(msCatalogEndpoint);
request.AddParameter("id", id, ParameterType.UrlSegment);
List<CatalogEvent> events = new List<CatalogEvent>();
var response = await client.ExecuteTaskAsync(request).ConfigureAwait(false);
var catalog = JsonConvert.DeserializeObject<CatalogEndpoint>(response.Content);
if (!(catalog.catalogEvents is null))
{
foreach (var ev in catalog.catalogEvents)
{
CatalogEvent catalogEvent = ev.Value;
catalogEvent.eventName = ev.Key.ToString();
catalogEvent.titleId = id;
DateTime dateTime = DateTime.UtcNow;
catalogEvent.date = dateTime.ToString();
events.Add(catalogEvent);
}
}
Console.WriteLine($"Task for Id: {id} took {sw.ElapsedMilliseconds} ms and was managed by Thread: {Thread.CurrentThread.ManagedThreadId}");
return events;
}
I am using RestSharp package to make the requests.
The main method is like this:
static void Main(string[] args)
{
//this list has 83 ids which I am getting from a database
List<int> ids = GetIds();
async Task ProcessEvents()
{
IEnumerable<Task<List<CatalogEvent>>> techBriefEvents = ids.Select(id => GetEventsAsync(id));
await Task.WhenAll(techBriefEvents);
}
Task.WhenAll(ProcessEvents());
Console.ReadKey();
}
This is the output:
Task for TitleId: 142 took 164 ms and was managed by Thread: 8
Task for TitleId: 16 took 349 ms and was managed by Thread: 5
Task for TitleId: 10 took 634 ms and was managed by Thread: 6
Task for TitleId: 215 took 650 ms and was managed by Thread: 5
Task for TitleId: 114 took 826 ms and was managed by Thread: 6
Task for TitleId: 214 took 843 ms and was managed by Thread: 5
Task for TitleId: 56 took 983 ms and was managed by Thread: 6
Task for TitleId: 212 took 1001 ms and was managed by Thread: 5
Task for TitleId: 168 took 1141 ms and was managed by Thread: 6
Task for TitleId: 21 took 1168 ms and was managed by Thread: 5
Task for TitleId: 26 took 1309 ms and was managed by Thread: 6
Task for TitleId: 30 took 1334 ms and was managed by Thread: 5
Task for TitleId: 213 took 1462 ms and was managed by Thread: 6
Task for TitleId: 24 took 1510 ms and was managed by Thread: 5
Task for TitleId: 29 took 1619 ms and was managed by Thread: 6
Task for TitleId: 23 took 1669 ms and was managed by Thread: 5
Task for TitleId: 31 took 1779 ms and was managed by Thread: 6
Task for TitleId: 14 took 1906 ms and was managed by Thread: 5
Task for TitleId: 18 took 1943 ms and was managed by Thread: 6
Task for TitleId: 20 took 2064 ms and was managed by Thread: 6
Task for TitleId: 19 took 2110 ms and was managed by Thread: 6
Task for TitleId: 175 took 2222 ms and was managed by Thread: 8
Task for TitleId: 15 took 2275 ms and was managed by Thread: 6
Task for TitleId: 102 took 2400 ms and was managed by Thread: 8
Task for TitleId: 33 took 2464 ms and was managed by Thread: 8
Task for TitleId: 135 took 2563 ms and was managed by Thread: 5
Task for TitleId: 5 took 2632 ms and was managed by Thread: 8
Task for TitleId: 137 took 2750 ms and was managed by Thread: 5
Task for TitleId: 12 took 2796 ms and was managed by Thread: 8
Task for TitleId: 41 took 2911 ms and was managed by Thread: 5
Task for TitleId: 136 took 2998 ms and was managed by Thread: 8
Task for TitleId: 43 took 3084 ms and was managed by Thread: 5
Task for TitleId: 139 took 3159 ms and was managed by Thread: 8
Task for TitleId: 51 took 3240 ms and was managed by Thread: 5
Task for TitleId: 42 took 3322 ms and was managed by Thread: 5
Task for TitleId: 39 took 3393 ms and was managed by Thread: 5
Task for TitleId: 44 took 3502 ms and was managed by Thread: 8
Task for TitleId: 122 took 3583 ms and was managed by Thread: 5
Task for TitleId: 36 took 3697 ms and was managed by Thread: 8
Task for TitleId: 95 took 3744 ms and was managed by Thread: 5
Task for TitleId: 67 took 3871 ms and was managed by Thread: 8
Task for TitleId: 229 took 3896 ms and was managed by Thread: 5
Task for TitleId: 226 took 4034 ms and was managed by Thread: 8
Task for TitleId: 108 took 4078 ms and was managed by Thread: 5
Task for TitleId: 123 took 4213 ms and was managed by Thread: 8
Task for TitleId: 143 took 4285 ms and was managed by Thread: 5
Task for TitleId: 236 took 4364 ms and was managed by Thread: 8
Task for TitleId: 228 took 4466 ms and was managed by Thread: 5
Task for TitleId: 232 took 4540 ms and was managed by Thread: 6
Task for TitleId: 230 took 4641 ms and was managed by Thread: 5
Task for TitleId: 149 took 4715 ms and was managed by Thread: 6
Task for TitleId: 176 took 4793 ms and was managed by Thread: 5
Task for TitleId: 208 took 4902 ms and was managed by Thread: 6
Task for TitleId: 155 took 4946 ms and was managed by Thread: 5
Task for TitleId: 61 took 5057 ms and was managed by Thread: 6
Task for TitleId: 190 took 5097 ms and was managed by Thread: 5
Task for TitleId: 93 took 5262 ms and was managed by Thread: 5
Task for TitleId: 194 took 5280 ms and was managed by Thread: 5
Task for TitleId: 156 took 5419 ms and was managed by Thread: 6
Task for TitleId: 101 took 5440 ms and was managed by Thread: 5
Task for TitleId: 193 took 5572 ms and was managed by Thread: 6
Task for TitleId: 167 took 5598 ms and was managed by Thread: 5
Task for TitleId: 197 took 5730 ms and was managed by Thread: 6
Task for TitleId: 111 took 5755 ms and was managed by Thread: 5
Task for TitleId: 216 took 5882 ms and was managed by Thread: 6
Task for TitleId: 60 took 5930 ms and was managed by Thread: 5
Task for TitleId: 9 took 6059 ms and was managed by Thread: 5
Task for TitleId: 152 took 6085 ms and was managed by Thread: 5
Task for TitleId: 169 took 6218 ms and was managed by Thread: 6
Task for TitleId: 154 took 6264 ms and was managed by Thread: 5
Task for TitleId: 7 took 6403 ms and was managed by Thread: 6
Task for TitleId: 141 took 6506 ms and was managed by Thread: 5
Task for TitleId: 58 took 6560 ms and was managed by Thread: 6
Task for TitleId: 172 took 6670 ms and was managed by Thread: 5
Task for TitleId: 11 took 6730 ms and was managed by Thread: 6
Task for TitleId: 17 took 6846 ms and was managed by Thread: 5
Task for TitleId: 55 took 6912 ms and was managed by Thread: 6
Task for TitleId: 166 took 7020 ms and was managed by Thread: 5
Task for TitleId: 140 took 7069 ms and was managed by Thread: 6
Task for TitleId: 110 took 7177 ms and was managed by Thread: 5
Task for TitleId: 90 took 7222 ms and was managed by Thread: 6
Task for TitleId: 160 took 7352 ms and was managed by Thread: 5
Task for TitleId: 97 took 7400 ms and was managed by Thread: 6
Task for TitleId: 200 took 7503 ms and was managed by Thread: 5
Task for TitleId: 153 took 7556 ms and was managed by Thread: 6
Task for TitleId: 207 took 7654 ms and was managed by Thread: 5
Task for TitleId: 161 took 7721 ms and was managed by Thread: 6
Task for TitleId: 231 took 7810 ms and was managed by Thread: 5
Task for TitleId: 202 took 7873 ms and was managed by Thread: 6
Task for TitleId: 220 took 8068 ms and was managed by Thread: 6

One obvious misconception is that "and was managed by Thread:". Unless ExecuteTaskAsync is very badly implemented, there is no thread.
If the requests are being made to the same host, you might be running into service point manager limitations.

Concurrent requests with HttpClient take longer than expected

I have a webservice which receives multiple requests at the same time. For each request, I need to call another webservice (authentication things). The problem is, if multiple (>20) requests happen at the same time, the response time suddenly gets a lot worse.
I made a sample to demonstrate the problem:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;
namespace CallTest
{
public class Program
{
private static readonly HttpClient _httpClient = new HttpClient(new HttpClientHandler { Proxy = null, UseProxy = false });
static void Main(string[] args)
{
ServicePointManager.DefaultConnectionLimit = 100;
ServicePointManager.Expect100Continue = false;
// warmup
CallSomeWebsite().GetAwaiter().GetResult();
CallSomeWebsite().GetAwaiter().GetResult();
RunSequentiell().GetAwaiter().GetResult();
RunParallel().GetAwaiter().GetResult();
}
private static async Task RunParallel()
{
var tasks = new List<Task>();
for (var i = 0; i < 300; i++)
{
tasks.Add(CallSomeWebsite());
}
await Task.WhenAll(tasks);
}
private static async Task RunSequentiell()
{
var tasks = new List<Task>();
for (var i = 0; i < 300; i++)
{
await CallSomeWebsite();
}
}
private static async Task CallSomeWebsite()
{
var watch = Stopwatch.StartNew();
using (var result = await _httpClient.GetAsync("http://example.com").ConfigureAwait(false))
{
// more work here, like checking success etc.
Console.WriteLine(watch.ElapsedMilliseconds);
}
}
}
}
Sequential calls are no problem. They take a few milliseconds to finish and the response time is mostly the same.
However, parallel request start taking longer and longer the more requests are being sent. Sometimes it takes even a few seconds. I tested it on .NET Framework 4.6.1 and on .NET Core 2.0 with the same results.
What is even stranger: I traced the HTTP requests with WireShark and they always take around the same time. But the sample program reports much higher values for parallel requests than WireShark.
How can I get the same performance for parallel requests? Is this a thread pool issue?

This behaviour has been fixed with .NET Core 2.1. I think the problem was the underlying windows WinHTTP handler, which was used by the HttpClient.
In .NET Core 2.1, they rewrote the HttpClientHandler (see https://blogs.msdn.microsoft.com/dotnet/2018/04/18/performance-improvements-in-net-core-2-1/#user-content-networking):
In .NET Core 2.1, HttpClientHandler has a new default implementation implemented from scratch entirely in C# on top of the other System.Net libraries, e.g. System.Net.Sockets, System.Net.Security, etc. Not only does this address the aforementioned behavioral issues, it provides a significant boost in performance (the implementation is also exposed publicly as SocketsHttpHandler, which can be used directly instead of via HttpClientHandler in order to configure SocketsHttpHandler-specific properties).
This turned out to remove the bottlenecks mentioned in the question.
On .NET Core 2.0, I get the following numbers (in milliseconds):
Fetching URL 500 times...
Sequentiell Total: 4209, Max: 35, Min: 6, Avg: 8.418
Parallel Total: 822, Max: 338, Min: 7, Avg: 69.126
But on .NET Core 2.1, the individual parallel HTTP requests seem to have improved a lot:
Fetching URL 500 times...
Sequentiell Total: 4020, Max: 40, Min: 6, Avg: 8.040
Parallel Total: 795, Max: 76, Min: 5, Avg: 7.972

In the question's RunParallel() function, a stopwatch is started for all 300 calls in the first second of the program running, and ended when each http request completes.
Therefore these times can't really be compared to the sequential iterations.
For smaller numbers of parallel tasks e.g. 50, if you measure the wall time that the sequential and parallel methods take you should find that the parallel method is faster due to it pipelining as many GetAsync tasks as it can.
That said, when running the code for 300 iterations I did find a repeatable several-second stall when running outside the debugger only:
Debug build, in debugger: Sequential 27.6 seconds, parallel 0.6 seconds
Debug build, without debugger: Sequential 26.8 seconds, parallel 3.2 seconds
[Edit]
There's a similar scenario described in this question, its possibly not relevant to your problem anyway.
This problem gets worse the more tasks are run, and disappears when:
Swapping the GetAsync work for an equivalent delay
Running against a local server
Slowing the rate of tasks creation / running less concurrent tasks
The watch.ElapsedMilliseconds diagnostic stops for all connections, indicating that all connections are affected by the throttling.
Seems to be some sort of (anti-syn-flood?) throttling in the host or network, that just halts the flow of packets once a certain number of sockets start connecting.

It sounds like for whatever reason, you're hitting a point of diminishing returns at around 20 concurrent Tasks. So, your best option might be to throttle your parallelism. TPL Dataflow is a great library for achieving this. To follow your pattern, add a method like this:
private static Task RunParallelThrottled()
{
var throtter = new ActionBlock<int>(i => CallSomeWebsite(),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 20 });
for (var i = 0; i < 300; i++)
{
throttler.Post(i);
}
throttler.Complete();
return throttler.Completion;
}
You might need to experiment with MaxDegreeOfParallelism until you find the sweet spot. Note that this is more efficient than doing batches of 20. In that scenario, all 20 in the batch would need to complete before the next batch begins. With TPL Dataflow, as soon as one completes, another is allowed to begin.

The reason that you are having issues is that .NET does not resume Tasks in the order that they are awaited, an awaited Task is only resumed when a calling function cannot resume execution, and Task is not for Parallel execution.
If you make a few modifications so that you pass in i to the CallSomeWebsite function and call Console.WriteLine("All loaded"); after you add all the tasks to the list, you will get something like this: (RequestNumber: Time)
All loaded
0: 164
199: 236
299: 312
12: 813
1: 837
9: 870
15: 888
17: 905
5: 912
10: 952
13: 952
16: 961
18: 976
19: 993
3: 1061
2: 1061
Do you notice how every Task is created before any of the times are printed out to the screen? The entire loop of creating Tasks completes before any of the Tasks resume execution after awaiting the network call.
Also, see how request 199 is completed before request 1? .NET will resume Tasks in the order that it deems best (This is guaranteed to be more complicated but I am not exactly sure how .NET decides which Task to continue).
One thing that I think you might be confusing is Asynchronous and Parallel. They are not the same, and Task is used for Asynchronous execution. What that means is that all of these tasks are running on the same thread (Probably. .NET can start a new thread for tasks if needed), so they are not running in Parallel. If they were truly Parallel, they would all be running in different threads, and the execution times would not be increasing for each execution.
Updated functions:
private static async Task RunParallel()
{
var tasks = new List<Task>();
for (var i = 0; i < 300; i++)
{
tasks.Add(CallSomeWebsite(i));
}
Console.WriteLine("All loaded");
await Task.WhenAll(tasks);
}
private static async Task CallSomeWebsite(int i)
{
var watch = Stopwatch.StartNew();
using (var result = await _httpClient.GetAsync("https://www.google.com").ConfigureAwait(false))
{
// more work here, like checking success etc.
Console.WriteLine($"{i}: {watch.ElapsedMilliseconds}");
}
}
As for the reason that the time printed is longer for the Asynchronous execution then the Synchronous execution, your current method of tracking time does not take into account the time that was spent between execution halt and continuation. That is why all of the reporting execution times are increasing over the set of completed requests. If you want an accurate time, you will need to find a way of subtracting the time that was spent between the await occurring and execution continuing. The issue isn't that it is taking longer, it is that you have an inaccurate reporting method. If you sum the time for all the Synchronous calls, it is actually significantly more than the max time of the Asynchronous call:
Sync: 27965
Max Async: 2341

Does usage of Thread.Sleep(n) causes performance issues?

I am using Thread.Sleep(n) for my project. I heard that Thread.Sleep can cause performance issues but not sure about it.
My requirement is:
Wait for 5 minute increments, up to 30 minutes (6 times the 5 minute
delay). After this, begin to increment by 1 hour and do this 5 times
(additional 5 hours).
Below I have provided my sample code which uses Thread.Sleep(n) in different scenarios:
Thread.Sleep(1000 * 60 * 5); //-------waiting for 5 minutes
var isDownloaded = false;
try
{
var attempt = 0;
while (attempt < 11)
{
isDownloaded = TryDownloading(strPathToDownload, strFileToDownload);
if (isDownloaded)
break;
attempt++;
if (attempt < 6)
Thread.Sleep(1000 * 60 * 5); //--------waiting for 5 minutes
else
{
if (attempt < 11)
Thread.Sleep(1000 * 60 * 60); //-------waiting for 1 hour
else
break;
}
}
}
On the above code I am trying to download a file with a maximum of 11 download attempts. Initially it waits for 5 minutes and then try to download the file which is the first attempt and if failed then it tries for the next 5 attempts with an interval of 5 minutes each. If they failed for the first six attempts then it go for the next 5 attempts with an interval for 1 hour each.
So we decided to use Thread.Sleep for those time delay in our console app.
Does this causes any problem or performance issues?
If Thread.Sleep(n) causes performance issues, then which would be an better alternative way instead of using Thread.Sleep(n)?
Also finally, Does MSDN suggested that Thread.Sleep(n) is harmful or it shouldn't be used?

This is absolutely fine. Here are the costs of sleeping:
You keep a thread occupied. This takes a little memory.
Setting up the wait inside of the OS. That is O(1). The duration of the sleep does not matter. It is a small, constant cost.
That's all.
What is bad, though, is busy waiting or doing polling loops because that causes actual CPU usage. Just spending time in a sleep or wait does not create CPU usage.
TL;DR: Use sleep for delays, do not use sleep for polling.
I must say that the aversion against sleeping is sometimes just a trained reflex. Be sure to analyze the concrete use case before you condemn sleeping.

Do not use ASP.NET worker process to run long running tasks!
The app pool may be recycled any time and you will lose your sleeping threads.
Consider using Windows Service instead.
You can communicate between web site and windows service using database or messaging.

You'd better use a Timer to perform intermittent action like that because Thread.Sleep is a blocking call and keeps the process allocated and the application freeze.
like this:
in the caller object
Timer t = new Timer(1000 * 60 * 5);
t.Tick += t_Tick;
t.Start();
than implement the event
//event timer elapsed implementation
int count = 0;
private void t_Tick(object sender, EventArgs e)
{
if(count >=5)
t.Stop();
else{
//your code that do the work here
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.