converting C(fork system) to C#

converting C(fork system) to C# - c#

I have a C sample program that I need to convert to C#, but I don't know how.
Here's the sample code:
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv)
{
printf("--beginning of program\n");
int counter = 0;
pid_t pid = fork();
if (pid == 0)
{
// child process
int i = 0;
for (; i < 5; ++i)
{
printf("child process: counter=%d\n", ++counter);
}
}
else if (pid > 0)
{
// parent process
int j = 0;
for (; j < 5; ++j)
{
printf("parent process: counter=%d\n", ++counter);
}
}
else
{
// fork failed
printf("fork() failed!\n");
return 1;
}
printf("--end of program--\n");
return 0;
}

The Windows equivalent to fork is using threads, so your code is trivially written in C# as follows:
new Thread(new ThreadStart(()=>
{
for(int i=0;i<5;++i)
Console.WriteLine("child process, counter="+i);
})).Start();
for(int i=0;i<5;++i)
Console.WriteLine("parent process, counter="+i);

The posix fork method creates a copy of the current address space as a new process and returns either the new process id (to the original process that called fork) or 0 to the newly-created process. Execution continues at the same place in both cases. There is no direct equivalent of this in C# or any other language that does not use posix-compatible libraries.
In C# we use threads as Blindy has already pointed out. The significant difference between threads and fork is that with threads you have a single process with a single memory space. That means that when you change a shared variable it changes for all threads. With a forked process each process gets its own memory space and any changes you make in that memory space are not automatically reflected in the other processes.
There are other differences, but that's probably the biggest one I can see. Creating multiple processes has some overheads and limitations that don't necessarily appear with multiple threads, not least of which is that if you're just starting a new process to do a small operation it is going to have the entire content of your program within that process. Threads can be simple or complex without much impact on the memory space of their process.

Related

How to achieve 100% CPU usage in multithreaded application?

I have ~100 text files 200MB each and I need to parse them. The program below loads files and processes them in parallel. It can create a Thread per file or a Process per file.
The problem: If I use threads it never uses 100% CPU and takes longer to complete.
THREAD PER FILE
total time: 430 sec
CPU usage 15-20%
CPU frequency 1.2 GHz
PROCESS PER FILE
total time 100 sec
CPU usage 100%
CPU frequency 3.75 GHz
I'm using E5-1650 v3 Hexa-Core with HT, therefore I process 12 files at a time.
How can I achive 100% CPU utilisation by threads?
Code below does not use result of processing since it doen not affect the problem.
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Threading;
namespace libsvm2tsv
{
class Program
{
static void Main(string[] args)
{
var sw = Stopwatch.StartNew();
switch (args[0])
{
case "-t": LoadAll(args[1], LoadFile); break;
case "-p": LoadAll(args[1], RunChild); break;
case "-f": LoadFile(args[1]); return;
}
Console.WriteLine("ELAPSED: {0} sec.", sw.ElapsedMilliseconds / 1000);
Console.ReadLine();
}
static void LoadAll(string folder, Action<string> algorithm)
{
var sem = new SemaphoreSlim(12);
Directory.EnumerateFiles(folder).ToList().ForEach(f=> {
sem.Wait();
new Thread(() => { try { algorithm(f); } finally { sem.Release(); } }).Start();
});
}
static void RunChild(string file)
{
Process.Start(new ProcessStartInfo
{
FileName = Assembly.GetEntryAssembly().Location,
Arguments = "-f \"" + file + "\"",
UseShellExecute = false,
CreateNoWindow = true
})
.WaitForExit();
}
static void LoadFile(string inFile)
{
using (var ins = File.OpenText(inFile))
while (ins.Peek() >= 0)
ParseLine(ins.ReadLine());
}
static long[] ParseLine(string line)
{
return line
.Split()
.Skip(1)
.Select(r => (long)(double.Parse(r.Split(':')[1]) * 1000))
.Select(r => r < 0 ? -1 : r)
.ToArray();
}
}
}

Finally, I've found the bottleneck. I'm using string.Split to parse numbers from every line of data, so I get billions short strings. These strings are put in heap. Since all threads share single heap memory allocation is synchronized. Since processes have separate heaps - no syncronization occures and things work fast. That's the root of issue. So, I rewrote parsing using IndexOf rather than Split and threads started to perform even better than separate processes. Just as I expected it to be.
Since .NET has no default tool to parse real numbers out of the certain position inside string I used this one: https://codereview.stackexchange.com/questions/75791/optimize-custom-double-parse with small modification.
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Threading;
using System.Threading.Tasks;
namespace libsvm2tsv
{
class Program
{
static void Main(string[] args)
{
var sw = Stopwatch.StartNew();
switch (args[0])
{
case "-t": LoadAll(args[1], LoadFile); break;
case "-p": LoadAll(args[1], RunChild); break;
case "-f": LoadFile(args[1]); return;
}
Console.WriteLine("ELAPSED: {0} sec.", sw.ElapsedMilliseconds / 1000);
Console.ReadLine();
}
static void LoadAll(string folder, Action<string> algorithm)
{
Parallel.ForEach(
Directory.EnumerateFiles(folder),
new ParallelOptions { MaxDegreeOfParallelism = 12 },
f => algorithm(f));
}
static void RunChild(string file)
{
Process.Start(new ProcessStartInfo
{
FileName = Assembly.GetEntryAssembly().Location,
Arguments = "-f \"" + file + "\"",
UseShellExecute = false,
CreateNoWindow = true
})
.WaitForExit();
}
static void LoadFile(string inFile)
{
using (var ins = File.OpenText(inFile))
while (ins.Peek() >= 0)
ParseLine(ins.ReadLine());
}
static long[] ParseLine(string line)
{
// first, count number of items
var items = 1;
for (var i = 0; i < line.Length; i++)
if (line[i] == ' ') items++;
//allocate memory and parse items
var all = new long[items];
var n = 0;
var index = 0;
while (index < line.Length)
{
var next = line.IndexOf(' ', index);
if (next < 0) next = line.Length;
if (next > index)
{
var v = (long)(parseDouble(line, line.IndexOf(':', index) + 1, next - 1) * 1000);
if (v < 0) v = -1;
all[n++] = v;
}
index = next + 1;
}
return all;
}
private readonly static double[] pow10Cache;
static Program()
{
pow10Cache = new double[309];
double p = 1.0;
for (int i = 0; i < 309; i++)
{
pow10Cache[i] = p;
p /= 10;
}
}
static double parseDouble(string input, int from, int to)
{
long inputLength = to - from + 1;
long digitValue = long.MaxValue;
long output1 = 0;
long output2 = 0;
long sign = 1;
double multiBy = 0.0;
int k;
//integer part
for (k = 0; k < inputLength; ++k)
{
digitValue = input[k + from] - 48; // '0'
if (digitValue >= 0 && digitValue <= 9)
{
output1 = digitValue + (output1 * 10);
}
else if (k == 0 && digitValue == -3 /* '-' */)
{
sign = -1;
}
else if (digitValue == -2 /* '.' */ || digitValue == -4 /* ',' */)
{
break;
}
else
{
return double.NaN;
}
}
//decimal part
if (digitValue == -2 /* '.' */ || digitValue == -4 /* ',' */)
{
multiBy = pow10Cache[inputLength - (++k)];
for (; k < inputLength; ++k)
{
digitValue = input[k + from] - 48; // '0'
if (digitValue >= 0 && digitValue <= 9)
{
output2 = digitValue + (output2 * 10);
}
else
{
return Double.NaN;
}
}
multiBy *= output2;
}
return sign * (output1 + multiBy);
}
}
}

I have ~100 text files 200MB each and I need to parse them.
The fastest way to read or write data from/to a spinning disk is sequentially in order to minimize the time the disk heads need to seek to find data or write it to the specified location. So doing parallel IO to a single disk is going to slow IO rates down - and depending on the actual IO pattern it can slow rates down dramatically. A disk that can handle 100 MB/sec sequentially might only be able to move 20 or 30 kilobytes per second doing parallel reads/writes of small blocks of data.
Were I optimizing such a process, I wouldn't worry about CPU utilization first, I'd optimize IO throughput first. You are IO bound unless you're doing some really CPU-intensive parsing. Once your IO throughput is optimized, if you're getting 100% CPU utilization then you're CPU bound. If your design scales nicely, then you can add CPUs and probably run faster.
To speed up your IO, you first need to minimize disk seeks, especially if you're using consumer-grade, cheap SATA drives. There are multiple ways to do this.
First, the easiest - eliminate the disk heads. Put your data on SSDs. Problem solved without having to write complex, bug-prone optimized code. How much time will it take for you to make this run faster using software? You have to design something, test it, tune it, debug it, and importantly, keep it running and running well. None of that is free. One important cost is the opportunity cost of spending time making things go faster - when you're doing that, you're not solving any other problems. Faster hardware has none of those costs. In this case, buy the SSDs, plug them in, and you're faster.
But if you really want to spend several weeks or longer optimizing your processing software, here's how I'd go about it:
Spread the data over multiple disks. You can't do parallel IO to physical disks quickly as the disk head seeks will kill performance. So do as much of the reading and writing to different disks as possible.
For each disk, have a single reader or writer thread or process that feeds data to a worker pool or writes data provided by that worker pool.
A tunable number of worker threads/processes to do the actual parsing.
That way, you can read the files and write output data all sequentally and without contention on each disk from other IO processes.

I would consider replacing ForEach with Parallel.ForEach and remove your explicit use of Threads. Use https://stackoverflow.com/a/5512363/34092 to set the number of threads to limit it to.
static void LoadAll(string folder, Action<string> algorithm)
{
Parallel.ForEach(Directory.EnumerateFiles(folder), algorithm);
}

As others have stated IO will probably be a bottleneck in the end and getting 100% CPU usage is really irrelevant. I feel they are missing something, though: you do get higher throughput with processes than with threads and that means IO is not the only bottleneck. The reason is that the CPU runs with a higher frequency with processes and you want it to run at peak spead when it is not waiting for IO! So, how can you do that?
The easiest way is to set the power profile from the power options manually. Edit power options and set both minimum and maximum processor state to 100%. That should do the job.
If you want to do it from your program, have a look at How to Disable Dynamic Frequency Scaling?. There is probably a similar API for .NET without using native code, but I couldn't find it now.

Interprocess communication on the same box - communication between 2 applications or processes

What is the best way to implement interprocess communication between applications that are on the same box -- both written in c#?
The manager application will be sending commands such as: stop, start to the other applications. It will also be monitoring the applications and possibly asking for data.
All applications will be on the same machine running on windows 7 OS.
Is IPC Channel a good choice for this? Is named Pipes a better choice? Or is sockets a better choice?
The applications that are being managed all have the same name. When they start up, they load in a dll that determines which algorithms are run.
Thanks

Boling it down, use WCF with Named Pipes.
Here is a link that has some meat to it regarding exactly this question: What is the best choice for .NET inter-process communication?

One way to do it, especially if you want to communicate between different processes spun off the same application - using MemoryMappedFile. And here is simplest example - Place it into console app. Start 2 instances of it and in quick succession, type w+enter in one and r+enter in the other. Watch. Note – I de-synchronized read and write timing so that you can see that sometimes data changes and other times - not
class Program
{
private static MemoryMappedFile _file = MemoryMappedFile.CreateOrOpen("XXX_YYY", 1, MemoryMappedFileAccess.ReadWrite);
private static MemoryMappedViewAccessor _view = _file.CreateViewAccessor();
static void Main(string[] args)
{
askforinput:
Console.WriteLine("R for read, W for write");
string input = Console.ReadLine();
if (string.Equals(input, "r", StringComparison.InvariantCultureIgnoreCase))
StartReading();
else if (string.Equals(input, "w", StringComparison.InvariantCultureIgnoreCase))
StartWriting();
else
goto askforinput;
_view.Dispose();
_file.Dispose();
}
private static void StartReading()
{
bool currVal = false;
for (int i = 0; i < 100; i++)
{
currVal = currVal != true;
Console.WriteLine(_view.ReadBoolean(0));
Thread.Sleep(221);
}
}
private static void StartWriting()
{
bool currVal = false;
for (int i = 0; i < 100; i++)
{
currVal = currVal != true;
_view.Write(0, currVal);
Console.WriteLine("Writen: " + currVal.ToString());
Thread.Sleep(500);
}
}
}
Again, you can build very complex and robust communication layer using Memory Mapped Files. This is just simple example.

BookSleeve - Poor Performance When Setting Hashes

I'm in the process of updating my web service to use the latest BookSleeve library, 1.3.38. Previously I was using 1.1.0.7
While doing some benchmarking, I noticed that setting hashes in Redis using the new version of BookSleeve is many times slower than the old version. Please consider the following C# benchmarking code:
public void TestRedisHashes()
{
int numItems = 1000; // number of hash items to set in redis
int numFields = 30; // number of fields in each redis hash
RedisConnection redis = new RedisConnection("10.0.0.01", 6379);
redis.Open();
// wait until the connection is open
while (!redis.State.Equals(BookSleeve.RedisConnectionBase.ConnectionState.Open)) { }
Stopwatch timer = new Stopwatch();
timer.Start();
for (int i = 0; i < numItems; i++)
{
string key = "test_" + i.ToString();
for (int j = 0; j < numFields; j++)
{
// set a value for each field in the hash
redis.Hashes.Set(0, key, "field_" + j.ToString(), "testdata");
}
redis.Keys.Expire(0, key, 30); // 30 second ttl
}
timer.Stop();
Console.WriteLine("Elapsed time for hash writes: {0} ms", timer.ElapsedMilliseconds);
}
BookSleeve 1.1.0.7 takes about 20ms to set 1000 hashes to Redis 2.6, while 1.3.38 takes around 400ms. That's 20X slower! Everything other part of BookSleeve 1.3.38 that I've tested is either as fast or faster than the old version. I've also tried the same test using Redis 2.4 as well as wrapping everything in a transaction. In both cases I got similar performance.
Has anyone else noticed anything like this? I must be doing something wrong... am I setting hashes correctly using the new version of BookSleeve? Is this the right way to do fire-and-forget commands? I've looked though the unit tests as an example of how to use hashes, but haven't been able to find what I'm doing differently. Is it possible that the newest version is just slower in this case?

To actually test the overall speed you would need to add code that waits for the last of the messages to be processed, for example:
Task last = null;
for (int i = 0; i < numItems; i++)
{
string key = "test_" + i.ToString();
for (int j = 0; j < numFields; j++)
{
// set a value for each field in the hash
redis.Hashes.Set(0, key, "field_" + j.ToString(), "testdata");
}
last = redis.Keys.Expire(0, key, 30); // 30 second ttl
}
redis.Wait(last);
Otherwise all you are timing is how fast the call to Set/Expire is. And in this case, that could matter. You see, in 1.1.0.7, all messages are immediately placed onto a queue, and a separate dedicated writer thread then picks up that message and writes it to the stream. In 1.3.38, the dedicated writer thread is gone (for various reasons). So if the socket is available, the calling thread writes to the underlying stream (if the socket is in use, there is a mechanism to handle that). More importantly, it is possible that in your original test against 1.1.0.7, no useful work has actually happened yet - there is no guarantee that work is anywhere near the socket, etc.
In most scenarios, this does not cause any overhead (and is less overhead when amortized), however: it is possible that in your case you are being impacted by effectively buffer under-run - in 1.1.0.7 you would have filled the buffer really quickly, and the worker thread would have probably always found more waiting messages - so it would not flush the stream until the end; in 1.3.38, it is probably flushing between messages. So: let's fix that:
Task last = null;
redis.SuspendFlush();
try {
for (int i = 0; i < numItems; i++)
{
string key = "test_" + i.ToString();
for (int j = 0; j < numFields; j++)
{
// set a value for each field in the hash
redis.Hashes.Set(0, key, "field_" + j.ToString(), "testdata");
}
last = redis.Keys.Expire(0, key, 30); // 30 second ttl
}
}
finally {
redis.ResumeFlush();
}
redis.Wait(last);
The SuspendFlush() / ResumeFlush() pair is ideal when calling a large batch of operations on a single thread to avoid any additional flushing. To copy the intellisense notes:
//
// Summary:
// Temporarily suspends eager-flushing (flushing if the write-queue becomes
// empty briefly). Buffer-based flushing will still occur when the data is full.
// This is useful if you are performing a large number of operations in close
// duration, and want to avoid packet fragmentation. Note that you MUST call
// ResumeFlush at the end of the operation - preferably using Try/Finally so
// that flushing is resumed even upon error. This method is thread-safe; any
// number of callers can suspend/resume flushing concurrently - eager flushing
// will resume fully when all callers have called ResumeFlush.
//
// Remarks:
// Note that some operations (transaction conditions, etc) require flushing
// - this will still occur even if the buffer is only part full.
Note that in most high throughput scenarios there are multiple operations coming in from multiple threads: in those scenarios, any work from concurrent threads will automatically be queued in a way that minimises the number of threads.

Multithreading Unhandled Exceptions

I am working on a project that will kick off multiple independent processes. I would like them to be isolated to the point that if one fails unexpectedly, the others will continue on without being impacted. I have tried a POC (pasted below) to test this using AppDomains but it still crashes the entire parent application.
Am I taking the wrong approach? If so, what should I be doing? If not, what am I doing wrong?
class Program
{
static void Main(string[] args)
{
Random rand = new Random();
Thread[] threads = new Thread[15];
for (int i = 0; i < 15; i++)
{
AppDomain domain = AppDomain.CreateDomain("Test" + i);
domain.UnhandledException += new UnhandledExceptionEventHandler(domain_UnhandledException);
domain.
Test test = domain.CreateInstanceFromAndUnwrap(Assembly.GetExecutingAssembly().Location, "ConsoleApplication1.Test") as Test;
Thread thread = new Thread(new ParameterizedThreadStart(test.DoSomeStuff));
thread.Start(rand.Next());
Console.WriteLine(String.Format("Thread #{0} has started", i));
threads[i] = thread;
}
for (int i = 0; i < 15; i++)
{
threads[i].Join();
Console.WriteLine(String.Format("Thread #{0} has finished", i));
}
Console.ReadLine();
}
static void domain_UnhandledException(object sender, UnhandledExceptionEventArgs e)
{
Console.WriteLine("UNHANDLED");
}
}
public class Test : MarshalByRefObject
{
public void DoSomeStuff(object state)
{
int loops = (int)state;
for (int i = 0; i < loops; i++)
{
if (i % 300 == 0)
{
// WILL break
Console.WriteLine("Breaking");
int val = i / (i % 300);
}
}
}
}
EDIT
Please note that the "Test" class is extremely simplified. The actual implimentation would be extremely complex and have a very likely gap in exception handling.

You don't need separate AppDomains. All you need to do is to catch exceptions in the DoSomeStuff member of the Test class. Thus if one of these threads handles its own exception, then rest of your app can continue running.

You have to catch the exception in the thread that raised it, there's no way around that. What you need to do next is probably serialize the exception back to the primary appdomain so it is aware of it. After all, it set off the thread to get some kind of job done and that job wasn't completed. Something ought to be done about that.
What you are emulating is the way that SQL Server and ASP.NET work. They have a very nice execution model. They accept requests to perform work from client machines. If that request bombs, they have the luxury of sending back a "sorry, it didn't work" message back. And shrug it off like it never happened, so nicely supported by appdomains.
Leaving it up to the client machine to deal with that. Not infrequently requiring the assistance of a human btw. Easy peasy, but it wasn't an accident they were designed that way. Truly emulating this execution model also requires finding somebody else to deal with the misery. That's difficult.

Console.WriteLine slow

I run through millions of records and sometimes I have to debug using Console.WriteLine to see what is going on.
However, Console.WriteLine is very slow, considerably slower than writing to a file.
BUT it is very convenient - does anyone know of a way to speed it up?

If it is just for debugging purposes you should use Debug.WriteLine instead. This will most likely be a bit faster than using Console.WriteLine.
Example
Debug.WriteLine("There was an error processing the data.");

You can use the OutputDebugString API function to send a string to the debugger. It doesn't wait for anything to redraw and this is probably the fastest thing you can get without digging into the low-level stuff too much.
The text you give to this function will go into Visual Studio Output window.
[DllImport("kernel32.dll")]
static extern void OutputDebugString(string lpOutputString);
Then you just call OutputDebugString("Hello world!");

Do something like this:
public static class QueuedConsole
{
private static StringBuilder _sb = new StringBuilder();
private static int _lineCount;
public void WriteLine(string message)
{
_sb.AppendLine(message);
++_lineCount;
if (_lineCount >= 10)
WriteAll();
}
public void WriteAll()
{
Console.WriteLine(_sb.ToString());
_lineCount = 0;
_sb.Clear();
}
}
QueuedConsole.WriteLine("This message will not be written directly, but with nine other entries to increase performance.");
//after your operations, end with write all to get the last lines.
QueuedConsole.WriteAll();
Here is another example: Does Console.WriteLine block?

I recently did a benchmark battery for this on .NET 4.8. The tests included many of the proposals mentioned on this page, including Async and blocking variants of both BCL and custom code, and then most of those both with and without dedicated threading, and finally scaled across power-of-2 buffer sizes.
The fastest method, now used in my own projects, buffers 64K of wide (Unicode) characters at a time from .NET directly to the Win32 function WriteConsoleW without copying or even hard-pinning. Remainders larger than 64K, after filling and flushing one buffer, are also sent directly, and in-situ as well. The approach deliberately bypasses the Stream/TextWriter paradigm so it can (obviously enough) provide .NET text that is already Unicode to a (native) Unicode API without all the superfluous memory copying/shuffling and byte[] array allocations required for first "decoding" to a byte stream.
If there is interest (perhaps because the buffering logic is slightly intricate), I can provide the source for the above; it's only about 80 lines. However, my tests determined that there's a simpler way to get nearly the same performance, and since it doesn't require any Win32 calls, I'll show this latter technique instead.
The following is way faster than Console.Write:
public static class FastConsole
{
static readonly BufferedStream str;
static FastConsole()
{
Console.OutputEncoding = Encoding.Unicode; // crucial
// avoid special "ShadowBuffer" for hard-coded size 0x14000 in 'BufferedStream'
str = new BufferedStream(Console.OpenStandardOutput(), 0x15000);
}
public static void WriteLine(String s) => Write(s + "\r\n");
public static void Write(String s)
{
// avoid endless 'GetByteCount' dithering in 'Encoding.Unicode.GetBytes(s)'
var rgb = new byte[s.Length << 1];
Encoding.Unicode.GetBytes(s, 0, s.Length, rgb, 0);
lock (str) // (optional, can omit if appropriate)
str.Write(rgb, 0, rgb.Length);
}
public static void Flush() { lock (str) str.Flush(); }
};
Note that this is a buffered writer, so you must call Flush() when you have no more text to write.
I should also mention that, as shown, technically this code assumes 16-bit Unicode (UCS-2, as opposed to UTF-16) and thus won't properly handle 4-byte escape surrogates for characters beyond the Basic Multilingual Plane. The point hardly seems important given the more extreme limitations on console text display in general, but could perhaps still matter for piping/redirection.
Usage:
FastConsole.WriteLine("hello world.");
// etc...
FastConsole.Flush();
On my machine, this gets about 77,000 lines/second (mixed-length) versus only 5,200 lines/sec under identical conditions for normal Console.WriteLine. That's a factor of almost 15x speedup.
These are controlled comparison results only; note that absolute measurements of console output performance are highly variable, depending on the console window settings and runtime conditions, including size, layout, fonts, DWM clipping, etc.

Why Console is slow:
Console output is actually an IO stream that's managed by your operating system. Most IO classes (like FileStream) have async methods but the Console class was never updated so it always blocks the thread when writing.
Console.WriteLine is backed by SyncTextWriter which uses a global lock to prevent multiple threads from writing partial lines. This is a major bottleneck that forces all threads to wait for each other to finish the write.
If the console window is visible on screen then there can be significant slowdown because the window needs to be redrawn before the console output is considered flushed.
Solutions:
Wrap the Console stream with a StreamWriter and then use async methods:
var sw = new StreamWriter(Console.OpenStandardOutput());
await sw.WriteLineAsync("...");
You can also set a larger buffer if you need to use sync methods. The call will occasionally block when the buffer gets full and is flushed to the stream.
// set a buffer size
var sw = new StreamWriter(Console.OpenStandardOutput(), Encoding.UTF8, 8192);
// this write call will block when buffer is full
sw.Write("...")
If you want the fastest writes though, you'll need to make your own buffer class that writes to memory and flushes to the console asynchronously in the background using a single thread without locking. The new Channel<T> class in .NET Core 2.1 makes this simple and fast. Plenty of other questions showing that code but comment if you need tips.

A little old thread and maybe not exactly what the OP is looking for, but I ran into the same question recently, when processing audio data in real time.
I compared Console.WriteLine to Debug.WriteLine with this code and used DebugView as a dos box alternative. It's only an executable (nothing to install) and can be customized in very neat ways (filters & colors!). It has no problems with tens of thousands of lines and manages the memory quite well (I could not find any kind of leak, even after days of logging).
After doing some testing in different environments (e.g.: virtual machine, IDE, background processes running, etc) I made the following observations:
Debug is almost always faster
For small bursts of lines (<1000), it's about 10 times faster
For larger chunks it seems to converge to about 3x
If the Debug output goes to the IDE, Console is faster :-)
If DebugView is not running, Debug gets even faster
For really large amounts of consecutive outputs (>10000), Debug gets slower and Console stays constant. I presume this is due to the memory, Debug has to allocate and Console does not.
Obviously, it makes a difference if DebugView is actually "in-view" or not, as the many gui updates have a significant impact on the overall performance of the system, while Console simply hangs, if visible or not. But it's hard to put numbers on that one...
I did not try multiple threads writing to the Console, as I think this should generally avoided. I never had (performance) problems when writing to Debug from multiple threads.
If you compile with Release settings, usually all Debug statements are omitted and Trace should produce the same behaviour as Debug.
I used VS2017 & .Net 4.6.1
Sorry for so much code, but I had to tweak it quite a lot to actually measure what I wanted to. If you can spot any problems with the code (biases, etc.), please comment. I would love to get more precise data for real life systems.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading;
namespace Console_vs_Debug {
class Program {
class Trial {
public string name;
public Action console;
public Action debug;
public List < float > consoleMeasuredTimes = new List < float > ();
public List < float > debugMeasuredTimes = new List < float > ();
}
static Stopwatch sw = new Stopwatch();
private static int repeatLoop = 1000;
private static int iterations = 2;
private static int dummy = 0;
static void Main(string[] args) {
if (args.Length == 2) {
repeatLoop = int.Parse(args[0]);
iterations = int.Parse(args[1]);
}
// do some dummy work
for (int i = 0; i < 100; i++) {
Console.WriteLine("-");
Debug.WriteLine("-");
}
for (int i = 0; i < iterations; i++) {
foreach(Trial trial in trials) {
Thread.Sleep(50);
sw.Restart();
for (int r = 0; r < repeatLoop; r++)
trial.console();
sw.Stop();
trial.consoleMeasuredTimes.Add(sw.ElapsedMilliseconds);
Thread.Sleep(1);
sw.Restart();
for (int r = 0; r < repeatLoop; r++)
trial.debug();
sw.Stop();
trial.debugMeasuredTimes.Add(sw.ElapsedMilliseconds);
}
}
Console.WriteLine("---\r\n");
foreach(Trial trial in trials) {
var consoleAverage = trial.consoleMeasuredTimes.Average();
var debugAverage = trial.debugMeasuredTimes.Average();
Console.WriteLine(trial.name);
Console.WriteLine($ " console: {consoleAverage,11:F4}");
Console.WriteLine($ " debug: {debugAverage,11:F4}");
Console.WriteLine($ "{consoleAverage / debugAverage,32:F2} (console/debug)");
Console.WriteLine();
}
Console.WriteLine("all measurements are in milliseconds");
Console.WriteLine("anykey");
Console.ReadKey();
}
private static List < Trial > trials = new List < Trial > {
new Trial {
name = "constant",
console = delegate {
Console.WriteLine("A static and constant string");
},
debug = delegate {
Debug.WriteLine("A static and constant string");
}
},
new Trial {
name = "dynamic",
console = delegate {
Console.WriteLine("A dynamically built string (number " + dummy++ + ")");
},
debug = delegate {
Debug.WriteLine("A dynamically built string (number " + dummy++ + ")");
}
},
new Trial {
name = "interpolated",
console = delegate {
Console.WriteLine($ "An interpolated string (number {dummy++,6})");
},
debug = delegate {
Debug.WriteLine($ "An interpolated string (number {dummy++,6})");
}
}
};
}
}

Just a little trick I use sometimes: If you remove focus from the Console window by opening another window over it, and leave it until it completes, it won't redraw the window until you refocus, speeding it up significantly. Just make sure you have the buffer set up high enough that you can scroll back through all of the output.

Try using the System.Diagnostics Debug class? You can accomplish the same things as using Console.WriteLine.
You can view the available class methods here.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.