BookSleeve - Poor Performance When Setting Hashes

BookSleeve - Poor Performance When Setting Hashes - c#

I'm in the process of updating my web service to use the latest BookSleeve library, 1.3.38. Previously I was using 1.1.0.7
While doing some benchmarking, I noticed that setting hashes in Redis using the new version of BookSleeve is many times slower than the old version. Please consider the following C# benchmarking code:
public void TestRedisHashes()
{
int numItems = 1000; // number of hash items to set in redis
int numFields = 30; // number of fields in each redis hash
RedisConnection redis = new RedisConnection("10.0.0.01", 6379);
redis.Open();
// wait until the connection is open
while (!redis.State.Equals(BookSleeve.RedisConnectionBase.ConnectionState.Open)) { }
Stopwatch timer = new Stopwatch();
timer.Start();
for (int i = 0; i < numItems; i++)
{
string key = "test_" + i.ToString();
for (int j = 0; j < numFields; j++)
{
// set a value for each field in the hash
redis.Hashes.Set(0, key, "field_" + j.ToString(), "testdata");
}
redis.Keys.Expire(0, key, 30); // 30 second ttl
}
timer.Stop();
Console.WriteLine("Elapsed time for hash writes: {0} ms", timer.ElapsedMilliseconds);
}
BookSleeve 1.1.0.7 takes about 20ms to set 1000 hashes to Redis 2.6, while 1.3.38 takes around 400ms. That's 20X slower! Everything other part of BookSleeve 1.3.38 that I've tested is either as fast or faster than the old version. I've also tried the same test using Redis 2.4 as well as wrapping everything in a transaction. In both cases I got similar performance.
Has anyone else noticed anything like this? I must be doing something wrong... am I setting hashes correctly using the new version of BookSleeve? Is this the right way to do fire-and-forget commands? I've looked though the unit tests as an example of how to use hashes, but haven't been able to find what I'm doing differently. Is it possible that the newest version is just slower in this case?

To actually test the overall speed you would need to add code that waits for the last of the messages to be processed, for example:
Task last = null;
for (int i = 0; i < numItems; i++)
{
string key = "test_" + i.ToString();
for (int j = 0; j < numFields; j++)
{
// set a value for each field in the hash
redis.Hashes.Set(0, key, "field_" + j.ToString(), "testdata");
}
last = redis.Keys.Expire(0, key, 30); // 30 second ttl
}
redis.Wait(last);
Otherwise all you are timing is how fast the call to Set/Expire is. And in this case, that could matter. You see, in 1.1.0.7, all messages are immediately placed onto a queue, and a separate dedicated writer thread then picks up that message and writes it to the stream. In 1.3.38, the dedicated writer thread is gone (for various reasons). So if the socket is available, the calling thread writes to the underlying stream (if the socket is in use, there is a mechanism to handle that). More importantly, it is possible that in your original test against 1.1.0.7, no useful work has actually happened yet - there is no guarantee that work is anywhere near the socket, etc.
In most scenarios, this does not cause any overhead (and is less overhead when amortized), however: it is possible that in your case you are being impacted by effectively buffer under-run - in 1.1.0.7 you would have filled the buffer really quickly, and the worker thread would have probably always found more waiting messages - so it would not flush the stream until the end; in 1.3.38, it is probably flushing between messages. So: let's fix that:
Task last = null;
redis.SuspendFlush();
try {
for (int i = 0; i < numItems; i++)
{
string key = "test_" + i.ToString();
for (int j = 0; j < numFields; j++)
{
// set a value for each field in the hash
redis.Hashes.Set(0, key, "field_" + j.ToString(), "testdata");
}
last = redis.Keys.Expire(0, key, 30); // 30 second ttl
}
}
finally {
redis.ResumeFlush();
}
redis.Wait(last);
The SuspendFlush() / ResumeFlush() pair is ideal when calling a large batch of operations on a single thread to avoid any additional flushing. To copy the intellisense notes:
//
// Summary:
// Temporarily suspends eager-flushing (flushing if the write-queue becomes
// empty briefly). Buffer-based flushing will still occur when the data is full.
// This is useful if you are performing a large number of operations in close
// duration, and want to avoid packet fragmentation. Note that you MUST call
// ResumeFlush at the end of the operation - preferably using Try/Finally so
// that flushing is resumed even upon error. This method is thread-safe; any
// number of callers can suspend/resume flushing concurrently - eager flushing
// will resume fully when all callers have called ResumeFlush.
//
// Remarks:
// Note that some operations (transaction conditions, etc) require flushing
// - this will still occur even if the buffer is only part full.
Note that in most high throughput scenarios there are multiple operations coming in from multiple threads: in those scenarios, any work from concurrent threads will automatically be queued in a way that minimises the number of threads.

Related

What is the correct way to use USBHIDDRIVER for multiple writes?

I am writing an application that needs to write messages to a USB HID device and read responses. For this purpose, I'm using USBHIDDRIVER.dll (https://www.leitner-fischer.com/2007/08/03/hid-usb-driver-library/ )
Now it works fine when writing many of the message types - i.e. short ones.
However, there is one type of message where I have to write a .hex file containing about 70,000 lines. The protocol requires that each line needs to be written individually and sent in a packet containing other information (start, end byte, checksum)
However I'm encountering problems with this.
I've tried something like this:
private byte[] _responseBytes;
private ManualResetEvent _readComplete;
public byte[][] WriteMessage(byte[][] message)
{
byte[][] devResponse = new List<byte[]>();
_readComplete = new ManualResetEvent(false);
for (int i = 0; i < message.Length; i++)
{
var usbHid = new USBInterface("myvid", "mypid");
usbHid.Connect();
usbHid.enableUsbBufferEvent(UsbHidReadEvent);
if (!usbHid.write(message)) {
throw new Exception ("Write Failed");
}
usbHid.startRead();
if (!_readComplete.WaitOne(10000)) {
usbHid.stopRead();
throw new Exception ("Timeout waiting for read");
}
usbHid.stopRead();
_readComplete.Reset();
devResponse.Add(_responseBytes.ToArray());
usbHid = null;
}
return devResponse;
}
private void ReadEvent()
{
if (_readComplete!= null)
{
_readComplete.Set();
}
_microHidReadBytes = (byte[])((ListWithEvent)sender)[0];
}
This appears to work. In WireShark I can see the messages going back and forth. However as you can see it's creating an instance of the USBInterface class every iteration. This seems very clunky and I can see in the TaskManager, it starts to eat up a lot of memory - current run has it above 1GB and eventually it falls over with an OutOfMemory exception. It is also very slow. Current run is not complete after about 15 mins, although I've seen another application do the same job in less than one minute.
However, if I move the creation and connection of the USBInterface out of the loop as in...
var usbHid = new USBInterface("myvid", "mypid");
usbHid.Connect();
usbHid.enableUsbBufferEvent(UsbHidReadEvent);
for (int i = 0; i < message.Length; i++)
{
if (!usbHid.write(message)) {
throw new Exception ("Write Failed");
}
usbHid.startRead();
if (!_readComplete.WaitOne(10000)) {
usbHid.stopRead();
throw new Exception ("Timeout waiting for read");
}
usbHid.stopRead();
_readComplete.Reset();
devResponse.Add(_responseBytes.ToArray());
}
usbHid = null;
... now what happens is it only allows me to do one write! I write the data, read the response and when it comes around the loop to write the second message, the application just hangs in the write() function and never returns. (Doesn't even time out)
What is the correct way to do this kind of thing?
(BTW I know it's adding a lot of data to that devResponse object but this is not the source of the issue - if I remove it, it still consumes an awful lot of memory)
UPDATE
I've found that if I don't enable reading, I can do multiple writes without having to create a new USBInterface1 object with each iteration. This is an improvement but I'd still like to be able to read each response. (I can see they are still sent down in Wireshark)

Saving images via producer-consumer pattern using BlockingCollection

I'm facing a producer-consumer problem: I have a camera that sends images very quickly and I have to save them to disk. The images are in the form of ushort[]. The camera always overrides the same variable of type ushort[]. So between one acquisition and another I have to copy the array and when possible, save it in order to free up the memory of that image. The important thing is not to lose any images from the camera, even if it means increasing the memory used: it is entirely acceptable that the consumer (saving images with freeing of memory) is slower than the producer; however, it is not acceptable not to copy the image into memory in time.
I've written sample code that should simulate the problem:
immage_ushort: is the image generated by the camera that must be copied to the BlockingCollection before the next image arrives
producerTask: has a cycle that should simulate the arrival of the image every time_wait; within this time the producer should copy the image in the BlockingCollection.
consumerTask: must work on the BlockingCollection by saving the images to disk and thus freeing up the memory; it doesn't matter if the consumer works slower than the producer.
I put a time_wait of 1 millisecond, to test the performance (actually the camera will not be able to reach that speed). The times are respected (with a maximum delay of 1-2 ms, therefore acceptable) if there is no saving to disk in the code (commenting image1.ImWrite (file_name)). But with saving to disk on, I instead get delays that sometimes exceed 100ms.
This is my code:
private void Execute_test_producer_consumer1()
{
//Images are stored as ushort array, so we create a BlockingCollection<ushort[]>
//to keep images when they arrive from camera
BlockingCollection<ushort[]> imglist = new BlockingCollection<ushort[]>();
string lod_date = "";
/*producerTask simulates a camera that returns an image every time_wait
milliseconds. The image is copied and inserted in the BlockingCollection
to be then saved on disk in the consumerTask*/
Task producerTask = Task.Factory.StartNew(() =>
{
//Number of images to process
int num_img = 3000;
//Time between one image and the next
long time_wait = 1;
//Time log variables
var watch1 = System.Diagnostics.Stopwatch.StartNew();
long watch_log = 0;
long delta_time = 0;
long timer1 = 0;
List<long> timer_delta_log = new List<long>();
List<long> timer_delta_log_time = new List<long>();
int ii = 0;
Console.WriteLine("-----START producer");
watch1.Restart();
//Here I expect every wait_time (or a little more) an image will be inserted
//into imglist
while (ii < num_img)
{
timer1 = watch1.ElapsedMilliseconds;
delta_time = timer1 - watch_log;
if (delta_time >= time_wait || ii == 0)
{
//Add image
imglist.Add((ushort[])immage_ushort.Clone());
//Inserting data for time log
timer_delta_log.Add(delta_time);
timer_delta_log_time.Add(timer1);
watch_log = timer1;
ii++;
}
}
imglist.CompleteAdding();
watch1.Stop();
lod_date = DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff");
Console.WriteLine("-----END producer: " + lod_date);
// We only print images that are not inserted on schedule
int gg = 0;
foreach (long timer_delta_log_t in timer_delta_log)
{
if (timer_delta_log_t > time_wait)
{
Console.WriteLine("-- Image " + (gg + 1) + ", delta: "
+ timer_delta_log_t + ", time: " + timer_delta_log_time[gg]);
}
gg++;
}
});
Task consumerTask = Task.Factory.StartNew(() =>
{
string file_name = "";
int yy = 0;
// saving images and removing data
foreach (ushort[] imm in imglist.GetConsumingEnumerable())
{
file_name = #"output/" + yy + ".png";
Mat image1 = new Mat(row, col, MatType.CV_16UC1, imm);
//By commenting on this line, the timing of the producer is respected
image1.ImWrite(file_name);
image1.Dispose();
yy++;
}
imglist.Dispose();
lod_date = DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff");
Console.WriteLine("-----END consumer: " + lod_date);
});
}
I thought, also, that the BlockingCollection could remain blocked for the entire duration of the foreach and therefore of saving the image to disk. So I also tried replacing the foreach with this:
while(!imglist.IsCompleted)
{
ushort[] elem = imglist.Take();
file_name = #"output/" + yy + ".png";
Mat image1 = new Mat(row, col, MatType.CV_16UC1, elem);
//By commenting on this line, the timing of the producer is respected
image1.ImWrite(file_name);
image1.Dispose();
yy++;
}
But the result doesn't change.
What am I doing wrong?

You migth want to start your tasks with the "LongRunning" option:
LongRunning
Specifies that a task will be a long-running, coarse-grained operation involving fewer, larger components than fine-grained systems. It provides a hint to the TaskScheduler that oversubscription may be warranted. Oversubscription lets you create more threads than the available number of hardware threads. It also provides a hint to the task scheduler that an additional thread might be required for the task so that it does not block the forward progress of other threads or work items on the local thread-pool queue.

UART communication bug in STM32 and a C# app

Kindly bear with me for this confusing question. I'm finding it as hard to describe as it is involving and tiresome. Read it and you'll know why.
I've been hounding this issue for over a month now without much progress. I'm using an STM32 (STM32F103C8 mounted on a BluePill board) to communicate with a C# app through an FT232r Serial-USB converter. The complete communication protocol is a bit complex. I'm writing here a simplistic version of the code that explains my problem quite accurately.
STM32 does the following.
In the initial setup,
Serial.begin at 2000000 (Yes it's very high but I've analyzed it using an oscilloscope and the signal is very healthy; impedance matching and clock jitter is very accurate).
Waits for a command from the C# end to enter the loop
In the loop, it does the following.
TX a byte buffer of length N on the serial port. Packet structure is 0xAA, N bytes, 1 byte checksum.
repeat the loop
And on the C# side (Pseudo code),
new Thread(()=>{while(true) IOTick(); Thread.Sleep(30); }).Start();
IOTick() is defined as:
{
while(SerialPortObject.BytesToRead > 1)
{
header = read();
if (header != 0xAA) continue;
byte [] buffer = new byte[N + 1];
receivedBytes = readBytes(buffer, N + 1, Timeout = 500ms); // receivedBytes is never less than N + 1 for timeout greater than 120)
use the N=16 bytes. Check Nth byte to compare checksum. Doen't take too much CPU time.
Send a packet received software event.
}
}
readBytes is defined as
int readBytes(byte[] buffer, int count, int timeout)
{
var st = DateTime.Now;
for (int i = 0; i < count; i++)
{
var b_ = read(timeout);
if (b_ == -1)
return i;
buffer[i] = (byte)b_;
timeout -= (int)(DateTime.Now - st).TotalMilliseconds;
}
return count;
}
int buffer2ReadIndex = 0;
byte[] buffer2= new byte[0];
int read(int timeout)
{
DateTime start = DateTime.Now;
if (buffer2.Length == 0)
{
while (SerialPortObject.BytesToRead <= 0)
{
if ((DateTime.Now - start).TotalMilliseconds > timeout)
return -1;
System.Threading.Thread.Sleep(30);
}
buffer2 = new byte[SerialPortObject.BytesToRead];
sp.Read(buffer2, 0, buffer2.Length);
}
if (buffer2.Length > 0)
{
var b = buffer2[buffer2ReadIndex];
buffer2ReadIndex++;
if (buffer2ReadIndex >= buffer2.Length)
{
buffer2ReadIndex = 0;
buffer2 = new byte[0];
}
return b;
}
return -1;
}
Now, everything is working as expected. The packet received software event is triggered not later than every ~30ms (the windows tick time). The problem starts if I have to wait between each packet TX at the STM side. First, I suspected that the I2C I was using for some tasks between each packet TX was causing some HW or software conflict with serial data which gets corrupted. But then I noticed that only if I introduce a delay of 1 millisecond using Arduino delay() between each packet TX, the same thing happens. Almost, 1K packets should be received every second now. Almost 1 out of 10 packets after a successful header exception get either not delivered completely or delivered with corrupted checksum, causing the C# app to lose the packet Header. The new header trace obviously requires flushing some bytes, losing some packets in the communication. Even this doesn't sound too bad for an app that can afford 5% data packet loss, strangely though, when this anomaly occurs, the packet received software interrupt waits for more than 1 second after every couple hundred of consecutive events.
I'm completely blind here. Even tried it with 115200 baud rate, does the same loss with a slightly lesser loss ratio. It should be noted that at 9600 baud, the issue doesn't happen. This is the only hint I've got right now.

It looks like I've found an answer.
After digging deep into SerialPort and SerialPort.base stream class and after doing some document reading and benchmarking, here is what I've observed:
SerialPort.BytesToRead updates are not uniform. DataReceived event seems to be following it. When bytes are coming at ~200kHz, (baud = 2Mbps), It is updated almost instantaneously (or within 30ms, worst case). When they are coming at ~20kHz or slower (evenly spaced on time using a micrcontroller), the SerialPort.BytesToRead can take up to 400ms to update. This happens only after a dozen 30ms updates.
So, observing this, I can say that SerialPort.BytesToRead is updated on two conditions. Some amount of time has passed since the data arrived (and this time is not constrained to 30ms) or the data is coming too fast.
This is a strange behavior. No data is lost when this anomaly is occurring. Not to surprise, 0.06% of bytes are lost when working at full bandwidth (200KBps at baud of 2Mbps).

Track dead WebDriver instances during parallel task

I am seeing some dead-instance weirdness running parallelized nested-loop web stress tests using Selenium WebDriver, simple example being, say, hit 300 unique pages with 100 impressions each.
I'm "successfully" getting 4 - 8 WebDriver instances going using a ThreadLocal<FirefoxWebDriver> to isolate them per task thread, and MaxDegreeOfParallelism on a ParallelOptions instance to limit the threads. I'm partitioning and parallelizing the outer loop only (the collection of pages), and checking .IsValueCreated on the ThreadLocal<> container inside the beginning of each partition's "long running task" method. To facilitate cleanup later, I add each new instance to a ConcurrentDictionary keyed by thread id.
No matter what parallelizing or partitioning strategy I use, the WebDriver instances will occasionally do one of the following:
Launch but never show a URL or run an impression
Launch, run any number of impressions fine, then just sit idle at some point
When either of these happen, the parallel loop eventually seems to notice that a thread isn't doing anything, and it spawns a new partition. If n is the number of threads allowed, this results in having n productive threads only about 50-60% of the time.
Cleanup still works fine at the end; there may be 2n open browsers or more, but the productive and unproductive ones alike get cleaned up.
Is there a way to monitor for these useless WebDriver instances and a) scavenge them right away, plus b) get the parallel loop to replace the task segment immediately, instead of lagging behind for several minutes as it often does now?

I was having a similar problem. It turns out that WebDriver doesn't have the best method for finding open ports. As described here it gets a system wide lock on ports, finds an open port, and then starts the instance. This can starve the other instances that you're trying to start of ports.
I got around this by specifying a random port number directly in the delegate for the ThreadLocal<IWebDriver> like this:
var ports = new List<int>();
var rand = new Random((int)DateTime.Now.Ticks & 0x0000FFFF);
var driver = new ThreadLocal<IWebDriver>(() =>
{
var profile = new FirefoxProfile();
var port = rand.Next(50) + 7050;
while(ports.Contains(port) && ports.Count != 50) port = rand.Next(50) + 7050;
profile.Port = port;
ports.Add(port);
return new FirefoxDriver(profile);
});
This works pretty consistently for me, although there's the issue if you end up using all 50 in the list that is unresolved.

Since there is no OnReady event nor an IsReady property, I worked around it by sleeping the thread for several seconds after creating each instance. Doing that seems to give me 100% durable, functioning WebDriver instances.
Thanks to your suggestion, I've implemented IsReady functionality in my open-source project Webinator. Use that if you want, or use the code outlined below.
I tried instantiating 25 instances, and all of them were functional, so I'm pretty confident in the algorithm at this point (I leverage HtmlAgilityPack to see if elements exist, but I'll skip it for the sake of simplicity here):
public void WaitForReady(IWebDriver driver)
{
var js = #"{ var temp=document.createElement('div'); temp.id='browserReady';" +
#"b=document.getElementsByTagName('body')[0]; b.appendChild(temp); }";
((IJavaScriptExecutor)driver).ExecuteScript(js);
WaitForSuccess(() =>
{
IWebElement element = null;
try
{
element = driver.FindElement(By.Id("browserReady"));
}
catch
{
// element not found
}
return element != null;
},
timeoutInMilliseconds: 10000);
js = #"{var temp=document.getElementById('browserReady');" +
#" temp.parentNode.removeChild(temp);}";
((IJavaScriptExecutor)driver).ExecuteScript(js);
}
private bool WaitForSuccess(Func<bool> action, int timeoutInMilliseconds)
{
if (action == null) return false;
bool success;
const int PollRate = 250;
var maxTries = timeoutInMilliseconds / PollRate;
int tries = 0;
do
{
success = action();
tries++;
if (!success && tries <= maxTries)
{
Thread.Sleep(PollRate);
}
}
while (!success && tries < maxTries);
return success;
}
The assumption is if the browser is responding to javascript functions and is finding elements, then it's probably a reliable instance and ready to be used.

Console.WriteLine slow

I run through millions of records and sometimes I have to debug using Console.WriteLine to see what is going on.
However, Console.WriteLine is very slow, considerably slower than writing to a file.
BUT it is very convenient - does anyone know of a way to speed it up?

If it is just for debugging purposes you should use Debug.WriteLine instead. This will most likely be a bit faster than using Console.WriteLine.
Example
Debug.WriteLine("There was an error processing the data.");

You can use the OutputDebugString API function to send a string to the debugger. It doesn't wait for anything to redraw and this is probably the fastest thing you can get without digging into the low-level stuff too much.
The text you give to this function will go into Visual Studio Output window.
[DllImport("kernel32.dll")]
static extern void OutputDebugString(string lpOutputString);
Then you just call OutputDebugString("Hello world!");

Do something like this:
public static class QueuedConsole
{
private static StringBuilder _sb = new StringBuilder();
private static int _lineCount;
public void WriteLine(string message)
{
_sb.AppendLine(message);
++_lineCount;
if (_lineCount >= 10)
WriteAll();
}
public void WriteAll()
{
Console.WriteLine(_sb.ToString());
_lineCount = 0;
_sb.Clear();
}
}
QueuedConsole.WriteLine("This message will not be written directly, but with nine other entries to increase performance.");
//after your operations, end with write all to get the last lines.
QueuedConsole.WriteAll();
Here is another example: Does Console.WriteLine block?

I recently did a benchmark battery for this on .NET 4.8. The tests included many of the proposals mentioned on this page, including Async and blocking variants of both BCL and custom code, and then most of those both with and without dedicated threading, and finally scaled across power-of-2 buffer sizes.
The fastest method, now used in my own projects, buffers 64K of wide (Unicode) characters at a time from .NET directly to the Win32 function WriteConsoleW without copying or even hard-pinning. Remainders larger than 64K, after filling and flushing one buffer, are also sent directly, and in-situ as well. The approach deliberately bypasses the Stream/TextWriter paradigm so it can (obviously enough) provide .NET text that is already Unicode to a (native) Unicode API without all the superfluous memory copying/shuffling and byte[] array allocations required for first "decoding" to a byte stream.
If there is interest (perhaps because the buffering logic is slightly intricate), I can provide the source for the above; it's only about 80 lines. However, my tests determined that there's a simpler way to get nearly the same performance, and since it doesn't require any Win32 calls, I'll show this latter technique instead.
The following is way faster than Console.Write:
public static class FastConsole
{
static readonly BufferedStream str;
static FastConsole()
{
Console.OutputEncoding = Encoding.Unicode; // crucial
// avoid special "ShadowBuffer" for hard-coded size 0x14000 in 'BufferedStream'
str = new BufferedStream(Console.OpenStandardOutput(), 0x15000);
}
public static void WriteLine(String s) => Write(s + "\r\n");
public static void Write(String s)
{
// avoid endless 'GetByteCount' dithering in 'Encoding.Unicode.GetBytes(s)'
var rgb = new byte[s.Length << 1];
Encoding.Unicode.GetBytes(s, 0, s.Length, rgb, 0);
lock (str) // (optional, can omit if appropriate)
str.Write(rgb, 0, rgb.Length);
}
public static void Flush() { lock (str) str.Flush(); }
};
Note that this is a buffered writer, so you must call Flush() when you have no more text to write.
I should also mention that, as shown, technically this code assumes 16-bit Unicode (UCS-2, as opposed to UTF-16) and thus won't properly handle 4-byte escape surrogates for characters beyond the Basic Multilingual Plane. The point hardly seems important given the more extreme limitations on console text display in general, but could perhaps still matter for piping/redirection.
Usage:
FastConsole.WriteLine("hello world.");
// etc...
FastConsole.Flush();
On my machine, this gets about 77,000 lines/second (mixed-length) versus only 5,200 lines/sec under identical conditions for normal Console.WriteLine. That's a factor of almost 15x speedup.
These are controlled comparison results only; note that absolute measurements of console output performance are highly variable, depending on the console window settings and runtime conditions, including size, layout, fonts, DWM clipping, etc.

Why Console is slow:
Console output is actually an IO stream that's managed by your operating system. Most IO classes (like FileStream) have async methods but the Console class was never updated so it always blocks the thread when writing.
Console.WriteLine is backed by SyncTextWriter which uses a global lock to prevent multiple threads from writing partial lines. This is a major bottleneck that forces all threads to wait for each other to finish the write.
If the console window is visible on screen then there can be significant slowdown because the window needs to be redrawn before the console output is considered flushed.
Solutions:
Wrap the Console stream with a StreamWriter and then use async methods:
var sw = new StreamWriter(Console.OpenStandardOutput());
await sw.WriteLineAsync("...");
You can also set a larger buffer if you need to use sync methods. The call will occasionally block when the buffer gets full and is flushed to the stream.
// set a buffer size
var sw = new StreamWriter(Console.OpenStandardOutput(), Encoding.UTF8, 8192);
// this write call will block when buffer is full
sw.Write("...")
If you want the fastest writes though, you'll need to make your own buffer class that writes to memory and flushes to the console asynchronously in the background using a single thread without locking. The new Channel<T> class in .NET Core 2.1 makes this simple and fast. Plenty of other questions showing that code but comment if you need tips.

A little old thread and maybe not exactly what the OP is looking for, but I ran into the same question recently, when processing audio data in real time.
I compared Console.WriteLine to Debug.WriteLine with this code and used DebugView as a dos box alternative. It's only an executable (nothing to install) and can be customized in very neat ways (filters & colors!). It has no problems with tens of thousands of lines and manages the memory quite well (I could not find any kind of leak, even after days of logging).
After doing some testing in different environments (e.g.: virtual machine, IDE, background processes running, etc) I made the following observations:
Debug is almost always faster
For small bursts of lines (<1000), it's about 10 times faster
For larger chunks it seems to converge to about 3x
If the Debug output goes to the IDE, Console is faster :-)
If DebugView is not running, Debug gets even faster
For really large amounts of consecutive outputs (>10000), Debug gets slower and Console stays constant. I presume this is due to the memory, Debug has to allocate and Console does not.
Obviously, it makes a difference if DebugView is actually "in-view" or not, as the many gui updates have a significant impact on the overall performance of the system, while Console simply hangs, if visible or not. But it's hard to put numbers on that one...
I did not try multiple threads writing to the Console, as I think this should generally avoided. I never had (performance) problems when writing to Debug from multiple threads.
If you compile with Release settings, usually all Debug statements are omitted and Trace should produce the same behaviour as Debug.
I used VS2017 & .Net 4.6.1
Sorry for so much code, but I had to tweak it quite a lot to actually measure what I wanted to. If you can spot any problems with the code (biases, etc.), please comment. I would love to get more precise data for real life systems.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading;
namespace Console_vs_Debug {
class Program {
class Trial {
public string name;
public Action console;
public Action debug;
public List < float > consoleMeasuredTimes = new List < float > ();
public List < float > debugMeasuredTimes = new List < float > ();
}
static Stopwatch sw = new Stopwatch();
private static int repeatLoop = 1000;
private static int iterations = 2;
private static int dummy = 0;
static void Main(string[] args) {
if (args.Length == 2) {
repeatLoop = int.Parse(args[0]);
iterations = int.Parse(args[1]);
}
// do some dummy work
for (int i = 0; i < 100; i++) {
Console.WriteLine("-");
Debug.WriteLine("-");
}
for (int i = 0; i < iterations; i++) {
foreach(Trial trial in trials) {
Thread.Sleep(50);
sw.Restart();
for (int r = 0; r < repeatLoop; r++)
trial.console();
sw.Stop();
trial.consoleMeasuredTimes.Add(sw.ElapsedMilliseconds);
Thread.Sleep(1);
sw.Restart();
for (int r = 0; r < repeatLoop; r++)
trial.debug();
sw.Stop();
trial.debugMeasuredTimes.Add(sw.ElapsedMilliseconds);
}
}
Console.WriteLine("---\r\n");
foreach(Trial trial in trials) {
var consoleAverage = trial.consoleMeasuredTimes.Average();
var debugAverage = trial.debugMeasuredTimes.Average();
Console.WriteLine(trial.name);
Console.WriteLine($ " console: {consoleAverage,11:F4}");
Console.WriteLine($ " debug: {debugAverage,11:F4}");
Console.WriteLine($ "{consoleAverage / debugAverage,32:F2} (console/debug)");
Console.WriteLine();
}
Console.WriteLine("all measurements are in milliseconds");
Console.WriteLine("anykey");
Console.ReadKey();
}
private static List < Trial > trials = new List < Trial > {
new Trial {
name = "constant",
console = delegate {
Console.WriteLine("A static and constant string");
},
debug = delegate {
Debug.WriteLine("A static and constant string");
}
},
new Trial {
name = "dynamic",
console = delegate {
Console.WriteLine("A dynamically built string (number " + dummy++ + ")");
},
debug = delegate {
Debug.WriteLine("A dynamically built string (number " + dummy++ + ")");
}
},
new Trial {
name = "interpolated",
console = delegate {
Console.WriteLine($ "An interpolated string (number {dummy++,6})");
},
debug = delegate {
Debug.WriteLine($ "An interpolated string (number {dummy++,6})");
}
}
};
}
}

Just a little trick I use sometimes: If you remove focus from the Console window by opening another window over it, and leave it until it completes, it won't redraw the window until you refocus, speeding it up significantly. Just make sure you have the buffer set up high enough that you can scroll back through all of the output.

Try using the System.Diagnostics Debug class? You can accomplish the same things as using Console.WriteLine.
You can view the available class methods here.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

BookSleeve - Poor Performance When Setting Hashes - c#

Related

What is the correct way to use USBHIDDRIVER for multiple writes?

Saving images via producer-consumer pattern using BlockingCollection

UART communication bug in STM32 and a C# app

Track dead WebDriver instances during parallel task

Console.WriteLine slow

Categories

Resources