How to achieve 100% CPU usage in multithreaded application?

How to achieve 100% CPU usage in multithreaded application? - c#

I have ~100 text files 200MB each and I need to parse them. The program below loads files and processes them in parallel. It can create a Thread per file or a Process per file.
The problem: If I use threads it never uses 100% CPU and takes longer to complete.
THREAD PER FILE
total time: 430 sec
CPU usage 15-20%
CPU frequency 1.2 GHz
PROCESS PER FILE
total time 100 sec
CPU usage 100%
CPU frequency 3.75 GHz
I'm using E5-1650 v3 Hexa-Core with HT, therefore I process 12 files at a time.
How can I achive 100% CPU utilisation by threads?
Code below does not use result of processing since it doen not affect the problem.
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Threading;
namespace libsvm2tsv
{
class Program
{
static void Main(string[] args)
{
var sw = Stopwatch.StartNew();
switch (args[0])
{
case "-t": LoadAll(args[1], LoadFile); break;
case "-p": LoadAll(args[1], RunChild); break;
case "-f": LoadFile(args[1]); return;
}
Console.WriteLine("ELAPSED: {0} sec.", sw.ElapsedMilliseconds / 1000);
Console.ReadLine();
}
static void LoadAll(string folder, Action<string> algorithm)
{
var sem = new SemaphoreSlim(12);
Directory.EnumerateFiles(folder).ToList().ForEach(f=> {
sem.Wait();
new Thread(() => { try { algorithm(f); } finally { sem.Release(); } }).Start();
});
}
static void RunChild(string file)
{
Process.Start(new ProcessStartInfo
{
FileName = Assembly.GetEntryAssembly().Location,
Arguments = "-f \"" + file + "\"",
UseShellExecute = false,
CreateNoWindow = true
})
.WaitForExit();
}
static void LoadFile(string inFile)
{
using (var ins = File.OpenText(inFile))
while (ins.Peek() >= 0)
ParseLine(ins.ReadLine());
}
static long[] ParseLine(string line)
{
return line
.Split()
.Skip(1)
.Select(r => (long)(double.Parse(r.Split(':')[1]) * 1000))
.Select(r => r < 0 ? -1 : r)
.ToArray();
}
}
}

Finally, I've found the bottleneck. I'm using string.Split to parse numbers from every line of data, so I get billions short strings. These strings are put in heap. Since all threads share single heap memory allocation is synchronized. Since processes have separate heaps - no syncronization occures and things work fast. That's the root of issue. So, I rewrote parsing using IndexOf rather than Split and threads started to perform even better than separate processes. Just as I expected it to be.
Since .NET has no default tool to parse real numbers out of the certain position inside string I used this one: https://codereview.stackexchange.com/questions/75791/optimize-custom-double-parse with small modification.
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Threading;
using System.Threading.Tasks;
namespace libsvm2tsv
{
class Program
{
static void Main(string[] args)
{
var sw = Stopwatch.StartNew();
switch (args[0])
{
case "-t": LoadAll(args[1], LoadFile); break;
case "-p": LoadAll(args[1], RunChild); break;
case "-f": LoadFile(args[1]); return;
}
Console.WriteLine("ELAPSED: {0} sec.", sw.ElapsedMilliseconds / 1000);
Console.ReadLine();
}
static void LoadAll(string folder, Action<string> algorithm)
{
Parallel.ForEach(
Directory.EnumerateFiles(folder),
new ParallelOptions { MaxDegreeOfParallelism = 12 },
f => algorithm(f));
}
static void RunChild(string file)
{
Process.Start(new ProcessStartInfo
{
FileName = Assembly.GetEntryAssembly().Location,
Arguments = "-f \"" + file + "\"",
UseShellExecute = false,
CreateNoWindow = true
})
.WaitForExit();
}
static void LoadFile(string inFile)
{
using (var ins = File.OpenText(inFile))
while (ins.Peek() >= 0)
ParseLine(ins.ReadLine());
}
static long[] ParseLine(string line)
{
// first, count number of items
var items = 1;
for (var i = 0; i < line.Length; i++)
if (line[i] == ' ') items++;
//allocate memory and parse items
var all = new long[items];
var n = 0;
var index = 0;
while (index < line.Length)
{
var next = line.IndexOf(' ', index);
if (next < 0) next = line.Length;
if (next > index)
{
var v = (long)(parseDouble(line, line.IndexOf(':', index) + 1, next - 1) * 1000);
if (v < 0) v = -1;
all[n++] = v;
}
index = next + 1;
}
return all;
}
private readonly static double[] pow10Cache;
static Program()
{
pow10Cache = new double[309];
double p = 1.0;
for (int i = 0; i < 309; i++)
{
pow10Cache[i] = p;
p /= 10;
}
}
static double parseDouble(string input, int from, int to)
{
long inputLength = to - from + 1;
long digitValue = long.MaxValue;
long output1 = 0;
long output2 = 0;
long sign = 1;
double multiBy = 0.0;
int k;
//integer part
for (k = 0; k < inputLength; ++k)
{
digitValue = input[k + from] - 48; // '0'
if (digitValue >= 0 && digitValue <= 9)
{
output1 = digitValue + (output1 * 10);
}
else if (k == 0 && digitValue == -3 /* '-' */)
{
sign = -1;
}
else if (digitValue == -2 /* '.' */ || digitValue == -4 /* ',' */)
{
break;
}
else
{
return double.NaN;
}
}
//decimal part
if (digitValue == -2 /* '.' */ || digitValue == -4 /* ',' */)
{
multiBy = pow10Cache[inputLength - (++k)];
for (; k < inputLength; ++k)
{
digitValue = input[k + from] - 48; // '0'
if (digitValue >= 0 && digitValue <= 9)
{
output2 = digitValue + (output2 * 10);
}
else
{
return Double.NaN;
}
}
multiBy *= output2;
}
return sign * (output1 + multiBy);
}
}
}

I have ~100 text files 200MB each and I need to parse them.
The fastest way to read or write data from/to a spinning disk is sequentially in order to minimize the time the disk heads need to seek to find data or write it to the specified location. So doing parallel IO to a single disk is going to slow IO rates down - and depending on the actual IO pattern it can slow rates down dramatically. A disk that can handle 100 MB/sec sequentially might only be able to move 20 or 30 kilobytes per second doing parallel reads/writes of small blocks of data.
Were I optimizing such a process, I wouldn't worry about CPU utilization first, I'd optimize IO throughput first. You are IO bound unless you're doing some really CPU-intensive parsing. Once your IO throughput is optimized, if you're getting 100% CPU utilization then you're CPU bound. If your design scales nicely, then you can add CPUs and probably run faster.
To speed up your IO, you first need to minimize disk seeks, especially if you're using consumer-grade, cheap SATA drives. There are multiple ways to do this.
First, the easiest - eliminate the disk heads. Put your data on SSDs. Problem solved without having to write complex, bug-prone optimized code. How much time will it take for you to make this run faster using software? You have to design something, test it, tune it, debug it, and importantly, keep it running and running well. None of that is free. One important cost is the opportunity cost of spending time making things go faster - when you're doing that, you're not solving any other problems. Faster hardware has none of those costs. In this case, buy the SSDs, plug them in, and you're faster.
But if you really want to spend several weeks or longer optimizing your processing software, here's how I'd go about it:
Spread the data over multiple disks. You can't do parallel IO to physical disks quickly as the disk head seeks will kill performance. So do as much of the reading and writing to different disks as possible.
For each disk, have a single reader or writer thread or process that feeds data to a worker pool or writes data provided by that worker pool.
A tunable number of worker threads/processes to do the actual parsing.
That way, you can read the files and write output data all sequentally and without contention on each disk from other IO processes.

I would consider replacing ForEach with Parallel.ForEach and remove your explicit use of Threads. Use https://stackoverflow.com/a/5512363/34092 to set the number of threads to limit it to.
static void LoadAll(string folder, Action<string> algorithm)
{
Parallel.ForEach(Directory.EnumerateFiles(folder), algorithm);
}

As others have stated IO will probably be a bottleneck in the end and getting 100% CPU usage is really irrelevant. I feel they are missing something, though: you do get higher throughput with processes than with threads and that means IO is not the only bottleneck. The reason is that the CPU runs with a higher frequency with processes and you want it to run at peak spead when it is not waiting for IO! So, how can you do that?
The easiest way is to set the power profile from the power options manually. Edit power options and set both minimum and maximum processor state to 100%. That should do the job.
If you want to do it from your program, have a look at How to Disable Dynamic Frequency Scaling?. There is probably a similar API for .NET without using native code, but I couldn't find it now.

Related

Saving images via producer-consumer pattern using BlockingCollection

I'm facing a producer-consumer problem: I have a camera that sends images very quickly and I have to save them to disk. The images are in the form of ushort[]. The camera always overrides the same variable of type ushort[]. So between one acquisition and another I have to copy the array and when possible, save it in order to free up the memory of that image. The important thing is not to lose any images from the camera, even if it means increasing the memory used: it is entirely acceptable that the consumer (saving images with freeing of memory) is slower than the producer; however, it is not acceptable not to copy the image into memory in time.
I've written sample code that should simulate the problem:
immage_ushort: is the image generated by the camera that must be copied to the BlockingCollection before the next image arrives
producerTask: has a cycle that should simulate the arrival of the image every time_wait; within this time the producer should copy the image in the BlockingCollection.
consumerTask: must work on the BlockingCollection by saving the images to disk and thus freeing up the memory; it doesn't matter if the consumer works slower than the producer.
I put a time_wait of 1 millisecond, to test the performance (actually the camera will not be able to reach that speed). The times are respected (with a maximum delay of 1-2 ms, therefore acceptable) if there is no saving to disk in the code (commenting image1.ImWrite (file_name)). But with saving to disk on, I instead get delays that sometimes exceed 100ms.
This is my code:
private void Execute_test_producer_consumer1()
{
//Images are stored as ushort array, so we create a BlockingCollection<ushort[]>
//to keep images when they arrive from camera
BlockingCollection<ushort[]> imglist = new BlockingCollection<ushort[]>();
string lod_date = "";
/*producerTask simulates a camera that returns an image every time_wait
milliseconds. The image is copied and inserted in the BlockingCollection
to be then saved on disk in the consumerTask*/
Task producerTask = Task.Factory.StartNew(() =>
{
//Number of images to process
int num_img = 3000;
//Time between one image and the next
long time_wait = 1;
//Time log variables
var watch1 = System.Diagnostics.Stopwatch.StartNew();
long watch_log = 0;
long delta_time = 0;
long timer1 = 0;
List<long> timer_delta_log = new List<long>();
List<long> timer_delta_log_time = new List<long>();
int ii = 0;
Console.WriteLine("-----START producer");
watch1.Restart();
//Here I expect every wait_time (or a little more) an image will be inserted
//into imglist
while (ii < num_img)
{
timer1 = watch1.ElapsedMilliseconds;
delta_time = timer1 - watch_log;
if (delta_time >= time_wait || ii == 0)
{
//Add image
imglist.Add((ushort[])immage_ushort.Clone());
//Inserting data for time log
timer_delta_log.Add(delta_time);
timer_delta_log_time.Add(timer1);
watch_log = timer1;
ii++;
}
}
imglist.CompleteAdding();
watch1.Stop();
lod_date = DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff");
Console.WriteLine("-----END producer: " + lod_date);
// We only print images that are not inserted on schedule
int gg = 0;
foreach (long timer_delta_log_t in timer_delta_log)
{
if (timer_delta_log_t > time_wait)
{
Console.WriteLine("-- Image " + (gg + 1) + ", delta: "
+ timer_delta_log_t + ", time: " + timer_delta_log_time[gg]);
}
gg++;
}
});
Task consumerTask = Task.Factory.StartNew(() =>
{
string file_name = "";
int yy = 0;
// saving images and removing data
foreach (ushort[] imm in imglist.GetConsumingEnumerable())
{
file_name = #"output/" + yy + ".png";
Mat image1 = new Mat(row, col, MatType.CV_16UC1, imm);
//By commenting on this line, the timing of the producer is respected
image1.ImWrite(file_name);
image1.Dispose();
yy++;
}
imglist.Dispose();
lod_date = DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff");
Console.WriteLine("-----END consumer: " + lod_date);
});
}
I thought, also, that the BlockingCollection could remain blocked for the entire duration of the foreach and therefore of saving the image to disk. So I also tried replacing the foreach with this:
while(!imglist.IsCompleted)
{
ushort[] elem = imglist.Take();
file_name = #"output/" + yy + ".png";
Mat image1 = new Mat(row, col, MatType.CV_16UC1, elem);
//By commenting on this line, the timing of the producer is respected
image1.ImWrite(file_name);
image1.Dispose();
yy++;
}
But the result doesn't change.
What am I doing wrong?

You migth want to start your tasks with the "LongRunning" option:
LongRunning
Specifies that a task will be a long-running, coarse-grained operation involving fewer, larger components than fine-grained systems. It provides a hint to the TaskScheduler that oversubscription may be warranted. Oversubscription lets you create more threads than the available number of hardware threads. It also provides a hint to the task scheduler that an additional thread might be required for the task so that it does not block the forward progress of other threads or work items on the local thread-pool queue.

.net code slower on AMD Opteron CPU

Have encounterred a situation where a simple .net fibonniacci code is slower on a particular set of servers and the only thing that is obviously different is the CPU.
AMD Opteron Processor 6276 - 11 secs
Intel Xeon XPU E7 - 4850 - 7 secs
Code is complied for x86 and using .NET framework 4.0.
-Clock speeds between both is similar and in fact PassMark benchmarks gives highesr scores for AMD.
-Have tried this on other AMD servers in the farm and the times are slower.
-Even my local I7 machines runs the code faster.
Fibonnacci code:
class Program
{
static void Main(string[] args)
{
const int ITERATIONS = 10000;
const int FIBONACCI = 100000;
var watch = new Stopwatch();
watch.Start();
DoFibonnacci(ITERATIONS, FIBONACCI);
watch.Stop();
Console.WriteLine("Total fibonacci time: {0}ms", watch.ElapsedMilliseconds);
Console.ReadLine();
}
private static void DoFibonnacci(int ITERATIONS, int FIBONACCI)
{
for (int i = 0; i < ITERATIONS; i++)
{
Fibonacci(FIBONACCI);
}
}
private static int Fibonacci(int x)
{
var previousValue = -1;
var currentResult = 1;
for (var i = 0; i <= x; ++i)
{
var sum = currentResult + previousValue;
previousValue = currentResult;
currentResult = sum;
}
return currentResult;
}
}
Any ideas on what maybe going on?

As we've established in the comments, you can workaround this performance bash by pinning the process to a specific processor on the AMD Opteron machines.
Kindled by this not-really-on-topic question I decided to have a look at possible scenarios where single core pinning would make such a difference (from 11 to 7 seconds seems a bit extreme).
The most plausible answer is not that revolutionary:
The AMD Opteron series employ HyperTransport in a so-called NUMA architecture, instead of a traditional FSB as you would find on Intel's SMP CPU's (Xeon 4850 included)
My guess is that this symptom stems from the fact that individual nodes in a NUMA architecture has individual cache, as opposed to the Intel CPU, in which the processor cache is shared.
In other words, when consecutive computations shift between nodes on the Opteron, the cache is flushed, whereas balancing between processors in an SMP architecture like the Xeon 4850 has no such impact since the cache is shared.
Setting affinity in .NET is pretty easy, just pick a processor (let's just take the first one for simplicity):
static void Main(string[] args)
{
Console.WriteLine(Environment.ProcessorCount);
Console.Read();
//An AffinityMask of 0x0001 will make sure the process is always pinned to processer 0
Process thisProcess = Process.GetCurrentProcess();
thisProcess.ProcessorAffinity = (IntPtr)0x0001;
const int ITERATIONS = 10000;
const int FIBONACCI = 100000;
var watch = new Stopwatch();
watch.Start();
DoFibonnacci(ITERATIONS, FIBONACCI);
watch.Stop();
Console.WriteLine("Total fibonacci time: {0}ms", watch.ElapsedMilliseconds);
Console.ReadLine();
}
Although I'm pretty sure this is not very smart in a NUMA environment.
Windows 2008 R2 has some cool native NUMA functionality, and I found a promissing codeplex project with a .NET wrapper for this as well: http://multiproc.codeplex.com/
I'm in no way near qualified to teach you how to utilize this technology, but this should point you in the right direction.

How to run CPU at a given load (% CPU utilization)?

Is it possible to freeze CPU usage that is shown in in Windows Task Manager? I wish to freeze the load as specific values like 20%, 50%, 70% etc. from my program.
(This is to analyse how much power the PC is consuming with regard to CPU usage.)
Is this possible?

My first naive attempt would be to spawn 2x threads as cores -- each thread in the highest priority and then, within each thread, run a busy-loop and do some work. (More threads than cores is to "steal" all the time I can get from other threads in windows :-)
Using some kind of API to read the CPU load (perhaps WMI or performance counters?) and I would then make each thread 'yield' from the busy loop (sleep for a certain amount of time each loop) until I get the approximate load in the feedback cycle.
This cycle would be self-adjusting: too high load, sleep more. Too low load, sleep less. It's not an exact science, but I think that with some tweaking a stable load can be obtained.
But, I have no idea, really :-)
Happy coding.
Also, consider power management -- sometimes it can lock a CPU at a "max %". Then fully load the CPU and it will max out at that limit. (Windows 7, at least, has a built-in feature to do this, depending upon CPU and chip-set -- there are likely many 3rd party tools.)
The situation becomes rather confusing with newer CPUs that dynamically clocked based on load and temperature, etc.
Here is my attempt at the "naive" approach for .NET 3.5. Make sure to include the System.Management reference.
The CPU utilization as reported by the Task Manager hovers within a few percent of the target -- average seems pretty darn close -- on my system. YMMV, but there is some flexibility for adjustment.
Happy coding (again).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Management;
using System.Threading;
using System.Diagnostics;
namespace CPULoad
{
class Program
{
// What to try to get :-)
static int TargetCpuUtilization = 50;
// An average window too large results in bad harmonics -- keep it small.
static int AverageWindow = 5;
// A somewhat large number gets better results here.
static int ThreadsPerCore = 8;
// WMI is *very slow* compared to a PerformanceCounter.
// It still works, but each cycle is *much* longer and it doesn't
// exhibit as good of characteristics in maintaining a stable load.
// (It also seems to run a few % higher).
static bool UseWMI = false;
// Not sure if this helps -- but just play about :-)
static bool UseQuestionableAverage = true;
static int CoreCount () {
var sys = new ManagementObject("Win32_ComputerSystem.Name=\"" + Environment.MachineName + "\"");
return int.Parse("" + sys["NumberOfLogicalProcessors"]);
}
static Func<int> GetWmiSampler () {
var searcher = new ManagementObjectSearcher(
#"root\CIMV2",
"SELECT PercentProcessorTime FROM Win32_PerfFormattedData_PerfOS_Processor");
return () => {
var allCores = searcher.Get().OfType<ManagementObject>().First();
return int.Parse("" + allCores["PercentProcessorTime"]);
};
}
static Func<int> GetCounterSampler () {
var cpuCounter = new PerformanceCounter {
CategoryName = "Processor",
CounterName = "% Processor Time",
InstanceName = "_Total",
};
return () => {
return (int)cpuCounter.NextValue();
};
}
static Func<LinkedList<int>, int> StandardAverage () {
return (samples) => {
return (int)samples.Average();
};
}
// Bias towards newest samples
static Func<LinkedList<int>, int> QuestionableAverage () {
return (samples) => {
var weight = 4.0;
var sum = 0.0;
var max = 0.0;
foreach (var sample in samples) {
sum += sample * weight;
max += weight;
weight = Math.Min(4, Math.Max(1, weight * 0.8));
}
return (int)(sum / max);
};
}
static void Main (string[] args) {
var threadCount = CoreCount() * ThreadsPerCore;
var threads = new List<Thread>();
for (var i = 0; i < threadCount; i++) {
Console.WriteLine("Starting thread #" + i);
var thread = new Thread(() => {
Loader(
UseWMI ? GetWmiSampler() : GetCounterSampler(),
UseQuestionableAverage ? QuestionableAverage() : StandardAverage());
});
thread.IsBackground = true;
thread.Priority = ThreadPriority.Highest;
thread.Start();
threads.Add(thread);
}
Console.ReadKey();
Console.WriteLine("Fin!");
}
static void Loader (Func<int> nextSample, Func<LinkedList<int>, int> average) {
Random r = new Random();
long cycleCount = 0;
int cycleLength = 10;
int sleepDuration = 15;
int temp = 0;
var samples = new LinkedList<int>(new[] { 50 });
long totalSample = 0;
while (true) {
cycleCount++;
var busyLoops = cycleLength * 1000;
for (int i = 0; i < busyLoops; i++) {
// Do some work
temp = (int)(temp * Math.PI);
}
// Take a break
Thread.Sleep(sleepDuration);
{
// Add new sample
// This seems to work best when *after* the sleep/yield
var sample = nextSample();
if (samples.Count >= AverageWindow) {
samples.RemoveLast();
}
samples.AddFirst(sample);
totalSample += sample;
}
var avg = average(samples);
// should converge to 0
var conv = Math.Abs(TargetCpuUtilization - (int)(totalSample / cycleCount));
Console.WriteLine(string.Format("avg:{0:d2} conv:{1:d2} sleep:{2:d2} cycle-length:{3}",
avg, conv, sleepDuration, cycleLength));
// Manipulating both the sleep duration and work duration seems
// to have the best effect. We don't change both at the same
// time as that skews one with the other.
// Favor the cycle-length adjustment.
if (r.NextDouble() < 0.05) {
sleepDuration += (avg < TargetCpuUtilization) ? -1 : 1;
// Don't let sleep duration get unbounded upwards or it
// can cause badly-oscillating behavior.
sleepDuration = (int)Math.Min(24, Math.Max(0, sleepDuration));
} else {
cycleLength += (avg < TargetCpuUtilization) ? 1 : -1;
cycleLength = (int)Math.Max(5, cycleLength);
}
}
}
}
}
While Windows is a preemptive operating system, code which runs in Kernel Mode -- such as drivers -- is preempted by far less. While not doable in C# AFAIK, this should yield a method of stricter load control than the above, but also has a good bit more complexity (and the ability to crash the entire system :-)
There is Process.PriorityClass, but setting this to anything but normal yielded lest consistent behavior for me.

I don't know if you can do that, but you can change the thread priority of the executing thread via the Priority property. You would set that by:
Thread.CurrentThread.Priority = ThreadPriority.Lowest;
Also, I don't think you really want to cap it. If the machine is otherwise idle, you'd like it to get busy on with the task, right? ThreadPriority helps communicate this to the scheduler.
Reference : How to restrict the CPU usage a C# program takes?

Console.WriteLine slow

I run through millions of records and sometimes I have to debug using Console.WriteLine to see what is going on.
However, Console.WriteLine is very slow, considerably slower than writing to a file.
BUT it is very convenient - does anyone know of a way to speed it up?

If it is just for debugging purposes you should use Debug.WriteLine instead. This will most likely be a bit faster than using Console.WriteLine.
Example
Debug.WriteLine("There was an error processing the data.");

You can use the OutputDebugString API function to send a string to the debugger. It doesn't wait for anything to redraw and this is probably the fastest thing you can get without digging into the low-level stuff too much.
The text you give to this function will go into Visual Studio Output window.
[DllImport("kernel32.dll")]
static extern void OutputDebugString(string lpOutputString);
Then you just call OutputDebugString("Hello world!");

Do something like this:
public static class QueuedConsole
{
private static StringBuilder _sb = new StringBuilder();
private static int _lineCount;
public void WriteLine(string message)
{
_sb.AppendLine(message);
++_lineCount;
if (_lineCount >= 10)
WriteAll();
}
public void WriteAll()
{
Console.WriteLine(_sb.ToString());
_lineCount = 0;
_sb.Clear();
}
}
QueuedConsole.WriteLine("This message will not be written directly, but with nine other entries to increase performance.");
//after your operations, end with write all to get the last lines.
QueuedConsole.WriteAll();
Here is another example: Does Console.WriteLine block?

I recently did a benchmark battery for this on .NET 4.8. The tests included many of the proposals mentioned on this page, including Async and blocking variants of both BCL and custom code, and then most of those both with and without dedicated threading, and finally scaled across power-of-2 buffer sizes.
The fastest method, now used in my own projects, buffers 64K of wide (Unicode) characters at a time from .NET directly to the Win32 function WriteConsoleW without copying or even hard-pinning. Remainders larger than 64K, after filling and flushing one buffer, are also sent directly, and in-situ as well. The approach deliberately bypasses the Stream/TextWriter paradigm so it can (obviously enough) provide .NET text that is already Unicode to a (native) Unicode API without all the superfluous memory copying/shuffling and byte[] array allocations required for first "decoding" to a byte stream.
If there is interest (perhaps because the buffering logic is slightly intricate), I can provide the source for the above; it's only about 80 lines. However, my tests determined that there's a simpler way to get nearly the same performance, and since it doesn't require any Win32 calls, I'll show this latter technique instead.
The following is way faster than Console.Write:
public static class FastConsole
{
static readonly BufferedStream str;
static FastConsole()
{
Console.OutputEncoding = Encoding.Unicode; // crucial
// avoid special "ShadowBuffer" for hard-coded size 0x14000 in 'BufferedStream'
str = new BufferedStream(Console.OpenStandardOutput(), 0x15000);
}
public static void WriteLine(String s) => Write(s + "\r\n");
public static void Write(String s)
{
// avoid endless 'GetByteCount' dithering in 'Encoding.Unicode.GetBytes(s)'
var rgb = new byte[s.Length << 1];
Encoding.Unicode.GetBytes(s, 0, s.Length, rgb, 0);
lock (str) // (optional, can omit if appropriate)
str.Write(rgb, 0, rgb.Length);
}
public static void Flush() { lock (str) str.Flush(); }
};
Note that this is a buffered writer, so you must call Flush() when you have no more text to write.
I should also mention that, as shown, technically this code assumes 16-bit Unicode (UCS-2, as opposed to UTF-16) and thus won't properly handle 4-byte escape surrogates for characters beyond the Basic Multilingual Plane. The point hardly seems important given the more extreme limitations on console text display in general, but could perhaps still matter for piping/redirection.
Usage:
FastConsole.WriteLine("hello world.");
// etc...
FastConsole.Flush();
On my machine, this gets about 77,000 lines/second (mixed-length) versus only 5,200 lines/sec under identical conditions for normal Console.WriteLine. That's a factor of almost 15x speedup.
These are controlled comparison results only; note that absolute measurements of console output performance are highly variable, depending on the console window settings and runtime conditions, including size, layout, fonts, DWM clipping, etc.

Why Console is slow:
Console output is actually an IO stream that's managed by your operating system. Most IO classes (like FileStream) have async methods but the Console class was never updated so it always blocks the thread when writing.
Console.WriteLine is backed by SyncTextWriter which uses a global lock to prevent multiple threads from writing partial lines. This is a major bottleneck that forces all threads to wait for each other to finish the write.
If the console window is visible on screen then there can be significant slowdown because the window needs to be redrawn before the console output is considered flushed.
Solutions:
Wrap the Console stream with a StreamWriter and then use async methods:
var sw = new StreamWriter(Console.OpenStandardOutput());
await sw.WriteLineAsync("...");
You can also set a larger buffer if you need to use sync methods. The call will occasionally block when the buffer gets full and is flushed to the stream.
// set a buffer size
var sw = new StreamWriter(Console.OpenStandardOutput(), Encoding.UTF8, 8192);
// this write call will block when buffer is full
sw.Write("...")
If you want the fastest writes though, you'll need to make your own buffer class that writes to memory and flushes to the console asynchronously in the background using a single thread without locking. The new Channel<T> class in .NET Core 2.1 makes this simple and fast. Plenty of other questions showing that code but comment if you need tips.

A little old thread and maybe not exactly what the OP is looking for, but I ran into the same question recently, when processing audio data in real time.
I compared Console.WriteLine to Debug.WriteLine with this code and used DebugView as a dos box alternative. It's only an executable (nothing to install) and can be customized in very neat ways (filters & colors!). It has no problems with tens of thousands of lines and manages the memory quite well (I could not find any kind of leak, even after days of logging).
After doing some testing in different environments (e.g.: virtual machine, IDE, background processes running, etc) I made the following observations:
Debug is almost always faster
For small bursts of lines (<1000), it's about 10 times faster
For larger chunks it seems to converge to about 3x
If the Debug output goes to the IDE, Console is faster :-)
If DebugView is not running, Debug gets even faster
For really large amounts of consecutive outputs (>10000), Debug gets slower and Console stays constant. I presume this is due to the memory, Debug has to allocate and Console does not.
Obviously, it makes a difference if DebugView is actually "in-view" or not, as the many gui updates have a significant impact on the overall performance of the system, while Console simply hangs, if visible or not. But it's hard to put numbers on that one...
I did not try multiple threads writing to the Console, as I think this should generally avoided. I never had (performance) problems when writing to Debug from multiple threads.
If you compile with Release settings, usually all Debug statements are omitted and Trace should produce the same behaviour as Debug.
I used VS2017 & .Net 4.6.1
Sorry for so much code, but I had to tweak it quite a lot to actually measure what I wanted to. If you can spot any problems with the code (biases, etc.), please comment. I would love to get more precise data for real life systems.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading;
namespace Console_vs_Debug {
class Program {
class Trial {
public string name;
public Action console;
public Action debug;
public List < float > consoleMeasuredTimes = new List < float > ();
public List < float > debugMeasuredTimes = new List < float > ();
}
static Stopwatch sw = new Stopwatch();
private static int repeatLoop = 1000;
private static int iterations = 2;
private static int dummy = 0;
static void Main(string[] args) {
if (args.Length == 2) {
repeatLoop = int.Parse(args[0]);
iterations = int.Parse(args[1]);
}
// do some dummy work
for (int i = 0; i < 100; i++) {
Console.WriteLine("-");
Debug.WriteLine("-");
}
for (int i = 0; i < iterations; i++) {
foreach(Trial trial in trials) {
Thread.Sleep(50);
sw.Restart();
for (int r = 0; r < repeatLoop; r++)
trial.console();
sw.Stop();
trial.consoleMeasuredTimes.Add(sw.ElapsedMilliseconds);
Thread.Sleep(1);
sw.Restart();
for (int r = 0; r < repeatLoop; r++)
trial.debug();
sw.Stop();
trial.debugMeasuredTimes.Add(sw.ElapsedMilliseconds);
}
}
Console.WriteLine("---\r\n");
foreach(Trial trial in trials) {
var consoleAverage = trial.consoleMeasuredTimes.Average();
var debugAverage = trial.debugMeasuredTimes.Average();
Console.WriteLine(trial.name);
Console.WriteLine($ " console: {consoleAverage,11:F4}");
Console.WriteLine($ " debug: {debugAverage,11:F4}");
Console.WriteLine($ "{consoleAverage / debugAverage,32:F2} (console/debug)");
Console.WriteLine();
}
Console.WriteLine("all measurements are in milliseconds");
Console.WriteLine("anykey");
Console.ReadKey();
}
private static List < Trial > trials = new List < Trial > {
new Trial {
name = "constant",
console = delegate {
Console.WriteLine("A static and constant string");
},
debug = delegate {
Debug.WriteLine("A static and constant string");
}
},
new Trial {
name = "dynamic",
console = delegate {
Console.WriteLine("A dynamically built string (number " + dummy++ + ")");
},
debug = delegate {
Debug.WriteLine("A dynamically built string (number " + dummy++ + ")");
}
},
new Trial {
name = "interpolated",
console = delegate {
Console.WriteLine($ "An interpolated string (number {dummy++,6})");
},
debug = delegate {
Debug.WriteLine($ "An interpolated string (number {dummy++,6})");
}
}
};
}
}

Just a little trick I use sometimes: If you remove focus from the Console window by opening another window over it, and leave it until it completes, it won't redraw the window until you refocus, speeding it up significantly. Just make sure you have the buffer set up high enough that you can scroll back through all of the output.

Try using the System.Diagnostics Debug class? You can accomplish the same things as using Console.WriteLine.
You can view the available class methods here.

C# -calculate download/upload time using network bandwidth

As a feature in the application which Im developing, I need to show the total estimated time left to upload/download a file to/from server.
how would it possible to get the download/upload speed to the server from client machine.
i think if im able to get speed then i can calculate time by -->
for example ---for a 200 Mb file = 200(1024 kb) = 204800 kb and
divide it by 204800 Mb / speed Kb/s = "x" seconds

The upload/download speed is no static property of a server, it depends on your specific connection and may also vary over time. Most application I've seen do an estimation over a short time window. That means they start downloading/uploading and measure the amount of data over, lets say 10 seconds. This is then taken as the current transfer speed and used to calculate the remaining time (e.g. 2500kB / 10s -> 250Kb/s). The time window is moved on and recalculated continuously to keep the calculation accurate to the current speed.
Although this is a quite basic approach, it will serve well in most cases.

Try something like this:
int chunkSize = 1024;
int sent = 0
int total = reader.Length;
DateTime started = DateTime.Now;
while (reader.Position < reader.Length)
{
byte[] buffer = new byte[
Math.Min(chunkSize, reader.Length - reader.Position)];
readBytes = reader.Read(buffer, 0, buffer.Length);
// send data packet
sent += readBytes;
TimeSpan elapsedTime = DateTime.Now - started;
TimeSpan estimatedTime =
TimeSpan.FromSeconds(
(total - sent) /
((double)sent / elapsedTime.TotalSeconds));
}

This is only tangentially related, but I assume if you're trying to calculate total time remaining, you're probably also going to be showing it as some kind of progress bar. If so, you should read this paper by Chris Harrison about perceptual differences. Here's the conclusion straight from his paper (emphasis mine).
Different progress bar behaviors appear to have a significant effect on user perception of process duration. By minimizing negative behaviors and incorporating positive behaviors, one can effectively make progress bars and their associated processes appear faster. Additionally, if elements of a multistage operation can be rearranged, it may be possible to reorder the stages in a more pleasing and seemingly faster sequence.
http://www.chrisharrison.net/projects/progressbars/ProgBarHarrison.pdf

I don't know why do you need this but i would go simpliest way possible and ask user what connection type he has. Then take file size divide it by speed and then by 8 to get number of seconds.
Point is you won't need processing power to calculate speeds. Microsoft on their website use function that calculates a speed for most default connections based on file size which you can get while uploading the file or to enter it manually.
Again, maybe you have other needs and you must calculate upload on fly...

The following code computes the remaining time in minute.
long totalRecieved = 0;
DateTime lastProgressChange = DateTime.Now;
Stack<int> timeSatck = new Stack<int>(5);
Stack<long> byteSatck = new Stack<long>(5);
using (WebClient c = new WebClient())
{
c.DownloadProgressChanged += delegate(object s, DownloadProgressChangedEventArgs args)
{
long bytes;
if (totalRecieved == 0)
{
totalRecieved = args.BytesReceived;
bytes = args.BytesReceived;
}
else
{
bytes = args.BytesReceived - totalRecieved;
}
timeSatck.Push(DateTime.Now.Subtract(lastProgressChange).Seconds);
byteSatck.Push(bytes);
double r = timeSatck.Average() * ((args.TotalBytesToReceive - args.BytesReceived) / byteSatck.Average());
this.textBox1.Text = (r / 60).ToString();
totalRecieved = args.BytesReceived;
lastProgressChange = DateTime.Now;
};
c.DownloadFileAsync(new Uri("http://www.visualsvn.com/files/VisualSVN-1.7.6.msi"), #"C:\SVN.msi");
}

I think I ve got the estimated time to download.
double timeToDownload = ((((totalFileSize/1024)-((fileStream.Length)/1024)) / Math.Round(currentSpeed, 2))/60);
this.Invoke(new UpdateProgessCallback(this.UpdateProgress), new object[] {
Math.Round(currentSpeed, 2), Math.Round(timeToDownload,2) });
where
private void UpdateProgress(double currentSpeed, double timeToDownload)
{
lblTimeUpdate.Text = string.Empty;
lblTimeUpdate.Text = " At Speed of " + currentSpeed + " it takes " + timeToDownload +" minute to complete download";
}
and current speed is calculated like
TimeSpan dElapsed = DateTime.Now - dStart;
if (dElapsed.Seconds > 0) {currentSpeed = (fileStream.Length / 1024) / dElapsed.Seconds;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.