Referencing stream.Position dramatically increase execution time

Referencing stream.Position dramatically increase execution time - c#

Any idea why referencing the Position property of a stream dratically increases IO time ?
The execution time of:
sw.Restart();
fs = new FileStream("tmp", FileMode.Open);
var br = new BinaryReader(fs);
for (int i = 0; i < n; i++)
{
fs.Position+=0; //Should be a NOOP
a[i] = br.ReadDouble();
}
Debug.Write("\n");
Debug.Write(sw.ElapsedMilliseconds.ToString());
Debug.Write("\n");
fs.Close();
sw.Stop();
Debug.Write(a[0].ToString() + "\n");
Debug.Write(a[n - 1].ToString() + "\n");
is ~100 times slower than the equivalent loop without the "fs.Position+=0;".
Normally the intention with using Seek (or manipulating the Position property) is to speed up things when you dont need all data in the file.
But if, e.g., you only need every second value in the file, it is aparently much faster to read the entire file and discard the data that you dont need, than to skip every second value in the file by moving Stream.Position

You're doing two things:
Fetching the position
Setting it
It's entirely possible that each of those performs a direct interaction with the underlying Win32 APIs, whereas normally you can read a fair amount of data without having to interop with native code at all, due to buffering.
I'm slightly surprised at the extent to which it's worse, but I'm not surprised that it is worse. I think it would be worth you doing separate tests to find out which has more effect - the read or the write. So write similar code which either just reads or just writes. Note that that should affect the code you write later, but it may satisfy your curiosity a bit further.
You might also try
fs.Seek(0, SeekOrigin.Current);
... which is more likely to be ignored as a genuine no-op. But even so, skipping a single byte using fs.Seek(1, SeekOrigin.Current) may well be expensive again.

From the reflector:
public override long Position {
[SecuritySafeCritical]
get {
if (this._handle.IsClosed) {
__Error.FileNotOpen();
}
if (!this.CanSeek) {
__Error.SeekNotSupported();
}
if (this._exposedHandle) {
this.VerifyOSHandlePosition();
}
return this._pos + (long)(this._readPos - this._readLen + this._writePos);
}
set {
if (value < 0L) {
throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));
}
if (this._writePos > 0) {
this.FlushWrite(false);
}
this._readPos = 0;
this._readLen = 0;
this.Seek(value, SeekOrigin.Begin);
}
}
(self explanatory) - essentially every set causes flush and there is no check if you're setting the same value to the position; if you're not happy with the FileStream, make your own stream proxy that would handle position update more gracefully :)

Related

Is Marshal.Copy too processor-intensive in this situation?

I am working on a realtime simulation model. The models are written in unmanaged code, but the models are controlled by C# managed code, called the ExecutiveManager. An ExecutiveManager runs multiple models at a time, and controls the timing of the running models (like if a model has a "framerate" of 20 per second, the executive will tell the models when to start it's next frame).
We are seeing a consistently high load on the CPU when running the simulation, it can get up to 100% and stay there on a machine that should be totally appropriate. I have used a processor profiler to determine where the issues are, and it pointed me to two methods: WriteMemoryRegion and ReadMemoryRegion. The ExecutiveManager makes the calls to these methods. Models have shared memory regions, and the ExecutiveManager is used to read and write these regions using these Methods. Both read and write make calls to Marshal.Copy, and my gut tells me that's where the issue is, but I don't want to trust my gut! We are going to do further testing to narrow things down more, but I wanted to do a quick sanity check on Marshal.Copy. WriteMemoryRegion and ReadMemoryRegion are called each frame, and furthermore they're called by each model in the ExecutiveManager, and each model typically has 6 shared regions. So for 10 models each with 6 regions running at 20 frames per second calling both WriteMemoryRegion and ReadMemoryRegion, that's 2400 calls of Marshal.Copy per second. Is this unreasonable, or could my problem lie elsewhere?
public async Task ReadMemoryRegion(MemoryRegionDefinition g) {
if (!cache.ContainsKey(g.Name)) {
cache.Add(g.Name, mmff.CreateOrOpen(g.Name, g.Size));
}
var mmf = cache[g.Name];
using (var stream = mmf.CreateViewStream())
using (var reader = brf.Create(stream)) {
var buffer = reader.ReadBytes(g.Size);
await WriteIcBuffer(g, buffer).ConfigureAwait(false);
}
}
private Task WriteIcBuffer(MemoryRegionDefinition g, byte[] buffer) {
Marshal.Copy(buffer, 0, new IntPtr(g.BaseAddress),
buffer.Length);
return Task.FromResult(0);
}
public async Task WriteMemoryRegion(MemoryRegionDefinition g) {
if (!cache.ContainsKey(g.Name)) {
if (g.Size > 0) {
cache.Add(g.Name, mmff.CreateOrOpen(g.Name, g.Size));
} else if (g.Size == 0){
throw new EmptyGlobalException($#"Global {g.Name} not
created as it does not contain any variables.");
} else {
throw new NegativeSizeGlobalException($#"Global {g.Name}
not created as it has a negative size.");
}
}
var mmf = cache[g.Name];
using (var stream = mmf.CreateViewStream())
using (var writer = bwf.Create(stream)) {
var buffer = await ReadIcBuffer(g);
writer.Write(buffer);
}
}
private Task<byte[]> ReadIcBuffer(MemoryRegionDefinition g) {
var buffer = new byte[g.Size];
Marshal.Copy(new IntPtr(g.BaseAddress), buffer, 0, g.Size);
return Task.FromResult(buffer);
}
I need to come up with a solution so that my processor isn't catching on fire. I'm very green in this area so all ideas are welcome. Again, I'm not sure Marshal.Copy is the issue, but it seems possible. Please let me know if you see other issues that could contribute to the processor problem.

My static method seems to perform faster with reuse. Why? Does it become cached?

I created code to see how fast my static compression method takes to execute, and I noticed that the first execution takes 8,500,000 nanoseconds, the second about half that, and then everything after that executes at 0 nanoseconds. Why?
private void CheckPerformance()
{
while (KeepRunning)
{
//generates a complex random 500 character string
string text = GenerateString(500, 4);
DateTime beginTime = DateTime.Now;
byte[] bytes = Compress(text); // < - timing this
long elapsedTicks = DateTime.Now.Ticks - beginTime.Ticks;
Console.WriteLine(" {0:N0} nanoseconds", elapsedTicks * 100);
//sleep for 5 seconds
Thread.Sleep(5000);
}
}
public static byte[] Compress(string text)
{
using (MemoryStream output = new MemoryStream())
{
using (DeflateStream ds = new DeflateStream(output, CompressionMode.Compress))
{
using (StreamWriter writer = new StreamWriter(ds, Encoding.UTF8))
{
writer.Write(text);
}
}
return output.ToArray();
}
}

The DateTime.Now gets updates around 10 times per second but don't quote me on this (could be depending on hardware and software setting). It's also slow because it needs to figure out what timezone the system is on. UtcNow is faster but will still be cached for a bit. Therefore, it could be using the cached version on subsequent calls.
Use StopWatch instead for more accurate measurement. StopWatch uses high precision by utilizing hardware. You can check that using Stopwatch.IsHighResolution.
using System.Diagnostics;
Stopwatch sw = new Stopwatch();
sw.Start();
// code to benchmark
sw.Stop();
Let's see if you get same metrics.
EDIT
While it is true that your method needs to be JIT compiled, the difference cannot be due to JIT compilation because it will be JIT compiled only once (not always but in your case it will be once) and then reused. Therefore, only the first call should take longer and the subsequent calls should be the same. To throw this assumption away, simply call Compress once outside the benchmarking phase so it is JIT compiled. Then benchmark it, and now the JIT compilation will not occur, and DateTime will still give you random results because it is cached.
Note: The JIT compiler does not necessarily always compile the whole method into machine code but only when execution passes through the code. So if you have if statements, the blocks may not be compiled until execution is passed through the block. But yours have no if statements so that is why it will be JIT compiled once.
Furthermore, we cannot say with confidence this was due to JIT compilation since it is possible Compress method could have been inlined but in your case it was most likely not inlined since you most likely had the debugger on and therefore, JIT optimization will be disabled.
Try this code and you will notice it gives random results for elapsed time even though the same code is executed:
for (int i = 0; i < 1000; i++)
{
DateTime beginTime = DateTime.UtcNow;
var sw = Stopwatch.StartNew();
while (sw.ElapsedTicks < 100)
{
Console.WriteLine("*");
}
long elapsedTicks = DateTime.UtcNow.Ticks - beginTime.Ticks;
Console.WriteLine(" {0:N0} nanoseconds", elapsedTicks * 100);
}
On my system if I change this line to sw.ElapsedTicks < 2050, then there is always is a difference of non zero reported consistently. Which means right around there is when DateTime.Now gets a new value instead of using the cached one.
In conclusion, I do not buy that JIT compilation is the explanation for what you are noticing.

When you hit it the first time, it gets jitted. So that takes time. Not sure though why the second and third times differ.

Perform operations while streamreader is open or copy stream locally, close the stream and then perform operations?

Which of the following approaches is better? I meant to ask, is it better to copy the stream locally, close it and do whatever operations that are needed to be done using the data? or just perform operations with the stream open? Assume that the input from the stream is huge.
First method:
public static int calculateSum(string filePath)
{
int sum = 0;
var list = new List<int>();
using (StreamReader sr = new StreamReader(filePath))
{
while (!sr.EndOfStream)
{
list.Add(int.Parse(sr.ReadLine()));
}
}
foreach(int item in list)
sum += item;
return sum;
}
Second method:
public static int calculateSum(string filePath)
{
int sum = 0;
using (StreamReader sr = new StreamReader(filePath))
{
while (!sr.EndOfStream)
{
sum += int.Parse(sr.ReadLine());
}
}
return sum;
}

If the file is modified often, then read the data in and then work with it. If it is not accessed often, then you are fine to read the file one line at a time and work with each line separately.

In general, if you can do it in a single pass, then do it in a single pass. You indicate that the input is huge, so it might not all fit into memory. If that's the case, then your first option isn't even possible.
Of course, there are exceptions to every rule of thumb. But you don't indicate that there's anything special about the file or the access pattern (other processes wanting to access it, for example) that prevents you from keeping it open longer than absolutely necessary to copy the data.
I don't know if your example is a real-world scenario or if you're just using the sum thing as a placeholder for more complex processing. In any case, if you're processing a file line-by-line, you can save yourself a lot of trouble by using File.ReadLines:
int sum = 0;
foreach (var line in File.ReadLines(filePath))
{
sum += int.Parse(line);
}
This does not read the entire file into memory at once. Rather, it uses an enumerator to present one line at a time, and only reads as much as it must to maintain a relatively small (probably four kilobyte) buffer.

How does IEnumerable differ from IObservable under the hood?

I'm curious as to how IEnumerable differs from IObservable under the hood. I understand the pull and push patterns respectively but how does C#, in terms of memory etc, notify subscribers (for IObservable) that it should receive the next bit of data in memory to process? How does the observed instance know it's had a change in data to push to the subscribers.
My question comes from a test I was performing reading in lines from a file. The file was about 6Mb in total.
Standard Time Taken: 4.7s, lines: 36587
Rx Time Taken: 0.68s, lines: 36587
How is Rx able to massively improve a normal iteration over each of the lines in the file?
private static void ReadStandardFile()
{
var timer = Stopwatch.StartNew();
var linesProcessed = 0;
foreach (var l in ReadLines(new FileStream(_filePath, FileMode.Open)))
{
var s = l.Split(',');
linesProcessed++;
}
timer.Stop();
_log.DebugFormat("Standard Time Taken: {0}s, lines: {1}",
timer.Elapsed.ToString(), linesProcessed);
}
private static void ReadRxFile()
{
var timer = Stopwatch.StartNew();
var linesProcessed = 0;
var query = ReadLines(new FileStream(_filePath, FileMode.Open)).ToObservable();
using (query.Subscribe((line) =>
{
var s = line.Split(',');
linesProcessed++;
}));
timer.Stop();
_log.DebugFormat("Rx Time Taken: {0}s, lines: {1}",
timer.Elapsed.ToString(), linesProcessed);
}
private static IEnumerable<string> ReadLines(Stream stream)
{
using (StreamReader reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}

My hunch is the behavior you're seeing is reflecting the OS caching the file. I would imagine if you reversed the order of the calls you would see a similar difference in speeds, just swapped.
You could improve this benchmark by performing a few warm-up runs or by copying the input file to a temp file using File.Copy prior to testing each one. This way the file would not be "hot" and you would get a fair comparison.

I'd suspect that you're seeing some kind of internal optimization of the CLR. It probably caches the content of the file in memory between the two calls so that ToObservable can pull the content much faster...
Edit: Oh, the good colleague with the crazy nickname eeh ... #sixlettervariables was faster and he's probably right: it's rather the OS who's optimizing than the CLR.

Dual-queue producer-consumer in .NET (forcing member variable flush)

I have a thread which produces data in the form of simple object (record). The thread may produce a thousand records for each one that successfully passes a filter and is actually enqueued. Once the object is enqueued it is read-only.
I have one lock, which I acquire once the record has passed the filter, and I add the item to the back of the producer_queue.
On the consumer thread, I acquire the lock, confirm that the producer_queue is not empty,
set consumer_queue to equal producer_queue, create a new (empty) queue, and set it on producer_queue. Without any further locking I process consumer_queue until it's empty and repeat.
Everything works beautifully on most machines, but on one particular dual-quad server I see in ~1/500k iterations an object that is not fully initialized when I read it out of consumer_queue. The condition is so fleeting that when I dump the object after detecting the condition the fields are correct 90% of the time.
So my question is this: how can I assure that the writes to the object are flushed to main memory when the queue is swapped?
Edit:
On the producer thread:
(producer_queue above is m_fillingQueue; consumer_queue above is m_drainingQueue)
private void FillRecordQueue() {
while (!m_done) {
int count;
lock (m_swapLock) {
count = m_fillingQueue.Count;
}
if (count > 5000) {
Thread.Sleep(60);
} else {
DataRecord rec = GetNextRecord();
if (rec == null) break;
lock (m_swapLock) {
m_fillingQueue.AddLast(rec);
}
}
}
}
In the consumer thread:
private DataRecord Next(bool remove) {
bool drained = false;
while (!drained) {
if (m_drainingQueue.Count > 0) {
DataRecord rec = m_drainingQueue.First.Value;
if (remove) m_drainingQueue.RemoveFirst();
if (rec.Time < FIRST_VALID_TIME) {
throw new InvalidOperationException("Detected invalid timestamp in Next(): " + rec.Time + " from record " + rec);
}
return rec;
} else {
lock (m_swapLock) {
m_drainingQueue = m_fillingQueue;
m_fillingQueue = new LinkedList<DataRecord>();
if (m_drainingQueue.Count == 0) drained = true;
}
}
}
return null;
}
The consumer is rate-limited, so it can't get ahead of the consumer.
The behavior I see is that sometimes the Time field is reading as DateTime.MinValue; by the time I construct the string to throw the exception, however, it's perfectly fine.

Have you tried the obvious: is microcode update applied on the fancy 8-core box(via BIOS update)? Did you run Windows Updates to get the latest processor driver?
At the first glance, it looks like you're locking your containers. So I am recommending the systems approach, as it sound like you're not seeing this issue on a good-ol' dual core box.

Assuming these are in fact the only methods that interact with the m_fillingQueue variable, and that DataRecord cannot be changed after GetNextRecord() creates it (read-only properties hopefully?), then the code at least on the face of it appears to be correct.
In which case I suggest that GregC's answer would be the first thing to check; make sure the failing machine is fully updated (OS / drivers / .NET Framework), becasue the lock statement should involve all the required memory barriers to ensure that the rec variable is fully flushed out of any caches before the object is added to the list.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Referencing stream.Position dramatically increase execution time - c#

Related

Is Marshal.Copy too processor-intensive in this situation?

My static method seems to perform faster with reuse. Why? Does it become cached?

Perform operations while streamreader is open or copy stream locally, close the stream and then perform operations?

How does IEnumerable differ from IObservable under the hood?

Dual-queue producer-consumer in .NET (forcing member variable flush)

Categories

Resources