If in testing on a computer without a debugger, say a client's computer, I encounter a bug that may have corrupted the state of the program but not actually crashed it, I know I can take a memory dump using the Windows Task Manager (right click on process name, create dump file).
I can use these with WinDbg to peek around in memory, etc., but what would be most useful to me is to be able to restore the dump into memory so that I can continue interacting with the program. Is this possible? If so, how? Is there a tool that can restore it or do I need to write my own.
The typical usermode dumps or minidumps do not contain enough information to do so. While they contain all usermode memory, they do not contain kernel memory, so open handles to kernel resources like files or network sockets will not be included in the dump (and even if they were, the hard disk has most likely changed so just trying to write to the hard disk may corrupt your system even more).
The only way I see to restore a memory dump is restoring the full memory and all other state like hard disk state, which can be done with most virtual machine software (which will, however, disconnect all your network connections on restore; gratefully most programs can handle lost network connectsions better than lost file handles).
I discovered that I could do this with Hyper-V snapshots. If I run my program in a virtual machine, I can optionally dump the memory, create a snapshot, transfer the dump if necessary, come back some time later, restore the snapshot and continue the program.
Related
I've designed a logging service for multiple applications in C#.
Since the thinking of performance saving, all logs should be store in buffer first, and then write to log file when buffer is full.
However, there are some of extension cards (PCI / PCI-e) causing BSoD, which are not in my control. The logs in the buffer will lose when the BSoD occurs, but I want to find a way to keep them.
I've found some articles are discussing about how to dumping data when software crashed. However, the minidump one needs to dump everything by myself, and I think it will cause some performance issues; the other articles (A)(B) are only suitable in single application crash.
Do anyone have any suggestion to save my logs even if BSoD occurs?
EDIT: if there are any suggestion to reduce the loss of data to minimize is also welcome.
Since the buffer you have in your C# application is not written to disk for performance reasons, the only other place left for it is memory (RAM). Since you don't know how Windows manages your memory at the moment of the crash, we have to consider two cases: a) the log is really in RAM and b) that RAM has been swapped to disk (page file). To get access to all RAM of a BSoD, you have to configure Windows to create a full memory dump instead of a kernel minidump.
At the time of a blue screen, the operating system stops relying on almost anything, even most kernel drivers. The only attempt it makes is to write the contents of the physical RAM onto disk. Furthermore, since it cannot even rely on valid NTFS data structures, it writes to the only place of contiguous disk space it knows: the page file. That's also the reason why your page file needs to be at least as large as physical RAM plus some metadata, otherwise it won't be able to hold the information.
At this point we can already give an answer to case b): if your log was actually swapped to the page file, it will likely be overwritten by the dump.
If the buffer was really part of the working set (RAM), that part will be included in the kernel dump. Debugging .NET applications from a kernel dump is almost impossible, because the SOS commands to analyze the .NET heaps only work for a user mode full memory dump. If you can identify your log entries by some other means (e.g. a certain substring), you may of course do a simple string search on the kernel dump.
All in all, what you're trying to achieve sounds like a XY-problem. If you want to test your service, why don't you remove or replace the unrelated problematic PCI cards or test on a different PC?
If blue screen logging is an explicit feature of your logging service, you should have considered this as a risk and evaluated before writing the service. That's a project management issue and off-topic for StackOverflow.
Unfortunately I have to confirm what #MobyDisk said: it's (almost) impossible and at least unreliable.
I have a problem with an application that I wrote in .NET/C#. It consists of a server which manages a few other machines, and runs tests on them. It is a windows forms application. In order to run tests with proper error handling, for each machine I have two threads: one for running tests and one that pings it continuously. Each machine has a running queue, in which tasks are stored, tasks that will be run on that particular machine.
The issue is that after some time, when more than a few tasks are present in the queue, the memory it consumes(process explorer, task manager) gradually increases from about 50-100MB to 1.6-1.8 GB. At about this limit almost every transaction(file copy on share, remote WMI access) with the remote machines fails with either "Not enough storage" or "Out of memory". I tried some tools in order to localize the string and the closest I got was .Net Memory Profiler. That wasn't of great help, because the largest amount of memory was residing in "Private Data - Unidentified". This I'm guessing it's unmanaged data, because I can evaluate every other data(from my program) down to each string and int, and every instance of it.
Can anyone suggest a tool I can use in order to properly localize the leak and maybe fix it. It would help me a lot if I would know the DLL(from my app)/Thread that uses that memory, or at least if I can view somehow what is in that memory.
Note: A lot of posts are out there about the two exceptions: Not enough storage, and Out of memory. Most of them suggest increasing the IRPStackSize on the 'server' machine(in my case, clients). I have IRPStackSize of 50(0x32) on all of the machines, including the server.
EDIT
Regarding the comments: yes, I do maintain a log, but nothing strange happens. Using a memory profiler I discovered that my application, the .NET side uses about 20MB of memory when the unmanaged part is well over 1GB. With the help of WinDbg I found out what resides in that extra memory(in most of it). In order to access the machines and run different tests on them I use WMI, for which I have a wrapper. Everything I use is being disposed(using statements, and for some actually calling the Dispose method. Strangely though, the memory is filled with clones of this class. Does anyone know why a class would clone itself in memory.
Note: the rate at which the memory usage increases in about 5MB/s, so it's not really over a long period of time. I also wonder why it is not being freed by the garbage collector. I am using C# classes to work with WMI, not COM, nor unmanaged code. Also, among the objects on the heap I see a lot of data belonging to wmiutils, CWbemError. Oddly enough, google doesn't even know the word(no results for CWbemError)
I have a C#.NET service running in production. The service functions as a TCP server to which clients register and make requests against. In looking at the Task Manager, it appears to be leaking about 10MB/day. I don't seem to notice these in dev (perhaps because of far less traffic and client activity). In searching around I've read that the Task Manager can be seriously wrong, but I'm not sure how accurate this is or in what circumstances the TM would display incorrect information.
To solve this problem I need to more closely monitor memory consumption. The problem is that the leak only seems to appear in production, where the deployed service was built for Release. Also since it's a service that can't be run directly be VS with an attached profiler/debugging, I'm not sure how to best pinpoint the problem with something more precise than TM.
Any group wisdom would be much appreciated, thanks.
EDIT:
I've added perfmon counters for the privates bytes of the service (7MB to start out) as well as CLR mem in all heaps (30MB to start out)
Task manager says the total memory to be ~37MB so this seems to make sense
The first part of this is to let the service go for a day and check out my counters again.
If my private bytes get huge but CLR mem is roughly static this would indicate an unmanaged leak. If both get huge then it's a managed leak.
Thanks guys.
Your first task is figuring out if the process is leaking memory. You can do this with perfmon measuring the Private Bytes
http://www.goldstarsoftware.com/papers/CapturingVirtualBytesToALogFile.pdf
If the graph is consistently rising (for say half an hour ) you have a memory leak. You can then use other counters to figure out if this is a .NET leak (.NET memory) though this is unlikely. I find that in most of these cases, there is a COM component that is being invoked but not released.
If you truly have a memory leak (and this isn't just variable memory usage)- the process will shutdown with an out of memory exception after running for a while.
You need one of the below MemoryProfilers in order to monitor it;
http://www.jetbrains.com/profiler/
http://www.red-gate.com/products/dotnet-development/ants-memory-profiler/
There are other choices but these are very capable and you can profile remote application's memory with them (at least JetBrains's solution handles that)
Follow this guide: http://blogs.msdn.com/b/tess/archive/2008/03/25/net-debugging-demos-lab-7-memory-leak.aspx
It goes over exactly what you're describing, a memory leak in production. As was mentioned you have to first determine whether it's unmanaged code or managed code that's leaking using perfmon and Private Bytes.
In general make sure for networking objects you're wrapping them in using statements so that they're properly disposed.
A workflow I often use for managed memory leaks is to start the server on a test machine, hit it with a known amount of connections (say 123,456 connections). Then take a memory snapshot by going to task manager and right clicking on the process name and selecting 'create dump'. Open this dump with WinDBG and SOS and run the command !dumpheap -stat. Look for objects that have a multiple of 123,456 instances. Should these objects still be in memory? If not run a !gcroot on an instance of those objects to find why it's still in memory.
Get a dump of the memory when its in a leak state using the Task Manager right click on the process and select create dump file. You can also use ProcDump which gives you more options.
Use SOS Extensions in either WinDebug or Visual Studio to inspect the memory.
We are using File.WriteAllBytes to write data to the disk. But if a reboot happens just about the time when we close the file, windows adds null to the file. This seems to be happening on Windows 7. So once we come back to the file we see nulls in the file. Is there a way to prevent this. Is windows closing it's internal handle after certain time and can this be forced to close immediately ?.
Depending on what behavior you want; you can either put it in a UPS as 0A0D suggested; but in addition you can use Windows' Vista+ Transactional NTFS functionality. This allows you to atomically write to the file system. So in your case; nothing would be written rather than improper data. It isn't directly part of the .NET Framework yet; but there are plenty of managed wrappers to be found online.
Sometimes no data is better than wrong data. When your application starts up again; it can see that the file is missing; it can "continue" from where it left off; depending on what your application does.
Based on your comments, there is no guarantees when writing a file - especially if you lose power during a file write. Your best bet is to put the PC on an Uninterruptable Power Supply. If you are able to create an auto-restore mechanism, like Microsoft Office products, then that would prevent complete loss of data but it won't fix the missing data upon power loss.
I would consider this a case of a fatal exception (sudden loss of power). There isn't anything you can do about it, and generally, trying to handle them only makes matters worse.
I have had to deal with something similar; essentially an embedded system running on Windows, where the expectation is that the power might be shut off at any time.
In practice, I work with the understanding that a file written to disk less than 10 seconds before loss-of-power means that the file will be corrupted. (I use 30 seconds in my code to play it safe).
I am not aware of any way of guaranteeing from code that a file has been fully closed, flushed to disk, and that the disk hardware has finalized its writes. Except to know that 10 (or 30) seconds has elapsed. It's not a very satisfying situation, but there it is.
Here are some pointers I have used in a real-life embedded project...
Use a system of checksums and backup files.
Checksums: at the end of any file you write, include a checksum (if it's a custom XML file then perhaps include a <checksum .../> tag of some sort). Then upon reading, if the checksum tag isn't there, or doesn't match the data, then you must reject the file as corrupt.
Backups: every time you write a file, save a copy to one of two backups; say A and B. If A exists on disk but is less than 30 seconds old, then copy to B instead. Then upon reading, read the original file first. If corrupt, then read A, if corrupt then read B.
Also
If it is an embedded system, you need to run the DOS command "chkdsk /F" on the drive you do your writes to, upon boot. Because if you are getting corrupted files, then you are also going to be getting a corrupted file system.
NTFS disk systems are meant to be more robust against errors than FAT32. But I believe that NTFS disks can also require more time to fully flush their data. I use FAT32 when I can.
Final thought: if you are really using an embedded system, under windows, you would do well to learn more about Windows Embedded, and the Enhanced Write Filter system.
See this SuperUser question. To summarize, VM software lets you save state of arbitrary applications (by saving the whole VM image).
Would it be possible to write some software for Windows that allows you to save and reload arbitrary application state? If so (and presumably so), what would it entail?
I would be looking to implement this, if possible, in a high-level language like C#. I presume if I used something else, I would need to dump memory registers (or maybe dump the entire application memory block) to a file somewhere and load it back somewhere to refresh state.
So how do I build this thing?
Well, it's unlikely that this is going to happen, especially not in C# - addressing the low level requirements in managed code would hardly be possible. Saving the state of a whole virtual machine is actually easier than just saving single processes, all you have to do is dump the whole machine memory and ensure a consistent disk image, which, given the capabilities of Virtualization software, is rather straightforward.
Restoring a single process would imply loading all shared objects that the process refers to, including whatever objects the process refers to in kernel space, i.e. file/memory/mutex/whatever handles, and without the whole machine / operating system being virtual, this would mean poking deep down in the internals of Windows...
All I'm saying is: while it is possible, the effort would be tremendous and probably not worth it.