Embedded resources - resx - performance - C#

Embedded resources - resx - performance - C# - c#

Embedded resources is only used to compile a file as binary part?
Is it a good idea to use embedded resources for performances reasons?
This question concerns
XML files saved as embedded resources
Resx files (strings)
In the two cases, is there same performance benefit: some caching strategy, or some thing like that (since it's could be managed like sources code/assemblies).
For exemple instead of the resx, I can use a hash-table (lately created). In my case that hashtable could be so big to stay forever in memory. So Resx is it a help with some cache strategy ?
I have the same problem for a tree object. Using it from an XML embedded-file couldn't help me, or should I implement all the cache-strategy?
THINKS

The embedded resource feature was added primarily because of performance reasons. There is no way to do it faster on a demand-page virtual memory operating system like Windows. You get the full benefit of a memory-mapped file to read the resource content.
That is not massively better than reading a separate file but you don't pay for having to find the file. Which is usually the costly operation on small resources and heavily affects the cold-start time of an app. Not having a large blob of files to deploy is very much a practical advantage.
They do occupy memory of course but it is virtual memory. Just numbers to the processor, one each for every 4096 bytes. Also the cheap kind of virtual memory, it is backed by the executable file instead of the paging file. You don't actually pay and start to use RAM until you access the resource in your program. That RAM will usually be released again soon unless you repeatedly use the resource. It does set an upper limit on the amount of resource data you can embed, peters out at 2 gigabytes.
That they start as an XML resource only matters at build time, the Resgen.exe tool turns it into a binary blob before it is embedded in the executable file. The advantage of XML is that it plays nice with the IDE and makes it easy to recover the original resource after you lost track of the original art work.

Related

Does e.g. File.ReadAllText() utilize Windows File Caching?

Windows File Caching, as described here:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364218(v=vs.85).aspx
By default, Windows caches file data that is read from disks and
written to disks. This implies that read operations read file data
from an area in system memory known as the system file cache, rather
than from the physical disk.
I have a C# app that reads files from disk in a very frequent, rapid manner. Should I worry about writing my own "file caching" mechanism, or will Windows handle this for me? When observing my app with Process Explorer, I still notice a lot of disk I/O during operation, even though I'm reading the same static file over and over again. Could it be that the Windows Cache Manager is simply telling the operating system that disk IO is taking place, when in fact the file is being read from the cache in memory?

Caching is enabled by default for filesystem operations in all OSes I'm aware of, Windows included. I'd be astonished if the implementation of File.ReadAllText disabled it, because it gives a pretty huge performance benefit.
The filesystem cache is fast, but a custom cache can be purpose-built and therefor much faster.
For instance, ReadAllText needs to decode the file into a string -- the filesystem cache won't help you there. You can also keep that same instance around, so that all parts of your app accessing it reference the same copy. This gives your CPU's cache a better chance of skipping main memory. Less allocations also means reduced GC pressure.
Do you need your own second layer of caching? Maybe, maybe not -- you should write the simplest code you can first, and then work to optimize it if it's a bottleneck after measuring.

When does it become worthwhile to spend the execution time to zip files?

We are using the #ziplib (found here) in an application that synchronizes files from a server for an occasionally connected client application.
My question is, with this algorithm, when does it become worthwhile to spend the execution time to do the actual zipping of files? Presumably, if only one small text file is being synchronized, the time to zip would not sufficiently reduce the size of the transfer and would actually slow down the entire process.
Since the zip time profile is going to change based on the number of files, the types of files and the size of those files, is there a good way to discover programmatically when I should zip the files and when I should just pass them as is? In our application, files will almost always be photos though the type of photo and size may well change.
I havent written the actual file transfer logic yet, but expect to use System.Net.WebClient to do this, but am open to alternatives to save on execution time as well.
UPDATE: As this discussion develops, is "to zip, or not to zip" the wrong question? Should the focus be on replacing the older System.Net.WebClient method with compressed WCF traffic or something similar? The database synchronization portion of this utility already uses Microsoft Synchronization Framework and WCF, so I am certainly open to that. Anything we can do now to limit network traffic is going to be huge for our clients.

To determine whether it's useful to compress a file, you have to read the file anyway. When on it, you might as well zip it then.
If you want to prevent useless zipping without reading the files, you could try to decide it on beforehand, based on other properties.
You could create an 'algorithm' that decides whether it's useful, for example based on file extention and size. So, a .txt file of more than 1 KB can be zipped, but a .jpg file shouldn't, regardless of the file size. But it's a lot of work to create such a list (you could also create a black- or whitelist and allow c.q. deny all files not on the list).

You probably have plenty of CPU time, so the only issue is: does it shrink?
If you can decrease the file you will save on (Disk and Network) I/O. That becomes profitable very quickly.
Alas, photos (jpeg) are already compressed so you probably won't see much gain.

You can write your own pretty simple heuristic analysis and then reuse it whilst each next file processing. Collected statistics should be saved to keep efficiency between restarts.
Basically interface:
enum FileContentType
{
PlainText,
OfficeDoc,
OffixeXlsx
}
// Name is ugly so find out better
public interface IHeuristicZipAnalyzer
{
bool IsWorthToZip(int fileSizeInBytes, FileContentType contentType);
void AddInfo(FileContentType, fileSizeInBytes, int finalZipSize);
}
Then you can collect statistic by adding information regarding just zipped file using AddInfo(...) and based on it can determine whether it worth to zip a next file by calling IsWorthToZip(...)

save output of a compiled application to memory instead of hard disk

I have an application (EXE file). it is running and while running generate some files (jpeg files) on hard disk. we know read and write to hard disk has poor performance.
Is there any solution to force this application to use memory to save its output jpeg files.
If this solution will be under Windows and use C#, it will be ideal.
Thanks.

The simplest option is probably not a programmatic one - it's just to use a RAM disk such as RAMDisk (there are others available, of course).
That way other processes get to use the results easily, without any messing around.

Since you don't have the source for the EXE and you can't/won't use a RAM disk, the next option is to improve the IO performance of your machine:
Use an SSD or a RAID 0 array, or add loads of memory that can be used as a cache.
But without access to the source code for the application, this isn't really a programming question, because the only way you can 'program' a solution is to write your own RAM disk application - and you can't use a RAM disk, so you've said.

IF you really need to make this solution programmatic then you need to dig deep - depending on the application you will have to hook a lot of functions used by the exe...
That is a really tough thing to do and is prone to problems with several things - permissions/rights, antivirus-protections...
Starting points:
http://www.codeproject.com/KB/winsdk/MonitorWindowsFileSystem.aspx
http://msdn.microsoft.com/en-us/windows/hardware/gg462968.aspx

When to use memory-mapped files?

I have an application that receives chunks of data over the network, and writes these to disk.
Once all chunks have been received, they can be decoded/recombined into the single file they actually represent.
I'm wondering if it's useful to use memory-mapped files or not - first for writing the single chunks to disk, second for the single file into which all of them are decoded.
My own feeling is that it might be useful for the second case only, anyone got some ideas on this?
Edit:
It's a C# app, and I'm only planning an x64 version.
(So running into the 'largest contigious free space' problem shouldn't be relevant)

Memory-mapped files are beneficial for scenarios where a relatively small portion (view) of a considerably larger file needs to be accessed repeatedly.
In this scenario, the operating system can help optimize the overall memory usage and paging behavior of the application by paging in and out only the most recently used portions of the mapped file.
In addition, memory-mapped files can expose interesting features such as copy-on-write or serve as the basis of shared-memory.
For your scenario, memory-mapped files can help you assemble the file if the chunks arrive out of order. However, you would still need to know the final file size in advance.
Also, you should be accessing the files only once, for writing a chunk. Thus, a performance advantage over explicitly implemented asynchronous I/O is unlikely, but it may be easier and quicker to implement your file writer correctly.
In .NET 4, Microsoft added support for memory-mapped files and there are some comprehensive articles with sample code, e.g. http://blogs.msdn.com/salvapatuel/archive/2009/06/08/working-with-memory-mapped-files-in-net-4.aspx.

Memory-mapped files are primarily used for Inter-Process Communication or I/O performance improvement.
In your case, are you trying to get better I/O performance?
Hate to point out the obivious, but Wikipedia gives a good rundown of the situation...
http://en.wikipedia.org/wiki/Memory-mapped_file
Specifically...
The memory mapped approach has its cost in minor page faults - when a block of data is loaded in page cache, but not yet mapped in to the process's virtual memory space. Depending on the circumstances, memory mapped file I/O can actually be substantially slower than standard file I/O.
It sounds like you're about to prematurely optimize for speed. Why not a regular file approach, and then refactor for MM files later if needed?

I'd say both cases are relevant. Simply write the single chunks to their proper place in the memory mapped file, out of order, as they come in. This of course is only useful if you know where each chunk should go, like in a bittorrent downloader. If you have to perform some extra analysis to know where the chunk should go, the benefit of a memory mapped file might not be as large.

Logic in Disk Defragmantation & Disk Check

What is the logic behind disk defragmentation and Disk Check in Windows? Can I do it using C# coding?

For completeness sake, here's a C# API wrapper for defragmentation:
http://blogs.msdn.com/jeffrey_wall/archive/2004/09/13/229137.aspx
Defragmentation with these APIs is (supposed to be) very safe nowadays. You shouldn't be able to corrupt the file system even if you wanted to.
Commercial defragmentation programs use the same APIs.

Look at Defragmenting Files at msdn for possible API helpers.
You should carefully think about using C# for this task, as it may introduce some undesired overhead for marshaling into native Win32.

If you don't know the logic for defragmentation, and if you didn't write the file system yourself so you can't authoritatively check it for errors, why not just start new processes running 'defrag' and 'chkdsk'?

Mark Russinovich wrote an article Inside Windows NT Disk Defragmentation a while ago which gives in-depth details. If you really want to do this I would really advise you to use the built-in facilities for defragmenting. More so, on recent OSes I have never seen a need as a user to even care about defragmenting; it will be done automatically on a schedule and the NTFS folks at MS are definitely smarter at that stuff than you (sorry, but they do this for some time now, you don't).

Despite its importance, the file system is no more than a data structure that maps file names into lists of disk blocks. And keeps track of meta-information such as the actual length of the file and special files that keep lists of files (e.g., directories). A disk checker verifies that the data structure is consistent. That is, every disk block must either be free for allocation to a file or belong to a single file. It can also check for certain cases where a set of disk blocks appears to be a file that should be in a directory but is not for some reason.
Defragmentation is about looking at the lists of disk blocks assigned to each file. Files will generally load faster if they use a contiguous set of blocks rather than ones scattered all over the disk. And generally the entire file system will perform best if all the disk blocks in use confine themselves to a single congtiguous range of the disk. Thus the trick is moving disk blocks around safely to achieve this end while not destroying the file system.
The major difficulty here is running these application while a disk is in use. It is possible but one has to be very, very, very careful not to make some kind of obvious or extremely subtle error and destroy most or all of the files. It is easier to work on a file system offline.
The other difficulty is dealing with the complexities of the file system. For example, you'd be much better off building something that supports FAT32 rather than NTFS because the former is a much, much simpler file system.
As long as you have low-level block access and some sensible way for dealing with concurrency problems (best handled by working on the file system when it is not in use) you can do this in C#, perl or any language you like.
BUT BE VERY CAREFUL. Early versions of the program will destroy entire file systems. Later versions will do so but only under obscure circumstances. And users get extremely angry and litigious if you destroy their data.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.