Caching FileInfo properties in C#

Caching FileInfo properties in C# - c#

From the MSDN documentation for the FileInfo.Name property, I see that the data for the property is cached the first time it is called and will only be updated subsequently by using the Refresh method.
I've the following questions which I can't find or aren't too clear in the documentation:
Is the data for all properties cached at the same time?
Is the Refresh method called on creation of the FileInfo, or only when a property is called for the first time?
If I've called one property, e.g. the Name property, and it's called Refresh, will calling a different property, e.g. the DirectoryName property, for the first time cause it to call Refresh again, or is it only called by the first property accessed in the entire class (see question #1)?
Can I pre-cache all the properties by calling Refresh manually? (Assuming it's not pre-cached on construction of the object)
Does calling Refresh manually cause properties which are pre-cached, e.g. CreationTime, to be refreshed as well?

At a guess, yes. It seems like a bit of a self-defeating "optimisation" for FileInfo to fetch only the properties you've fetched before, especially when they can be (and probably are) all fetched in one API call.
The fact that the documentation calls out DirectoryInfo methods which serve up already-cached FileInfos suggests quite strongly (to me, anyway) that simply constructing a FileInfo doesn't cache anything. It makes sense - if you construct a FileInfo directly, it might refer to a file that doesn't exist yet (you plan to create it, for instance), whereas all the methods which return cached FileInfos refer to files that exist at the time of the snapshot, under the assumption you're going to use at least some of them.
No, by my answer to question 1. That's why the Refresh method is there.
I would imagine so (see answer 1).
Yes. See answer 3.

The value of the CreationTime property is pre-cached if the current
instance of the FileSystemInfo object was returned from any of the
following DirectoryInfo methods:
GetDirectories
GetFiles
GetFileSystemInfos
EnumerateDirectories
EnumerateFiles
EnumerateFileSystemInfos
To get the latest value, call the Refresh method.
If the file described in the FileSystemInfo object does not exist,
this property will return 12:00 midnight, January 1, 1601 A.D. (C.E.)
Coordinated Universal Time (UTC), adjusted to local time.
NTFS-formatted drives may cache file meta-info, such as file creation
time, for a short period of time. This process is known as file
tunneling. As a result, it may be necessary to explicitly set the
creation time of a file if you are overwriting or replacing an
existing file.
(MSDN)
Internally, Refresh calls the standard Win32API and thus fills all properties.
[...]
flag2 = Win32Native.GetFileAttributesEx(path, 0, ref data);
Accessing any property that is specified to Refresh causes a full refresh, for instance:
public DateTime LastAccessTimeUtc
{
[SecuritySafeCritical]
get
{
if (this._dataInitialised == -1)
{
this._data = default(Win32Native.WIN32_FILE_ATTRIBUTE_DATA);
this.Refresh();
}
[...]

Related

Understanding lazy loading optimization in C#

After reading a bit of how yield, foreach, linq deferred execution and iterators work in C#. I decided to give it a try optimizing an attribute based validation mechanic inside a small project. The result:
private IEnumerable<string> GetPropertyErrors(PropertyInfo property)
{
// where Entity is the current object instance
string propertyValue = property.GetValue(Entity)?.ToString();
foreach (var attribute in property.GetCustomAttributes().OfType<ValidationAttribute>())
{
if (!attribute.IsValid(propertyValue))
{
yield return $"Error: {property.Name} {attribute.ErrorMessage}";
}
}
}
// inside another method
foreach(string error in GetPropertyErrors(property))
{
// Some display/insert log operation
}
I find this slow but that also could be due to reflection or a large amount of properties to process.
So my question is... Is this optimal or a good use of the lazy loading mechanic? or I'm missing something and just wasting tons of resources.
NOTE: The code intention itself is not important, my concern is the use of lazy loading in it.

Lazy loading is not something specific to C# or to Entity Framework. It's a common pattern, which allows defer some data loading. Deferring means not loading immediately. Some samples when you need that:
Loading images in (Word) document. Document may be big and it can contain thousands of images. If you'll load all them when document is opened it might take big amount of time. Nobody wants sit and watch 30 seconds on loading document. Same approach is used in web browsers - resources are not sent with body of page. Browser defers resources loading.
Loading graphs of objects. It may be objects from database, file system objects etc. Loading full graph might be equal to loading all database content into memory. How long it will take? Is it efficient? No. If you are building some file system explorer will you load info about every file in system before you start using it? It's much faster if you will load info about current directory only (and probably it's direct children).
Lazy loading not always mean deferring loading until you really need data. Loading might occur in background thread before you really need that data. E.g. you might never scroll to the bottom of web page to see footer image. Lazy loading means only deferring. And C# enumerators can help you with that. Consider getting list of files in directory:
string[] files = Directory.GetFiles("D:");
IEnumerable<string> filesEnumerator = Directory.EnumerateFiles("D:");
First approach returns array of files. It means directory should get all its files and save their names to array before you can get even first file name. It's like loading all images before you see document.
Second approach uses enumerator - it returns files one by one when you ask for next file name. It means that enumerator is returned immediately without getting all files and saving them to some collection. And you can process files one by one when you need that. Here getting files list is deferred.
But you should be careful. If underlying operation is not deferred, then returning enumerator gives you no benefits. E.g.
public IEnumerable<string> EnumerateFiles(string path)
{
foreach(string file in Directory.GetFiles(path))
yield return file;
}
Here you use GetFiles method which fills array of file names before returning them. So yielding files one by one gives you no speed benefits.
Btw in your case you have exactly same problem - GetCustomAttributes extension internally uses Attribute.GetCustomAttributes method which returns array of attributes. So you will not reduce time of getting first result.

This isn't quite how the term "lazy loading" is generally used in .NET. "Lazy loading" is most often used of something like:
public SomeType SomeValue
{
get
{
if (_backingField == null)
_backingField = RelativelyLengthyCalculationOrRetrieval();
return _backingField;
}
}
As opposed to just having _backingField set when an instance was constructed. Its advantage is that it costs nothing in the cases when SomeValue is never accessed, at the expense of a slightly greater cost when it is. It's therefore advantageous when the chances of SomeValue not being called are relatively high, and generally disadvantageous otherwise with some exceptions (when we might care about how quickly things are done in between instance creation and the first call to SomeValue).
Here we have deferred execution. It's similar, but not quite the same. When you call GetPropertyErrors(property) rather than receiving a collection of all of the errors you receive an object that can find those errors when asked for them.
It will always save the time taken to get the first such item, because it allows you to act upon it immediately rather than waiting until it has finished processing.
It will always reduce memory use, because it isn't spending memory on a collection.
It will also save time in total, because no time is spent creating a collection.
However, if you need to access it more than once, then while a collection will still have the same results, it will have to calculate them all again (unlike lazy loading which loads its results and stores them for subsequent reuse).
If you're rarely going to want to hit the same set of results, it's generally always a win.
If you're almost always going to want to hit the same set of results, it's generally a lose.
If you are sometimes going to want to hit the same set of results though, you can pass the decision on whether to cache or not up to the caller, with a single use calling GetPropertyErrors() and acting on the results directly, but a repeated use calling ToList() on that and then acting repeatedly on that list.
As such, the approach of not sending a list is the more flexible, allowing the calling code to decide which approach is the more efficient for its particular use of it.
You could also combine it with lazy loading:
private IEnumerable<string> LazyLoadedEnumerator()
{
if (_store == null)
return StoringCalculatingEnumerator();
return _store;
}
private IEnumerable<string> StoringCalculatingEnumerator()
{
List<string> store = new List<string>();
foreach(string str in SomethingThatCalculatesTheseStrings())
{
yield return str;
store.Add(str);
}
_store = store;
}
This combination is rarely useful in practice though.
As a rule, start with deferred evaluation as the normal approach and decide further up the call chain whether to store the results or not. An exception though is if you can know the size of the results before you begin (you can't here because you don't know if an element will be added or not until you've examined the property). In this case there is the possibility of a performance improvement in just how you create that list, because you can set its capacity ahead of time. This though is a micro-optimisation that is only applicable if you also know that you'll also always want to work on a list and doesn't save that much in the grand scheme of things.

Read data once and use later

I read values in from a file on startup of my application.
I'd like to use those values during a condition in a timer every xx seconds later in my program's execution.
I don't want to read the file again. How do I go about referencing the values initially read in?
The timer is in a completely different project/class to initial reading of the file.

Assign them somewhere!
If you're reading from the file and creating the timer condition in the same place, you could even use a local variable to store the values.
If they need to be accessed later but you don't want to recreate them, you could store them in a field in the class where this is happening.
If these values will be used elsewhere in your application, but will remain relevant as long as this class' type is around, you could store them in a static field or property.
If you want them to be loaded on-demand and then saved for subsequent access, you could use a Lazy<T> type to store them.

If you are needing to reference them from another class and hold them in memory - create a public static property somewhere to which you can assign the data.
public static MyDataType Data{ get;set;}
... where MyDataType is an object that holds your data. You can then test for Null in your timer method to make sure this has happened before continuing.

Webservice Method generic list fill and keep for 1 day

I have a webmethod inside a webservice that calls another webservice to get data and fills a generic list then it returns it, what i want to do is to save the list in memory, so the next time the webmethod is invoked it does not hit the other webservice but just returns the list, i have tried but when i invoke the web method for the second time the list count shows as 0, looks like garbage collection is cleaning all. any suggestions ?

Store it in the ASP.NET cache. Setting an absolute expiration of midnight should assure that you only get it once per day (unless it gets tossed from the cache due to space issues).
[Web Method]
public List<Foo> GetFoos()
{
var foos = Cache["FooList"] as List<Foo>;
if (foos == null)
{
... get foos from remote web service ...
var expiration = DateTime.Today.AddHours(7);
if (DateTime.Now >= expiration)
{
expiration = expiration.AddDays(1);
}
Cache.Insert( "FooList", foos, null, expiration, Cache.NoSlidingExpiration );
}
return foos;
}
Note: you could also use output caching as well, but you're limited to a sliding expiration. That is, it will be cached for a duration based on when the request occurs. It's not clear that's what you want. For example, what if the first request occurs at 11pm with a 24 hour duration, you wouldn't check again until 11pm the next day. If you have data changing on a daily basis, you're better off using the ASP.NET cache in conjunction with output caching on a shorter duration to ensure that you get the latest, daily data in a timely fashion.
Updated example based on comments.

It sounds to me like your list might either not be static, or it might constantly be new'd within a non-static constructor. There are three possible fixes for this:
Make sure that your generic list is a static property which only get initialised within a static constructor.
Seeing your time requirements I would also suggest potentially looking into MemoryCache or Cache.
Use the WebMethod attribute and set a CacheDuration (i.e: [WebMethod(CacheDuration=86400)])

I have not tried this on a webservice, but I think output cashing would work.
[WebMethod(CacheDuration=86400)]
public string FunctionName(string Name)
{
...code...
return(sb.ToString());
}
Read: How to perform output caching with Web services in Visual C# .NET

C# Collection whose items expire

I am writing a Console Application in C# in which I want to cache certain items for a predefined time (let's say 1 hour). I want items that have been added into this cache to be automatically removed after they expire. Is there a built-in data structure that I can use? Remember this is a Console App not a web app.

Do you actually need them removed from the cache at that time? Or just that future requests to the cache for that item should return null after a given time?
To do the former, you would need some sort of background thread that was periodically purging the cache. This would only be needed if you were worried about memory consumption or something. If you just want the data to expire, that would be easy to do.
It is trivial to create such a class.
class CachedObject<TValue>
{
DateTime Date{get;set;}
TimeSpan Duration{get;set;}
TValue Cached{get;set;}
}
class Cache : Dictionary<TKey,TValue>
{
public new TValue this(TKey key)
{
get{
if (ContainsKey(key))
{
var val = base.this[key];
//compare dates
//if expired, remove from cache, return null
//else return the cached item.
}
}
set{//create new CachedObject, set date and timespan, set value, add to dictionary}
}

Its already in the BCL. Its just not where you expect to find it: You can use System.Web.Caching from other kinds of applications too, not only in ASP.NET.
This search on google links to several resources about this.

I don't know of any objects in the BCL which do this, but I have written similar things before.
You can do this fairly easily by just including a System.Threading.Timer inside of your caching class (no web/winforms dependencies), and storing an expiration (or last used) time on your objects. Just have the timer check every few minutes, and remove the objects you want to expire.
However, be watchful of events on your objects. I had a system like this, and was not being very careful to unsubscribe from events on my objects in the cache, which was preventing a subtle, but nasty memeory leak over time. This can be very tricky to debug.

Include an ExpirationDate property in the object that you will be caching (probably a wrapper around your real object) and set it to expire in an hour in its constructor. Instead of removing items from the collection, access the collection through a method that filters out the expired items. Or create a custom collection that does this automatically. If you need to actually remove items from the cache, your custom collection could instead purge expired items on every call to one of its members.

Overhead of File/Directory.Exists in getter?

I have a class that has several properties that refer to file/directory locations on the local disk. These values can be dynamic and i want to ensure that anytime they are accessed, i verify that it exists first without having to include this code in every method that uses the values.
My question is, does putting this in the getter incur a performance penalty? It would not be called thousands of times in a loop, so that is not a consideration. Just want to make sure i am not doing something that would cause unnecessary bottle necks.
I know that generally it is not wise to optimize too early, but i would rather have this error checking in place now before i have to go back and remove it from the getter and add it all over the place.
Clarification:
The files/directories being pointed to by the properties are going to be used by System.Diagnostics.Process. i won't be reading/writing to these files/directories directly, i just want to make sure they exist before i spawn a child process.

Anything that's not a simple lookup or computation should go in a method, not a property. Properties should be conceptually similar to just accessing a field - if there is any additional overhead or chance of failure (and IO - even just checking a file exists - would fail that test on both counts) then properties are not the right choice.
Remember that properties even get called by the debugger when looking at object state.
Your question about what the overhead actually is, and optimising early, becomes irrelevant when looked at from this perspective. Hope this helps.

If you're that worried about performance (and you're right when you say that it's not a good idea to optimize too early), there are ways to mitigate this. If you consider that the expensive operation is the File I/O and you have lots of these going on, you could always look at using something like a Dictionary in your class. Consider this (fairly contrived) sample code:
private Dictionary<string, bool> _directories = new Dictionary<string, bool>();
private void CheckDirectory(string directory, bool create)
{
if (_directories.ContainsKey(_directories))
{
bool exists = Directory.Exists(directory);
if (create && !exists)
{
Directory.CreateDirectory(directory);
}
// Add the directory to the dictionary. The value depends on
// whether the directory previously existed or the method has been told
// to create it.
_directories.Add(directory, create || exists);
}
}
It's a simple matter later on to add those directories that don't exist by iterating over this list.

It is feasible for the path to exist at the point it is check but be moved/deleted in between that and the operation on it.
you may already know this and accept the risk but just so you are aware of it.
If you are going to do it anyway it doesn't matter whether it's in a property or not, just what granularity of checking you do (once per operation or once per group of operations)
If you use the non static FileInfo operations be aware that this object will cache its view on the file system.
This could be a good thing for you as you can control how often the cache is refreshed via the Refresh() method or it may lead to possible bugs in your code.
The usual try it first before worrying about performance recommendation applies but you indicate you are aware of this.

If you are reusing an object you should consider using the FileInfo class vs the static File class. The static methods of the File class does a possible unnecessary security check each time.
FileInfo - DirectoryInfo - File - Directory
EDIT:
My answer would still apply. To make sure your file exists you would do something like so in your getter:
if(File.Exists(string))
//do stuff
else
//file doesn't exist
OR
FileInfo fi = new FileInfo(fName);
if (fi.Exists)
//do stuff
else
//file doesn't exist
Correct?
What I am saying is that if your are looping through this logic thousands of time then use the FileInfo instance VS the static File class because you will get a negative performance impact if you use the static File.Exits method.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.