I am writing a program which will manipulate certain objects, let's call them "Words". Each word is an instance of one of several classes. There can be lots and lots of these at a time.
Each of these Word objects needs access to a dictionary, stored in an XML file. I don't feel like each Word object should be loading the XML file separately. The Word objects should be able to access some global pool of program data.
What's the best way to deal with this? Should I have a class called ProgramData which contains the XML document and gets passed to every Word object when they are created? This won't cause multiple instances of the XML file to be loaded into memory, will it? Can I do what I want to do without passing ProgramData to every new object?
You should load the XML file into one instance of ProgramData, and then pass that instance into each Word instance (maybe in the constructor). You could also make a static property in the Word class that you set before you start instantiating Words, but be sure to use locking for thread safety.
Your alternative is the Singleton pattern, but trust me, you don't want to go down that road.
Edit: Just to be clearer, this is the first option (the one I'd use):
public class Word
{
private ProgramData _Data
public Word(ProgramData data)
{
_Data = data;
}
public void MethodThatUsesData
{
// _Data.TryGetValue()
}
}
// in your main method or initialization routine:
ProgramData data = MethodThatLoadsData();
Word w = new Word(data);
Your ProgramData class can have a static variable for the XmlDocument and thus you won't have to pass the variable through a constructor; your other class can just reference ProgramData.YourVariable.
To answer your other question, no, this approach will not cause multiple instances of the XML file to be loaded into memory, and nor will drharris's approach.
Also, keep in mind that this information applies to all object oriented programming languages (at least those that I'm aware of), not just C#. If you load the XML document into memory once in a method called myMethod(), and myMethod() is only called once... guess what, that XML document will only get loaded into memory once. Only things that you coded to happen, or that another dev coded (and you called either directly or indirectly) will happen. There is no magic.
Related
Sometimes, when I save to XML, I end up with a completely empty XML file.
I can't reproduce the issue on demand yet. It is just occasional. Are there steps that one can take to assist the user in this regard?
At the moment I do this:
public bool SavePublisherData()
{
bool bSaved = false;
try
{
XmlSerializer x = new XmlSerializer(_PublisherData.GetType());
using (StreamWriter writer = new StreamWriter(_strPathXML))
{
_PublisherData.BuildPublisherListFromDictionary();
x.Serialize(writer, _PublisherData);
bSaved = true;
}
}
catch
{
}
return bSaved;
}
The reason I have not put anything in the catch block is because this code is part of a C# DLL and I am calling it from an MFC project. I have read that you can't (or shouldn't) pass exceptions through from one environment to another. Thus, when an exception happens in my DLL I don't really know how I can sensibly feed that information to the user so they can see it. That is a side issue.
But this is how I save it. So, what steps can one take to try and prevent complete data loss?
Thank you.
Update
I have looked at the KB article that the link in the comments refers to and it states:
Use the following XmlSerializer class constructors. These class constructors cache the assemblies.
This is also re-stated in the article itself indicated in the comments:
What is the solution?
The default constructors XmlSerializer(type) and XmlSerializer(type, defaultNameSpace) caches the dynamic assembly so if you use those constructors only one copy of the dynamic assembly needs to be created.
Seems pretty smart… why not do this in all constructors? Hmm… interesting idea, wonder why they didn’t think of that one:) Ok, the other constructors are used for special cases, and the assumption would be that you wouldn’t create a ton of the same XmlSerializers using those special cases, which would mean that we would cache a lot of items we later didn’t need and use up a lot of extra space. Sometimes you have to do what is good for the majority of the people.
So what do you do if you need to use one of the other constructors? My suggestion would be to cache the XmlSerializer if you need to use it often. Then it would only be created once.
My code uses one of these default constructors as you can see:
XmlSerializer(_PublisherData.GetType());
So I don't think I need to worry about this XmlSerializerFactory in this instance.
I read values in from a file on startup of my application.
I'd like to use those values during a condition in a timer every xx seconds later in my program's execution.
I don't want to read the file again. How do I go about referencing the values initially read in?
The timer is in a completely different project/class to initial reading of the file.
Assign them somewhere!
If you're reading from the file and creating the timer condition in the same place, you could even use a local variable to store the values.
If they need to be accessed later but you don't want to recreate them, you could store them in a field in the class where this is happening.
If these values will be used elsewhere in your application, but will remain relevant as long as this class' type is around, you could store them in a static field or property.
If you want them to be loaded on-demand and then saved for subsequent access, you could use a Lazy<T> type to store them.
If you are needing to reference them from another class and hold them in memory - create a public static property somewhere to which you can assign the data.
public static MyDataType Data{ get;set;}
... where MyDataType is an object that holds your data. You can then test for Null in your timer method to make sure this has happened before continuing.
I have an abstract class which runs a fairly computationally intensive series of static functions inside several nested loops.
In a small number of these loops, I need to obtain a list of dates which are stored in a comma-separated string in a .settings file. I then parse them into DateTimes and use them.
The issue is, I'm re-parsing these strings over and over again, and this is using up quite a bit of CPU time (obviously). Profiling shows that 20% of the core algorithm is wasted on these operations. If I could somehow cache these in a place accessible by the static functions then it would save me a lot of processing time.
The simplest option would be to parse the list of DateTimes at the very start of computation, and then pass that list to each of the sub-functions. This would certainly cut down on CPU work, but it would mean that the sub-functions would need to accept this list when called outside the core algorithm. It doesn't make intuitive sense why a list of DateTimes would be needed when calling one of the parent static functions.
Another thing to fix it would be to make the class not abstract, and the functions non-static, and store the list of dates, etc, in variables for each of the functions to access. The reason I wanted to have it abstract with static functions is because I didn't want to have to instantiate the class every time I wanted to manually call one of the sub-functions.
Ideally, what I would like to do is to parse the list once and store it somewhere in memory. Then, when I do a subsequent iteration, I can somehow check to see if it's not null, then I can use it. If it's null (probably because I'm in the first iteration), then I know I need to parse it.
I was thinking I could have a .settings file which has the list in it. I would never save the settings file to disk, but it would basically allow for storage between static calls.
I know this is all very messy - I'm just trying to avoid re-writing a thousand lines of static code if feasible.
If you all think it's a terrible idea then I will raise my white flag and re-write it all.
If the dates are read-only then it's pretty straightforward - declare a static property on a class which loads the values if they don't exist and stores them in a static variable - something like this:
public class DateList
{
private static List<DateTime> mydates = null; // new List<DateTime>(); haha, oops
public static List<DateTime> Current {
get {
if(mydates == null)
{
lock(typeof(DateList)) {
if(mydates == null) {
mydates = LoadDates();
}
}
}
return mydates;
}
}
// thanks to Porges - if you're using .NET 4 then this is cleaner and achieves the same result:
private static Lazy<List<DateTime>> mydates2 = new Lazy<List<DateTime>>(() => LoadDates(), true);
public static List<DateTime> Current2
{
return mydates2.Value;
}
}
this example would then be accessed using:
var dates = DateList.Current
Be careful if the dates are not read-only - then you'll have to consider things in more detail.
Another thing to fix it would be to make the class not abstract, and the functions non-static, and store the list of dates, etc, in variables for each of the functions to access. The reason I wanted to have it abstract with static functions is because I didn't want to have to instantiate the class every time I wanted to manually call one of the sub-functions.
Do this. Classes exist in order to encapsulate state. If you store the cache somewhere static, you'll only make trouble for yourself if/when you want to add parallelism, or refactor code.
I'm not sure what you mean by the second part ("manually call"). Do you mean while testing?
I'm making a jquery clone for C#. Right now I've got it set up so that every method is an extension method on IEnumerable<HtmlNode> so it works well with existing projects that are already using HtmlAgilityPack. I thought I could get away without preserving state... however, then I noticed jQuery has two methods .andSelf and .end which "pop" the most recently matched elements off an internal stack. I can mimic this functionality if I change my class so that it always operates on SharpQuery objects instead of enumerables, but there's still a problem.
With JavaScript, you're given the Html document automatically, but when working in C# you have to explicitly load it, and you could use more than one document if you wanted. It appears that when you call $('xxx') you're essentially creating a new jQuery object and starting fresh with an empty stack. In C#, you wouldn't want to do that, because you don't want to reload/refetch the document from the web. So instead, you load it once either into a SharpQuery object, or into an list of HtmlNodes (you just need the DocumentNode to get started).
In the jQuery docs, they give this example
$('ul.first').find('.foo')
.css('background-color', 'red')
.end().find('.bar')
.css('background-color', 'green')
.end();
I don't have an initializer method because I can't overload the () operator, so you just start with sq.Find() instead, which operates on the root of the document, essentially doing the same thing. But then people are going to try and write sq.Find() on one line, and then sq.Find() somewhere down the road, and (rightfully) expect it to operate on the root of the document again... but if I'm maintaining state, then you've just modified the context after the first call.
So... how should I design my API? Do I add another Init method that all queries should begin with that resets the stack (but then how do I force them to start with that?), or add a Reset() that they have to call at the end of their line? Do I overload the [] instead and tell them to start with that? Do I say "forget it, no one uses those state-preserved functions anyway?"
Basically, how would you like that jQuery example to be written in C#?
sq["ul.first"].Find(".foo") ...
Downfalls: Abuses the [] property.
sq.Init("ul.first").Find(".foo") ...
Downfalls: Nothing really forces the programmer to start with Init, unless I add some weird "initialized" mechanism; user might try starting with .Find and not get the result he was expecting. Also, Init and Find are pretty much identical anyway, except the former resets the stack too.
sq.Find("ul.first").Find(".foo") ... .ClearStack()
Downfalls: programmer may forget to clear the stack.
Can't do it.
end() not implemented.
Use two different objects.
Perhaps use HtmlDocument as the base that all queries should begin with, and then every method thereafter returns a SharpQuery object that can be chained. That way the HtmlDocument always maintains the initial state, but the SharpQuery objects may have different states. This unfortunately means I have to implement a bunch of stuff twice (once for HtmlDocument, once for the SharpQuery object).
new SharpQuery(sq).Find("ul.first").Find(".foo") ...
The constructor copies a reference to the document, but resets the stack.
I think the major stumbling block you're running into here is that you're trying to get away with just having one SharpQuery object for each document. That's not how jQuery works; in general, jQuery objects are immutable. When you call a method that changes the set of elements (like find or end or add), it doesn't alter the existing object, but returns a new one:
var theBody = $('body');
// $('body')[0] is the <body>
theBody.find('div').text('This is a div');
// $('body')[0] is still the <body>
(see the documentation of end for more info)
SharpQuery should operate the same way. Once you create a SharpQuery object with a document, method calls should return new SharpQuery objects, referencing a different set of elements of the same document. For instance:
var sq = SharpQuery.Load(new Uri("http://api.jquery.com/category/selectors/"));
var header = sq.Find("h1"); // doesn't change sq
var allTheLinks = sq.Find(".title-link") // all .title-link in the whole document; also doesn't change sq
var someOfTheLinks = header.Find(".title-link"); // just the .title-link in the <h1>; again, doesn't change sq or header
The benefits of this approach are several. Because sq, header, allTheLinks, etc. are all the same class, you only have one implementation of each method. Yet each of these objects references the same document, so you don't have multiple copies of each node, and changes to the nodes are reflected in every SharpQuery object on that document (e.g. after allTheLinks.text("foo"), someOfTheLinks.text() == "foo".).
Implementing end and the other stack-based manipulations also becomes easy. As each method creates a new, filtered SharpQuery object from another, it retains a reference to that parent object (allTheLinks to header, header to sq). Then end is as simple as returning a new SharpQuery containing the same elements as the parent, like:
public SharpQuery end()
{
return new SharpQuery(this.parent.GetAllElements());
}
(or however your syntax shakes out.)
I think this approach will get you the most jQuery-like behavior, with a fairly easy implementation. I'll definitely be keeping an eye on this project; it's a great idea.
I would lean towards a variant on option 2. In jQuery $() is a function call. C# doesn't have global functions, a static function call is the closest. I would use a method that indicates you're creating a wrapper like..
SharpQuery.Create("ul.first").Find(".foo")
I wouldn't be concerned about shortening SharpQuery to sq since intellisense means users won't have to type the whole thing (and if they have resharper they only need to type SQ anyways).
I have a class that has several properties that refer to file/directory locations on the local disk. These values can be dynamic and i want to ensure that anytime they are accessed, i verify that it exists first without having to include this code in every method that uses the values.
My question is, does putting this in the getter incur a performance penalty? It would not be called thousands of times in a loop, so that is not a consideration. Just want to make sure i am not doing something that would cause unnecessary bottle necks.
I know that generally it is not wise to optimize too early, but i would rather have this error checking in place now before i have to go back and remove it from the getter and add it all over the place.
Clarification:
The files/directories being pointed to by the properties are going to be used by System.Diagnostics.Process. i won't be reading/writing to these files/directories directly, i just want to make sure they exist before i spawn a child process.
Anything that's not a simple lookup or computation should go in a method, not a property. Properties should be conceptually similar to just accessing a field - if there is any additional overhead or chance of failure (and IO - even just checking a file exists - would fail that test on both counts) then properties are not the right choice.
Remember that properties even get called by the debugger when looking at object state.
Your question about what the overhead actually is, and optimising early, becomes irrelevant when looked at from this perspective. Hope this helps.
If you're that worried about performance (and you're right when you say that it's not a good idea to optimize too early), there are ways to mitigate this. If you consider that the expensive operation is the File I/O and you have lots of these going on, you could always look at using something like a Dictionary in your class. Consider this (fairly contrived) sample code:
private Dictionary<string, bool> _directories = new Dictionary<string, bool>();
private void CheckDirectory(string directory, bool create)
{
if (_directories.ContainsKey(_directories))
{
bool exists = Directory.Exists(directory);
if (create && !exists)
{
Directory.CreateDirectory(directory);
}
// Add the directory to the dictionary. The value depends on
// whether the directory previously existed or the method has been told
// to create it.
_directories.Add(directory, create || exists);
}
}
It's a simple matter later on to add those directories that don't exist by iterating over this list.
It is feasible for the path to exist at the point it is check but be moved/deleted in between that and the operation on it.
you may already know this and accept the risk but just so you are aware of it.
If you are going to do it anyway it doesn't matter whether it's in a property or not, just what granularity of checking you do (once per operation or once per group of operations)
If you use the non static FileInfo operations be aware that this object will cache its view on the file system.
This could be a good thing for you as you can control how often the cache is refreshed via the Refresh() method or it may lead to possible bugs in your code.
The usual try it first before worrying about performance recommendation applies but you indicate you are aware of this.
If you are reusing an object you should consider using the FileInfo class vs the static File class. The static methods of the File class does a possible unnecessary security check each time.
FileInfo - DirectoryInfo - File - Directory
EDIT:
My answer would still apply. To make sure your file exists you would do something like so in your getter:
if(File.Exists(string))
//do stuff
else
//file doesn't exist
OR
FileInfo fi = new FileInfo(fName);
if (fi.Exists)
//do stuff
else
//file doesn't exist
Correct?
What I am saying is that if your are looping through this logic thousands of time then use the FileInfo instance VS the static File class because you will get a negative performance impact if you use the static File.Exits method.