I have an abstract class which runs a fairly computationally intensive series of static functions inside several nested loops.
In a small number of these loops, I need to obtain a list of dates which are stored in a comma-separated string in a .settings file. I then parse them into DateTimes and use them.
The issue is, I'm re-parsing these strings over and over again, and this is using up quite a bit of CPU time (obviously). Profiling shows that 20% of the core algorithm is wasted on these operations. If I could somehow cache these in a place accessible by the static functions then it would save me a lot of processing time.
The simplest option would be to parse the list of DateTimes at the very start of computation, and then pass that list to each of the sub-functions. This would certainly cut down on CPU work, but it would mean that the sub-functions would need to accept this list when called outside the core algorithm. It doesn't make intuitive sense why a list of DateTimes would be needed when calling one of the parent static functions.
Another thing to fix it would be to make the class not abstract, and the functions non-static, and store the list of dates, etc, in variables for each of the functions to access. The reason I wanted to have it abstract with static functions is because I didn't want to have to instantiate the class every time I wanted to manually call one of the sub-functions.
Ideally, what I would like to do is to parse the list once and store it somewhere in memory. Then, when I do a subsequent iteration, I can somehow check to see if it's not null, then I can use it. If it's null (probably because I'm in the first iteration), then I know I need to parse it.
I was thinking I could have a .settings file which has the list in it. I would never save the settings file to disk, but it would basically allow for storage between static calls.
I know this is all very messy - I'm just trying to avoid re-writing a thousand lines of static code if feasible.
If you all think it's a terrible idea then I will raise my white flag and re-write it all.
If the dates are read-only then it's pretty straightforward - declare a static property on a class which loads the values if they don't exist and stores them in a static variable - something like this:
public class DateList
{
private static List<DateTime> mydates = null; // new List<DateTime>(); haha, oops
public static List<DateTime> Current {
get {
if(mydates == null)
{
lock(typeof(DateList)) {
if(mydates == null) {
mydates = LoadDates();
}
}
}
return mydates;
}
}
// thanks to Porges - if you're using .NET 4 then this is cleaner and achieves the same result:
private static Lazy<List<DateTime>> mydates2 = new Lazy<List<DateTime>>(() => LoadDates(), true);
public static List<DateTime> Current2
{
return mydates2.Value;
}
}
this example would then be accessed using:
var dates = DateList.Current
Be careful if the dates are not read-only - then you'll have to consider things in more detail.
Another thing to fix it would be to make the class not abstract, and the functions non-static, and store the list of dates, etc, in variables for each of the functions to access. The reason I wanted to have it abstract with static functions is because I didn't want to have to instantiate the class every time I wanted to manually call one of the sub-functions.
Do this. Classes exist in order to encapsulate state. If you store the cache somewhere static, you'll only make trouble for yourself if/when you want to add parallelism, or refactor code.
I'm not sure what you mean by the second part ("manually call"). Do you mean while testing?
Related
I have a static class with properties to store user's inputs:
public static class UserData
{
public static double UserInput1 { get; set; }
}
And I have nested methods that need the user's inputs
public static double Foo()
{
[...]
var input1 = UserData.UserInput1;
var bar = Bar();
[...]
}
private static double Bar()
{
var input1 = UserData.UserInput1;
[...]
}
The positive thing is that I do not have to pass all user inputs to Foo(), then to Bar() (and to further nested methods within Bar()).
The negative thing is that I have to get UserData.UserInput1 and other user inputs very often. I could change the code to get the user inputs only once:
public static double Foo()
{
[...]
var input1 = UserData.UserInput1;
var bar = Bar(input1);
[...]
}
private static double Bar(double input1)
{
[...]
}
Which one is faster?
The second one is the faster than the first one. Because you avoid to obtain the static property from UserData.
It's not a big goal works with static when we talk about performance cost due to the need to perform a lookup in the symbol table and track shared memory. By passing input values as parameters, this is avoided and slightly better performance is achieved.
But both options are ok. It's more important to focus on code readability and maintainability rather than performance unless you are working on a critical performance issue.
Which one is faster?
Using static mutable state in this way will be way slower in the long run. Because you will spend a bunch of time trying to find and fix bugs. This time could be better spent doing things that will actually help performance, like profiling and optimizing code.
Try to make method that compute anything take the required input as parameters. Try to make input fields properties of the associated UI class. This should help keep the code simple and understandable.
Accessing a static property will be translated to a indirect memory access. Passing a parameter to a method might be free if the parameter is already in a register, or might involve a bit more work if it needs to be loaded, moved or passed on the stack. But we are talking about single digit cycles here, optimization on this level should only be done in super tight loops that are run many millions of times each second, and then you should typically ensure that all methods can be inlined, side stepping the problem.
If you're worried about such micro-optimizations (which you generally wouldn't need), consider using inlining.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
https://learn.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.methodimploptions?view=net-7.0
PS: using one over the other, or using AggressiveInlining, will not save you anywhere close to the 1ms you are hoping for, under non-extreme/farfetched scenarios.
I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.
I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.
I have basically the following, edited for brevity
static void Main(string[] args)
{
MyProcess process = new MyProcess();
process.DoThings();
}
And then in the actual process to do stuff I have
public class MyProcess
{
public void DoThings()
{
//Get some list of things
List<Thing> things = getThings();
Parallel.Foreach(things, item => {
//based on some criteria, take actions from MyActionClass
MyActionClass myAct = new MyActionClass(item);
string tempstring = myAct.DoOneThing();
if(somecondition)
{
MyAct.DoOtherThing();
}
...other similar calls to myAct below here
};
}
}
And over in the MyActionClass I have something like the following:
public class MyActionClass
{
private Thing _thing;
public MyActionClass(Thing item)
{
_thing = item;
}
public string DoOneThing()
{
return _thing.GetSubThings().FirstOrDefault();
}
public void DoOtherThing()
{
_thing.property1 = "Somenewvalue";
}
}
If I can explain this any better I'll try, but I think that's the basics of my needs
EDIT:
Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?
There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.
But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.
Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.
So, perhaps you should try to go fully immutable near threading code?
Perhaps.
Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.
It should at least be considered as a possible option/alternative.
About the EDIT
If I change the value of a property of the item I'm working with while
inside the Parallel.Foreach (in this case, a string value that gets
written to a database inside the loop), will that have any affect on
the rest of the loop iterations or just the one I'm on?
If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.
Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.
And if they want to be persisted by external code, such objects should probably provide:
Either SyncRoot property to synchronize property reading code.
Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
Or something more exotic.
when they say static classes should not have state/side effects does that mean:
static void F(Human h)
{
h.Name = "asd";
}
is violating it?
Edit:
i have a private variable now called p which is an integer. It's never read at all throughout the entire program, so it can't affect any program flow.
is this violating "no side effects"?:
int p;
static void F(Human h)
{
p=123;
h.Name = "asd";
}
the input and output is still always the same in this case..
When you say "they", who are you refering to?
Anyways, moving on. A method such as what you presented is completely fine - if that's what you want it to do, then OK. No worries.
Similarly, it is completely valid for a static class to have some static state. Again, it could be that you would need that at some point.
The real thing to watch out for is something like
static class A
{
private static int x = InitX();
static A()
{
Console.WriteLine("A()");
}
private static int InitX()
{
Console.out.WriteLine("InitX()");
return 0;
}
...
}
If you use something along these lines, then you could easily be confused about when the static constructor is called and when InitX() is called. If you had some side effects / state changing that occurs like in this example, then that would be bad practice.
But as far as your actual question goes, those kind of state changes and side effects are fine.
Edit
Looking at your second example, and taking the rule precisely as it is stated, then, yes, you are in violation of it.
But...
Don't let that rule necessarily stop you from things like this. It can be very useful in some cases, e.g. when a method does intensive calculation, memoization is an easy way to reduce performance cost. While memoization technically has state and side-effects, the output is always the same for every input, which is the really important .
Side effects of a static member mean that it change the value of some other members in its container class. The static member in your case does not effect other members of its class and it is not violating the sentence you have mentioned.
EDIT
In the second example you've added by editting your question you are violating it.
It is perfectly acceptable for methods of a static class to change the state of objects that are passed to them. Indeed, that is the primary use for non-function static methods (since a non-function method which doesn't change the state of something would be pretty useless).
The pattern to be avoided is having a static class where methods have side-effects that are not limited to the passed-in objects or objects referenced by them. Suppose, for example, one had an embroidery-plotting class which had functions to select an embroidery module, and to scale, translate, or rotate future graphic operations. If multiple routines expect to do some drawing, it could be difficult to prevent device-selections or transformations done by one routine from affecting other routines. There are two common ways to resolve this problem:
Have all the static graphic routines accept a parameter which will hold a handle to the current device and world transform.
Have a non-static class which holds a device handle and world transform, and have it expose a full set of graphic methods.
In many cases, the best solution will be to have a class which uses the second approach for its external interface, but possibly uses the first method internally. The first approach is somewhat better with regard to the Single Responsibility Principle, but from an external calling standpoint, using class methods is often nicer than using static ones.
I have various classes for handling form data and querying a database. I need some advice on reducing the amount of code I write from site to site.
The following code is for handling a form posted via ajax to the server. It simply instantiates a Form class, validates the data and processes any errors:
public static string submit(Dictionary<string, string> d){
Form f = new Form("myform");
if (!f.validate(d)){
return f.errors.toJSON();
}
//process form...
}
Is there a way to reduce this down to 1 line as follows:
if (!Form.validate("myform", d)){ return Form.errors.toJSON(); }
Let's break that down into two questions.
1) Can I write the existing logic all in one statement?
The local variable has to be declared in its own statement, but the initializer doesn't have to be there. It's prefectly legal to say:
Form f;
if (!(f=new Form("myform")).validate(d))return f.errors.toJSON();
Why you would want to is beyond me; doing so is ugly, hard to debug, hard to understand, and hard to maintain. But it's perfectly legal.
2) Can I make this instance method into a static method?
Probably not directly. Suppose you had two callers validating stuff on two different threads, both calling the static Form.Validate method, and both producing errors. Now you have a race. One of them is going to win and fill in Form.Errors. And now you have two threads reporting the same set of errors, but the errors are wrong for one of them.
The better way to make this into a static method is to make the whole thing into a static method that has the desired semantics, as in plinth's answer.
Errors errors = Validator.Validate(d);
if (errors != null) return errors.toJSON();
Now the code is very clear, and the implementation of Validate is straightforward. Create a form, call the validator, either return null or the errors.
I would suggest that you don't need advice on reducing the amount of code you write. Rather, get advice on how to make the code read more like the meaning it intends to represent. Sometimes that means writing slightly more code, but that code is clear and easy to understand.
I would move all common validation logic to a superclass.
I think the main problem of your code is not that is long, but that you're repeating that in many places, either if you manage to make it a one-liner, it would not be DRY.
Take a look at the Template Method pattern, it might help here (The abstract class with the validation would be the Template and your specific 'actions' would be the subclasses).
Of course you could write this:
public static string FormValidate(Dictionary<string, string> d)
{
Form f = new Form("myform");
if (!f.validate(d))
return f.errors.ToJSON();
return null;
}
then your submit can be:
public static string submit(Dictionary<string, string> d)
{
if ((string errs = FormValidate(d))!= null) { return errs; }
// process form
}
That cuts down your code and doesn't hurt readability much at all.
If you really, really wanted to, you could store the error text in a thread-local property.
Does C# have a "ThreadLocal" analog (for data members) to the "ThreadStatic" attribute?
I've written a helper class that takes a string in the constructor and provides a lot of Get properties to return various aspects of the string. Currently the only way to set the line is through the constructor and once it is set it cannot be changed. Since this class only has one internal variable (the string) I was wondering if I should keep it this way or should I allow the string to be set as well?
Some example code my help why I'm asking:
StreamReader stream = new StreamReader("ScannedFile.dat");
ScannerLine line = null;
int responses = 0;
while (!stream.EndOfStream)
{
line = new ScannerLine(stream.ReadLine());
if (line.IsValid && !line.IsKey && line.HasResponses)
responses++;
}
Above is a quick example of counting the number of valid responses in a given scanned file. Would it be more advantageous to code it like this instead?
StreamReader stream = new StreamReader("ScannedFile.dat");
ScannerLine line = new ScannerLine();
int responses = 0;
while (!stream.EndOfStream)
{
line.RawLine = stream.ReadLine();
if (line.IsValid && !line.IsKey && line.HasResponses)
responses++;
}
This code is used in the back end of a ASP.net web application and needs to be somewhat responsive. I am aware that this may be a case of premature optimization but I'm coding this for responsiveness on the client side and maintainability.
Thanks!
EDIT - I decided to include the constructor of the class as well (Yes, this is what it really is.) :
public class ScannerLine
{
private string line;
public ScannerLine(string line)
{
this.line = line;
}
/// <summary>Gets the date the exam was scanned.</summary>
public DateTime ScanDate
{
get
{
DateTime test = DateTime.MinValue;
DateTime.TryParseExact(line.Substring(12, 6).Trim(), "MMddyy", CultureInfo.InvariantCulture, DateTimeStyles.None, out test);
return test;
}
}
/// <summary>Gets a value indicating whether to use raw scoring.</summary>
public bool UseRaw { get { return (line.Substring(112, 1) == "R" ? true : false); } }
/// <summary>Gets the raw points per question.</summary>
public float RawPoints
{
get
{
float test = float.MinValue;
float.TryParse(line.Substring(113, 4).Insert(2, "."), out test);
return test;
}
}
}
**EDIT 2 - ** I included some sample properties of the class to help clarify. As you can see, the class takes a fixed string from a scanner and simply makes it easier to break apart the line into more useful chunks. The file is a line delimiated file from a Scantron machine and the only way to parse it is a bunch of string.Substring calls and conversions.
I would definitely stick with the immutable version if you really need the class at all. Immutability makes it easier to reason about your code - if you store a reference to a ScannerLine, it's useful to know that it's not going to change. The performance is almost certain to be insignificant - the IO involved in reading the line is likely to be more significant than creating a new object. If you're really concerned about performance, should should benchmark/profile the code before you decide to make a design decision based on those performance worries.
However, if your state is just a string, are you really providing much benefit over just storing the strings directly and having appropriate methods to analyse them later? Does ScannerLine analyse the string and cache that analysis, or is it really just a bunch of parsing methods?
You're first approach is more clear. Performance wise you can gain something but I don't think is worth.
I would go with the second option. It's more efficient, and they're both equally easy to understand IMO. Plus, you probably have no way of knowing how many times those statements in the while loop are going to be called. So who knows? It could be a .01% performance gain, or a 50% performance gain (not likely, but maybe)!
Immutable classes have a lot of advantages. It makes sense for a simple value class like this to be immutable. The object creation time for classes is small for modern VMs. The way you have it is just fine.
I'd actually ditch the "instance" nature of the class entirely, and use it as a static class, not an instance as you are right now. Every property is entirely independent from each other, EXCEPT for the string used. If these properties were related to each other, and/or there were other "hidden" variables that were set up every time that the string was assigned (pre-processing the properties for example), then there'd be reasons to do it one way or the other with re-assignment, but from what you're doing there, I'd change it to be 100% static methods of the class.
If you insist on having the class be an instance, then for pure performance reasons I'd allow re-assignment of the string, as then the CLR isn't creating and destroying instances of the same class continually (except for the string itself obviously).
At the end of the day, IMO this is something you can really do any way you want since there are no other class instance variables. There may be style reasons to do one or the other, but it'd be hard to be "wrong" when solving that problem. If there were other variables in the class that were set upon construction, then this'd be a whole different issue, but right now, code for what you see as the most clear.
I'd go with your first option. There's no reason for the class to be mutable in your example. Keep it simple unless you actually have a need to make it mutable. If you're really that concerned with performance, then run some performance analysis tests and see what the differences are.