Question: Is there a significant performance hit when assigning a variable to itself?
Although the code can be easily changed to avoid assigning a variable to itself or to use flags to detect how something should be assigned .. I like the use of a null coalescing operator to make cleaner initialization code. Consider the following code:
public class MyObject
{
public string MyString { get; set; }
}
public class Worker
{
Dictionary<int, string> m_cache = new Dictionary<string, string>();
public void Assign(ref MyObject obj)
{
string tmp = null;
if (some condition)
{
//can't pass in MyObject.MyString to out param, so we need to use tmp
m_cache.TryGetValue("SomeKey", out tmp); //assumption: key is guaranteed to exist if condition is met
}
else
{
obj.MyString = MagicFinder(); //returns the string we are after
}
//finally, assign MyString property
obj.MyString = tmp ?? obj.MyString;
//is there a significant performance hit here that is worth changing
//it to the following?
if (!string.IsEmptyOrNull(tmp))
{
obj.MyString = tmp;
}
}
}
Your question seems to boil down to this section:
//finally, assign MyString property
obj.MyString = tmp ?? obj.MyString;
To:
//is there a significant performance hit here that is worth changing
//it to the following?
if (!string.IsNullOrEmpty(tmp))
{
obj.MyString = tmp;
}
Realize, however, that the two versions are not the same. In the first case (using the null coalescing operation), obj.MyString can get assigned to string.Empty. The second case prevents this explicitly, since it's checking against string.IsNullOrEmpty, which will precent the assignment in the case of a non-null, empty string.
Since the behavior is different, I would say this is not an optimization, but rather a behavioral change. Whether this is appropriate depends on the behavior this method is specified to demonstrate. How should empty strings be handled? That should be the determining factor here.
That being said, if you were to do an explicit null check, the performance would be near identical - I would suggest going for maintainability and readability here - to me, the first version is simpler, as it's short and clear.
Also, looking at your code, I think it would be far simpler to put the assignment into the statement. This would make the code far more maintainable (and more efficient, to the delight of your boss...):
public void Assign(ref MyObject obj)
{
if (some condition)
{
string tmp = null;
//can't pass in MyObject.MyString to out param, so we need to use tmp
m_cache.TryGetValue("SomeKey", out tmp); //assumption: key is guaranteed to exist if condition is met
obj.MyString = tmp;
}
else
{
obj.MyString = MagicFinder(); //returns the string we are after
}
}
Given the assumption in the code (that the key will always exist in cache if your condition is true), you can even simplify this down to:
public void Assign(MyObject obj) // No reason to pass by ref...
{
obj.MyString = someCondition ? m_cache["SomeKey"] : MagicFinder();
}
You shouldn't worry about that, the impact should be minimal. However, think about how many people are actually aware of the null coalescing operator. Personally, I find the latter more readable at a glance, you don't have to think about it what it means.
The performance worry of checking if a string is null or empty is of no immediate concern, but you could alter your code to be more concise, I think:
string tmp = null;
if (some condition)
{
m_cache.TryGetValue("SomeKey", out tmp);
}
if (string.IsNullOrEmpty(tmp))
{
tmp = MagicFinder();
}
obj.MyString = tmp;
"Premature optimization is the root of all evil."
In other words, I doubt that any theoretical answer will be representative of real world performance, unless you happen to know how it compiles down to bytecode... How about you create a quick and dirty test framework that'll run it 1000 times with random values (null or valid strings) and measure the performance yourself?
EDIT: To add, I'd be rather surprised if the performance was significantly different enough that it would matter in the grand scheme of things. Go for legibility first (although you could argue in favor of both :))
Related
I have a very quick question about the best way to use two variables. Essentially I have an enum and an int, the value for which I want to get within several ifs. Should I declare them outside the if's or inside - consider the following examples:
e.g.a:
public void test() {
EnumName? value = null;
int distance = 0;
if(anotherValue == something) {
distance = 10;
value = getValue(distance);
}
else if(anotherValue == somethingElse) {
distance = 20;
value = getValue(distance);
}
if (value == theValueWeWant){
//Do something
}
OR
e.g.2
public void test() {
if(anotherValue == something) {
int distance = 10;
EnumType value = getValue(distance);
if (value == theValueWeWant){
//Do something
}
else if(anotherValue == somethingElse) {
int distance = 20;
EnumType value = getValue(distance);
if (value == theValueWeWant){
//Do something
}
}
I am just curious which is best? or if there is a better way?
Purely in terms of maintenance, the first code block is better as it does not duplicate code (assuming that "Do something" is the same in both cases).
In terms of performance, the difference should be negligible. The second case does generate twice as many locals in the compiled IL, but the JIT should notice that their usage does not overlap and optimize them away. The second case is also going to cause emission of the same code twice (if (value == theValueWeWant) { ...), but this should also not cause any significant performance penalty.
(Though both aspects of the second example will cause the compiled assembly to be very slightly larger, more IL does not always imply worse performance.)
Both examples do two different things:
Version 1 will run the same code if you get the desired value, where as Version 2 will potentially run different code even if you get the desired value.
There's a lot of possible (micro)optimizations you could do.
For Example, if distance is only ever used in getValue(distance), you could get rid of it entirely:
/*Waring, micro-optimization!*/
public void test() {
EnumType value = getValue((anotherValue == something) ? 10 : (anotherValue == somethingElse) ? 20 : 0);
if (value == theValueWeWant){
//Do something
}
}
If you wish to use those later on, then g for the second method. Those variables will be lost as soon as they're out of scope.
Even if you don't want to use them later, declaring them before the if's is something you should do, to avoid code repetition.
This question is purely a matter of style and hence has no correct answer, only opinions
The C# best practice is generally to declare variables in the scope where they are used. This would point to the second example as the answer. Even though the types and names are the same, they represent different uses and should be constrained to the blocks in which they are created.
I want to see your ideas on a efficient way to check values of a newly serialized object.
Example I have an xml document I have serialized into an object, now I want to do value checks. First and most basic idea I can think of is to use nested if statments and checks each property, could be from one value checking that it has he correct url format, to checking another proprieties value that is a date but making sue it is in the correct range etc.
So my question is how would people do checks on all values in an object? Type checks are not important as this is already taken care of it is more to do with the value itself. It needs to be for quite large objects this is why I did not really want to use nested if statements.
Edit:
I want to achieve complete value validation on all properties in a given object.
I want to check the value it self not that it is null. I want to check the value for specific things if i have, an object with many properties one is of type string and named homepage.
I want to be able to check that the string in the in the correct URL format if not fail. This is just one example in the same object I could check that a date is in a given range if any are not I will return false or some form of fail.
I am using c# .net 4.
Try to use Fluent Validation, it is separation of concerns and configure validation out of your object
public class Validator<T>
{
List<Func<T,bool>> _verifiers = new List<Func<T, bool>>();
public void AddPropertyValidator(Func<T, bool> propValidator)
{
_verifiers.Add(propValidator);
}
public bool IsValid(T objectToValidate)
{
try {
return _verifiers.All(pv => pv(objectToValidate));
} catch(Exception) {
return false;
}
}
}
class ExampleObject {
public string Name {get; set;}
public int BirthYear { get;set;}
}
public static void Main(string[] args)
{
var validator = new Validator<ExampleObject>();
validator.AddPropertyValidator(o => !string.IsNullOrEmpty(o.Name));
validator.AddPropertyValidator(o => o.BirthYear > 1900 && o.BirthYear < DateTime.Now.Year );
validator.AddPropertyValidator(o => o.Name.Length > 3);
validator.Validate(new ExampleObject());
}
I suggest using Automapper with a ValueResolver. You can deserialize the XML into an object in a very elegant way using autommaper and check if the values you get are valid with a ValueResolver.
You can use a base ValueResolver that check for Nulls or invalid casts, and some CustomResolver's that check if the Values you get are correct.
It might not be exacly what you are looking for, but I think it's an elegant way to do it.
Check this out here: http://dannydouglass.com/2010/11/06/simplify-using-xml-data-with-automapper-and-linqtoxml
In functional languages, such as Haskell, your problem could be solved with the Maybe-monad:
The Maybe monad embodies the strategy of combining a chain of
computations that may each return Nothing by ending the chain early if
any step produces Nothing as output. It is useful when a computation
entails a sequence of steps that depend on one another, and in which
some steps may fail to return a value.
Replace Nothing with null, and the same thing applies for C#.
There are several ways to try and solve the problem, none of them are particularly pretty. If you want a runtime-validation that something is not null, you could use an AOP framework to inject null-checking code into your type. Otherwise you would really have to end up doing nested if checks for null, which is not only ugly, it will probably violate the Law of Demeter.
As a compromise, you could use a Maybe-monad like set of extension methods, which would allow you to query the object, and choose what to do in case one of the properties is null.
Have a look at this article by Dmitri Nesteruk: http://www.codeproject.com/Articles/109026/Chained-null-checks-and-the-Maybe-monad
Hope that helps.
I assume your question is: How do I efficiently check whether my object is valid?
If so, it does not matter that your object was just deserialized from some text source. If your question regards checking the object while deserializing to quickly stop deserializing if an error is found, that is another issue and you should update your question.
Validating an object efficiently is not often discussed when it comes to C# and administrative tools. The reason is that it is very quick no matter how you do it. It is more common to discuss how to do the checks in a manner that is easy to read and easily maintained.
Since your question is about efficiency, here are some ideas:
If you have a huge number of objects to be checked and performance is of key importance, you might want to change your objects into arrays of data so that they can be checked in a consistent manner. Example:
Instead of having MyObject[] MyObjects where MyObject has a lot of properties, break out each property and put them into an array like this:
int[] MyFirstProperties
float[] MySecondProperties
This way, the loop that traverses the list and checks the values, can be as quick as possible and you will not have many cache misses in the CPU cache, since you loop forward in the memory. Just be sure to use regular arrays or lists that are not implemented as linked lists, since that is likely to generate a lot of cache misses.
If you do not want to break up your objects into arrays of properties, it seems that top speed is not of interest but almost top speed. Then, your best bet is to keep your objects in a serial array and do:
.
bool wasOk = true;
foreach (MyObject obj in MyObjects)
{
if (obj.MyFirstProperty == someBadValue)
{
wasOk = false;
break;
}
if (obj.MySecondProperty == someOtherBadValue)
{
wasOk = false;
break;
}
}
This checks whether all your objects' properties are ok. I am not sure what your case really is but I think you get the point. Speed is already great when it comes to just checking properties of an object.
If you do string compares, make sure that you use x = y where possible, instead of using more sophisticated string compares, since x = y has a few quick opt outs, like if any of them is null, return, if the memory address is the same, the strings are equal and a few more clever things if I remember correctly. For any Java guy reading this, do not do this in Java!!! It will work sometimes but not always.
If I did not answer your question, you need to improve your question.
I'm not certain I understand the depth of your question but, wouldn't you just do somthing like this,
public SomeClass
{
private const string UrlValidatorRegex = "http://...
private const DateTime MinValidSomeDate = ...
private const DateTime MaxValidSomeDate = ...
public string SomeUrl { get; set; }
public DateTime SomeDate { get; set; }
...
private ValidationResult ValidateProperties()
{
var urlValidator = new RegEx(urlValidatorRegex);
if (!urlValidator.IsMatch(this.Someurl))
{
return new ValidationResult
{
IsValid = false,
Message = "SomeUrl format invalid."
};
}
if (this.SomeDate < MinValidSomeDate
|| this.SomeDate > MinValidSomeDate)
{
return new ValidationResult
{
IsValid = false,
Message = "SomeDate outside permitted bounds."
};
}
...
// Check other fields and properties here, return false on failure.
...
return new ValidationResult
{
IsValid = true,
};
}
...
private struct ValidationResult
{
public bool IsValid;
public string Message;
}
}
The exact valdiation code would vary depending on how you would like your class to work, no? Consider a property of a familar type,
public string SomeString { get; set; }
What are the valid values for this property. Both null and string.Empty may or may not be valid depending on the Class adorned with the property. There may be maximal length that should be allowed but, these details would vary by implementation.
If any suggested answer is more complicated than code above without offering an increase in performance or functionality, can it be more efficient?
Is your question actually, how can I check the values on an object without having to write much code?
I have been writing:
if(Class.HasSomething() == true/false)
{
// do somthing
}
else
{
// do something else
}
but I've also seen people that do the opposite:
if(true/false == Class.HasSomething())
{
// do somthing
}
else
{
// do something else
}
Is there any advantage in doing one or the other in terms of performance and speed? I'm NOT talking about coding style here.
They're both equivalent, but my preference is
if(Class.HasSomething())
{
// do something
}
else
{
// do something else
}
...for simplicity.
Certain older-style C programmers prefer "Yoda Conditions", because if you accidentally use a single-equals sign instead, you'll get a compile time error about assigning to a constant:
if (true = Foo()) { ... } /* Compile time error! Stops typo-mistakes */
if (Foo() = true) { ... } /* Will actually compile for certain Foo() */
Even though that mistake will no longer compile in C#, old habits die hard, and many programmers stick to the style developed in C.
Personally, I like the very simple form for True statements:
if (Foo()) { ... }
But for False statements, I like an explicit comparison.
If I write the shorter !Foo(), it is easy to over-look the ! when reviewing code later.
if (false == Foo()) { ... } /* Obvious intent */
if (!Foo()) { ... } /* Easy to overlook or misunderstand */
The second example is what I've heard called "Yoda conditions"; "False, this method's return value must be". It's not the way you'd say it in English and so among English-speaking programmers it's generally looked down on.
Performance-wise, there's really no difference. The first example is generally better grammatically (and thus for readability), but given the name of your method the "grammar" involved (and the fact you're comparing bool to bool) would make the equality check redundant anyway. So, for a true statement, I would simply write:
if(Class.HasSomething())
{
// do somthing
}
else
{
// do something else
}
This would be incrementally faster, as the if() block basically has a built-in equality comparison, so if you code if(Class.HasSomething() == true) the CLR will evaluate if((Class.HasSomething() == true) == true). But, we're talking a gain of maybe a few clocks here (not milliseconds, not ticks, but clocks; the ones that happen 2 billion times a second in modern processors).
For a false condition, it's a toss-up between using the not operator: if(!Class.HasSomething()) and using a comparison to false: if(Class.HasSomething() == false). The first is more concise, but it can be easy to miss that little exclamation point in a complex expression (especially since it occurs before the entire expression) and so I'd consider equating with false to ensure that the code is readable.
You will not see any performance difference.
The correct option is
if (Whatever())
The only time you should write == false or != true is when dealing with bool?s. (in which case all four options have different meanings)
You will not see any performance difference, either comparison is translated into the same IL...
if(Class.HasSomething())
{
// do somthing
}
is my way. But better try to avoid a multiple method call of HasSomething(). Better expose the return value once and reuse it.
you should write neither.
Write
if(Class.HasSomething())
{
// do something
}
else
{
// do something else
}
instead. If Class.HasSomething() is already a bool, it's pointless to compare it to another boolean
There is no perf advantage here. This coding style is used to guard against situation where programmer types = instead of ==. Compiler will cathc this because true/false are constants and cannot be assigned a new value
For the case of booleans, I'd recommend neither: just use if (method()) and if (!method()). For the case of things besides booleans, the convention of using yoda-speak, e.g. if (1 == x) came about to prevent mistakes, because if (1 = x) will throw a compiler error while if (x = 1) will not (it is valid code in C, but is probably not what you intended). In C#, such a statement is only valid if the variable was a boolean, which reduces the need to do that.
Which is a better programming practice and why?
I have a class like this:
class data {
public double time { get; internal set; }
public double count { get; internal set; }
public average_count { ... }
}
Where average_count should be read_only and give a calculation of count / time.
Is it better to write the accessor as:
public average_count { get {
return (time == 0) ? 0 : (count / time);
}}
Or should I do something like:
private _avg_count;
public average_count {
get
{
return _avg_count;
}
internal set
{
return _avg_count;
}
}
Where _avg_count is updated when in the time and count set accessors?
It seems like the first is easier to read but may be slower if average_count is accessed often. Will the compiler optimizations make the difference insignificant?
Doing it on the fly results in more readable code. Precalculating may improve performance, but you should only do this if (a) it's necessary and (b) you've profiled and it makes a difference.
The bottom line is, readability and maintainability should only be sacrificed for performance when it's absolutely necessary.
This is such a seemingly simple question, yet it's almost impossible to answer. The reason is that what's "right" depends on a number of factors.
1. Performance
In his answer Skilldrick recommends that you prefer what is readable over what performs best as a general rule:
[R]eadability and maintainability should
only be sacrificed for performance
when it's absolutely necessary.
I would counter this by saying that it is only true in a typical business application, where performance and functionality are two clearly distinguishable features. In certain high-performance software scenarios, this is not such an easy thing to say as performance and functionality may become inextricably linked -- that is, if how well your program accomplishes its task depends on how solidly it performs (this is the case at my current place of employment, a company that does algorithmic trading).
So it's a judgment call on your part. The best advice is to profile whenever you have an inkling; and if it's appropriate to sacrifice readability for performance in your case, then you should do so.
2. Memory Usage
0xA3 suggests a fairly elegant approach providing a compromise of sorts: only calculate the value as needed, and then cache it.
The downside to this approach, of course, is that it requires more memory to maintain. An int? requires essentially the same amount of memory as an int plus a bool (which, due to alignment issues, could actually mean 64 bits instead of just 40). If you have loads and loads of instances of this data class and memory is a scarce resource on the platform you're targeting, bloating your type with another 32 bits per instance might not be the smartest move.
3. Maintainability
That said, I do agree in general with what others have said that, all else being equal, you're better off erring on the side of readability so that you can understand your code if and when you revisit it in the future. However, none of the various approaches to this problem is particularly complicated, either.
The bottom line is that only you know your exact circumstances and thus you are in the best position to decide what to do here.
It depends how often you will call average_count and how often count and time are modified. For such kind of optimization I would suggest you to use profiler.
In cases where performance is critical and you need to frequently access the property you have another option as well: Calculating the result when it is needed and then cache the result. The pattern would look like this:
class Foo
{
int? _cachedResult = null;
int _someProperty;
public int SomeProperty
{
get { return _someProperty; }
set { _someProperty = value; _cachedResult = null; }
}
int _someOtherProperty;
public int SomeOtherProperty
{
get { return _someOtherProperty; }
set { _someOtherProperty = value; _cachedResult = null; }
}
public int SomeDerivedProperty
{
get
{
if (_cachedResult == null)
_cachedResult = someExpensiveCalculation();
return (int)_cachedResult;
}
}
}
In this simple case, I would definitely implement the calculated version first; then optimize if neccessary. If you store the value, you will also need extra code to recalculate the value if any of the values it depends on, changes, which leads to possibilities for errors.
Don't optimize prematurely.
I'd go with your first option. These are in-memory objects so the computation on what you're doing is going to be extremely fast. Further, if you created a dedicated property for this (e.g., average_count) then you're going to have to add more code to re-calculate this in the setter for both the time and the count.
As a side note (since you're asking about best practice), you should be using Pascal casing in C# which is initial caps and no underscores.
Will the compiler optimizations make the difference insignificant?
Depends on what you consider "significant". Reading a varaible is rather quick. Dividing two number is rather quick. In fact, depending on what in the RAM cache, reading the variables may take longer than doing the divide.
Use the first method. If it's seems to slow, then consider the second.
If you care about thread safety it may be way easier to do the second option rather than the first.
private double _avg_count;
static readonly object avgLock = new object();
public double time { get; internal set; }
public double count { get; internal set; }
public double average_count {
get
{
return _avg_count;
}
}
private void setAverageCount()
{
_avg_count = time == 0 ? 0 : (count / time);
}
public void Increment()
{
lock (avgLock)
{
count += 1;
setAverageCount();
}
}
public void EndTime(double time)
{
lock (avgLock)
{
time = time;
setAverageCount();
}
}
Which is faster? This:
bool isEqual = (MyObject1 is MyObject2)
Or this:
bool isEqual = ("blah" == "blah1")
It would be helpful to figure out which one is faster. Obviously, if you apply .ToUpper() to each side of the string comparison like programmers often do, that would require reallocating memory which affects performance. But how about if .ToUpper() is out of the equation like in the above sample?
I'm a little confused here.
As other answers have noted, you're comparing apples and oranges. ::rimshot::
If you want to determine if an object is of a certain type use the is operator.
If you want to compare strings use the == operator (or other appropriate comparison method if you need something fancy like case-insensitive comparisons).
How fast one operation is compared to the other (no pun intended) doesn't seem to really matter.
After closer reading, I think that you want to compare the speed of string comparisions with the speed of reference comparisons (the type of comparison used in the System.Object base type).
If that's the case, then the answer is that reference comparisons will never be slower than any other string comparison. Reference comparison in .NET is pretty much analogous to comparing pointers in C - about as fast as you can get.
However, how would you feel if a string variable s had the value "I'm a string", but the following comparison failed:
if (((object) s) == ((object) "I'm a string")) { ... }
If you simply compared references, that might happen depending on how the value of s was created. If it ended up not being interned, it would not have the same reference as the literal string, so the comparison would fail. So you might have a faster comparison that didn't always work. That seems to be a bad optimization.
According to the book Maximizing .NET Performance
the call
bool isEqual = String.Equals("test", "test");
is identical in performance to
bool isEqual = ("test" == "test");
The call
bool isEqual = "test".Equals("test");
is theoretically slower than the call to the static String.Equals method, but I think you'll need to compare several million strings in order to actually detect a speed difference.
My tip to you is this; don't worry about which string comparison method is slower or faster. In a normal application you'll never ever notice the difference. You should use the way which you think is most readable.
The first one is used to compare types not values.
If you want to compare strings with a non-sensitive case you can use:
string toto = "toto";
string tata = "tata";
bool isEqual = string.Compare(toto, tata, StringComparison.InvariantCultureIgnoreCase) == 0;
Console.WriteLine(isEqual);
How about you tell me? :)
Take the code from this Coding Horror post, and insert your code to test in place of his algorithm.
Comparing strings with a "==" operator compares the contents of the string vs. the string object reference. Comparing objects will call the "Equals" method of the object to determine whether they are equal or not. The default implementation of Equals is to do a reference comparison, returning True if both object references are the same physical object. This will likely be faster than the string comparison, but is dependent on the type of object being compared.
I'd assume that comparing the objects in your first example is going to be about as fast as it gets since its simply checking if both objects point to the same address in memory.
As it has been mentioned several times already, it is possible to compare addresses on strings as well, but this won't necessarily work if the two strings are allocated from different sources.
Lastly, its usually good form to try and compare objects based on type whenever possible. Its typically the most concrete method of identification. If your objects need to be represented by something other than their address in memory, its possible to use other attributes as identifiers.
If I understand the question and you really want to compare reference equality with the plain old "compare the contents": Build a testcase and call object.ReferenceEquals compared against a == b.
Note: You have to understand what the difference is and that you probably cannot use a reference comparison in most scenarios. If you are sure that this is what you want it might be a tiny bit faster. You have to try it yourself and evaluate if this is worth the trouble at all..
I don't feel like any of these answers address the actual question. Let's say the string in this example is the type's name and we're trying to see if it's faster to compare a type name or the type to determine what it is.
I put this together and to my surprise, it's about 10% faster to check the type name string than the type in every test I ran. I intentionally put the simplest strings and classes into play to see if it was possible to be faster, and turns out it is possible. Not sure about more complicated strings and type comparisons from heavily inherited classes. This is of course a micro-op and may possibly change at some point in the evolution of the language I suppose.
In my case, I was considering a value converter that switches based on this name, but it could also switch over the type since each type specifies a unique type name. The value converter would figure out the font awesome icon to show based on the type of item presented.
using System;
using System.Diagnostics;
using System.Linq;
namespace ConsoleApp1
{
public sealed class A
{
public const string TypeName = "A";
}
public sealed class B
{
public const string TypeName = "B";
}
public sealed class C
{
public const string TypeName = "C";
}
class Program
{
static void Main(string[] args)
{
var testlist = Enumerable.Repeat(0, 100).SelectMany(x => new object[] { new A(), new B(), new C() }).ToList();
int count = 0;
void checkTypeName()
{
foreach (var item in testlist)
{
switch (item)
{
case A.TypeName:
count++;
break;
case B.TypeName:
count++;
break;
case C.TypeName:
count++;
break;
default:
break;
}
}
}
void checkType()
{
foreach (var item in testlist)
{
switch (item)
{
case A _:
count++;
break;
case B _:
count++;
break;
case C _:
count++;
break;
default:
break;
}
}
}
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 100000; i++)
{
checkTypeName();
}
sw.Stop();
Console.WriteLine(sw.Elapsed);
sw.Restart();
for (int i = 0; i < 100000; i++)
{
checkType();
}
sw.Stop();
Console.WriteLine(sw.Elapsed);
}
}
}