Remove null check after lazy initialization - c#

When one decides to use lazy initialization, he usually has to pay for it.
class Loafer
{
private VeryExpensiveField field;
private VeryExpensiveField LazyInitField()
{
field = new VeryExpensiveField();
// I wanna here remove null check from accessor, but how?
return field;
}
property Field { get { return field ?? LazyInitField(); } }
}
Basically, he has to check every time if his backing field has null/nil value. What if he could escape from this practice? When you successfully initialize the field, you can rid of this check, right?
Unfortunately, majority of production languages do not allow you to modify their functions in run-time, especially add or remove single instructions from the function body though it would be helpful if used wisely. However, in C#, you can use delegates (initially I discovered them and afterwards realized why native languages having function pointers for) and events mechanism to imitate such behavior with consequent lack of performance, because null-checks just move onto lower level, but do not disappear completely. Some languages, e.g. LISP and Prolog, allow you to modify their code easily, but they are hardly can be treated as production languages.
In native languages like Delphi and C/C++ it seems better to write two functions, safe and rapid, call them by pointer and switch this pointer to rapid version after initialization. You can even allow compiler or IDE to generate code to do this without additional headache. But as #hvd mentioned, this can even decrease speed, because CPU will not know that those functions are almost the same, thus will not prefetch them into it's cache.
Yes, I'm performance maniac seeking for performance without explicit problem, just to feed my curiosity. What common approaches are exist to develop such functionality?

Actually the laziness toolkit framework is not always that important, when you compare it's overhead to the actual computation.
There are many approaches.
You can use Lazy, a self modifying lambda setup, a boolean or whatever suits your workflow best.
Lazy evaluation toolkit's overhead is only important to consider when you have some repeated computation.
My code example with a micro benchmark explores the comparative overhead of lazy computation in context of an accompanying more expensive operation in a loop.
You can see that laziness toolkit's overhead is neglectible even when used along with a relatively chip payload operation.
void Main()
{
// If the payload is small, laziness toolkit is not neglectible
RunBenchmarks(i => i % 2 == 0, "Smaller payload");
// Even this small string manupulation neglects overhead of laziness toolkit
RunBenchmarks(i => i.ToString().Contains("5"), "Larger payload");
}
void RunBenchmarks(Func<int, bool> payload, string what)
{
Console.WriteLine(what);
var items = Enumerable.Range(0, 10000000).ToList();
Func<Func<int, bool>> createPredicateWithBoolean = () =>
{
bool computed = false;
return i => (computed || (computed = Compute())) && payload(i);
};
items.Count(createPredicateWithBoolean());
var sw = Stopwatch.StartNew();
Console.WriteLine(items.Count(createPredicateWithBoolean()));
sw.Stop();
Console.WriteLine("Elapsed using boolean: {0}", sw.ElapsedMilliseconds);
Func<Func<int, bool>> createPredicate = () =>
{
Func<int, bool> current = i =>
{
var computed2 = Compute();
current = j => computed2;
return computed2;
};
return i => current(i) && payload(i);
};
items.Count(createPredicate());
sw = Stopwatch.StartNew();
Console.WriteLine(items.Count(createPredicate()));
sw.Stop();
Console.WriteLine("Elapsed using smart predicate: {0}", sw.ElapsedMilliseconds);
Console.WriteLine();
}
bool Compute()
{
return true; // not important for the exploration
}
Output:
Smaller payload
5000000
Elapsed using boolean: 161
5000000
Elapsed using smart predicate: 182
Larger payload
5217031
Elapsed using boolean: 1980
5217031
Elapsed using smart predicate: 1994

FWIW with the help of Spring4D this can also be done in Delphi:
var
field: Lazy<VeryExpensiveField>;
begin
field :=
function: VeryExpensiveField
begin
Result := VeryExpensiveField.Create;
end;

Related

What does _= mean in C#? [duplicate]

While going through new C# 7.0 features, I stuck up with discard feature. It says:
Discards are local variables which you can assign but cannot read
from. i.e. they are “write-only” local variables.
and, then, an example follows:
if (bool.TryParse("TRUE", out bool _))
What is real use case when this will be beneficial? I mean what if I would have defined it in normal way, say:
if (bool.TryParse("TRUE", out bool isOK))
The discards are basically a way to intentionally ignore local variables which are irrelevant for the purposes of the code being produced. It's like when you call a method that returns a value but, since you are interested only in the underlying operations it performs, you don't assign its output to a local variable defined in the caller method, for example:
public static void Main(string[] args)
{
// I want to modify the records but I'm not interested
// in knowing how many of them have been modified.
ModifyRecords();
}
public static Int32 ModifyRecords()
{
Int32 affectedRecords = 0;
for (Int32 i = 0; i < s_Records.Count; ++i)
{
Record r = s_Records[i];
if (String.IsNullOrWhiteSpace(r.Name))
{
r.Name = "Default Name";
++affectedRecords;
}
}
return affectedRecords;
}
Actually, I would call it a cosmetic feature... in the sense that it's a design time feature (the computations concerning the discarded variables are performed anyway) that helps keeping the code clear, readable and easy to maintain.
I find the example shown in the link you provided kinda misleading. If I try to parse a String as a Boolean, chances are I want to use the parsed value somewhere in my code. Otherwise I would just try to see if the String corresponds to the text representation of a Boolean (a regular expression, for example... even a simple if statement could do the job if casing is properly handled). I'm far from saying that this never happens or that it's a bad practice, I'm just saying it's not the most common coding pattern you may need to produce.
The example provided in this article, on the opposite, really shows the full potential of this feature:
public static void Main()
{
var (_, _, _, pop1, _, pop2) = QueryCityDataForYears("New York City", 1960, 2010);
Console.WriteLine($"Population change, 1960 to 2010: {pop2 - pop1:N0}");
}
private static (string, double, int, int, int, int) QueryCityDataForYears(string name, int year1, int year2)
{
int population1 = 0, population2 = 0;
double area = 0;
if (name == "New York City")
{
area = 468.48;
if (year1 == 1960) {
population1 = 7781984;
}
if (year2 == 2010) {
population2 = 8175133;
}
return (name, area, year1, population1, year2, population2);
}
return ("", 0, 0, 0, 0, 0);
}
From what I can see reading the above code, it seems that the discards have a higher sinergy with other paradigms introduced in the most recent versions of C# like tuples deconstruction.
For Matlab programmers, discards are far from being a new concept because the programming language implements them since very, very, very long time (probably since the beginning, but I can't say for sure). The official documentation describes them as follows (link here):
Request all three possible outputs from the fileparts function:
helpFile = which('help');
[helpPath,name,ext] = fileparts('C:\Path\data.txt');
The current workspace now contains three variables from fileparts: helpPath, name, and ext. In this case, the variables are small. However, some functions return results that use much more memory. If you do not need those variables, they waste space on your system.
Ignore the first output using a tilde (~):
[~,name,ext] = fileparts(helpFile);
The only difference is that, in Matlab, inner computations for discarded outputs are normally skipped because output arguments are flexible and you can know how many and which one of them have been requested by the caller.
I have seen discards used mainly against methods which return Task<T> but you don't want to await the output.
So in the example below, we don't want to await the output of SomeOtherMethod() so we could do something like this:
//myClass.cs
public async Task<bool> Example() => await SomeOtherMethod()
// example.cs
Example();
Except this will generate the following warning:
CS4014 Because this call is not awaited, execution of the
current method continues before the call is completed. Consider
applying the 'await' operator to the result of the call.
To mitigate this warning and essentially ensure the compiler that we know what we are doing, you can use a discard:
//myClass.cs
public async Task<bool> Example() => await SomeOtherMethod()
// example.cs
_ = Example();
No more warnings.
To add another use case to the above answers.
You can use a discard in conjunction with a null coalescing operator to do a nice one-line null check at the start of your functions:
_ = myParam ?? throw new MyException();
Many times I've done code along these lines:
TextBox.BackColor = int32.TryParse(TextBox.Text, out int32 _) ? Color.LightGreen : Color.Pink;
Note that this would be part of a larger collection of data, not a standalone thing. The idea is to provide immediate feedback on the validity of each field of the data they are entering.
I use light green and pink rather than the green and red one would expect--the latter colors are dark enough that the text becomes a bit hard to read and the meaning of the lighter versions is still totally obvious.
(In some cases I also have a Color.Yellow to flag something which is not valid but neither is it totally invalid. Say the parser will accept fractions and the field currently contains "2 1". That could be part of "2 1/2" so it's not garbage, but neither is it valid.)
Discard pattern can be used with a switch expression as well.
string result = shape switch
{
Rectangule r => $"Rectangule",
Circle c => $"Circle",
_ => "Unknown Shape"
};
For a list of patterns with discards refer to this article: Discards.
Consider this:
5 + 7;
This "statement" performs an evaluation but is not assigned to something. It will be immediately highlighted with the CS error code CS0201.
// Only assignment, call, increment, decrement, and new object expressions can be used as a statement
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/compiler-messages/cs0201?f1url=%3FappId%3Droslyn%26k%3Dk(CS0201)
A discard variable used here will not change the fact that it is an unused expression, rather it will appear to the compiler, to you, and others reviewing your code that it was intentionally unused.
_ = 5 + 7; //acceptable
It can also be used in lambda expressions when having unused parameters:
builder.Services.AddSingleton<ICommandDispatcher>(_ => dispatcher);

StackExchange.Redis Scan x amount of keys

I have a redis db that has thousands of keys and I'm currently running the following line to get all the keys:
string[] keysArr = keys.Select(key => (string)key).ToArray();
But because I have a lot of keys this takes a long time. I want to limit the number of keys being read. So I'm trying to run an execute command where I get 100 keys at a time:
var keys = Redis.Connection.GetDatabase(dbNum).Execute("scan", 0, "count", 100);
This command successfully runs the command, however unable to access the the value as it is private, and unable to cast it even though RedisResult classs provides a explicit cast to it:
public static explicit operator string[] (RedisResult result);
Any ideas to get x amount of keys at a time from redis?
Thanks
SE.Redis has a .Keys() method on IServer API which fully encapsulates the semantics of SCAN. If possible, just use this method, and consume the data 100 at a time. It is usually pretty easy to write a batching function, i.e.
ExecuteInBatches(server.Keys(), 100, batch => DoSomething(batch));
with:
public void ExecuteInBatches<T>(IEnumerable<T> source, int batchSize,
Action<List<T>> action)
{
List<T> batch = new List<T>();
foreach(var item in source) {
batch.Add(item);
if(batch.Count == batchSize) {
action(batch);
batch = new List<T>(); // in case the callback stores it
}
}
if (batch.Count != 0) {
action(batch); // any leftovers
}
}
The enumerator will worry about advancing the cursor.
You can use Execute, but: that is a lot of work! Also, SCAN makes no gaurantees about how many will be returned per page; it can be zero - it can be 3 times what you asked for. It is ... guidance only.
Incidentally, the reason that the cast fails is because SCAN doesn't return a string[] - it returns an array of two items, the first of which is the "next" cursor, the second is the keys. So maybe:
var arr = (RedisResult[])server.Execute("scan", 0);
var nextCursor = (int)arr[0];
var keys = (RedisKey[])arr[1];
But all this is doing is re-implementing IServer.Keys, the hard way (and significantly less efficiently - ServerResult is not the ideal way to store data, it is simply necessary in the case of Execute and ScriptEvaluate).
I would use the .Take() method, outlined by Microsoft here.
Returns a specified number of contiguous elements from the start of a
sequence.
It would look something like this:
//limit to 100
var keysArr = keys.Select(key => (string)key).Take(100).ToArray();

FindAll vs Where extension-method

I just want know if a "FindAll" will be faster than a "Where" extentionMethod and why?
Example :
myList.FindAll(item=> item.category == 5);
or
myList.Where(item=> item.category == 5);
Which is better ?
Well, FindAll copies the matching elements to a new list, whereas Where just returns a lazily evaluated sequence - no copying is required.
I'd therefore expect Where to be slightly faster than FindAll even when the resulting sequence is fully evaluated - and of course the lazy evaluation strategy of Where means that if you only look at (say) the first match, it won't need to check the remainder of the list. (As Matthew points out, there's work in maintaining the state machine for Where. However, this will only have a fixed memory cost - whereas constructing a new list may require multiple array allocations etc.)
Basically, FindAll(predicate) is closer to Where(predicate).ToList() than to just Where(predicate).
Just to react a bit more to Matthew's answer, I don't think he's tested it quite thoroughly enough. His predicate happens to pick half the items. Here's a short but complete program which tests the same list but with three different predicates - one picks no items, one picks all the items, and one picks half of them. In each case I run the test fifty times to get longer timing.
I'm using Count() to make sure that the Where result is fully evaluated. The results show that collecting around half the results, the two are neck and neck. Collecting no results, FindAll wins. Collecting all the results, Where wins. I find this intriguing: all of the solutions become slower as more and more matches are found: FindAll has more copying to do, and Where has to return the matched values instead of just looping within the MoveNext() implementation. However, FindAll gets slower faster than Where does, so loses its early lead. Very interesting.
Results:
FindAll: All: 11994
Where: All: 8176
FindAll: Half: 6887
Where: Half: 6844
FindAll: None: 3253
Where: None: 4891
(Compiled with /o+ /debug- and run from the command line, .NET 3.5.)
Code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
class Test
{
static List<int> ints = Enumerable.Range(0, 10000000).ToList();
static void Main(string[] args)
{
Benchmark("All", i => i >= 0); // Match all
Benchmark("Half", i => i % 2 == 0); // Match half
Benchmark("None", i => i < 0); // Match none
}
static void Benchmark(string name, Predicate<int> predicate)
{
// We could just use new Func<int, bool>(predicate) but that
// would create one delegate wrapping another.
Func<int, bool> func = (Func<int, bool>)
Delegate.CreateDelegate(typeof(Func<int, bool>), predicate.Target,
predicate.Method);
Benchmark("FindAll: " + name, () => ints.FindAll(predicate));
Benchmark("Where: " + name, () => ints.Where(func).Count());
}
static void Benchmark(string name, Action action)
{
GC.Collect();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 50; i++)
{
action();
}
sw.Stop();
Console.WriteLine("{0}: {1}", name, sw.ElapsedMilliseconds);
}
}
How about we test instead of guess? Shame to see the wrong answer get out.
var ints = Enumerable.Range(0, 10000000).ToList();
var sw1 = Stopwatch.StartNew();
var findall = ints.FindAll(i => i % 2 == 0);
sw1.Stop();
var sw2 = Stopwatch.StartNew();
var where = ints.Where(i => i % 2 == 0).ToList();
sw2.Stop();
Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
/*
Debug
sw1: 1149856
sw2: 1652284
Release
sw1: 532194
sw2: 1016524
*/
Edit:
Even if I turn the above code from
var findall = ints.FindAll(i => i % 2 == 0);
...
var where = ints.Where(i => i % 2 == 0).ToList();
... to ...
var findall = ints.FindAll(i => i % 2 == 0).Count;
...
var where = ints.Where(i => i % 2 == 0).Count();
I get these results
/*
Debug
sw1: 1250409
sw2: 1267016
Release
sw1: 539536
sw2: 600361
*/
Edit 2.0...
If you want a list of the subset of the current list the fastest method if the FindAll(). The reason for this is simple. The FindAll instance method uses the indexer on the current List instead of the enumerator state machine. The Where() extension method is an external call to a different class that uses the enumerator. If you step from each node in the list to the next node you will have to call the MoveNext() method under the covers. As you can see from the above examples it is even faster to use the index entries to create a new list (that is pointing to the original items, so memory bloat will be minimal) to even just get a count of the filtered items.
Now if you are going to early abort from the Enumerator the Where() method could be faster. Of course if you move the early abort logic to the predicate of the FindAll() method you will again be using the indexer instead of the enumerator.
Now there are other reasons to use the Where() statement (such as the other linq methods, foreach blocks and many more) but the question was is the FindAll() faster than Where(). And unless you don't execute the Where() the answer seems to be yes. (When comparing apples to apples)
I am not say don't use LINQ or the .Where() method. They make for code that is much simpler to read. The question was about performance and not about how easy you can read and understand the code. By fast the fastest way to do this work would be to use a for block stepping each index and doing any logic as you want (even early exits). The reason LINQ is so great is becasue of the complex expression trees and transformation you can get with them. But using the iterator from the .Where() method has to go though tons of code to find it's way to a in memory statemachine that is just getting the next index out of the List. It should also be noted that this .FindAll() method is only useful on objects that implmented it (such as Array and List.)
Yet more...
for (int x = 0; x < 20; x++)
{
var ints = Enumerable.Range(0, 10000000).ToList();
var sw1 = Stopwatch.StartNew();
var findall = ints.FindAll(i => i % 2 == 0).Count;
sw1.Stop();
var sw2 = Stopwatch.StartNew();
var where = ints.AsEnumerable().Where(i => i % 2 == 0).Count();
sw2.Stop();
var sw4 = Stopwatch.StartNew();
var cntForeach = 0;
foreach (var item in ints)
if (item % 2 == 0)
cntForeach++;
sw4.Stop();
Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
Console.WriteLine("sw4: {0}", sw4.ElapsedTicks);
}
/* averaged results
sw1 575446.8
sw2 605954.05
sw3 394506.4
/*
Well, at least you can try to measure it.
The static Where method is implemented using an iterator bloc (yield keyword), which basically means that the execution will be deferred. If you only compare the calls to theses two methods, the first one will be slower, since it immediately implies that the whole collection will be iterated.
But if you include the complete iteration of the results you get, things can be a bit different. I'm pretty sure the yield solution is slower, due to the generated state machine mechanism it implies. (see #Matthew anwser)
I can give some clue, but not sure which one faster.
FindAll() is executed right away.
Where() is defferred executed.
The advantage of where is the deferred execution. See the difference if you'd have the following functionality
BigSequence.FindAll( x => DoIt(x) ).First();
BigSequence.Where( x => DoIt(x) ).First();
FindAll has covered the complete sequene, while Where in most sequences will stop enumerating as soon as one element is found.
The same effects will be one using Any(), Take(), Skip(), etc. I'm not sure, but I guess you'll have huge advantages in all functions that have deferred execution

Why are LINQ extensions written in a very difficult to read way?

I was checking some of the code that make up LINQ extensions in Reflector, and this is the kind of code I come across:
private bool MoveNext()
{
bool flag;
try
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<set>5__7b = new Set<TSource>(this.comparer);
this.<>7__wrap7d = this.source.GetEnumerator();
this.<>1__state = 1;
goto Label_0092;
case 2:
this.<>1__state = 1;
goto Label_0092;
default:
goto Label_00A5;
}
Label_0050:
this.<element>5__7c = this.<>7__wrap7d.Current;
if (this.<set>5__7b.Add(this.<element>5__7c))
{
this.<>2__current = this.<element>5__7c;
this.<>1__state = 2;
return true;
}
Label_0092:
if (this.<>7__wrap7d.MoveNext())
{
goto Label_0050;
}
this.<>m__Finally7e();
Label_00A5:
flag = false;
}
fault
{
this.System.IDisposable.Dispose();
}
return flag;
}
Was there a reason for Microsoft to write it this way?
Also what does the <> syntax mean, in lines like:
switch (this.<>1__state)
I have never seen it written before a variable, only after.
The MSIL is still valid 2.x code and the <> names you're seeing are auto generated by the C# 3.x compilers.
For example:
public void AttachEvents()
{
_ctl.Click += (sender,e) => MessageBox.Show( "Hello!" );
}
Translates to something like:
public void AttachEvents()
{
_ctl.Click += new EventHandler( <>b_1 );
}
private void <>b_1( object sender, EventArgs e )
{
MessageBox.Show( "Hello!" );
}
I should also note that the reason you're seeing it like that in Reflector is that you don't have .NET 3.5 optimization turned on. Go to View | Options and change Optimization to .NET 3.5 and it will do a better job of translating the generated identifiers back to their lamda expressions.
You're seeing the internal guts of the finite state machines that the C# compiler emits on your behalf when it handles iterators.
Jon Skeet has some great articles (Iterator block implementation details and Iterators, iterator blocks and data pipelines) on this subject. See also Chapter 6 of his book.
There was previously an SO post on this subject.
And, finally, Microsoft Research has a nice paper on the subject.
Read until your heart is content.
Identifiers starting with <> aren't valid C# identifiers, so I suspect they use them to mangle the names without fear of conflict, as no identifier in the C# code could be the same.
As to why it's hard to read, I suspect that it's more down to the fact it's easy to generate.
This is code that is automatically generated when you use iterators. The <> is used to ensure there are no collisions, and also to prevent you from accessing the compiler-generator classes directly in your code.
See the following for more information:
Using C# Yield for Readability and Performance
C# Iterators
These are types that have been auto-generated by the compiler from iterator methods.
The compiler will do exactly the same sort of thing to your own iterators. For example, write something like this and then take a look at the actual generated code in Reflector:
public IEnumerable<int> GetRandom()
{
Random rng = new Random();
while (true)
{
yield return rng.Next();
}
}
This is state machine that is automatically generated from an iterator, such as the following:
static IEnumerable<Func<KeyValuePair<int, int>>> FunnyMethod() {
for (var i = 0; i < 10; i++) {
var localVar = i;
yield return () => new KeyValuePair(localVar, i);
}
}
This method will return 10 for all of the values.
The compiler transforms these methods into state machines that store their state in the <>1__state field and call each part of the iterator for a different value of the field.
The <> part is part of the generated field name, and is chosen so as not to conflict with anything.
You must understand what Reflector does. It is not getting the source code back. That's not what a developer at MS wrote. :) It takes Intermediate Language (IL) and systematically converts it back to C# (or VB.NET). In doing so, it must come up with an approach. As you know there are many ways to skin a cat in code that will eventually lead to the same IL. Reflector has to pick a way to move back wards from IL to a higher level language and use that way every time.
(Fixed per comment, thank you.)

What is the worst gotcha in C# or .NET? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I was recently working with a DateTime object, and wrote something like this:
DateTime dt = DateTime.Now;
dt.AddDays(1);
return dt; // still today's date! WTF?
The intellisense documentation for AddDays() says it adds a day to the date, which it doesn't - it actually returns a date with a day added to it, so you have to write it like:
DateTime dt = DateTime.Now;
dt = dt.AddDays(1);
return dt; // tomorrow's date
This one has bitten me a number of times before, so I thought it would be useful to catalog the worst C# gotchas.
private int myVar;
public int MyVar
{
get { return MyVar; }
}
Blammo. Your app crashes with no stack trace. Happens all the time.
(Notice capital MyVar instead of lowercase myVar in the getter.)
Type.GetType
The one which I've seen bite lots of people is Type.GetType(string). They wonder why it works for types in their own assembly, and some types like System.String, but not System.Windows.Forms.Form. The answer is that it only looks in the current assembly and in mscorlib.
Anonymous methods
C# 2.0 introduced anonymous methods, leading to nasty situations like this:
using System;
using System.Threading;
class Test
{
static void Main()
{
for (int i=0; i < 10; i++)
{
ThreadStart ts = delegate { Console.WriteLine(i); };
new Thread(ts).Start();
}
}
}
What will that print out? Well, it entirely depends on the scheduling. It will print 10 numbers, but it probably won't print 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 which is what you might expect. The problem is that it's the i variable which has been captured, not its value at the point of the creation of the delegate. This can be solved easily with an extra local variable of the right scope:
using System;
using System.Threading;
class Test
{
static void Main()
{
for (int i=0; i < 10; i++)
{
int copy = i;
ThreadStart ts = delegate { Console.WriteLine(copy); };
new Thread(ts).Start();
}
}
}
Deferred execution of iterator blocks
This "poor man's unit test" doesn't pass - why not?
using System;
using System.Collections.Generic;
using System.Diagnostics;
class Test
{
static IEnumerable<char> CapitalLetters(string input)
{
if (input == null)
{
throw new ArgumentNullException(input);
}
foreach (char c in input)
{
yield return char.ToUpper(c);
}
}
static void Main()
{
// Test that null input is handled correctly
try
{
CapitalLetters(null);
Console.WriteLine("An exception should have been thrown!");
}
catch (ArgumentNullException)
{
// Expected
}
}
}
The answer is that the code within the source of the CapitalLetters code doesn't get executed until the iterator's MoveNext() method is first called.
I've got some other oddities on my brainteasers page.
The Heisenberg Watch Window
This can bite you badly if you're doing load-on-demand stuff, like this:
private MyClass _myObj;
public MyClass MyObj {
get {
if (_myObj == null)
_myObj = CreateMyObj(); // some other code to create my object
return _myObj;
}
}
Now let's say you have some code elsewhere using this:
// blah
// blah
MyObj.DoStuff(); // Line 3
// blah
Now you want to debug your CreateMyObj() method. So you put a breakpoint on Line 3 above, with intention to step into the code. Just for good measure, you also put a breakpoint on the line above that says _myObj = CreateMyObj();, and even a breakpoint inside CreateMyObj() itself.
The code hits your breakpoint on Line 3. You step into the code. You expect to enter the conditional code, because _myObj is obviously null, right? Uh... so... why did it skip the condition and go straight to return _myObj?! You hover your mouse over _myObj... and indeed, it does have a value! How did THAT happen?!
The answer is that your IDE caused it to get a value, because you have a "watch" window open - especially the "Autos" watch window, which displays the values of all variables/properties relevant to the current or previous line of execution. When you hit your breakpoint on Line 3, the watch window decided that you would be interested to know the value of MyObj - so behind the scenes, ignoring any of your breakpoints, it went and calculated the value of MyObj for you - including the call to CreateMyObj() that sets the value of _myObj!
That's why I call this the Heisenberg Watch Window - you cannot observe the value without affecting it... :)
GOTCHA!
Edit - I feel #ChristianHayter's comment deserves inclusion in the main answer, because it looks like an effective workaround for this issue. So anytime you have a lazy-loaded property...
Decorate your property with [DebuggerBrowsable(DebuggerBrowsableState.Never)] or [DebuggerDisplay("<loaded on demand>")]. – Christian Hayter
Re-throwing exceptions
A gotcha that gets lots of new developers, is the re-throw exception semantics.
Lots of time I see code like the following
catch(Exception e)
{
// Do stuff
throw e;
}
The problem is that it wipes the stack trace and makes diagnosing issues much harder, cause you can not track where the exception originated.
The correct code is either the throw statement with no args:
catch(Exception)
{
throw;
}
Or wrapping the exception in another one, and using inner exception to get the original stack trace:
catch(Exception e)
{
// Do stuff
throw new MySpecialException(e);
}
Here's another time one that gets me:
static void PrintHowLong(DateTime a, DateTime b)
{
TimeSpan span = a - b;
Console.WriteLine(span.Seconds); // WRONG!
Console.WriteLine(span.TotalSeconds); // RIGHT!
}
TimeSpan.Seconds is the seconds portion of the timespan (2 minutes and 0 seconds has a seconds value of 0).
TimeSpan.TotalSeconds is the entire timespan measured in seconds (2 minutes has a total seconds value of 120).
Leaking memory because you didn't un-hook events.
This even caught out some senior developers I know.
Imagine a WPF form with lots of things in it, and somewhere in there you subscribe to an event. If you don't unsubscribe then the entire form is kept around in memory after being closed and de-referenced.
I believe the issue I saw was creating a DispatchTimer in the WPF form and subscribing to the Tick event, if you don't do a -= on the timer your form leaks memory!
In this example your teardown code should have
timer.Tick -= TimerTickEventHandler;
This one is especially tricky since you created the instance of the DispatchTimer inside the WPF form, so you would think that it would be an internal reference handled by the Garbage Collection process... unfortunately the DispatchTimer uses a static internal list of subscriptions and services requests on the UI thread, so the reference is 'owned' by the static class.
Maybe not really a gotcha because the behavior is written clearly in MSDN, but has broken my neck once because I found it rather counter-intuitive:
Image image = System.Drawing.Image.FromFile("nice.pic");
This guy leaves the "nice.pic" file locked until the image is disposed. At the time I faced it I though it would be nice to load icons on the fly and didn't realize (at first) that I ended up with dozens of open and locked files! Image keeps track of where it had loaded the file from...
How to solve this? I thought a one liner would do the job. I expected an extra parameter for FromFile(), but had none, so I wrote this...
using (Stream fs = new FileStream("nice.pic", FileMode.Open, FileAccess.Read))
{
image = System.Drawing.Image.FromStream(fs);
}
If you count ASP.NET, I'd say the webforms lifecycle is a pretty big gotcha to me. I've spent countless hours debugging poorly written webforms code, just because a lot of developers just don't really understand when to use which event handler (me included, sadly).
overloaded == operators and untyped containers (arraylists, datasets, etc.):
string my = "my ";
Debug.Assert(my+"string" == "my string"); //true
var a = new ArrayList();
a.Add(my+"string");
a.Add("my string");
// uses ==(object) instead of ==(string)
Debug.Assert(a[1] == "my string"); // true, due to interning magic
Debug.Assert(a[0] == "my string"); // false
Solutions?
always use string.Equals(a, b) when you are comparing string types
using generics like List<string> to ensure that both operands are strings.
[Serializable]
class Hello
{
readonly object accountsLock = new object();
}
//Do stuff to deserialize Hello with BinaryFormatter
//and now... accountsLock == null ;)
Moral of the story : Field initialisers are not run when deserializing an object
DateTime.ToString("dd/MM/yyyy"); This will actually not always give you dd/MM/yyyy but instead it will take into account the regional settings and replace your date separator depending on where you are. So you might get dd-MM-yyyy or something alike.
The right way to do this is to use DateTime.ToString("dd'/'MM'/'yyyy");
DateTime.ToString("r") is supposed to convert to RFC1123, which uses GMT. GMT is within a fraction of a second from UTC, and yet the "r" format specifier does not convert to UTC, even if the DateTime in question is specified as Local.
This results in the following gotcha (varies depending on how far your local time is from UTC):
DateTime.Parse("Tue, 06 Sep 2011 16:35:12 GMT").ToString("r")
> "Tue, 06 Sep 2011 17:35:12 GMT"
Whoops!
I saw this one posted the other day, and I think it is pretty obscure, and painful for those that don't know
int x = 0;
x = x++;
return x;
As that will return 0 and not 1 as most would expect
I'm a bit late to this party, but I have two gotchas that have both bitten me recently:
DateTime resolution
The Ticks property measures time in 10-millionths of a second (100 nanosecond blocks), however the resolution is not 100 nanoseconds, it's about 15ms.
This code:
long now = DateTime.Now.Ticks;
for (int i = 0; i < 10; i++)
{
System.Threading.Thread.Sleep(1);
Console.WriteLine(DateTime.Now.Ticks - now);
}
will give you an output of (for example):
0
0
0
0
0
0
0
156254
156254
156254
Similarly, if you look at DateTime.Now.Millisecond, you'll get values in rounded chunks of 15.625ms: 15, 31, 46, etc.
This particular behaviour varies from system to system, but there are other resolution-related gotchas in this date/time API.
Path.Combine
A great way to combine file paths, but it doesn't always behave the way you'd expect.
If the second parameter starts with a \ character, it won't give you a complete path:
This code:
string prefix1 = "C:\\MyFolder\\MySubFolder";
string prefix2 = "C:\\MyFolder\\MySubFolder\\";
string suffix1 = "log\\";
string suffix2 = "\\log\\";
Console.WriteLine(Path.Combine(prefix1, suffix1));
Console.WriteLine(Path.Combine(prefix1, suffix2));
Console.WriteLine(Path.Combine(prefix2, suffix1));
Console.WriteLine(Path.Combine(prefix2, suffix2));
Gives you this output:
C:\MyFolder\MySubFolder\log\
\log\
C:\MyFolder\MySubFolder\log\
\log\
When you start a process (using System.Diagnostics) that writes to the console, but you never read the Console.Out stream, after a certain amount of output your app will appear to hang.
No operator shortcuts in Linq-To-Sql
See here.
In short, inside the conditional clause of a Linq-To-Sql query, you cannot use conditional shortcuts like || and && to avoid null reference exceptions; Linq-To-Sql evaluates both sides of the OR or AND operator even if the first condition obviates the need to evaluate the second condition!
Using default parameters with virtual methods
abstract class Base
{
public virtual void foo(string s = "base") { Console.WriteLine("base " + s); }
}
class Derived : Base
{
public override void foo(string s = "derived") { Console.WriteLine("derived " + s); }
}
...
Base b = new Derived();
b.foo();
Output:
derived base
Value objects in mutable collections
struct Point { ... }
List<Point> mypoints = ...;
mypoints[i].x = 10;
has no effect.
mypoints[i] returns a copy of a Point value object. C# happily lets you modify a field of the copy. Silently doing nothing.
Update:
This appears to be fixed in C# 3.0:
Cannot modify the return value of 'System.Collections.Generic.List<Foo>.this[int]' because it is not a variable
Perhaps not the worst, but some parts of the .net framework use degrees while others use radians (and the documentation that appears with Intellisense never tells you which, you have to visit MSDN to find out)
All of this could have been avoided by having an Angle class instead...
For C/C++ programmers, the transition to C# is a natural one. However, the biggest gotcha I've run into personally (and have seen with others making the same transition) is not fully understanding the difference between classes and structs in C#.
In C++, classes and structs are identical; they only differ in the default visibility, where classes default to private visibility and structs default to public visibility. In C++, this class definition
class A
{
public:
int i;
};
is functionally equivalent to this struct definition.
struct A
{
int i;
};
In C#, however, classes are reference types while structs are value types. This makes a BIG difference in (1) deciding when to use one over the other, (2) testing object equality, (3) performance (e.g., boxing/unboxing), etc.
There is all kinds of information on the web related to the differences between the two (e.g., here). I would highly encourage anyone making the transition to C# to at least have a working knowledge of the differences and their implications.
Garbage collection and Dispose(). Although you don't have to do anything to free up memory, you still have to free up resources via Dispose(). This is an immensely easy thing to forget when you are using WinForms, or tracking objects in any way.
Arrays implement IList
But don't implement it. When you call Add, it tells you that it doesn't work. So why does a class implement an interface when it can't support it?
Compiles, but doesn't work:
IList<int> myList = new int[] { 1, 2, 4 };
myList.Add(5);
We have this issue a lot, because the serializer (WCF) turns all the ILists into arrays and we get runtime errors.
foreach loops variables scope!
var l = new List<Func<string>>();
var strings = new[] { "Lorem" , "ipsum", "dolor", "sit", "amet" };
foreach (var s in strings)
{
l.Add(() => s);
}
foreach (var a in l)
Console.WriteLine(a());
prints five "amet", while the following example works fine
var l = new List<Func<string>>();
var strings = new[] { "Lorem" , "ipsum", "dolor", "sit", "amet" };
foreach (var s in strings)
{
var t = s;
l.Add(() => t);
}
foreach (var a in l)
Console.WriteLine(a());
MS SQL Server can't handle dates before 1753. Significantly, that is out of synch with the .NET DateTime.MinDate constant, which is 1/1/1. So if you try to save a mindate, a malformed date (as recently happened to me in a data import) or simply the birth date of William the Conqueror, you're gonna be in trouble. There is no built-in workaround for this; if you're likely to need to work with dates before 1753, you need to write your own workaround.
The contract on Stream.Read is something that I've seen trip up a lot of people:
// Read 8 bytes and turn them into a ulong
byte[] data = new byte[8];
stream.Read(data, 0, 8); // <-- WRONG!
ulong data = BitConverter.ToUInt64(data);
The reason this is wrong is that Stream.Read will read at most the specified number of bytes, but is entirely free to read just 1 byte, even if another 7 bytes are available before end of stream.
It doesn't help that this looks so similar to Stream.Write, which is guaranteed to have written all the bytes if it returns with no exception. It also doesn't help that the above code works almost all the time. And of course it doesn't help that there is no ready-made, convenient method for reading exactly N bytes correctly.
So, to plug the hole, and increase awareness of this, here is an example of a correct way to do this:
/// <summary>
/// Attempts to fill the buffer with the specified number of bytes from the
/// stream. If there are fewer bytes left in the stream than requested then
/// all available bytes will be read into the buffer.
/// </summary>
/// <param name="stream">Stream to read from.</param>
/// <param name="buffer">Buffer to write the bytes to.</param>
/// <param name="offset">Offset at which to write the first byte read from
/// the stream.</param>
/// <param name="length">Number of bytes to read from the stream.</param>
/// <returns>Number of bytes read from the stream into buffer. This may be
/// less than requested, but only if the stream ended before the
/// required number of bytes were read.</returns>
public static int FillBuffer(this Stream stream,
byte[] buffer, int offset, int length)
{
int totalRead = 0;
while (length > 0)
{
var read = stream.Read(buffer, offset, length);
if (read == 0)
return totalRead;
offset += read;
length -= read;
totalRead += read;
}
return totalRead;
}
/// <summary>
/// Attempts to read the specified number of bytes from the stream. If
/// there are fewer bytes left before the end of the stream, a shorter
/// (possibly empty) array is returned.
/// </summary>
/// <param name="stream">Stream to read from.</param>
/// <param name="length">Number of bytes to read from the stream.</param>
public static byte[] Read(this Stream stream, int length)
{
byte[] buf = new byte[length];
int read = stream.FillBuffer(buf, 0, length);
if (read < length)
Array.Resize(ref buf, read);
return buf;
}
The Nasty Linq Caching Gotcha
See my question that led to this discovery, and the blogger who discovered the problem.
In short, the DataContext keeps a cache of all Linq-to-Sql objects that you have ever loaded. If anyone else makes any changes to a record that you have previously loaded, you will not be able to get the latest data, even if you explicitly reload the record!
This is because of a property called ObjectTrackingEnabled on the DataContext, which by default is true. If you set that property to false, the record will be loaded anew every time... BUT... you can't persist any changes to that record with SubmitChanges().
GOTCHA!
Events
I never understood why events are a language feature. They are complicated to use: you need to check for null before calling, you need to unregister (yourself), you can't find out who is registered (eg: did I register?). Why isn't an event just a class in the library? Basically a specialized List<delegate>?
Today I fixed a bug that eluded for long time. The bug was in a generic class that was used in multi threaded scenario and a static int field was used to provide lock free synchronisation using Interlocked. The bug was caused because each instantiation of the generic class for a type has its own static. So each thread got its own static field and it wasn't used a lock as intended.
class SomeGeneric<T>
{
public static int i = 0;
}
class Test
{
public static void main(string[] args)
{
SomeGeneric<int>.i = 5;
SomeGeneric<string>.i = 10;
Console.WriteLine(SomeGeneric<int>.i);
Console.WriteLine(SomeGeneric<string>.i);
Console.WriteLine(SomeGeneric<int>.i);
}
}
This prints
5
10
5
Just found a weird one that had me stuck in debug for a while:
You can increment null for a nullable int without throwing an excecption and the value stays null.
int? i = null;
i++; // I would have expected an exception but runs fine and stays as null
Enumerables can be evaluated more than once
It'll bite you when you have a lazily-enumerated enumerable and you iterate over it twice and get different results. (or you get the same results but it executes twice unnecessarily)
For example, while writing a certain test, I needed a few temp files to test the logic:
var files = Enumerable.Range(0, 5)
.Select(i => Path.GetTempFileName());
foreach (var file in files)
File.WriteAllText(file, "HELLO WORLD!");
/* ... many lines of codes later ... */
foreach (var file in files)
File.Delete(file);
Imagine my surprise when File.Delete(file) throws FileNotFound!!
What's happening here is that the files enumerable got iterated twice (the results from the first iteration are simply not remembered) and on each new iteration you'd be re-calling Path.GetTempFilename() so you'll get a different set of temp filenames.
The solution is, of course, to eager-enumerate the value by using ToArray() or ToList():
var files = Enumerable.Range(0, 5)
.Select(i => Path.GetTempFileName())
.ToArray();
This is even scarier when you're doing something multi-threaded, like:
foreach (var file in files)
content = content + File.ReadAllText(file);
and you find out content.Length is still 0 after all the writes!! You then begin to rigorously checks that you don't have a race condition when.... after one wasted hour... you figured out it's just that tiny little Enumerable gotcha thing you forgot....
TextInfo textInfo = Thread.CurrentThread.CurrentCulture.TextInfo;
textInfo.ToTitleCase("hello world!"); //Returns "Hello World!"
textInfo.ToTitleCase("hElLo WoRld!"); //Returns "Hello World!"
textInfo.ToTitleCase("Hello World!"); //Returns "Hello World!"
textInfo.ToTitleCase("HELLO WORLD!"); //Returns "HELLO WORLD!"
Yes, this behavior is documented, but that certainly doesn't make it right.

Categories