Should variable declarations always be placed outside of a loop? - c#

Is it better to declare a variable used in a loop outside of the loop rather then inside? Sometimes I see examples where a variable is declared inside the loop. Does this effectively cause the program to allocate memory for a new variable each time the loop runs? Or is .NET smart enough to know that it's really the same variable.
For example see the code below from this answer.
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
}
Would this modified version be any more efficent?
public static void CopyStream(Stream input, Stream output)
{
int read; //OUTSIDE LOOP
byte[] buffer = new byte[32768];
while (true)
{
read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
}

No, it wouldn't be more efficient. However, I'd rewrite it this way which happens to declare it outside the loop anyway:
byte[] buffer = new byte[32768];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, read);
}
I'm not generally a fan of using side-effects in conditions, but effectively the Read method is giving you two bits of data: whether or not you've reached the end of the stream, and how much you've read. The while loop is now saying, "While we've managed to read some data... copy it."
It's a little bit like using int.TryParse:
if (int.TryParse(text, out value))
{
// Use value
}
Again you're using a side-effect of calling the method in the condition. As I say, I don't make a habit out of doing this except for this particular pattern, when you're dealing with a method returning two bits of data.
The same thing comes up reading lines from a TextReader:
string line;
while ((line = reader.ReadLine()) != null)
{
...
}
To go back to your original question: if a variable is going to be initialized in every iteration of a loop and it's only used within the body of the loop, I'd almost always declare it within the loop. One minor exception here is if the variable is being captured by an anonymous function - at that point it will make a difference in behaviour, and I'd pick whichever form gave me the desired behaviour... but that's almost always the "declare inside" form anyway.
EDIT: When it comes to scoping, the code above does indeed leave the variable in a larger scope than it needs to be... but I believe it makes the loop clearer. You can always address this by introducing a new scope if you care to:
{
int read;
while (...)
{
}
}

In the unlikely environment that doesn't help you with this, it would still be a micro-optimization. Factors like clarity and proper scoping is much more important than the edge case where this might just make next to no difference.
You should give your variables proper scope without thinking about performance. Of course, complex initializations are a different beast, so if something should only be initialized once but is only used within a loop, you'd still want to declare it outside.

I am going to agree with most of these other answers with a caveat.
If you are using lambada expressions you must be careful with capturing variables.
static void Main(string[] args)
{
var a = Enumerable.Range(1, 3);
var b = a.GetEnumerator();
int x;
while(b.MoveNext())
{
x = b.Current;
Task.Factory.StartNew(() => Console.WriteLine(x));
}
Console.ReadLine();
}
will give the result
3
3
3
Where
static void Main(string[] args)
{
var a = Enumerable.Range(1, 3);
var b = a.GetEnumerator();
while(b.MoveNext())
{
int x = b.Current;
Task.Factory.StartNew(() => Console.WriteLine(x));
}
Console.ReadLine();
}
will give the result
1
2
3
or some order there of. This is because when the task finally starts it will check the current value of it's reference to x. in the first example all 3 loops pointed at the same reference, where in the second example they all pointed at different references.

As is the case with lots of simple optimizations like this, the compiler takes care of it for you. If you try both of these and look at the assemblies' IL in ildasm you can see that they both declare a single int32 read variable, although it does reorder the declarations:
.locals init ([0] int32 read,
[1] uint8[] buffer,
[2] bool CS$4$0000)
.locals init ([0] uint8[] buffer,
[1] int32 read,
[2] bool CS$4$0000)

It really doesn't matter, and if I was reviewing the code for that particular example, I wouldn't care either way.
However, be aware that the two can mean very different things if you end up capturing the 'read' variable in a closure.
See this excellent post from Eric Lippert where this issue comes up regarding foreach loops - Link

I've generally preferred the latter as a matter of personal habit because, even if .NET is smart enough, other environments in which I might work later may not be smart enough. It could be nothing more than compiling down to an extra line of code inside the loop to re-initialize the variable, but it's still overhead.
Even if they're identical for all measurable purposes in any given example, I would say the latter has less of a chance of causing problems in the long run.

Related

Can I have multiple arrays on the same memory in C#? [duplicate]

I know in C# we can always get the sub-array of a given array by using Array.Copy() method. However, this will consume more memory and processing time which is unnecessary in read-only situation. For example, I'm writing a heavy load network program which exchanges messages with other nodes in the cluster very frequently. The first 20 bytes of every message is the message header while the rest bytes make up the message body. Therefore, I will divide the received raw message into header byte array and body byte array in order to process them separately. However, this will obviously consume double memory and extra time. In C, we can easily use a pointer and assign offset to it to access different parts of the array.
For instance, in C language, if we have a char a[] = "ABCDEFGHIJKLMN", we can declare a char* ptr = a + 3 to represent the array DEFGHIJKLMN.
Is there a way to accomplish this in C#?
You might be interested in ArraySegments or unsafe.
ArraySegments delimits a section of a one-dimensional array.
Check ArraySegments in action
ArraySegments usage example:
int[] array = { 10, 20, 30 };
ArraySegment<int> segment = new ArraySegment<int>(array, 1, 2);
// The segment contains offset = 1, count = 2 and range = { 20, 30 }
Unsafe define an unsafe context in which pointers can be used.
Unsafe usage example:
int[] a = { 4, 5, 6, 7, 8 };
unsafe
{
fixed (int* c = a)
{
// use the pointer
}
}
First of all you must consider this as a premature optimization.
But you may use several ways to reduce memory consumption, if you sure you really need it:
1) You may use Flyweight pattern https://en.wikipedia.org/wiki/Flyweight_pattern to pool duplicated resources.
2) You may try to use unsafe directive and manual pointer management.
3) You may just switch to C for this functionality and just call native code from your C# program.
From my experience memory consumption for short-lived objects is not a big problem and I'd just write code with flyweight pattern and profile application afterwards.
Assuming you have a Message wrapper class in C#? Why not just add a property on it called header that returns the top 20 bytes.
You can easily accomplish this using skip and take suggested by Jonathon Reinhart above if you have the entire initial array in a memory array, but it sounds like you may have it in a network stream, which means the property might be a little more involved by doing a read of the initial 20 bytes from the the stream.
Something along the lines of:
class Message
{
private readonly Stream _stream;
private byte[] _inMemoryBytes;
public Message(Stream stream)
{
_stream = stream;
}
public IEnumerable<byte> Header
{
get
{
if (_inMemoryBytes.Length >= 20)
return _inMemoryBytes.Take(20);
_stream.Read(_inMemoryBytes, 0, 20);
return _inMemoryBytes.Take(20);
}
}
public IEnumerable<byte> FullMessage
{
get
{
// Read and return the whole message. You might want amend to data already read.
}
}
}

Safely access data in MemoryStream

Assume that I have a MemoryStream and function that operates on bytes.
Current code is something like this:
void caller()
{
MemoryStream ms = // not important
func(ms.GetBuffer(), 0, (int)ms.Length);
}
void func(byte[] buffer, int offset, int length)
{
// not important
}
I can not change func but I would like to minimize possibility of changing stream data from within the func.
How could / should I rewrite the code to be sure that stream data won't be changed?
Or this can't be done?
EDIT:
I am sorry, I didn't mention that a I would like to not make copies of data.
Call .ToArray.
func(ms.GetBuffer().ToArray(), 0, (int)ms.Length);
From MSDN (emphasis mine):
Note that the buffer contains allocated bytes which might be unused.
For example, if the string "test" is written into the MemoryStream
object, the length of the buffer returned from GetBuffer is 256, not
4, with 252 bytes unused. To obtain only the data in the buffer, use
the ToArray method; however, ToArray creates a copy of the data in
memory.
Ideally you would change func to take an IEnumerable<byte>. Once a method has the array, you're trusting they won't modify the data if you don't want them to. If the contract was to provide IEnumerable<byte>, the implementer would have to decide if they need a copy to edit or not.
If you can't make a copy (ToArray as suggested in other answers) and can't change signature of the func function the only thing left is try to validate that function did not change the data.
You may compute some sort of hash before/after call and check if it is the same. It will not guarantee that func did not changed the underlying data (due to hash collisions), but at least will give you good chance to know if it happened. May be useful for non-production code...
The real solution is to either provide copy of the data to untrusted code OR pass some wrapper interface/object that does not allow data changes (requires signature changes/rewrite for func).
Copy the data out of the stream by using ms.ToArray(). Obviously, there'll be a performance hit.
You cannot pass only a 'slice' of an array to a method. Either you pass a copy of the array to the method and copy the result back:
byte[] slice = new byte[length];
Buffer.BlockCopy(bytes, offset, slice, 0, length);
func(slice, 0, length);
Buffer.BlockCopy(slice, 0, bytes, offset, length);
or, if you can change the method, you pass some kind of proxy object that wraps the array and checks for each access if it's within the allowed range:
class ArrayView<T>
{
private T[] array;
private int offset;
private int length;
public T this[int index]
{
get
{
if (index < offset || index >= offset + length)
throw new ArgumentOutOfRange("index");
return array[index];
}
set
{
if (index < offset || index >= offset + length)
throw new ArgumentOutOfRange("index");
array[index] = value;
}
}
}
Are you trying to make sure that func() is never actually able to change the memory stream, or is it enough if your code can throw an exception if something is changed? Sounds like you want to do something like:
void caller()
{
MemoryStream ms = // not important
var checksum = CalculateMyChecksum(ms);
func(ms.GetBuffer(), 0, (int)ms.Length);
if(checksum != CalculateMyChecksum(ms)){
throw new Exception("Hey! Someone has been fiddling with my memory!");
}
}
I would not feel comfortable recommending this for anything important / critical though. Could you give some more information? Maybe there is a better solution to your problem, and a way to avoid this issue completely.

Encryption Project: Need advice on how to eliminate method overhead

I am looking for advice. I have developed my own encryption algorithms because I enjoy it and I can. Now, I am looking to try a new idea.
My idea involves consolidating a number my algorithms into a larger one. For instance, you call X.Encrypt() then it uses A.Encrypt(), B.Encrypt(), C.Encrypt() etc. When you perform this kind of operation one byte per A, B, C method call the method overhead becomes killer. Going from a few ms to several minutes. So, any questions?
I am merely looking for code design tips and tricks to maybe lessen the issue.
Thanks ahead of time.
Update
Code example of the issue:
//fast
moduleA.Transform(true, buffer, 0, buffer.Length);
moduleB.Transform(true, buffer, 0, buffer.Length);
//slow
for (int L = 0; L < buffer.Length; )
{
moduleA.Transform(true, buffer, L++, 1);
moduleB.Transform(true, buffer, L++, 1);
}
I know this problem is inherent to how it is being called. My goal is to change how I am doing it. I know inside the Transform methods there can be improvement. The fast operates in about 24s while the slow takes many minutes. Clearly, overhead from the methods, no profiler needed :)
I do have an idea I am going to try. I am thinking about using "run-modes" where I instead of looping outside of the Transform methods I change how it runs inside each method to fit my needs. So, I could do an every-other-byte encryption performed inside the Transform methods and as a batch. I believe this would eliminate the overhead I am getting.
FINAL UPDATE (Solved my own issue, still open to ideas!)
Incrementing the loop rate inside the Transform method has worked!
What I've done is the following and it seems to work well:
ITransformationModule moduleA = TransformationFactory.GetModuleInstance("Subspace28");
ITransformationModule moduleB = TransformationFactory.GetModuleInstance("Ataxia");
moduleA.IncrementInterval = 2;
moduleB.IncrementInterval = 2;
moduleA.Transform(true, buffer, 0, buffer.Length);
moduleB.Transform(true, buffer, 1, buffer.Length);
This runs at about 12s for 100MB on my work VM. Thank you all who contributed! It was a combination of response that helped lead me to try it this way. I appreciate you all greatly!
This is just proof of concept at the moment. It is building towards greater things! :)
Are you encrypting the data by calling methods on a byte-by-byte basis? Why not call the method on a chunk of data and loop within that method? Also, while it is definitely fun to try out your own encryption methods, you should pretty much always use a known, tested, and secure algorithm if security is at all a concern.
You could try to implement your algorithm such that your code makes chunky calls then chatty calls. That is instead of calling functions hundred of time, you could have less function calls such that each function has more work to do. This is one advice, you might have to make your algorithm efficient as well such that its not processor intensive. Hope this help.
You want to have class X call methods from class A, B, C, D, E, F, G, etc...without the method call overhead. At first, that seems absurd. You might be able to find a way to do it using System.Reflection.Emit. That is, dynamically create a method that does A+B+C+D+E+F+G, then call that.
Firstly profile your code so you know where you should operate first, then ask again :)
Would something like this work? Of course you would have to modify it to fit your encryption arguments and return types....
static class Encryptor
{
delegate void Transform(bool b, byte[] buffer, int index, int length);
static Transform[] transformers = new Transform[3];
static Encryptor()
{
transformers[0] = (b, buffer, index, length) => { /*Method A*/ };
transformers[1] = (b, buffer, index, length) => { /*Method B*/ };
transformers[2] = (b, buffer, index, length) => { /*Method C*/ };
}
public static void Encrypt(bool b, byte[] buffer)
{
int length = buffer.Length;
int nTransforms = transformers.Length;
for (int i = 0; i < length;)
{
for (int j = 0; j < nTransforms; j++)
{
transformers[i % nTransforms](b, buffer, i++, 1);
}
}
}
}
Edit So this would do the second example
Encryptor.Encrypt(yourBoolean, yourBuffer);
I don't know the specifics of your implementation, but this shouldn't have overhead issues.

Working with byte arrays in C#

I have a byte array that represents a complete TCP/IP packet. For clarification, the byte array is ordered like this:
(IP Header - 20 bytes)(TCP Header - 20 bytes)(Payload - X bytes)
I have a Parse function that accepts a byte array and returns a TCPHeader object. It looks like this:
TCPHeader Parse( byte[] buffer );
Given the original byte array, here is the way I'm calling this function right now.
byte[] tcpbuffer = new byte[ 20 ];
System.Buffer.BlockCopy( packet, 20, tcpbuffer, 0, 20 );
TCPHeader tcp = Parse( tcpbuffer );
Is there a convenient way to pass the TCP byte array, i.e., bytes 20-39 of the complete TCP/IP packet, to the Parse function without extracting it to a new byte array first?
In C++, I could do the following:
TCPHeader tcp = Parse( &packet[ 20 ] );
Is there anything similar in C#? I want to avoid the creation and subsequent garbage collection of the temporary byte array if possible.
A common practice you can see in the .NET framework, and that I recommend using here, is specifying the offset and length. So make your Parse function also accept the offset in the passed array, and the number of elements to use.
Of course, the same rules apply as if you were to pass a pointer like in C++ - the array shouldn't be modified or else it may result in undefined behavior if you are not sure when exactly the data will be used. But this is no problem if you are no longer going to be modifying the array.
I would pass an ArraySegment<byte> in this case.
You would change your Parse method to this:
// Changed TCPHeader to TcpHeader to adhere to public naming conventions.
TcpHeader Parse(ArraySegment<byte> buffer)
And then you would change the call to this:
// Create the array segment.
ArraySegment<byte> seg = new ArraySegment<byte>(packet, 20, 20);
// Call parse.
TcpHeader header = Parse(seg);
Using the ArraySegment<T> will not copy the array, and it will do the bounds checking for you in the constructor (so that you don't specify incorrect bounds). Then you change your Parse method to work with the bounds specified in the segment, and you should be ok.
You can even create a convenience overload that will accept the full byte array:
// Accepts full array.
TcpHeader Parse(byte[] buffer)
{
// Call the overload.
return Parse(new ArraySegment<byte>(buffer));
}
// Changed TCPHeader to TcpHeader to adhere to public naming conventions.
TcpHeader Parse(ArraySegment<byte> buffer)
If an IEnumerable<byte> is acceptable as an input rather than byte[], and you're using C# 3.0, then you could write:
tcpbuffer.Skip(20).Take(20);
Note that this still allocates enumerator instances under the covers, so you don't escape allocation altogether, and so for a small number of bytes it may actually be slower than allocating a new array and copying the bytes into it.
I wouldn't worry too much about allocation and GC of small temporary arrays to be honest though. The .NET garbage collected environment is extremely efficient at this type of allocation pattern, particularly if the arrays are short lived, so unless you've profiled it and found GC to be a problem then I'd write it in the most intuitive way and fix up performance issues when you know you have them.
If you really need these kind of control, you gotta look at unsafe feature of C#. It allows you to have a pointer and pin it so that GC doesn't move it:
fixed(byte* b = &bytes[20]) {
}
However this practice is not suggested for working with managed only code if there are no performance issues. You could pass the offset and length as in Stream class.
If you can change the parse() method, change it to accept the offset where the processing should begin.
TCPHeader Parse( byte[] buffer , int offset);
You could use LINQ to do something like:
tcpbuffer.Skip(20).Take(20);
But System.Buffer.BlockCopy / System.Array.Copy are probably more efficient.
This is how I solved it coming from being a c programmer to a c# programmer. I like to use MemoryStream to convert it to a stream and then BinaryReader to break apart the binary block of data. Had to add the two helper functions to convert from network order to little endian. Also for building a byte[] to send see
Is there a way cast an object back to it original type without specifing every case? which has a function that allow for converting from an array of objects to a byte[].
Hashtable parse(byte[] buf, int offset )
{
Hashtable tcpheader = new Hashtable();
if(buf.Length < (20+offset)) return tcpheader;
System.IO.MemoryStream stm = new System.IO.MemoryStream( buf, offset, buf.Length-offset );
System.IO.BinaryReader rdr = new System.IO.BinaryReader( stm );
tcpheader["SourcePort"] = ReadUInt16BigEndian(rdr);
tcpheader["DestPort"] = ReadUInt16BigEndian(rdr);
tcpheader["SeqNum"] = ReadUInt32BigEndian(rdr);
tcpheader["AckNum"] = ReadUInt32BigEndian(rdr);
tcpheader["Offset"] = rdr.ReadByte() >> 4;
tcpheader["Flags"] = rdr.ReadByte() & 0x3f;
tcpheader["Window"] = ReadUInt16BigEndian(rdr);
tcpheader["Checksum"] = ReadUInt16BigEndian(rdr);
tcpheader["UrgentPointer"] = ReadUInt16BigEndian(rdr);
// ignoring tcp options in header might be dangerous
return tcpheader;
}
UInt16 ReadUInt16BigEndian(BinaryReader rdr)
{
UInt16 res = (UInt16)(rdr.ReadByte());
res <<= 8;
res |= rdr.ReadByte();
return(res);
}
UInt32 ReadUInt32BigEndian(BinaryReader rdr)
{
UInt32 res = (UInt32)(rdr.ReadByte());
res <<= 8;
res |= rdr.ReadByte();
res <<= 8;
res |= rdr.ReadByte();
res <<= 8;
res |= rdr.ReadByte();
return(res);
}
I don't think you can do something like that in C#. You could either make the Parse() function use an offset, or create 3 byte arrays to begin with; one for the IP Header, one for the TCP Header and one for the Payload.
There is no way using verifiable code to do this. If your Parse method can deal with having an IEnumerable<byte> then you can use a LINQ expression
TCPHeader tcp = Parse(packet.Skip(20));
Some people who answered
tcpbuffer.Skip(20).Take(20);
did it wrong. This is excellent solution, but the code should look like:
packet.Skip(20).Take(20);
You should use Skip and Take methods on your main packet, and tcpbuffer should not be exist in the code you posted. Also you don't have to use then System.Buffer.BlockCopy.
JaredPar was almost correct, but he forgot the Take method
TCPHeader tcp = Parse(packet.Skip(20));
But he didn't get wrong with tcpbuffer.
Your last line of your posted code should look like:
TCPHeader tcp = Parse(packet.Skip(20).Take(20));
But if you want to use System.Buffer.BlockCopy anyway instead Skip and Take, because maybe it is better in performance as Steven Robbins answered : "But System.Buffer.BlockCopy / System.Array.Copy are probably more efficient", or your Parse function cannot deal with IEnumerable<byte>, or you are more used to System.Buffer.Block in your posted question, then I would recommend to simply just make tcpbuffer not local variable, but private or protected or public or internal and static or not field (in other words it should be defined and created outside method where your posted code is executed). Thus tcpbuffer will be created only once, and his values (bytes) will be set every time you pass the code you posted at System.Buffer.BlockCopy line.
This way your code can look like:
class Program
{
//Your defined fields, properties, methods, constructors, delegates, events and etc.
private byte[] tcpbuffer = new byte[20];
Your unposted method title(arguments/parameters...)
{
//Your unposted code before your posted code
//byte[] tcpbuffer = new byte[ 20 ]; No need anymore! this line can be removed.
System.Buffer.BlockCopy( packet, 20, this.tcpbuffer, 0, 20 );
TCPHeader tcp = Parse( this.tcpbuffer );
//Your unposted code after your posted code
}
//Your defined fields, properties, methods, constructors, delegates, events and etc.
}
or simply only the necessary part:
private byte[] tcpbuffer = new byte[20];
...
{
...
//byte[] tcpbuffer = new byte[ 20 ]; No need anymore! This line can be removed.
System.Buffer.BlockCopy( packet, 20, this.tcpbuffer, 0, 20 );
TCPHeader tcp = Parse( this.tcpbuffer );
...
}
If you did:
private byte[] tcpbuffer;
instead, then you must on your constructor/s add the line:
this.tcpbuffer = new byte[20];
or
tcpbuffer = new byte[20];
You know that you don't have to type this. before tcpbuffer, it is optional, but if you defined it static, then you cannot do that. Instead you'll have to type the class name and then the dot '.', or leave it (just type the name of the field and that's it all).
Why not flip the problem and create classes that overlay the buffer to pull bits out?
// member variables
IPHeader ipHeader = new IPHeader();
TCPHeader tcpHeader = new TCPHeader();
// passing in the buffer, an offset and a length allows you
// to move the header over the buffer
ipHeader.SetBuffer( buffer, 0, 20 );
if( ipHeader.Protocol == TCP )
{
tcpHeader.SetBuffer( buffer, ipHeader.ProtocolOffset, 20 );
}

What is the worst gotcha in C# or .NET? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I was recently working with a DateTime object, and wrote something like this:
DateTime dt = DateTime.Now;
dt.AddDays(1);
return dt; // still today's date! WTF?
The intellisense documentation for AddDays() says it adds a day to the date, which it doesn't - it actually returns a date with a day added to it, so you have to write it like:
DateTime dt = DateTime.Now;
dt = dt.AddDays(1);
return dt; // tomorrow's date
This one has bitten me a number of times before, so I thought it would be useful to catalog the worst C# gotchas.
private int myVar;
public int MyVar
{
get { return MyVar; }
}
Blammo. Your app crashes with no stack trace. Happens all the time.
(Notice capital MyVar instead of lowercase myVar in the getter.)
Type.GetType
The one which I've seen bite lots of people is Type.GetType(string). They wonder why it works for types in their own assembly, and some types like System.String, but not System.Windows.Forms.Form. The answer is that it only looks in the current assembly and in mscorlib.
Anonymous methods
C# 2.0 introduced anonymous methods, leading to nasty situations like this:
using System;
using System.Threading;
class Test
{
static void Main()
{
for (int i=0; i < 10; i++)
{
ThreadStart ts = delegate { Console.WriteLine(i); };
new Thread(ts).Start();
}
}
}
What will that print out? Well, it entirely depends on the scheduling. It will print 10 numbers, but it probably won't print 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 which is what you might expect. The problem is that it's the i variable which has been captured, not its value at the point of the creation of the delegate. This can be solved easily with an extra local variable of the right scope:
using System;
using System.Threading;
class Test
{
static void Main()
{
for (int i=0; i < 10; i++)
{
int copy = i;
ThreadStart ts = delegate { Console.WriteLine(copy); };
new Thread(ts).Start();
}
}
}
Deferred execution of iterator blocks
This "poor man's unit test" doesn't pass - why not?
using System;
using System.Collections.Generic;
using System.Diagnostics;
class Test
{
static IEnumerable<char> CapitalLetters(string input)
{
if (input == null)
{
throw new ArgumentNullException(input);
}
foreach (char c in input)
{
yield return char.ToUpper(c);
}
}
static void Main()
{
// Test that null input is handled correctly
try
{
CapitalLetters(null);
Console.WriteLine("An exception should have been thrown!");
}
catch (ArgumentNullException)
{
// Expected
}
}
}
The answer is that the code within the source of the CapitalLetters code doesn't get executed until the iterator's MoveNext() method is first called.
I've got some other oddities on my brainteasers page.
The Heisenberg Watch Window
This can bite you badly if you're doing load-on-demand stuff, like this:
private MyClass _myObj;
public MyClass MyObj {
get {
if (_myObj == null)
_myObj = CreateMyObj(); // some other code to create my object
return _myObj;
}
}
Now let's say you have some code elsewhere using this:
// blah
// blah
MyObj.DoStuff(); // Line 3
// blah
Now you want to debug your CreateMyObj() method. So you put a breakpoint on Line 3 above, with intention to step into the code. Just for good measure, you also put a breakpoint on the line above that says _myObj = CreateMyObj();, and even a breakpoint inside CreateMyObj() itself.
The code hits your breakpoint on Line 3. You step into the code. You expect to enter the conditional code, because _myObj is obviously null, right? Uh... so... why did it skip the condition and go straight to return _myObj?! You hover your mouse over _myObj... and indeed, it does have a value! How did THAT happen?!
The answer is that your IDE caused it to get a value, because you have a "watch" window open - especially the "Autos" watch window, which displays the values of all variables/properties relevant to the current or previous line of execution. When you hit your breakpoint on Line 3, the watch window decided that you would be interested to know the value of MyObj - so behind the scenes, ignoring any of your breakpoints, it went and calculated the value of MyObj for you - including the call to CreateMyObj() that sets the value of _myObj!
That's why I call this the Heisenberg Watch Window - you cannot observe the value without affecting it... :)
GOTCHA!
Edit - I feel #ChristianHayter's comment deserves inclusion in the main answer, because it looks like an effective workaround for this issue. So anytime you have a lazy-loaded property...
Decorate your property with [DebuggerBrowsable(DebuggerBrowsableState.Never)] or [DebuggerDisplay("<loaded on demand>")]. – Christian Hayter
Re-throwing exceptions
A gotcha that gets lots of new developers, is the re-throw exception semantics.
Lots of time I see code like the following
catch(Exception e)
{
// Do stuff
throw e;
}
The problem is that it wipes the stack trace and makes diagnosing issues much harder, cause you can not track where the exception originated.
The correct code is either the throw statement with no args:
catch(Exception)
{
throw;
}
Or wrapping the exception in another one, and using inner exception to get the original stack trace:
catch(Exception e)
{
// Do stuff
throw new MySpecialException(e);
}
Here's another time one that gets me:
static void PrintHowLong(DateTime a, DateTime b)
{
TimeSpan span = a - b;
Console.WriteLine(span.Seconds); // WRONG!
Console.WriteLine(span.TotalSeconds); // RIGHT!
}
TimeSpan.Seconds is the seconds portion of the timespan (2 minutes and 0 seconds has a seconds value of 0).
TimeSpan.TotalSeconds is the entire timespan measured in seconds (2 minutes has a total seconds value of 120).
Leaking memory because you didn't un-hook events.
This even caught out some senior developers I know.
Imagine a WPF form with lots of things in it, and somewhere in there you subscribe to an event. If you don't unsubscribe then the entire form is kept around in memory after being closed and de-referenced.
I believe the issue I saw was creating a DispatchTimer in the WPF form and subscribing to the Tick event, if you don't do a -= on the timer your form leaks memory!
In this example your teardown code should have
timer.Tick -= TimerTickEventHandler;
This one is especially tricky since you created the instance of the DispatchTimer inside the WPF form, so you would think that it would be an internal reference handled by the Garbage Collection process... unfortunately the DispatchTimer uses a static internal list of subscriptions and services requests on the UI thread, so the reference is 'owned' by the static class.
Maybe not really a gotcha because the behavior is written clearly in MSDN, but has broken my neck once because I found it rather counter-intuitive:
Image image = System.Drawing.Image.FromFile("nice.pic");
This guy leaves the "nice.pic" file locked until the image is disposed. At the time I faced it I though it would be nice to load icons on the fly and didn't realize (at first) that I ended up with dozens of open and locked files! Image keeps track of where it had loaded the file from...
How to solve this? I thought a one liner would do the job. I expected an extra parameter for FromFile(), but had none, so I wrote this...
using (Stream fs = new FileStream("nice.pic", FileMode.Open, FileAccess.Read))
{
image = System.Drawing.Image.FromStream(fs);
}
If you count ASP.NET, I'd say the webforms lifecycle is a pretty big gotcha to me. I've spent countless hours debugging poorly written webforms code, just because a lot of developers just don't really understand when to use which event handler (me included, sadly).
overloaded == operators and untyped containers (arraylists, datasets, etc.):
string my = "my ";
Debug.Assert(my+"string" == "my string"); //true
var a = new ArrayList();
a.Add(my+"string");
a.Add("my string");
// uses ==(object) instead of ==(string)
Debug.Assert(a[1] == "my string"); // true, due to interning magic
Debug.Assert(a[0] == "my string"); // false
Solutions?
always use string.Equals(a, b) when you are comparing string types
using generics like List<string> to ensure that both operands are strings.
[Serializable]
class Hello
{
readonly object accountsLock = new object();
}
//Do stuff to deserialize Hello with BinaryFormatter
//and now... accountsLock == null ;)
Moral of the story : Field initialisers are not run when deserializing an object
DateTime.ToString("dd/MM/yyyy"); This will actually not always give you dd/MM/yyyy but instead it will take into account the regional settings and replace your date separator depending on where you are. So you might get dd-MM-yyyy or something alike.
The right way to do this is to use DateTime.ToString("dd'/'MM'/'yyyy");
DateTime.ToString("r") is supposed to convert to RFC1123, which uses GMT. GMT is within a fraction of a second from UTC, and yet the "r" format specifier does not convert to UTC, even if the DateTime in question is specified as Local.
This results in the following gotcha (varies depending on how far your local time is from UTC):
DateTime.Parse("Tue, 06 Sep 2011 16:35:12 GMT").ToString("r")
> "Tue, 06 Sep 2011 17:35:12 GMT"
Whoops!
I saw this one posted the other day, and I think it is pretty obscure, and painful for those that don't know
int x = 0;
x = x++;
return x;
As that will return 0 and not 1 as most would expect
I'm a bit late to this party, but I have two gotchas that have both bitten me recently:
DateTime resolution
The Ticks property measures time in 10-millionths of a second (100 nanosecond blocks), however the resolution is not 100 nanoseconds, it's about 15ms.
This code:
long now = DateTime.Now.Ticks;
for (int i = 0; i < 10; i++)
{
System.Threading.Thread.Sleep(1);
Console.WriteLine(DateTime.Now.Ticks - now);
}
will give you an output of (for example):
0
0
0
0
0
0
0
156254
156254
156254
Similarly, if you look at DateTime.Now.Millisecond, you'll get values in rounded chunks of 15.625ms: 15, 31, 46, etc.
This particular behaviour varies from system to system, but there are other resolution-related gotchas in this date/time API.
Path.Combine
A great way to combine file paths, but it doesn't always behave the way you'd expect.
If the second parameter starts with a \ character, it won't give you a complete path:
This code:
string prefix1 = "C:\\MyFolder\\MySubFolder";
string prefix2 = "C:\\MyFolder\\MySubFolder\\";
string suffix1 = "log\\";
string suffix2 = "\\log\\";
Console.WriteLine(Path.Combine(prefix1, suffix1));
Console.WriteLine(Path.Combine(prefix1, suffix2));
Console.WriteLine(Path.Combine(prefix2, suffix1));
Console.WriteLine(Path.Combine(prefix2, suffix2));
Gives you this output:
C:\MyFolder\MySubFolder\log\
\log\
C:\MyFolder\MySubFolder\log\
\log\
When you start a process (using System.Diagnostics) that writes to the console, but you never read the Console.Out stream, after a certain amount of output your app will appear to hang.
No operator shortcuts in Linq-To-Sql
See here.
In short, inside the conditional clause of a Linq-To-Sql query, you cannot use conditional shortcuts like || and && to avoid null reference exceptions; Linq-To-Sql evaluates both sides of the OR or AND operator even if the first condition obviates the need to evaluate the second condition!
Using default parameters with virtual methods
abstract class Base
{
public virtual void foo(string s = "base") { Console.WriteLine("base " + s); }
}
class Derived : Base
{
public override void foo(string s = "derived") { Console.WriteLine("derived " + s); }
}
...
Base b = new Derived();
b.foo();
Output:
derived base
Value objects in mutable collections
struct Point { ... }
List<Point> mypoints = ...;
mypoints[i].x = 10;
has no effect.
mypoints[i] returns a copy of a Point value object. C# happily lets you modify a field of the copy. Silently doing nothing.
Update:
This appears to be fixed in C# 3.0:
Cannot modify the return value of 'System.Collections.Generic.List<Foo>.this[int]' because it is not a variable
Perhaps not the worst, but some parts of the .net framework use degrees while others use radians (and the documentation that appears with Intellisense never tells you which, you have to visit MSDN to find out)
All of this could have been avoided by having an Angle class instead...
For C/C++ programmers, the transition to C# is a natural one. However, the biggest gotcha I've run into personally (and have seen with others making the same transition) is not fully understanding the difference between classes and structs in C#.
In C++, classes and structs are identical; they only differ in the default visibility, where classes default to private visibility and structs default to public visibility. In C++, this class definition
class A
{
public:
int i;
};
is functionally equivalent to this struct definition.
struct A
{
int i;
};
In C#, however, classes are reference types while structs are value types. This makes a BIG difference in (1) deciding when to use one over the other, (2) testing object equality, (3) performance (e.g., boxing/unboxing), etc.
There is all kinds of information on the web related to the differences between the two (e.g., here). I would highly encourage anyone making the transition to C# to at least have a working knowledge of the differences and their implications.
Garbage collection and Dispose(). Although you don't have to do anything to free up memory, you still have to free up resources via Dispose(). This is an immensely easy thing to forget when you are using WinForms, or tracking objects in any way.
Arrays implement IList
But don't implement it. When you call Add, it tells you that it doesn't work. So why does a class implement an interface when it can't support it?
Compiles, but doesn't work:
IList<int> myList = new int[] { 1, 2, 4 };
myList.Add(5);
We have this issue a lot, because the serializer (WCF) turns all the ILists into arrays and we get runtime errors.
foreach loops variables scope!
var l = new List<Func<string>>();
var strings = new[] { "Lorem" , "ipsum", "dolor", "sit", "amet" };
foreach (var s in strings)
{
l.Add(() => s);
}
foreach (var a in l)
Console.WriteLine(a());
prints five "amet", while the following example works fine
var l = new List<Func<string>>();
var strings = new[] { "Lorem" , "ipsum", "dolor", "sit", "amet" };
foreach (var s in strings)
{
var t = s;
l.Add(() => t);
}
foreach (var a in l)
Console.WriteLine(a());
MS SQL Server can't handle dates before 1753. Significantly, that is out of synch with the .NET DateTime.MinDate constant, which is 1/1/1. So if you try to save a mindate, a malformed date (as recently happened to me in a data import) or simply the birth date of William the Conqueror, you're gonna be in trouble. There is no built-in workaround for this; if you're likely to need to work with dates before 1753, you need to write your own workaround.
The contract on Stream.Read is something that I've seen trip up a lot of people:
// Read 8 bytes and turn them into a ulong
byte[] data = new byte[8];
stream.Read(data, 0, 8); // <-- WRONG!
ulong data = BitConverter.ToUInt64(data);
The reason this is wrong is that Stream.Read will read at most the specified number of bytes, but is entirely free to read just 1 byte, even if another 7 bytes are available before end of stream.
It doesn't help that this looks so similar to Stream.Write, which is guaranteed to have written all the bytes if it returns with no exception. It also doesn't help that the above code works almost all the time. And of course it doesn't help that there is no ready-made, convenient method for reading exactly N bytes correctly.
So, to plug the hole, and increase awareness of this, here is an example of a correct way to do this:
/// <summary>
/// Attempts to fill the buffer with the specified number of bytes from the
/// stream. If there are fewer bytes left in the stream than requested then
/// all available bytes will be read into the buffer.
/// </summary>
/// <param name="stream">Stream to read from.</param>
/// <param name="buffer">Buffer to write the bytes to.</param>
/// <param name="offset">Offset at which to write the first byte read from
/// the stream.</param>
/// <param name="length">Number of bytes to read from the stream.</param>
/// <returns>Number of bytes read from the stream into buffer. This may be
/// less than requested, but only if the stream ended before the
/// required number of bytes were read.</returns>
public static int FillBuffer(this Stream stream,
byte[] buffer, int offset, int length)
{
int totalRead = 0;
while (length > 0)
{
var read = stream.Read(buffer, offset, length);
if (read == 0)
return totalRead;
offset += read;
length -= read;
totalRead += read;
}
return totalRead;
}
/// <summary>
/// Attempts to read the specified number of bytes from the stream. If
/// there are fewer bytes left before the end of the stream, a shorter
/// (possibly empty) array is returned.
/// </summary>
/// <param name="stream">Stream to read from.</param>
/// <param name="length">Number of bytes to read from the stream.</param>
public static byte[] Read(this Stream stream, int length)
{
byte[] buf = new byte[length];
int read = stream.FillBuffer(buf, 0, length);
if (read < length)
Array.Resize(ref buf, read);
return buf;
}
The Nasty Linq Caching Gotcha
See my question that led to this discovery, and the blogger who discovered the problem.
In short, the DataContext keeps a cache of all Linq-to-Sql objects that you have ever loaded. If anyone else makes any changes to a record that you have previously loaded, you will not be able to get the latest data, even if you explicitly reload the record!
This is because of a property called ObjectTrackingEnabled on the DataContext, which by default is true. If you set that property to false, the record will be loaded anew every time... BUT... you can't persist any changes to that record with SubmitChanges().
GOTCHA!
Events
I never understood why events are a language feature. They are complicated to use: you need to check for null before calling, you need to unregister (yourself), you can't find out who is registered (eg: did I register?). Why isn't an event just a class in the library? Basically a specialized List<delegate>?
Today I fixed a bug that eluded for long time. The bug was in a generic class that was used in multi threaded scenario and a static int field was used to provide lock free synchronisation using Interlocked. The bug was caused because each instantiation of the generic class for a type has its own static. So each thread got its own static field and it wasn't used a lock as intended.
class SomeGeneric<T>
{
public static int i = 0;
}
class Test
{
public static void main(string[] args)
{
SomeGeneric<int>.i = 5;
SomeGeneric<string>.i = 10;
Console.WriteLine(SomeGeneric<int>.i);
Console.WriteLine(SomeGeneric<string>.i);
Console.WriteLine(SomeGeneric<int>.i);
}
}
This prints
5
10
5
Just found a weird one that had me stuck in debug for a while:
You can increment null for a nullable int without throwing an excecption and the value stays null.
int? i = null;
i++; // I would have expected an exception but runs fine and stays as null
Enumerables can be evaluated more than once
It'll bite you when you have a lazily-enumerated enumerable and you iterate over it twice and get different results. (or you get the same results but it executes twice unnecessarily)
For example, while writing a certain test, I needed a few temp files to test the logic:
var files = Enumerable.Range(0, 5)
.Select(i => Path.GetTempFileName());
foreach (var file in files)
File.WriteAllText(file, "HELLO WORLD!");
/* ... many lines of codes later ... */
foreach (var file in files)
File.Delete(file);
Imagine my surprise when File.Delete(file) throws FileNotFound!!
What's happening here is that the files enumerable got iterated twice (the results from the first iteration are simply not remembered) and on each new iteration you'd be re-calling Path.GetTempFilename() so you'll get a different set of temp filenames.
The solution is, of course, to eager-enumerate the value by using ToArray() or ToList():
var files = Enumerable.Range(0, 5)
.Select(i => Path.GetTempFileName())
.ToArray();
This is even scarier when you're doing something multi-threaded, like:
foreach (var file in files)
content = content + File.ReadAllText(file);
and you find out content.Length is still 0 after all the writes!! You then begin to rigorously checks that you don't have a race condition when.... after one wasted hour... you figured out it's just that tiny little Enumerable gotcha thing you forgot....
TextInfo textInfo = Thread.CurrentThread.CurrentCulture.TextInfo;
textInfo.ToTitleCase("hello world!"); //Returns "Hello World!"
textInfo.ToTitleCase("hElLo WoRld!"); //Returns "Hello World!"
textInfo.ToTitleCase("Hello World!"); //Returns "Hello World!"
textInfo.ToTitleCase("HELLO WORLD!"); //Returns "HELLO WORLD!"
Yes, this behavior is documented, but that certainly doesn't make it right.

Categories