How do you recommend using #region / #endregion? To what extent should that replace using sub functions to clarify your code?
Not at all.
First of all, #regions are more a way of grouping many related functions/members into collapsible regions. They are not intended to structure a single multi-thousand line function into parts. (That being said, if you write a single method that's so long that you consider structuring it with #regions then you're probably doing something seriously wrong. Regions or not, that code would be unmaintainable. Period.)
Many people argue however, that it doesn't really help and that you should consider rewriting classes that actually need regions to be understandable. Also, regions tend to hide nasty code.
#region / #endregion is a way to logically group parts of the code belonging to the same class. Personally, I tend to group private field declarations, properties, public functions and private functions.
Sometimes I use those keywords to group some parts of the code which I need to look after and update often, for instance, calculation methods.
If you have more than one 'logical group of code' in a class, your class violates the single responsibility principle.
Sort that out and you no longer need regions.
Regions seem good in theory, but in my experience, it's a feature that is often abused.
Programmers love order; most folk love tidying things away into little boxes. They group messy code, fields, properties, constructors, methods, public methods, internal methods, private methods, helper methods, constants, interface implementations and God knows what else.
The only thing I can think of that irks me more is the use of partial classes to hide complexity.
Anyway, while over-use of regions is often a tell-tale sign of hiding a mess that shouldn't be there, I've also seen good code swamped by them. I've downloaded a few open source projects written by respected programmers. These fellows are writing some amazing code, but, oh, what's this?
One field? A field region!
Two properties? A property region!
One constructor? A constructor region!
One private method? A private method region!
I could go on.
To this very day, I am still astounded when I see this. In some cases, a region, a blank line, another blank line and the end region can take up 5x the space of the original code (5 lines with regions, 1 line without). It's basically a form of OCD; these regions may appeal to our sense of order during the act of writing software, but in practice they're useless -- pure noise. When I first started writing c# I abused them in this way, too. But then I realised how noisy my code was, and that hitting ctrl-k l every time I open a file is a sign that I was doing it wrong.
I can understand it when a class implements an interface that has a lot of properties (e.g. for databinding) or even a group of methods for implementing some related functionality, but for everything?. It makes no sense.
I still use regions now and then, but... I exercise a lot of restraint.
The only circumstance I've ever found where I felt using a region was totally okay is in the code below. Once I got it right, I never wanted to have to look at those constants again. Indeed, I use this class every day, and I think the only time I've uncollapsed this region in the last four years was when I needed to reimplement it in Python.
I think (hope, pray) that the circumstances of this code are an edge case. C# constants based on a VB3 type declaration that defines how the COBOL data structure returned by a C++ function is laid out. Yeah, I ported this to Python. I'm that good. I'm tempted to learn Haskell just so that I can rewrite my Python code in it, with an eye towards one day reimplementing my Haskell code in OCaml.
#region buffer_definition
/*
The buffer is a byte array that is passed to the underlying API. The VB representation of
the buffer's structure (using zero-based arrays, so each array has one more element than
its dimension) is this:
Public Type BUFFER_TYPE
Method As String * 50
Status As Integer
Msg As String * 200
DataLine As String * 1200
Prop(49) As String * 100
Fld(79) As String * 20
Fmt(79) As String * 50
Prompt(79) As String * 20
ValIn(79) As String * 80
ValOut(79) As String * 80
End Type
The constants defined here have the following prefixes:
len = field length
cnt = count of fields in an array
ptr = starting position within the buffer
*/
// data element lengths
private const int len_method = 50;
private const int len_status = 2;
private const int len_msg = 200;
private const int len_dataLine = 1200;
// array elements require both count and length:
private const int cnt_prop = 50;
private const int len_prop = 100;
private const int cnt_fld = 80;
private const int len_fld = 20;
private const int len_fmt = 50;
private const int len_prompt = 20;
private const int len_valIn = 80;
private const int len_valOut = 80;
// calculate the buffer length
private const int len_buffer =
len_method
+ len_status
+ len_msg
+ len_dataLine
+ (cnt_prop * len_prop)
+ (cnt_fld * (len_fld + len_fmt + len_prompt + len_valIn + len_valOut));
// calculate the pointers to the start of each field. These pointers are used
// in the marshalling methods to marshal data into and out of the buffer.
private const int PtrMethod = 0;
private const int PtrStatus = PtrMethod + len_method;
private const int PtrMsg = PtrStatus + len_status;
private const int PtrDataLine = PtrMsg + len_msg;
private const int PtrProp = PtrDataLine + len_dataLine;
private const int PtrFld = PtrProp + (cnt_prop * len_prop);
private const int PtrFmt = PtrFld + (cnt_fld * len_fld);
private const int PtrPrompt = PtrFmt + (cnt_fld * len_fmt);
private const int PtrValIn = PtrPrompt + (cnt_fld * len_prompt);
private const int PtrValOut = PtrValIn + (cnt_fld * len_valIn);
[MarshalAs(UnmanagedType.LPStr, SizeConst = len_buffer)]
private static byte[] buffer = new byte[len_buffer];
#endregion
I think that functions should only be used for reusable code. Thats what they were designed for. Nothing infurriates me more than seeing a function being created for something that is called only once.
Use a region.
If you need to do 500 lines then type the 500 lines in. If you want to neaten it up use a region, if there is anything reusable then use a function.
Related
Let's say we have one structure :
[StructLayout(LayoutKind.Explicit, Size=8)] // using System.Runtime.InteropServices;
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
}
What I want to have : Both direct access to type string and int values, for the field Ident in this structure, without breaking the 8 bytes size of the structure, nor having to compute a string value each time from the int value.
The field Ident in that structure as int is interesting because I can fast compare with other idents if they match, other idents may come from datas that are unrelated to this structure, but are in the same int format.
Question : Is there a way to define a field that is not part of the struture layout ? Like :
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
[NoOffset()] // <- is there something I can do the like of this
string _identStr;
public string IdentStr {
get { // EDIT ! missed the getter on this property
if (string.IsNullOrEmpty(_identStr)) _identStr =
System.Text.Encoding.ASCII.GetString(Ident.GetBytes());
// do the above only once. May use an extra private bool field to go faster.
return _identStr;
}
}
}
PS : I use pointers ('*' and '&', unsafe) because I need to deal with endianness (Local system, binary files/file format, network) and fast type conversions, fast arrays filling. I also use many flavours of Marshal methods (fixing structures on byte arrays), and a little of PInvoke and COM interop. Too bad some assemblies I'm dealing with doesn't have their dotNet counterpart yet.
TL;DR; For details only
The question is all it is about, I just don't know the answer. The following should answer most questions like "other approaches", or "why not do this instead", but could be ignored as the answer would be straightforward. Anyway, I preemptively put everything so it's clear from the start what am I trying to do. :)
Options/Workaround I'm currently using (or thinking of using) :
Create a getter (not a field) that computes the string value each time :
public string IdentStr {
get { return System.Text.Encoding.ASCII.GetString(Ident.GetBytes()); }
// where GetBytes() is an extension method that converts an int to byte[]
}
This approach, while doing the job, performs poorly : The GUI displays aircraft from a database of default flights, and injects other flights from the network with a refresh rate of one second (I should increase that to 5 seconds). I have around 1200 flights within a area, relating to 2400 airports (departure and arrival), meaning I have 2400 calls to the above code each second to display the ident in a DataGrid.
Create another struct (or class), which only purpose is to manage
data on GUI side, when not reading/writing to a stream or file. That means, read
the data with the explicit layout struct. Create another struct with
the string version of the field. Work with GUI. That will perform
better on an overall point of view, but, in the process of defining
structures for the game binaries, I'm already at 143 structures of
the kind (just with older versions of the game datas; there are a bunch I didn't write yet, and I plan to add structures for the newest datas types). ATM, more than half of them require one or more extra
fields to be of meaningful use. It's okay if I were the only one to use the assembly, but
other users will probably get lost with AirportHeader,
AirportHeaderEx, AirportEntry, AirportEntryEx,
AirportCoords, AirportCoordsEx.... I would avoid doing that.
Optimize option 1 to make computations perform faster (thanks to SO,
there are a bunch of ideas to look for - currently working on the idea). For the Ident field, I
guess I could use pointers (and I will). Already doing it for fields I must display in little endian and read/write in big
endian. There are other values, like 4x4 grid informations that are
packed in a single Int64 (ulong), that needs bit shifting to
expose the actual values. Same for GUIDs or objects pitch/bank/yaw.
Try to take advantage of overlapping fields (on study). That would work for GUIDs. Perhaps it may work for the Ident example, if MarshalAs can constrain the
value to an ASCII string. Then I just need to specify the same
FieldOffset, '0' in this case. But I'm unsure setting the field
value (entry.FieldStr = "FMEP";) actually uses the Marshal constrain on the managed code side. My undestanding is it will store the string in Unicode on managed side (?).
Furthermore, that wouldn't work for packed bits (bytes that contains
several values, or consecutive bytes hosting values that have to be
bit shifted). I believe it is impossible to specify value position, length and format
at bit level.
Why bother ? context :
I'm defining a bunch of structures to parse binary datas from array of bytes (IO.File.ReadAllBytes) or streams, and write them back, datas related to a game. Application logic should use the structures to quickly access and manipulate the datas on demand. Assembly expected capabilities is read, validate, edit, create and write, outside the scope of the game (addon building, control) and inside the scope of the game (API, live modding or monitoring). Other purpose is to understand the content of binaries (hex) and make use of that understanding to build what's missing in the game.
The purpose of the assembly is to provide a ready to use basis components for a c# addon contributor (I don't plan to make the code portable). Creating applications for the game or processing addon from source to compilation into game binaries. It's nice to have a class that loads the entire content of a file in memory, but some context require you to not do that, and only retrieve from the file what is necessary, hence the choice of the struct pattern.
I need to figure out the trust and legal issues (copyrighted data) but that's outside the scope of the main concern. If that matter, Microsoft did provide over the years public freely accessible SDKs exposing binaries structures on previous versions of the game, for the purpose of what I'm doing (I'm not the first and probably not the last to do so). Though, I wouldn't dare to expose undocumented binaries (for the latest game datas for instance), nor facilitate a copyright breach on copyrighted materials/binaries.
I'm just asking confirmation if there is a way or not to have private fields not being part of the structure layout. Naive belief ATM is "that's impossible, but there are workarounds". It's just that my c# experience is pretty sparce, so maybe I'm wrong, why I ask. Thanks !
As suggested, there are several ways to get the job done. Here are the getters/setters I came up with within the structure. I'll measure how each code performs on various scenarios later. The dict approach is very seducing as on many scenarios, I would need a directly accessible global database of (59000) airports with runways and parking spots (not just the Ident), but a fast check between struct fields is also interesting.
public string IdentStr_Marshal {
get {
var output = "";
GCHandle pinnedHandle; // CS0165 for me (-> c# v5)
try { // Fast if no exception, (very) slow if exception thrown
pinnedHandle = GCHandle.Alloc(this, GCHandleType.Pinned);
IntPtr structPtr = pinnedHandle.AddrOfPinnedObject();
output = Marshal.PtrToStringAnsi(structPtr, 4);
// Cannot use UTF8 because the assembly should work in Framework v4.5
} finally { if (pinnedHandle.IsAllocated) pinnedHandle.Free(); }
return output;
}
set {
value.PadRight(4); // Must fill the blanks - initial while loop replaced (Charlieface's)
IntPtr intValuePtr = IntPtr.Zero;
// Cannot use UTF8 because some users are on Win7 with FlightSim 2004
try { // Put a try as a matter of habit, but not convinced it's gonna throw.
intValuePtr = Marshal.StringToHGlobalAnsi(value);
Ident = Marshal.ReadInt32(intValuePtr, 0).BinaryConvertToUInt32(); // Extension method to convert type.
} finally { Marshal.FreeHGlobal(intValuePtr); // freeing the right pointer }
}
}
public unsafe string IdentStr_Pointer {
get {
string output = "";
fixed (UInt32* ident = &Ident) { // Fixing the field
sbyte* bytes = (sbyte*)ident;
output = new string(bytes, 0, 4, System.Text.Encoding.ASCII); // Encoding added (#Charlieface)
}
return output;
}
set {
// value must not exceed a length of 4 and must be in Ansi [A-Z,0-9,whitespace 0x20].
// value validation at this point occurs outside the structure.
fixed (UInt32* ident = &Ident) { // Fixing the field
byte* bytes = (byte*)ident;
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value);
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
}
}
}
static Dictionary<UInt32, string> ps_dict = new Dictionary<UInt32, string>();
public string IdentStr_StaticDict {
get {
string output; // logic update with TryGetValue (#Charlieface)
if (ps_dict.TryGetValue(Ident, out output)) return output;
output = System.Text.Encoding.ASCII.GetString(Ident.ToBytes(EndiannessType.LittleEndian));
ps_dict.Add(Ident, output);
return output;
}
set { // input can be "FMEE", "DME" or "DK". length of 2 characters is the minimum.
var bytes = new byte[4]; // Need to convert value to a 4 byte array
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value); // should be 4 bytes or less
// Put the valid ASCII codes in the array.
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
Ident = BitConverter.ToUInt32(bytes, 0); // Set structure int value
if (!ps_dict.ContainsKey(Ident)) // Add if missing
ps_dict.Add(Ident, System.Text.Encoding.ASCII.GetString(bytes));
}
}
As mentioned by others, it is not possible to exclude a field from a struct for marshalling.
You also cannot use a pointer as a string in most places.
If the number of different possible strings is relatively small (and it probably will be, given it's only 4 characters), then you could use a static Dictionary<int, string> as a kind of string-interning mechanism.
Then you write a property to add/retrieve the real string.
Note that dictionary access is O(1), and hashing an int just returns itself, so it will be very, very fast, but will take up some memory.
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader
{
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
static Dictionary<int, string> _identStrings = new Dictionary<int, string>();
public string IdentStr =>
_identStrings.TryGetValue(Ident, out var ret) ? ret :
(_identStrings[Ident] = Encoding.ASCII.GetString(Ident.GetBytes());
}
This is not possible because a structure must contain all of its values in a specific order. Usually this order is controlled by the CLR itself. If you want to change the order of the data order, you can use the StructLayout. However, you cannot exclude a field or that data would simply not exist in memory.
Instead of a string (which is a reference type) you can use a pointer to point directly to that string and use that in your structure in combination with the StructLayout. To get this string value, you can use a get-only property that reads directly from unmanaged memory.
I have an array inside a class:
class MatchNode
{
public short X;
public short Y;
public NodeVal[] ControlPoints;
private MatchNode()
{
ControlPoints = new NodeVal[4];
}
}
The NodeVal is:
struct NodeVal
{
public readonly short X;
public readonly short Y;
public NodeVal(short x, short y)
{
X = x;
Y = y;
}
}
Now what if we wanted to take performance to next level and avoid having a separate object for the array. Actually it doesn't have to have an array. The only restriction is that the client code should be able to access NodeVal by index like:
matchNode.ControlPoints[i]
OR
matchNode[i]
and of course the solution should be faster or as fast as array access since it's supposed to be an optimization.
EDIT: As Ryan suggested it seems I should explain more about the motivation:
The MatchNode class is used heavily in the project. Millions of them are used in the project and each are accessed hundreds of times so having them as compact and concise as possible can lead to less cache misses and overall performance.
Let's consider a 64bit machine. In the current implementation the class the array takes 8 bytes for the ControlPoints reference and the size of the array object would be at least 16 bytes of object overhead (for method table and sync block) and 16 byte for the actual byte. So we have at least 24 overhead bytes beside 16 bytes of actual data.
These objects are used in bottlenecks of the project so it matters if we could optimize them more.
Of course we could just have a super big array of NodeVal and just save an index in MatchNode that would locate the actual data but again it will change every client codes that uses the MatchNodes, let alone be a dirty non-object oriented solution.
It is okay to have a messy MatchNode that uses every kind of nasty trick like unsafe or static cache code. It is not okay to leak these optimizations out to the client code.
You´re looking for indexers:
class MatchNode
{
public short X;
public short Y;
private NodeVal[] myField;
public NodeVal this[int i] { get { return myField[i]; } set { myField[i] = value; } }
public MatchNode(int size) { this.myField = new NodeVal[size]; }
}
Now you can simply use this:
var m = new MatchNode(10);
m[0] = new NodeVal();
However I doubt this will affect performance (at least in means of speed) in any way and you should consider the actual problems using a profiling tool (dotTrace for instance). Furthermore this approach will also create a private backing-field which will produce the same memory-footprint.
I am trying to port a rather large source from VB6 to C#. This is no easy task - especially for me being fairly new to C#.net. This source uses numerous Windows APIs as well as numerous Types. I know that there is no equivalent to the VB6 Type in C# but I'm sure there is a way to reach the same outcome. I will post some code below to further explain my request.
VB6:
Private Type ICONDIRENTRY
bWidth As Byte
bHeight As Byte
bColorCount As Byte
bReserved As Byte
wPlanes As Integer
wBitCount As Integer
dwBytesInRes As Long
dwImageOffset As Long
End Type
Dim tICONDIRENTRY() As ICONDIRENTRY
ReDim tICONDIRENTRY(tICONDIR.idCount - 1)
For i = 0 To tICONDIR.idCount - 1
Call ReadFile(lFile, tICONDIRENTRY(i), Len(tICONDIRENTRY(i)), lRet, ByVal 0&)
Next i
I have tried using structs and classes - but no luck so far.
I would like to see a conversion of this Type structure, but if someone had any clue as to how to convert the entire thing it would be unbelievably helpful. I have spent countless hours on this small project already.
If it makes any difference, this is strictly for educational purposes only.
Thank you for any help in advance,
Evan
struct is the equivalent. You'd express it like this:
struct IconDirEntry {
public byte Width;
public byte Height;
public byte ColorCount;
public byte Reserved;
public int Planes;
public int BitCount;
public long BytesInRes;
public long ImageOffset;
}
You declare a variable like this:
IconDirEntry entry;
Generally, in C#, type prefixes are not used, nor are all caps, except possibly for constants. structs are value types in C#, so that means that they are always passed by value. It looks like you're passing them in to a method that's populating them. If you want that usage, you'll have to use classes.
I'm not exactly sure what your issue is but this is a small ex of how to use a struct.
struct aStrt
{
public int A;
public int B;
}
static void Main(string[] args)
{
aStrt saStrt;
saStrt.A = 5;
}
Your question is not clear ..
What issues are you facing when you are using either struct or class and define those field members? Are you not able to access those members using an instance created for that class ??
Else, declare the class as static and make all the members inside the class also as static , so that u can access them without any instance being created!!
Maybe you trying to get something like this?
struct IconDirEntry
{
public byte Width;
// etc...
}
IconDirEntry[] tICONDIRENTRY = new IconDireEntry[tICONDIR.idCount - 1];
Before you react from the gut, as I did initially, read the whole question please. I know they make you feel dirty, I know we've all been burned before and I know it's not "good style" but, are public fields ever ok?
I'm working on a fairly large scale engineering application that creates and works with an in memory model of a structure (anything from high rise building to bridge to shed, doesn't matter). There is a TON of geometric analysis and calculation involved in this project. To support this, the model is composed of many tiny immutable read-only structs to represent things like points, line segments, etc. Some of the values of these structs (like the coordinates of the points) are accessed tens or hundreds of millions of times during a typical program execution. Because of the complexity of the models and the volume of calculation, performance is absolutely critical.
I feel that we're doing everything we can to optimize our algorithms, performance test to determine bottle necks, use the right data structures, etc. etc. I don't think this is a case of premature optimization. Performance tests show order of magnitude (at least) performance boosts when accessing fields directly rather than through a property on the object. Given this information, and the fact that we can also expose the same information as properties to support data binding and other situations... is this OK? Remember, read only fields on immutable structs. Can anyone think of a reason I'm going to regret this?
Here's a sample test app:
struct Point {
public Point(double x, double y, double z) {
_x = x;
_y = y;
_z = z;
}
public readonly double _x;
public readonly double _y;
public readonly double _z;
public double X { get { return _x; } }
public double Y { get { return _y; } }
public double Z { get { return _z; } }
}
class Program {
static void Main(string[] args) {
const int loopCount = 10000000;
var point = new Point(12.0, 123.5, 0.123);
var sw = new Stopwatch();
double x, y, z;
double calculatedValue;
sw.Start();
for (int i = 0; i < loopCount; i++) {
x = point._x;
y = point._y;
z = point._z;
calculatedValue = point._x * point._y / point._z;
}
sw.Stop();
double fieldTime = sw.ElapsedMilliseconds;
Console.WriteLine("Direct field access: " + fieldTime);
sw.Reset();
sw.Start();
for (int i = 0; i < loopCount; i++) {
x = point.X;
y = point.Y;
z = point.Z;
calculatedValue = point.X * point.Y / point.Z;
}
sw.Stop();
double propertyTime = sw.ElapsedMilliseconds;
Console.WriteLine("Property access: " + propertyTime);
double totalDiff = propertyTime - fieldTime;
Console.WriteLine("Total difference: " + totalDiff);
double averageDiff = totalDiff / loopCount;
Console.WriteLine("Average difference: " + averageDiff);
Console.ReadLine();
}
}
result:
Direct field access: 3262
Property access: 24248
Total difference: 20986
Average difference: 0.00020986
It's only 21 seconds, but why not?
Your test isn't really being fair to the property-based versions. The JIT is smart enough to inline simple properties so that they have a runtime performance equivalent to that of direct field access, but it doesn't seem smart enough (today) to detect when the properties access constant values.
In your example, the entire loop body of the field access version is optimized away, becoming just:
for (int i = 0; i < loopCount; i++)
00000025 xor eax,eax
00000027 inc eax
00000028 cmp eax,989680h
0000002d jl 00000027
}
whereas the second version, is actually performing the floating point division on each iteration:
for (int i = 0; i < loopCount; i++)
00000094 xor eax,eax
00000096 fld dword ptr ds:[01300210h]
0000009c fdiv qword ptr ds:[01300218h]
000000a2 fstp st(0)
000000a4 inc eax
000000a5 cmp eax,989680h
000000aa jl 00000096
}
Making just two small changes to your application to make it more realistic makes the two operations practically identical in performance.
First, randomize the input values so that they aren't constants and the JIT isn't smart enough to remove the division entirely.
Change from:
Point point = new Point(12.0, 123.5, 0.123);
to:
Random r = new Random();
Point point = new Point(r.NextDouble(), r.NextDouble(), r.NextDouble());
Secondly, ensure that the results of each loop iteration are used somewhere:
Before each loop, set calculatedValue = 0 so they both start at the same point. After each loop call Console.WriteLine(calculatedValue.ToString()) to make sure that the result is "used" so the compiler doesn't optimize it away. Finally, change the body of the loop from "calculatedValue = ..." to "calculatedValue += ..." so that each iteration is used.
On my machine, these changes (with a release build) yield the following results:
Direct field access: 133
Property access: 133
Total difference: 0
Average difference: 0
Just as we expect, the x86 for each of these modified loops is identical (except for the loop address)
000000dd xor eax,eax
000000df fld qword ptr [esp+20h]
000000e3 fmul qword ptr [esp+28h]
000000e7 fdiv qword ptr [esp+30h]
000000eb fstp st(0)
000000ed inc eax
000000ee cmp eax,989680h
000000f3 jl 000000DF (This loop address is the only difference)
Given that you deal with immutable objects with readonly fields, I would say that you have hit the one case when I don't find public fields to be a dirty habit.
IMO, the "no public fields" rule is one of those rules which are technically correct, but unless you are designing a library intended to be used by the public it is unlikely to cause you any problem if you break it.
Before I get too massively downvoted, I should add that encapsulation is a good thing. Given the invariant "the Value property must be null if HasValue is false", this design is flawed:
class A {
public bool HasValue;
public object Value;
}
However, given that invariant, this design is equally flawed:
class A {
public bool HasValue { get; set; }
public object Value { get; set; }
}
The correct design is
class A {
public bool HasValue { get; private set; }
public object Value { get; private set; }
public void SetValue(bool hasValue, object value) {
if (!hasValue && value != null)
throw new ArgumentException();
this.HasValue = hasValue;
this.Value = value;
}
}
(and even better would be to provide an initializing constructor and make the class immutable).
I know you feel kind of dirty doing this, but it isn't uncommon for rules and guidelines to get shot to hell when performance becomes an issue. For example, quite a few high traffic websites using MySQL have data duplication and denormalized tables. Others go even crazier.
Moral of the story - it may go against everything you were taught or advised, but the benchmarks don't lie. If it works better, just do it.
If you really need that extra performance, then it's probably the right thing to do. If you don't need the extra performance then it's probably not.
Rico Mariani has a couple of related posts:
Ten Questions on Value-Based Programming
Ten Questions on Value-Based Programming : Solution
Personally, the only time I would consider using public fields is in a very implementation-specific private nested class.
Other times it just feels too "wrong" to do it.
The CLR will take care of performance by optimising out the method/property (in release builds) so that shouldn't be an issue.
Not that I disagree with the other answers, or with your conclusion... but I'd like to know where you get the order of magnitude performance difference stat from. As I understand the C# compiler, any simple property (with no additional code other than direct access to the field), should get inlined by the JIT compiler as a direct access anyway.
The advantedge of using properties even in these simple cases (in most situations) was that by writing it as a property you allow for future changes that might modify the property. (Although in your case there would not be any such changes in future of course)
Try compiling a release build and running directly from the exe instead of through the debugger. If the application was run through a debugger then the JIT compiler will not inline the property accessors. I was not able to replicate your results. In fact, each test I ran indicated that there was virtually no difference in execution time.
But, like the others I am not completely oppossed to direct field access. Especially because it is easy to make the field private and add a public property accessor at a later time without needed make any more code modifications to get the application to compile.
Edit: Okay, my initial tests used an int data type instead of double. I see a huge difference when using doubles. With ints the direct vs. property is virtually the same. With doubles property access is about 7x slower than direct access on my machine. This is somewhat puzzling to me.
Also, it is important to run the tests outside of the debugger. Even in release builds the debugger adds overhead which skews the results.
Here's some scenarios where it is OK (from the Framework Design Guidelines book):
DO use constant fields for constants
that will never change.
DO use public
static readonly fields for predefined
object instances.
And where it is not:
DO NOT assign instances of mutable
types to readonly fields.
From what you have stated I don't get why your trivial properties don't get inlined by the JIT?
If you modify your test to use the temp variables you assign rather than directly access the properties in your calculation you will see a large performance improvement:
sw.Start();
for (int i = 0; i < loopCount; i++)
{
x = point._x;
y = point._y;
z = point._z;
calculatedValue = x * y / z;
}
sw.Stop();
double fieldTime = sw.ElapsedMilliseconds;
Console.WriteLine("Direct field access: " + fieldTime);
sw.Reset();
sw.Start();
for (int i = 0; i < loopCount; i++)
{
x = point.X;
y = point.Y;
z = point.Z;
calculatedValue = x * y / z;
}
sw.Stop();
Perhaps I'll repeat someone else, but here's my point too if it may help.
Teachings are to give you the tools you need to achieve a certain level of ease when encountering such situations.
The Agile Software development methodology says that you have to first deliver the product to your client no matter what your code might look like. Second, you may optimize and make your code "beautiful" or according to the programming states of the art.
Here, either you or your client require performance. Within your project, PERFORMANCE is CRUCIAL, if I understand correctly.
So, I guess you'll agree with me that we don't care about what the code might look like or whether it respects the "art". Do what you have to to make it performant and powerful! Properties allow your code to "format" the data I/O if required. A property has its own memory address, then it looks for its member address when you return the member's value, so you got two searches of address. If performance is such critical, just do it, and make your immutable members public. :-)
This reflects some others point of view too, if I read correctly. :)
Have a good day!
Types which encapsulate functionality should use properties. Types which only serve to hold data should use public fields, except in the case of immutable classes (where wrapping fields in read-only properties is the only way to reliably protect them against modification). Exposing members as public fields essentially proclaims "these members may be freely modified at any time without regard for anything else". If the type in question is a class type, it further proclaims "anyone who exposes a reference to this thing will be allowing the recipient to change these members at any time in any fashion they see fit." While one shouldn't expose public fields in cases where such a proclamation would be inappropriate, one should expose public fields in cases where such a proclamation would be appropriate and client code could benefit from the assumptions enabled thereby.
If you have two threads invoking a static function at the same moment in time, is there a concurrency risk? And if that function uses a static member of the class, is there even a bigger problem?
Are the two calls seperated from each other? (the function is like copied for the two threads?)
Are they automatically queued?
For instance in next example, is there a risk?
private static int a = 5;
public static int Sum()
{
int b = 4;
a = 9;
int c = a + b;
return c;
}
And next example, is there a risk?
public static int Sum2()
{
int a = 5;
int b = 4;
int c = a + b;
return c;
}
Update: And indeed, if both functions are in the same class, what is the risk then?
thx, Lieven Cardoen
Yes, there is a concurrency risk when you modify a static variable in static methods.
The static functions themselves have distinct sets of local variables, but any static variables are shared.
In your specific samples you're not being exposed, but that's just because you're using constants (and assigning the same values to them). Change the code sample slightly and you'll be exposed.
Edit:
If you call both Sum1() AND Sum2() from different threads you're in trouble, there's no way to guarantee the value of a and b in this statement: int c = a + b;
private static int a = 5;
public static int Sum1()
{
int b = 4;
a = 9;
int c = a + b;
return c;
}
public static int Sum2()
{
int b = 4;
int c = a + b;
return c;
}
You can also achieve concurrency problems with multiple invocations of a single method like this:
public static int Sum3(int currentA)
{
a = currentA;
int b = 4;
int c = a + b;
int d = a * b; // a may have changed here
return c + d;
}
The issue here is that the value of a may change mid-method due to other invocations changing it.
See here for a discussion on local variables. before your edit neither of the above methods themselves presented a concurrency risk; the local variables are all independent per call; the shared state (static int a) is visible to multiple threads, but you don't mutate it, and you only read it once.
If you did something like:
if(a > 5) {
Console.WriteLine(a + " is greater than 5");
} // could write "1 is greater than 5"
it would (in theory) not be safe, as the value of a could be changed by another thread - you would typically either synchronize access (via lock etc), or take a snapshot:
int tmp = a;
if(tmp > 5) {
Console.WriteLine(tmp + " is greater than 5");
}
If you are editing the value, you would almost certainly require synchronization.
Yes, there is a risk. That's why you'll see in MSDN doc, it will often say "This class is threadsafe for static members" (or something like that). It means when MS wrote the code, they intentionally used synchronization primitives to make the static members threadsafe. This is common when writing libraries and frameworks, because it is easier to make static members threadsafe than instance members, because you don't know what the library user is going to want to do with instances. If they made instance members threadsafe for many of the library classes, they would put too many restrictions on you ... so often they let you handle it.
So you likewise need to make your static members threadsafe (or document that they aren't).
By the way, static constructors are threadsafe in a sense. The CLR will make sure they are called only once and will prevent 2 threads from getting into a static constructor.
EDIT: Marc pointed out in the comments an edge case in which static constructors are not threadsafe. If you use reflection to explicitly call a static constructor, apparently you can call it more than once. So I revise the statement as follows: as long as you are relying on the CLR to decide when to call your static constructor, then the CLR will prevent it from being called more than once, and it will also prevent the static ctor from being called re-entrantly.
In your two examples, there is no thread safety issues because each call to the function will have it's own copy of the local variables on the stack, and in your first example with 'a' being a static variable, you never change 'a', so there is no problem.
If you change the value in 'a' in your first example you will have a potential concurrency problem.
If the scope of the variables is contained within the static function then there is no risk, but variables outside of the function scope (static / shared) DEFINITLY pose a concurrency risk
Static methods in OO are no difference from "just" functions in procedural programming. Unless you store some state inside static variable there is no risk at all.
You put "ASP.NET" in the question title, this blog post is a good summary of the problems when using the ThreadStatic keyword in ASP.NET :
http://piers7.blogspot.com/2005/11/threadstatic-callcontext-and_02.html