How and when to use :short in C#? - c#

I want to know when to use :short in C#?
Please help I want to use it instead of int.
Is using short a good or bad idea?

short - aka Int16 - has some very real but limited uses.
Example scenarios:
when the input value is limited to 16-bits, and you don't want to violate an invariant (perhaps because it maps to a database column that is 16 bits - smallint in SQL Server, for example)
declaring an enum that is : short for similar reasons
because you're implementing an algorithm that demands 16-bit wrapping behaviour - CRC-16, for example
when you are writing a struct with explicit layout that needs to map a very specific configuration (usually related to C/C++ mapping)
It is unusual, but by no means unexpected. Similarly: byte, sbyte, ushort, uint, long, ulong, etc.
int is a great default, but it is by no means the only option.

You will rarely need to use short, and I think it's reasonable to consider its use "bad" unless there's a compelling reason for using it.
int will generally perform better than short on modern CPUs.
For example, you may need to use short in a struct used to interoperate with legacy unmanaged code.

It's more optimal solution for memory saving to use proper type, because short has 16 bits size and int has 32 bits size.

Related

Is struct field layout consistent with endianness in C#?

When I first learned endianness, I was very confused at how it worked. I finally explained it to myself by the following metaphor:
On a big-endian machine, an int[4] would be arranged like this:
| int[4] |
|int1|int2|int3|int4|
While on little-endian machines, it would be laid out like
| int[4] |
|1tni|2tni|3tni|4tni|
That way the layout of the array would be consistent in memory, while the values themselves would be arranged differently.
Now to the real question: I am writing more optimized versions of BinaryReader and BinaryWriter in my .NET library. One of the problems I have run into is the implementation of Write(decimal). A decimal contains 4 int fields: flags, hi, lo, and mid, in that order. So basically on your typical little-endian machine it would look like this in memory:
| lamiced |
|sgalf|ih|ol|dim|
My question is, how would the CLR arrange the struct on big-endian machines? Would it arrange it so that the basic layout of the decimal would be conserved, like so
| decimal |
|flags|hi|lo|mid|
or would it completely reverse the binary arrangement of the decimal, like
| decimal |
|mid|lo|hi|flags|
?
Don't have a big-endian machine nearby, otherwise I'd test it out myself.
edit: TL;DR does the following code print -1 or 0 on big-endian machines?
struct Pair
{
public int a;
public int b;
}
unsafe static void Main()
{
var p = default(Pair);
p.a = -1;
Console.WriteLine(*(int*)&p);
}
It's not entirely clear what your actual question is.
Regarding the relationship between the layout of fields in a data structure and endianness, there is none. Endianness does not affect how fields in a data structure are laid out, only the order of bytes within a field.
I.e. in answer to this:
does the following code print -1 or 0 on big-endian machines?
… the output will be -1.
But you seem to be also or instead asking about the effect of endianness on the in-memory representation of the Decimal type. Which is a somewhat different question.
Regarding the endianness of the Decimal in-memory representation, I'm not aware of any requirement that .NET provide consistent implementations of the Decimal type. As commenter Hans Passant points out, there are multiple ways to view the current implementation; either as the CLR code you referenced, or as the more detailed declaration seen in e.g. wtypes.h or OleDb.h (another place a DECIMAL type appears, which has the same format as elsewhere). But in reality, as far as .NET is concerned, you are not promised anything about the in-memory layout of the type.
I would expect, for simplicity in implementation, the fields representing the 3 32-bit mantissa components may be affected by endianness, individually. (The sign and scale are represented as individual bytes, so endianness would not affect those). That is, while the order of the individual 32 bit fields would remain the same — high, low, mid — the bytes within each field will be represented according to the current platform's endianness.
But if Microsoft for some bizarre reason decided they wanted the .NET implementation to deviate from the native implementation (seems unlikely, but let's assume it for the sake of argument) and always use little-endian for the fields even on big-endian platforms, that would be within their rights.
For that matter, they could even rearrange the fields if they wanted to: their current order appears to me to be a concession to the de facto x86 standard of little-endianness, such that on little-endian architectures the combination of low and mid 32-bit values can be treated as a single 64-bit value without swapping words, so if they decided to deviate from the wtypes.h declaration, they might well decide to just make the mantissa a single 96-bit, little-endian or big-endian value.
Again, I'm not saying these actions are in any way likely. Just that they are theoretically possible and are just easy, obvious examples (a subset of all possible examples) of why writing managed code that assumes such private implementation details is probably not a good idea.
Even if you had access to a big-endian machine that could run .NET libraries (*) and so could test the actual behavior, today's current behavior doesn't offer you any guarantees of future behavior.
(*) (I don't even know of any…pure big-endian CPUs are fairly uncommon these days, and I can't think of a single one off the top of my head that is supported by Microsoft as an actual .NET platform.)
So…
I am skeptical that it is practical to author implementations of BinaryReader and BinaryWriter that are observably more optimized than those found in .NET already. The main reason for using these types is to handle I/O, and that necessarily means interacting with external systems that are orders of magnitude slower than the CPU that is handling the actual conversions to and from byte representations (and even the GC operations to support those conversions). Even if the existing Microsoft code were in some way hypothetically inefficient, in practice I doubt it would matter much.
But if you must implement these yourself, it seems to me that the only safe way to deal with the Decimal type is to use the Decimal.GetBits() method and Decimal.Decimal(int[]) constructor. These use clearly-documented, endian-independent mechanisms to convert the Decimal type. They are based on int, the in-memory representation of which will of course vary according to endianness, but your code will never need to worry about that, because it will only have to deal with entire int values, not their byte-wise representations.

Confusion between Word16 and UWord16

I'm porting some C code to C#. I'm seeing a lot of Word16, Word32 usage, along with UWord16and UWord32.
I know Word32 is an unsigned 32bit int type, but what could have been the need to write it with a different name UWord32? Am I missing something here? Is it different from Word32 in some manner?
Also, WORD32 can I just replace its usage in C# with int? If not, why?
This Source, says WORD is an unsigned integral type. Yes the source is of Haskell, I couldn't find any other documentation explaining the datatype WORD.
but what could have been the need to write it with a different name UWord32?
This is an unsigned 32 bit integer type.
In general, you can likely replace (moving to C#):
WORD32 -> int (Int32)
UWORD32 -> uint (UInt32)
WORD16 -> short (Int16)
UWORD16 -> ushort (UInt16)
This is, however, all speculation based on my expectations given the naming scheme you've shown.
Note that, if you're using Windows Data Types, WORD -> ushort, and DWORD -> uint. Signed types are INT/INT32 -> int, and then INT16 -> short, INT64 -> long, etc.
That being said, all of these options are all defines in C or C++, and not "native" (language defined) types. Your code could define WORD to represent an unsigned 64 bit integer, if it chose. As such, you need to look at where the defines are coming from (I listed the Windows API standards here).
I know Word32 is an unsigned 32bit int type, but what could have been the need to write it with a different name UWord32?
If this is the case, there is likely no need to have two definitions for the same type. It may be that two headers you are using define things slightly different. Again, you'd need to check the headers you're using that define these types, and see how they're specified.

using uint vs int [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have observed for a while that C# programmers tend to use int everywhere, and rarely resort to uint. But I have never discovered a satisfactory answer as to why.
If interoperability is your goal, uint shouldn't appear in public APIs because not all CLI languages support unsigned integers. But that doesn't explain why int is so prevalent, even in internal classes. I suspect this is the reason uint is used sparingly in the BCL.
In C++, if you have an integer for which negative values make no sense, you choose an unsigned integer.
This clearly signifies that negative numbers are not allowed or expected, and the compiler will do some checking for you. I also suspect in the case of array indices, that the JIT can easily drop the lower bounds check.
However, when mixing int and unit types, extra care and casts will be needed.
Should uint be used more? Why?
int is shorter to type than uint.
Your observation of why uint isn't used in the BCL is the main reason, I suspect.
UInt32 is not CLS Compliant, which means that it is wholly inappropriate for use in public APIs. If you're going to be using uint in your private API, this will mean doing conversions to other types - and it's typically easier and safer to just keep the type the same.
I also suspect that this is not as common in C# development, even when C# is the only language being used, primarily because it is not common in the BCL. Developers, in general, try to (thankfully) mimic the style of the framework on which they are building - in C#'s case, this means trying to make your APIs, public and internal, look as much like the .NET Framework BCL as possible. This would mean using uint sparingly.
Normally int will suffice. If you can satisfy all of the following conditions, you can use uint:
It is not for a public API (since uint is not CLS compliant).
You don't need negative numbers.
You (might) need the additional range.
You are not using it in a comparison with < 0, as that is never true.
You are not using it in a comparison with >= 0, as that is never false.
The last requirement is often forgotten and will introduce bugs:
static void Main(string[] args)
{
if (args.Length == 0) return;
uint last = (uint)(args.Length - 1);
// This will eventually throw an IndexOutOfRangeException:
for (uint i = last; i >= 0; i--)
{
Console.WriteLine(args[i]);
}
}
1) Bad habit. Seriously. Even in C/C++.
Think of the common for pattern:
for( int i=0; i<3; i++ )
foo(i);
There's absolutely no reason to use an integer there. You will never have negative values. But almost everyone will do a simple loop that way, even if it contains (at least) two other "style" errors.
2) int is perceived as the native type of the machine.
I prefer uint to int unless a negative number is actually in the range of acceptable values. In particular, accepting an int param but throwing an ArgumentException if the number is less than zero is just silly--use a uint!
I agree that uint is underused, and I encourage everyone else to use it more.
I program at a lower level application layer where ints rarely get above 100, so negative values are not an issue (e.g. for i < myname.length() type stuff) it's just an old C habit - and shorter to type as mentioned above. However, in some cases, when interfacing to hardware where I'm dealing with event flags from devices, the uint is important in cases where a flag may use the left (highest) most bit.
Honestly, for 99.9% of my work I could easily use ushort, but int, you know, sounds sounds a lot better than ushort.
I have made a Direct3D 10 wrapper in C# & need to use uint if I want to create very large vertex buffers. Large buffers in the video card can not be represented with a signed int.
UINT is very useful & is silly to say otherwise. If anyone thinks just because they have never needed to use uint no one else will, you are wrong.
I think it is just laziness. C# is inherently a choice for development on desktops and other machines with relatively much resources.
C and C++, however, has deep roots in old systems and embedded systems where memory is sparse, so programmers are used to think carefully what datatype to use.
C# programmers are lazy, and since there are enough resources in general, nobody really optimizes memory usage (in general, not always of course). Event if a byte would be sufficient, a lot of C# programmers, including me, just use int for simplicity. Moreover, a lot of API functions accept ints, so it prevents casting.
I agree that choosing the correct datatype is good practice, but I think the main motivation is laziness.
Finally, choosing an integer is more mathematically correct. Unsigned ints don't exist in math (only natural numbers). And since most programmers have a mathematical background, using an integer is more natural.
I think a big part of the reason is that when C first came out most of the examples used int for brevity's sake. We rejoiced at not having to write integer like we did with Fortran and Pascal, and in those days we routinely used them for mundane things like array indices and loop counters. Unsigned integers were special cases for large numbers that needed that last extra bit. I think it's a natural progression that C habits continued into C# and other new languages like Python.
Some languages (e.g. many versions of Pascal) regard unsigned types as representing numeric quantities; an operation between an unsigned type and a signed type of the same size will generally be performed as though the operands were promoted to the next larger type (in some such languages, the largest type has no unsigned equivalent, so such promotion will always be possible).
Other languages (e.g. C) regard N-bit unsigned types as a group which wraps around modulo 2^N. Note that subtracting N from a member of such a group doesn't represent numerical subtraction, but rather yields the group member which, when N is added to it, would yield the original. Arguably, certain operations involving mixtures of signed and unsigned values don't really make sense and should perhaps have been forbidden, but even code which is sloppy with its specifications of things like numeric literals will usually work, and code has been written which mixes signed and unsigned types and, despite being sloppy, does work, that the spec isn't apt to change any time soon.
It's a lot easier to work exclusively with signed types than to work out all the intricacies of interactions between signed and unsigned types. Unsigned types are useful when decomposing large numbers out of smaller pieces (e.g. for serialization) or for reconstituting such numbers, but in general it's better to simply use signed numbers for things that actually represent quantities
I know this is probably an old thread but I wanted to give some clarification.
Lets take an int8 you can store –128 to 127 and it uses 1 byte that is a total of 127 positive numbers.
When you use an int8 one of the bits is used for the negative numbers -128.
When you use a Uint8 you give the negative numbers to the positive so this allows you to use 255 positive numbers with the same amount of storage 1 byte.
The only draw back is the you have now lost the capability to use negative values.
Another problem with this is not all programming languages and databases support this.
The only reason you would use this in my opinion is when you need to be efficient in like gaming programming and you have to store large non negative numbers.
This is why not many programs use this it.
The main reason is storage is not a problem and you can't use it flexibly with other software, plugins, Database, or Api's. Also for example a bank would need negative numbers to store money etc.
I hope this will help someone.

C# Network encoding

I'm working on a networking application in C#, sending a lot of plain numbers across the network. I discovered the IPAddress.HostToNetworkOrder and IPAddress.NetworkToHostOrder methods, which are very useful, but they left me with a few questions:
I know I need to encode and decode integers, what about unsigned ones? I think yes, so at the moment I'm doing it by casting a pointer to the unsigned int into a pointer to an int, and then doing a network conversion for the int (since there is no method overload that takes unsigned ints)
public static UInt64 HostToNetworkOrder(UInt64 i)
{
Int64 a = *((Int64*)&i);
a = IPAddress.HostToNetworkOrder(a);
return *((UInt64*)&a);
}
public static UInt64 NetworkToHostOrder(UInt64 a)
{
Int64 i = *((Int64*)&a);
i = IPAddress.HostToNetworkOrder(i);
return *((UInt64*)&i);
}
2. What about floating point numbers (single and double). I think no, however If I do need to should I do a similar method to the unsigned ints and cast a single pointer into a int pointer and convert like so?
EDIT:: Jons answer doesn't answer the second half of the question (it doesn't really answer the first either!), I would appreciate someone answering part 2
I suspect you'd find it easier to use my EndianBinaryReader and EndianBinaryWriter in MiscUtil - then you can decide the endianness yourself. Alternatively, for individual values, you can use EndianBitConverter.
You'd better read several RFC documents to see how different TCP/IP protocols (application level, for example, HTTP/FTP/SNMP and so on).
This is generally speaking, a protocol specific question (both your questions), as your packet must encapsulate the integers or floating point number in a protocol defined format.
For SNMP, this is a conversion that changing an integer/float number to a few bytes and changing it back. ASN.1 is used.
http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One

.NET Integer vs Int16?

I have a questionable coding practice.
When I need to iterate through a small list of items whose count limit is under 32000, I use Int16 for my i variable type instead of Integer. I do this because I assume using the Int16 is more efficient than a full blown Integer.
Am I wrong? Is there no effective performance difference between using an Int16 vs an Integer? Should I stop using Int16 and just stick with Integer for all my counting/iteration needs?
You should almost always use Int32 or Int64 (and, no, you do not get credit by using UInt32 or UInt64) when looping over an array or collection by index.
The most obvious reason that it's less efficient is that all array and collection indexes found in the BCL take Int32s, so an implicit cast is always going to happen in code that tries to use Int16s as an index.
The less-obvious reason (and the reason that arrays take Int32 as an index) is that the CIL specification says that all operation-stack values are either Int32 or Int64. Every time you either load or store a value to any other integer type (Byte, SByte, UInt16, Int16, UInt32, or UInt64), there is an implicit conversion operation involved. Unsigned types have no penalty for loading, but for storing the value, this amounts to a truncation and a possible overflow check. For the signed types every load sign-extends, and every store sign-collapses (and has a possible overflow check).
The place that this is going to hurt you most is the loop itself, not the array accesses. For example take this innocent-looking loop:
for (short i = 0; i < 32000; i++) {
...
}
Looks good, right? Nope! You can basically ignore the initialization (short i = 0) since it only happens once, but the comparison (i<32000) and incrementing (i++) parts happen 32000 times. Here's some pesudo-code for what this thing looks like at the machine level:
Int16 i = 0;
LOOP:
Int32 temp0 = Convert_I16_To_I32(i); // !!!
if (temp0 >= 32000) goto END;
...
Int32 temp1 = Convert_I16_To_I32(i); // !!!
Int32 temp2 = temp1 + 1;
i = Convert_I32_To_I16(temp2); // !!!
goto LOOP;
END:
There are 3 conversions in there that are run 32000 times. And they could have been completely avoided by just using an Int32 or Int64.
Update: As I said in the comment, I have now, in fact written a blog post on this topic, .NET Integral Data Types And You
According to the below reference, the runtime optimizes performance of Int32 and recommends them for counters and other frequently accessed operations.
From the book: MCTS Self-Paced Training Kit (Exam 70-536): Microsoft® .NET Framework 2.0—Application Development Foundation
Chapter 1: "Framework Fundamentals"
Lesson 1: "Using Value Types"
Best Practices: Optimizing performance
with built-in types
The runtime optimizes the performance of 32-bit integer types (Int32 and UInt32), so use those types for counters and other frequently accessed integral variables.
For floating-point operations, Double is the most efficient type because those operations are optimized by hardware.
Also, Table 1-1 in the same section lists recommended uses for each type.
Relevant to this discussion:
Int16 - Interoperation and other specialized uses
Int32 - Whole numbers and counters
Int64 - Large whole numbers
Int16 may actually be less efficient because the x86 instructions for word access take up more space than the instructions for dword access. It will depend on what the JIT does. But no matter what, it's almost certainly not more efficient when used as the variable in an iteration.
The opposite is true.
32 (or 64) bit integers are faster than int16. In general the native datatype is the fastest one.
Int16 are nice if you want to make your data-structures as lean as possible. This saves space and may improve performance.
Never assume efficiency.
What is or isn't more efficient will vary from compiler to compiler and platform to platform. Unless you actually tested this, there is no way to tell whether int16 or int is more efficient.
I would just stick with ints unless you come across a proven performance problem that using int16 fixes.
Any performance difference is going to be so tiny on modern hardware that for all intents and purposes it'll make no difference. Try writing a couple of test harnesses and run them both a few hundred times, take the average loop completion times, and you'll see what I mean.
It might make sense from a storage perspective if you have very limited resources - embedded systems with a tiny stack, wire protocols designed for slow networks (e.g. GPRS etc), and so on.
Use Int32 on 32-bit machines (or Int64 on 64-bit machines) for fastest performance. Use a smaller integer type if you're really concerned about the space it takes up (may be slower, though).
The others here are correct, only use less than Int32 (for 32-bit code)/Int64 (for 64-bit code) if you need it for extreme storage requirements, or for another level of enforcement on a business object field (you should still have propery level validation in this case, of course).
And in general, don't worry about efficiency until there is a performance problem. And in that case, profile it. And if guess & checking with both ways while profiling doesn't help you enough, check the IL code.
Good question though. You're learning more about how the compiler does it's thing. If you want to learn to program more efficiently, learning the basics of IL and how the C#/VB compilers do their job would be a great idea.
I can't imagine there being any significant performance gain on Int16 vs. int.
You save some bits in the variable declaration.
And definitely not worth the hassle when the specs change and whatever you are counting can go above 32767 now and you discover that when your application starts throwing exceptions...
There is no significant performance gain in using a data type smaller than Int32, in fact, i read somewhere that using Int32 will be faster than Int16 because of memory allocation

Categories