I'm working on a networking application in C#, sending a lot of plain numbers across the network. I discovered the IPAddress.HostToNetworkOrder and IPAddress.NetworkToHostOrder methods, which are very useful, but they left me with a few questions:
I know I need to encode and decode integers, what about unsigned ones? I think yes, so at the moment I'm doing it by casting a pointer to the unsigned int into a pointer to an int, and then doing a network conversion for the int (since there is no method overload that takes unsigned ints)
public static UInt64 HostToNetworkOrder(UInt64 i)
{
Int64 a = *((Int64*)&i);
a = IPAddress.HostToNetworkOrder(a);
return *((UInt64*)&a);
}
public static UInt64 NetworkToHostOrder(UInt64 a)
{
Int64 i = *((Int64*)&a);
i = IPAddress.HostToNetworkOrder(i);
return *((UInt64*)&i);
}
2. What about floating point numbers (single and double). I think no, however If I do need to should I do a similar method to the unsigned ints and cast a single pointer into a int pointer and convert like so?
EDIT:: Jons answer doesn't answer the second half of the question (it doesn't really answer the first either!), I would appreciate someone answering part 2
I suspect you'd find it easier to use my EndianBinaryReader and EndianBinaryWriter in MiscUtil - then you can decide the endianness yourself. Alternatively, for individual values, you can use EndianBitConverter.
You'd better read several RFC documents to see how different TCP/IP protocols (application level, for example, HTTP/FTP/SNMP and so on).
This is generally speaking, a protocol specific question (both your questions), as your packet must encapsulate the integers or floating point number in a protocol defined format.
For SNMP, this is a conversion that changing an integer/float number to a few bytes and changing it back. ASN.1 is used.
http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One
Related
I want to know when to use :short in C#?
Please help I want to use it instead of int.
Is using short a good or bad idea?
short - aka Int16 - has some very real but limited uses.
Example scenarios:
when the input value is limited to 16-bits, and you don't want to violate an invariant (perhaps because it maps to a database column that is 16 bits - smallint in SQL Server, for example)
declaring an enum that is : short for similar reasons
because you're implementing an algorithm that demands 16-bit wrapping behaviour - CRC-16, for example
when you are writing a struct with explicit layout that needs to map a very specific configuration (usually related to C/C++ mapping)
It is unusual, but by no means unexpected. Similarly: byte, sbyte, ushort, uint, long, ulong, etc.
int is a great default, but it is by no means the only option.
You will rarely need to use short, and I think it's reasonable to consider its use "bad" unless there's a compelling reason for using it.
int will generally perform better than short on modern CPUs.
For example, you may need to use short in a struct used to interoperate with legacy unmanaged code.
It's more optimal solution for memory saving to use proper type, because short has 16 bits size and int has 32 bits size.
When I first learned endianness, I was very confused at how it worked. I finally explained it to myself by the following metaphor:
On a big-endian machine, an int[4] would be arranged like this:
| int[4] |
|int1|int2|int3|int4|
While on little-endian machines, it would be laid out like
| int[4] |
|1tni|2tni|3tni|4tni|
That way the layout of the array would be consistent in memory, while the values themselves would be arranged differently.
Now to the real question: I am writing more optimized versions of BinaryReader and BinaryWriter in my .NET library. One of the problems I have run into is the implementation of Write(decimal). A decimal contains 4 int fields: flags, hi, lo, and mid, in that order. So basically on your typical little-endian machine it would look like this in memory:
| lamiced |
|sgalf|ih|ol|dim|
My question is, how would the CLR arrange the struct on big-endian machines? Would it arrange it so that the basic layout of the decimal would be conserved, like so
| decimal |
|flags|hi|lo|mid|
or would it completely reverse the binary arrangement of the decimal, like
| decimal |
|mid|lo|hi|flags|
?
Don't have a big-endian machine nearby, otherwise I'd test it out myself.
edit: TL;DR does the following code print -1 or 0 on big-endian machines?
struct Pair
{
public int a;
public int b;
}
unsafe static void Main()
{
var p = default(Pair);
p.a = -1;
Console.WriteLine(*(int*)&p);
}
It's not entirely clear what your actual question is.
Regarding the relationship between the layout of fields in a data structure and endianness, there is none. Endianness does not affect how fields in a data structure are laid out, only the order of bytes within a field.
I.e. in answer to this:
does the following code print -1 or 0 on big-endian machines?
… the output will be -1.
But you seem to be also or instead asking about the effect of endianness on the in-memory representation of the Decimal type. Which is a somewhat different question.
Regarding the endianness of the Decimal in-memory representation, I'm not aware of any requirement that .NET provide consistent implementations of the Decimal type. As commenter Hans Passant points out, there are multiple ways to view the current implementation; either as the CLR code you referenced, or as the more detailed declaration seen in e.g. wtypes.h or OleDb.h (another place a DECIMAL type appears, which has the same format as elsewhere). But in reality, as far as .NET is concerned, you are not promised anything about the in-memory layout of the type.
I would expect, for simplicity in implementation, the fields representing the 3 32-bit mantissa components may be affected by endianness, individually. (The sign and scale are represented as individual bytes, so endianness would not affect those). That is, while the order of the individual 32 bit fields would remain the same — high, low, mid — the bytes within each field will be represented according to the current platform's endianness.
But if Microsoft for some bizarre reason decided they wanted the .NET implementation to deviate from the native implementation (seems unlikely, but let's assume it for the sake of argument) and always use little-endian for the fields even on big-endian platforms, that would be within their rights.
For that matter, they could even rearrange the fields if they wanted to: their current order appears to me to be a concession to the de facto x86 standard of little-endianness, such that on little-endian architectures the combination of low and mid 32-bit values can be treated as a single 64-bit value without swapping words, so if they decided to deviate from the wtypes.h declaration, they might well decide to just make the mantissa a single 96-bit, little-endian or big-endian value.
Again, I'm not saying these actions are in any way likely. Just that they are theoretically possible and are just easy, obvious examples (a subset of all possible examples) of why writing managed code that assumes such private implementation details is probably not a good idea.
Even if you had access to a big-endian machine that could run .NET libraries (*) and so could test the actual behavior, today's current behavior doesn't offer you any guarantees of future behavior.
(*) (I don't even know of any…pure big-endian CPUs are fairly uncommon these days, and I can't think of a single one off the top of my head that is supported by Microsoft as an actual .NET platform.)
So…
I am skeptical that it is practical to author implementations of BinaryReader and BinaryWriter that are observably more optimized than those found in .NET already. The main reason for using these types is to handle I/O, and that necessarily means interacting with external systems that are orders of magnitude slower than the CPU that is handling the actual conversions to and from byte representations (and even the GC operations to support those conversions). Even if the existing Microsoft code were in some way hypothetically inefficient, in practice I doubt it would matter much.
But if you must implement these yourself, it seems to me that the only safe way to deal with the Decimal type is to use the Decimal.GetBits() method and Decimal.Decimal(int[]) constructor. These use clearly-documented, endian-independent mechanisms to convert the Decimal type. They are based on int, the in-memory representation of which will of course vary according to endianness, but your code will never need to worry about that, because it will only have to deal with entire int values, not their byte-wise representations.
I know question is a bit weird, I'm asking out of pure curiosity, as I couldn't find any relevant info around. Also, please feel free to edit title, I know its terrible, but could not make up any better.
Let say I have variable foo of type object, which is either short or ushort. I need to send it over network, so I use BitConverter to transform it into byte[]:
byte[] b = new byte[2];
if(foo is short){
BitConverter.GetBytes((short)foo, 0);
}else{
BitConverter.GetBytes((ushort)foo, 0);
}
Network/Socket magic happens, I want my variable back. I know type I am expecting to get, so I call BitConverter.GetUInt16 or GetInt16 properly.
Now, question is - does it actually matter, how I serialized the variable? I mean, bits are the same, so it shouldn't have any meaning, am I correct? So that I could
BitConverter.GetBytes((short)foo, 0);
and then do
BitConverter.GetUInt16(myByteArray, 0);
Anyone?
To serialize your variable, you should assign the result of BitConverter.GetBytes() to your byte[].
It doesn't matter if your variable is short or ushort, as those are the same size and hold the same values between 0 and 32767. As long as the size is ok, you should have no problems.
So you may make your code as simple as this:
byte[] b;
if(foo is short || foo is ushort)
b = BitConverter.GetBytes((short)foo); // You get proper results for ushort as well
However at the decoding site you must know which type you need, for short, you need:
short foo = BitConverter.ToInt16(b, 0);
but if you need an ushort, then you write:
ushort foo = BitConverter.ToUInt16(b, 0);
When you send multibyte variables over the network, you should also ensure that they are in network byte order as #itsme86 mentioned in his answer.
If you need to send both shorts and ushorts, then you also need to send type information to the other end to know if the data type is signed or not.
I don't write about it now in detail as it would complicate the code.
If you're transmitting it over the network, you could run into endianness issues (i.e. multibyte values might be stored in different byte order on different architectures). The standard convention when sending a multibyte value over a network is to transform it to Network Byte Order.
The receiver of the multibyte value would then convert it to Host Byte Order.
I'm porting some C# decompression code to AS3, and since it's doing some pretty complex stuff, it's using a range of datatypes such as byte and short. The problem is, AS3 doesn't have those datatypes.
For the most part I can use uint to hold these values. However, at some points, I get a line such as:
length[symbol++] = (short)len;
To my understanding, this means that len must be read and assigned to the length array as a short. So I'm wondering, how would I do this in AS3? I'm guessing perhaps to do:
length[symbol++] = len & 0xFF;
But I'm unsure if this would give a proper result.
So basically, my question is this: how do I make sure to keep the the correct number of bytes when doing this sort of stuff in AS3? Maybe I should use ByteArrays instead?
Depending on reason why cast is in C# code you may or may not need to keep cast in AS3 code. If cast is purely to adjust type to type of elements of length array (i.e. there is no loss of precision) than you don't need cast. If len can actually be bigger than 0x7FFF you'll need to perform some cast.
I think ByteArray maybe a reasonable option if you need to handle result similar to C# StreamReader, random access may be harder than necessary.
Note that short is 2 bytes long (synonym for System.Int16) so to convert to it using bit manipulations you need to do & 0xFFFF. Be also very careful if casting between signed and unsigned types...
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have observed for a while that C# programmers tend to use int everywhere, and rarely resort to uint. But I have never discovered a satisfactory answer as to why.
If interoperability is your goal, uint shouldn't appear in public APIs because not all CLI languages support unsigned integers. But that doesn't explain why int is so prevalent, even in internal classes. I suspect this is the reason uint is used sparingly in the BCL.
In C++, if you have an integer for which negative values make no sense, you choose an unsigned integer.
This clearly signifies that negative numbers are not allowed or expected, and the compiler will do some checking for you. I also suspect in the case of array indices, that the JIT can easily drop the lower bounds check.
However, when mixing int and unit types, extra care and casts will be needed.
Should uint be used more? Why?
int is shorter to type than uint.
Your observation of why uint isn't used in the BCL is the main reason, I suspect.
UInt32 is not CLS Compliant, which means that it is wholly inappropriate for use in public APIs. If you're going to be using uint in your private API, this will mean doing conversions to other types - and it's typically easier and safer to just keep the type the same.
I also suspect that this is not as common in C# development, even when C# is the only language being used, primarily because it is not common in the BCL. Developers, in general, try to (thankfully) mimic the style of the framework on which they are building - in C#'s case, this means trying to make your APIs, public and internal, look as much like the .NET Framework BCL as possible. This would mean using uint sparingly.
Normally int will suffice. If you can satisfy all of the following conditions, you can use uint:
It is not for a public API (since uint is not CLS compliant).
You don't need negative numbers.
You (might) need the additional range.
You are not using it in a comparison with < 0, as that is never true.
You are not using it in a comparison with >= 0, as that is never false.
The last requirement is often forgotten and will introduce bugs:
static void Main(string[] args)
{
if (args.Length == 0) return;
uint last = (uint)(args.Length - 1);
// This will eventually throw an IndexOutOfRangeException:
for (uint i = last; i >= 0; i--)
{
Console.WriteLine(args[i]);
}
}
1) Bad habit. Seriously. Even in C/C++.
Think of the common for pattern:
for( int i=0; i<3; i++ )
foo(i);
There's absolutely no reason to use an integer there. You will never have negative values. But almost everyone will do a simple loop that way, even if it contains (at least) two other "style" errors.
2) int is perceived as the native type of the machine.
I prefer uint to int unless a negative number is actually in the range of acceptable values. In particular, accepting an int param but throwing an ArgumentException if the number is less than zero is just silly--use a uint!
I agree that uint is underused, and I encourage everyone else to use it more.
I program at a lower level application layer where ints rarely get above 100, so negative values are not an issue (e.g. for i < myname.length() type stuff) it's just an old C habit - and shorter to type as mentioned above. However, in some cases, when interfacing to hardware where I'm dealing with event flags from devices, the uint is important in cases where a flag may use the left (highest) most bit.
Honestly, for 99.9% of my work I could easily use ushort, but int, you know, sounds sounds a lot better than ushort.
I have made a Direct3D 10 wrapper in C# & need to use uint if I want to create very large vertex buffers. Large buffers in the video card can not be represented with a signed int.
UINT is very useful & is silly to say otherwise. If anyone thinks just because they have never needed to use uint no one else will, you are wrong.
I think it is just laziness. C# is inherently a choice for development on desktops and other machines with relatively much resources.
C and C++, however, has deep roots in old systems and embedded systems where memory is sparse, so programmers are used to think carefully what datatype to use.
C# programmers are lazy, and since there are enough resources in general, nobody really optimizes memory usage (in general, not always of course). Event if a byte would be sufficient, a lot of C# programmers, including me, just use int for simplicity. Moreover, a lot of API functions accept ints, so it prevents casting.
I agree that choosing the correct datatype is good practice, but I think the main motivation is laziness.
Finally, choosing an integer is more mathematically correct. Unsigned ints don't exist in math (only natural numbers). And since most programmers have a mathematical background, using an integer is more natural.
I think a big part of the reason is that when C first came out most of the examples used int for brevity's sake. We rejoiced at not having to write integer like we did with Fortran and Pascal, and in those days we routinely used them for mundane things like array indices and loop counters. Unsigned integers were special cases for large numbers that needed that last extra bit. I think it's a natural progression that C habits continued into C# and other new languages like Python.
Some languages (e.g. many versions of Pascal) regard unsigned types as representing numeric quantities; an operation between an unsigned type and a signed type of the same size will generally be performed as though the operands were promoted to the next larger type (in some such languages, the largest type has no unsigned equivalent, so such promotion will always be possible).
Other languages (e.g. C) regard N-bit unsigned types as a group which wraps around modulo 2^N. Note that subtracting N from a member of such a group doesn't represent numerical subtraction, but rather yields the group member which, when N is added to it, would yield the original. Arguably, certain operations involving mixtures of signed and unsigned values don't really make sense and should perhaps have been forbidden, but even code which is sloppy with its specifications of things like numeric literals will usually work, and code has been written which mixes signed and unsigned types and, despite being sloppy, does work, that the spec isn't apt to change any time soon.
It's a lot easier to work exclusively with signed types than to work out all the intricacies of interactions between signed and unsigned types. Unsigned types are useful when decomposing large numbers out of smaller pieces (e.g. for serialization) or for reconstituting such numbers, but in general it's better to simply use signed numbers for things that actually represent quantities
I know this is probably an old thread but I wanted to give some clarification.
Lets take an int8 you can store –128 to 127 and it uses 1 byte that is a total of 127 positive numbers.
When you use an int8 one of the bits is used for the negative numbers -128.
When you use a Uint8 you give the negative numbers to the positive so this allows you to use 255 positive numbers with the same amount of storage 1 byte.
The only draw back is the you have now lost the capability to use negative values.
Another problem with this is not all programming languages and databases support this.
The only reason you would use this in my opinion is when you need to be efficient in like gaming programming and you have to store large non negative numbers.
This is why not many programs use this it.
The main reason is storage is not a problem and you can't use it flexibly with other software, plugins, Database, or Api's. Also for example a bank would need negative numbers to store money etc.
I hope this will help someone.