Print and parse hexfloat in .NET (C#/VB.NET) - c#

In c++, one can use std::cout << std::hexfloat << someFloatValue to print floating point values in base 16, instead of the usual base 10. For example, 0.09375 (i.e. 3/32) would be printed as "0x1.8p-4"
It should not print "0x3DC00000"! This is simply be type-punning the float to an int and printing that in hexadecimal. That's not what I'm looking for!
I am looking specifically for the C++ "hexfloat" format.
I cannot find anything similar in .NET. I need to print some floating point vaules from a VB.NET program (but that's not required, C# works too) that are later read by a C++ program. Ideally, I would also like to be able to parse those back into my VB.NET program.
How do I correctly print hexfloats from .NET? I would rather avoid rolling my own algorithm, if there is already something in the framework. If not, I would like to avoid platform-dependent code, e.g. plonking the float into a union ([StructLayout(LayoutKind.Explicit)] struct) and inspecting the underlying bits through that.

Related

Is struct field layout consistent with endianness in C#?

When I first learned endianness, I was very confused at how it worked. I finally explained it to myself by the following metaphor:
On a big-endian machine, an int[4] would be arranged like this:
| int[4] |
|int1|int2|int3|int4|
While on little-endian machines, it would be laid out like
| int[4] |
|1tni|2tni|3tni|4tni|
That way the layout of the array would be consistent in memory, while the values themselves would be arranged differently.
Now to the real question: I am writing more optimized versions of BinaryReader and BinaryWriter in my .NET library. One of the problems I have run into is the implementation of Write(decimal). A decimal contains 4 int fields: flags, hi, lo, and mid, in that order. So basically on your typical little-endian machine it would look like this in memory:
| lamiced |
|sgalf|ih|ol|dim|
My question is, how would the CLR arrange the struct on big-endian machines? Would it arrange it so that the basic layout of the decimal would be conserved, like so
| decimal |
|flags|hi|lo|mid|
or would it completely reverse the binary arrangement of the decimal, like
| decimal |
|mid|lo|hi|flags|
?
Don't have a big-endian machine nearby, otherwise I'd test it out myself.
edit: TL;DR does the following code print -1 or 0 on big-endian machines?
struct Pair
{
public int a;
public int b;
}
unsafe static void Main()
{
var p = default(Pair);
p.a = -1;
Console.WriteLine(*(int*)&p);
}
It's not entirely clear what your actual question is.
Regarding the relationship between the layout of fields in a data structure and endianness, there is none. Endianness does not affect how fields in a data structure are laid out, only the order of bytes within a field.
I.e. in answer to this:
does the following code print -1 or 0 on big-endian machines?
… the output will be -1.
But you seem to be also or instead asking about the effect of endianness on the in-memory representation of the Decimal type. Which is a somewhat different question.
Regarding the endianness of the Decimal in-memory representation, I'm not aware of any requirement that .NET provide consistent implementations of the Decimal type. As commenter Hans Passant points out, there are multiple ways to view the current implementation; either as the CLR code you referenced, or as the more detailed declaration seen in e.g. wtypes.h or OleDb.h (another place a DECIMAL type appears, which has the same format as elsewhere). But in reality, as far as .NET is concerned, you are not promised anything about the in-memory layout of the type.
I would expect, for simplicity in implementation, the fields representing the 3 32-bit mantissa components may be affected by endianness, individually. (The sign and scale are represented as individual bytes, so endianness would not affect those). That is, while the order of the individual 32 bit fields would remain the same — high, low, mid — the bytes within each field will be represented according to the current platform's endianness.
But if Microsoft for some bizarre reason decided they wanted the .NET implementation to deviate from the native implementation (seems unlikely, but let's assume it for the sake of argument) and always use little-endian for the fields even on big-endian platforms, that would be within their rights.
For that matter, they could even rearrange the fields if they wanted to: their current order appears to me to be a concession to the de facto x86 standard of little-endianness, such that on little-endian architectures the combination of low and mid 32-bit values can be treated as a single 64-bit value without swapping words, so if they decided to deviate from the wtypes.h declaration, they might well decide to just make the mantissa a single 96-bit, little-endian or big-endian value.
Again, I'm not saying these actions are in any way likely. Just that they are theoretically possible and are just easy, obvious examples (a subset of all possible examples) of why writing managed code that assumes such private implementation details is probably not a good idea.
Even if you had access to a big-endian machine that could run .NET libraries (*) and so could test the actual behavior, today's current behavior doesn't offer you any guarantees of future behavior.
(*) (I don't even know of any…pure big-endian CPUs are fairly uncommon these days, and I can't think of a single one off the top of my head that is supported by Microsoft as an actual .NET platform.)
So…
I am skeptical that it is practical to author implementations of BinaryReader and BinaryWriter that are observably more optimized than those found in .NET already. The main reason for using these types is to handle I/O, and that necessarily means interacting with external systems that are orders of magnitude slower than the CPU that is handling the actual conversions to and from byte representations (and even the GC operations to support those conversions). Even if the existing Microsoft code were in some way hypothetically inefficient, in practice I doubt it would matter much.
But if you must implement these yourself, it seems to me that the only safe way to deal with the Decimal type is to use the Decimal.GetBits() method and Decimal.Decimal(int[]) constructor. These use clearly-documented, endian-independent mechanisms to convert the Decimal type. They are based on int, the in-memory representation of which will of course vary according to endianness, but your code will never need to worry about that, because it will only have to deal with entire int values, not their byte-wise representations.

VB number representation is strange, how do i convert it to a natural number

So this may be obvious but i have recently inherited some legacy code and scattered around the code are array indexes like this
someArray(&H7D0)
I get that this "&H7D0" is the index but how do i go about changing it to a real number as i am converting the code to C#.
the code is a mess and it's not obvious what it might be.
This is a Hexidecimal number. The equivalent C# would be someArray(0x7d0)
Both are equivalent to the decimal number 2000 so you could actually write someArray(2000) to allow the code to be used in both languages.

Parsing a MIDI file for Note On only

I've been trying to figure out the mystical realm of MIDI parsing, and I'm having no luck. All I'm trying to do is get the note value (60 = C4, 72 = C5, etc), in order of when they occur.
My code is as follows. All it does is very simply open a file as a byte array and read everything out as hex:
byte[] MIDI = File.ReadAllBytes("TestMIDI.mid");
foreach (var element in MIDI) {
string b = Convert.ToString(element,16);
Debug.WriteLine(b);
}
All TestMIDI.mid contains is one note on C5. Here's a hex dump of it. Using this info, I'm trying to find the simple hex value for Note On (0x9, or just 9 in the dump), but there aren't any. I can find a few 72's, but there are 3, which doesn't make any sense to me (note on, note off, then what?).
This is my first attempt at parsing MIDI as a file and using hex dumps (are they even called that?), so I'm sorry if I'm heading in the complete wrong direction. All I need is to get the note that plays, and in what order. I don't need timing or anything fancy at all. The reason behind this, if it matters - is to then generate new code in a different language to be played out of a speaker, very similar to the beep command on *nix. Because of this, I don't want to use any frameworks that 1) I didn't program, and really didn't learn anything and 2) do far more than what I need, making the framework heavier than the actual code by me.
Accepted answer is not a solution for the problem. It will not work in common case. I'll provide several cases where this code either will not work or will fail. Order of these cases corresponds their probability - most probable cases go first.
False positives. MIDI files contain a lot of data structures where you can find a byte with the value 144. And these structures are not Note On events. For real MIDI files you'll get bunch of "notes" that are not notes but random values within the file.
Channels other than 0. Most of the modern MIDI files contain several track chunks. Each one holds events for the specific MIDI channel (from 0 to 15). 144 (or 90 in hex) represents a Note On event for the channel 0. So you are going to miss a lot of Note On events for other channels.
Running status. MIDI files actively use concept of running status. This technique allows don't store status bytes of consecutive events of the same type. It means that status byte 144 can be written only once for the first Note On event and you will not find it further in the file.
144 is the last byte in a file. MIDI file can end with this value. For example if a custom chunk is the last chunk in the file or track chunk doesn't end with End of Track event (which is corruption according to MIDI file specification but possible scenario in real world). In this case you' ll get IndexOutOfRangeException on MIDI[i+1].
Thus, you should never search for specific value to find some semantic data structure in a MIDI file. You must use one of the .NET libraries available on the Internet. For example, with the DryWetMIDI you can use this code:
IEnumerable<Note> notes = MidiFile.Read(filePath)
.GetNotes();
To do this right, you'll need at least some semblance of a MIDI parser. Searching through 0x9 events is a good start, but 0x9 is also a Note-Off event if the velocity field is 0. 0x9 can also be present inside other events (meta events, MPQN events, delta times, etc), so you'll get false positives. So, you need something that actually knows the MIDI file format to do this accurately.
Look for a library, write your own, or port an open-source one. Mine is in Java if you want to look.

How well do Script# numbers map to Javascript?

I've been playing with Script#, and I was wondering how the C# numbers were converted to Javascript. I wrote this little bit of code
int a = 3 / 2;
and looked at the relevant bit of compiled Javascript:
var $0=3/2;
In C#, the result of 3 / 2 assigned to an int is 1, but in Javascript, which only has one number type, is 1.5.
Because of this disparity between the C# and Javascript behaviour, and since the compiled code doesn't seem to compensate for it, should I assume that my numeric calculations written in C# might behave incorrectly when compiled to Javascript?
Should I assume that my numeric calculations written in C# might behave incorrectly when compiled to Javascript?
Yes.
Like you said, "the compiled code doesn't seem to compensate for it" - though for the case you mention where a was declared as an int it would be easy enough to compensate by using var $0 = Math.floor(3/2);. But if you don't control how the "compiler" works you're in a pickle. (You could correct the JavaScript manually, but you'd have to do that every time you regenerated it. Yuck.)
Note also that you are likely to have problems with decimal numbers too due to the way JavaScript represents decimal places. Most people are surprised the first time they find out that JavaScript will tell you that 0.4 * 3 works out to be 1.2000000000000002. For more details see one of the many other questions on this issue, e.g., How to deal with floating point number precision in JavaScript?. (Actually I think C# handles decimals the same way, so maybe this issue won't be such a surprise. Still, it can be a trap for new players...)

strtoul Equivalent in C#

I'm trying to read-in a bunch of unsigned integers from a configuration file into a class. These numbers may be specified in either base-10 (eg: 1234) or in base-16 (eg: 0xAB31). Therefore looking for the strtoul equivalent in C# 2.0.
More specifically, I'm interested in a C# function which mimics the behaviour of the this function when the argument indicating the base or radix is passed in as zero. (Under C++, strtoul will attempt to 'guess' the base or radix based on the first couple of characters in the string and then proceed to convert the number suitably)
Currently I'm manually checking the first two characters (using string.Substring() method) of the string and then calling Convert.ToUInt32(hex, 10) or Convert.ToUInt32(hex, 16) as needed.
I'm sure that there has to be a better way to deal with this problem and hence this post. More elegant ideas/solutions or work-arounds would be great help.
Well, you don't need to use Substring unless it's in hex, but it sounds like you're basically doing it the right way:
return text.StartsWith("0x") ? Convert.ToUInt32(text.Substring(2), 16)
: Convert.ToUInt32(text, 10);
Obviously this will create an extra object for the Substring call, and you could write your own hex parsing code to cope with this - but unless you've actually run into performance problems with this approach, I'd keep it simple.

Categories