File.WriteAllBytes does not changes file to binary 10101011 - c#

I have a some confusion about the function File.WriteAllBytes.
Actually i read from an image file using
byte[] b = System.IO.File.ReadAllBytes(textBox1.Text);
and then i wrote back the read data to a text file to see how it looks.
System.IO.File.WriteAllBytes(#"D:\abc.txt", b);
But the contents of abc.txt are not pure binary(1010110) , but they appear as :-
ëžÕwN±k›“ùIRA=Ï¥Dh﬒ȪÊj:³0Æî(À÷«3ÚÉid¤n•O<‰-ª#–¢)cùY³Ö˜K„TûËEÇóþ}wtÑ+²=£v*NÌ!\ äji;âíÇ8ÿ ?犴ö¬€Áç#µ:+ŠVÜ„©³Û?çù~VèÖ·ÂËSŠE7RH8}GJGfT?Ý?çüÿ œÌÊR"6­ÓŠY¬Š¬L§|n¹> ÷’ÃU{D®t­vE!3** Ý× õ¨ã(¨qžO§ùÿ >Ó¥¤…K€#N{ñM(ÊÅ€ûÃŒRtj/²Æ¤¶¹RÁŽxqþÏó#KŒîn皘æ0C/-Ž1Mu>oÊ }é5(­Q¢i±pIôÀôÿ ?çÒÂB-á.ãï©Ú}êB®æÇÌyÿ ?çüU¥mã$”ã
‚DiFQ¸'µ,ARGLäc¯4%ËŸÃœsŸóù~H 3d‚zŠ‡Ø........................................
Are the binary 1s and 0s getting converted to some other number system comprising of so many symbols ??

A text-viewer like NotePad will try to interpret the bytes as text, probably it will interpret the bytes as Unicode.
If you want to see the actual 0's and 1's then read in the image as bytes and convert the byte array to a string of 0's and 1's, for this you could use:
public static string ByteArrayToString(byte[] ba)
{
string hex = BitConverter.ToString(ba);
return hex.Replace("-","");
}
The conversion function is copied from here (accepted answer). Remember, this string will not anymore be interpretable as image, essentially this will simply be a large string of 0's and 1's.

Each byte in the file is comprised of 8 bits. When you use ReadAllBytes, you get an array of byte instances, where each byte represents a number between 0 and 255 (inclusive). One representation of the number 86 that is readable by humans is 01010110. However, when you use WriteAllBytes, it writes the sequence of bytes in the original form. Notepad is then loading the file and displaying each byte as a single character (or in some encodings treating multiple bytes as a single character to display). However, if you were to write "01010110" to a file such that Notepad shows those numbers, you would actually end up writing 8 bytes, not 8 bits, like this, where each set of 8 bits represents the digit '0' or '1':
00110000 00110001 00110000 00110001 00110000 00110001 00110001 00110000

Related

I'm looking for a way to turn output into an Octet

I'm writing a program that handles binary numbers in C#, after i convert the decimal numbers to binary, it will only take as much space as needed, but i need all 4 outputs to be octets (8 characters).
Let's say i convert 255.255.255.0 to binary i get the following outputs
11111111
11111111
11111111
0
what i would want is to get 7 zeroes behind the 0, as it needs to fill out all 8 spaces.
It is always 8 since I'm working with Subnet masks
I hope any of you can help, thank you. :)
You can use the below code. You can split the string and pad left each string.
string binary = String.Join(Environment.NewLine, (input.Split('.').Select(x => Convert.ToString(Int32.Parse(x), 2).PadLeft(8, '0'))).ToArray());

Maximum UTF-8 string size given UTF-16 size

What is the formula for determining the maximum number of UTF-8 bytes required to encode a given number of UTF-16 code units (i.e. the value of String.Length in C# / .NET)?
I see 3 possibilities:
# of UTF-16 code units x 2
# of UTF-16 code units x 3
# of UTF-16 code units x 4
A UTF-16 code point is represented by either 1 or 2 code units, so we just need to consider the worst case scenario of a string filled with one or the other. If a UTF-16 string is composed entirely of 2 code unit code points, then we know the UTF-8 representation will be at most the same size, since the code points take up a maximum of 4 bytes in both representations, thus worst case is option (1) above.
So the interesting case to consider, which I don't know the answer to, is the maximum number of bytes that a single code unit UTF-16 code point can require in UTF-8 representation.
If all single code unit UTF-16 code points can be represented with 3 UTF-8 bytes, which my gut tells me makes the most sense, then option (2) will be the worst case scenario. If there are any that require 4 bytes then option (3) will be the answer.
Does anyone have insight into which is correct? I'm really hoping for (1) or (2) as (3) is going to make things a lot harder :/
UPDATE
From what I can gather, UTF-16 encodes all characters in the BMP in a single code unit, and all other planes are encoded in 2 code units.
It seems that UTF-8 can encode the entire BMP within 3 bytes and uses 4 bytes for encoding the other planes.
Thus it seems to me that option (2) above is the correct answer, and this should work:
string str = "Some string";
int maxUtf8EncodedSize = str.Length * 3;
Does that seem like it checks out?
The worst case for a single UTF-16 word is U+FFFF which in UTF-16 is encoded just as-is (0xFFFF) Cyberchef. In UTF-8 it is encoded to ef bf bf (three bytes).
The worst case for two UTF-16 words (a "surrogate pair") is U+10FFFF which in UTF-16 is encoded as 0xDBFF DFFF. In UTF-8 it is encoded to f3 cf bf bf (four bytes).
Therefore the worst case is a load of U+FFFF's which will convert a UTF-16 string of length 2N bytes to a UTF-8 string of length 3N bytes.
So yes, you are correct. I don't think you need to consider stuff like glyphs because that sort of thing is done after decoding from UTF8/16 to code points.
Properly formed UTF-8 can be up to 4 bytes per Unicode codepoint.
UTF-16-encoded characters can be up to 2 16-bit sequences per Unicode codepoint.
Characters outside the basic multilingual plane (including emoji and languages that were added to more recent versions of Unicode) are represented in up to 21 bits, which in the UTF-8 format results in 4 byte sequences, which turn out to also take up 4 bytes in UTF-16.
However, there are some environments that do things weirdly. Since UTF-16 characters outside the basic multilingual plane take up to 2 16-bit sequences (they're detectible because they're always 16 bit sequences in the range U+D800 to U+DFFF), some mistaken UTF-8 implementations, usually referred to as CESU-8, that convert those UTF-8 sequences into two 3-byte UTF-8 sequences, for a total of six bytes per UTF-32 codepoint. (I believe some early Oracle DB implementations did this, and I'm sure they weren't the only ones).
There's one more minor wrench in things, which is that some glyphs are classified as combining characters, and multiple UTF-16 (or UTF-32) sequences are used when determining what gets displayed on the screen, but I don't think that applies in your case.
Based on your edit, it looks like you're trying to estimate the maximum length of .Net encoding conversion. String Length measures the total number of Chars, which are a count of UTF-16 codepoints. As a worst-case estimate, therefore, I believe you can safely estimate count(Char) * 3, because the non-BMP characters will be count(Char) * 2 yielding 4 bytes as UTF-8.
If you want to get the total number of UTF-32 codepoints represented, you should be able to do something like
var maximumUtf8Bytes = System.Globalization.StringInfo(myString).LengthInTextElements * 4;
(My C# is a bit rusty as I haven't used a .Net environment much in the last few years, but I think that does the trick).

byte ToString without converting from hexidecimal

I am using c# to read information coming out of a scale and I am getting back 6 bytes of Data. The last two contain the weight, in Hexadecimal. The way that it is set up is that if your append byte 5 on to byte 4 and convert to decimal you will get the correct weight.
I am trying to do this right now by using toString on the bytes and appending them but toString is automatically converting them from Hexadecimal to decimal. This is occurring before I can append them so I am getting incorrect weights.
Is there any way to convert a byte to a string without it being formatted from hexadecimal to decimal for you?
Use the X format string when calling ToString on your bytes to keep them in hexadecimal. You can append a number to X to specify the number of "digits" you want.
byte b = 0x0A;
b.ToString("X"); // A
b.ToString("X2"); // 0A

Succinct way to write a mixture of chars and bytes?

I'm trying to write an index file that follows the format of a preexisting (and immutable) text file.
The file is fixed length, with 11 bytes of string (in ASCII) followed by 4 bytes of long for a total of 15 bytes per line.
Perhaps I'm being a bit dim, but is there an simple way to do this? I get the feeling I need to open up two streams to write one line - one for the string and one for the bytes - but that feels wrong.
Any hints?
You can use BitConverter to convert between an int/long and an array of bytes. This way you would be able to write eleven bytes followed by four bytes, followed by eleven more bytes, and so on.
byte[] intBytes = BitConverter.GetBytes(intValue); // returns 4-byte array
Converting to bytes: BitConverter.GetBytes(int).
Converting back to int: BitConverter.ToInt32(byte\[\], int)
If you are developing a cross-platform solution, keep in mind the following note from the documentation (thanks to uriDium for the comment):
The order of bytes in the array returned by the GetBytes method depends on whether the computer architecture is little-endian or big-endian.

ASCII values in hexadecimal notation

I am trying to parse some output data from and PBX and I have found something that I can't really figure out.
In the documentation it says the following
Information for type of call and feature. Eight character for ’status information 3’ with following ASCII values in hexadecimal notation.
1. Character
Bit7 Incoming call
Bit6 Outgoing call
Bit5 Internal call
Bit4 CN call
2. Character
Bit3 Transferred call (transferring party inside)
Bit2 CN-transferred call (transferring party outside)
Bit1
Bit0
Any ideas how to interpret this? I have no raw data at the time to match against but I still need to figure it out.
Probably you'll receive two characters (hex digits: 0-9, A-F) First digit represents the hex value for the most significant 4 bits, next digit for the least significant 4 bits.
Example:
You will probably receive something like the string "7C" as hex representation of the bitmap: 01111100.
Eight character for ’status information 3’ with following ASCII values in hexadecimal notation.
If think this means the following.
You will get 8 bytes - one byte per line, I guess.
It is just the wrong term. They mean two hex digits per byte but call them characters.
So it is just a byte with bit flags - or more precisely a array of eight such bytes.
Bit
7 incoming
6 outgoing
5 internal
4 CN
3 transfered
2 CN transfered
1 unused?
0 unused?
You could map this to a enum.
[BitFlags]
public enum CallInformation : Byte
{
Incoming = 128,
Outgoing = 64,
Internal = 32,
CN = 16
Transfered = 8,
CNTransfered = 4,
Undefined = 0
}
Very hard without data. I'd guess that you will get two bytes (two ASCII characters), and need to pick them apart at the bit level.
For instance, if the first character is 'A', you will need to look up its character code (65, or hex 0x41), and then look at the bits. Of course the bits are the same regardless of decimal or hex, but its easer to do by hand in hex. 0x41 is bit 5 and bit 1 set, so that would be an "internal call". Bit 1 seems undocumented.
I'm not sure why it looks as if that would require two characters; it's only eight bits documented.

Categories