I'm trying to take in a file which contains hex values to convert to binary but the file reader is not reading in the next lines and it is stuck in a loop.
Here is what the hex file looks like:
3c011001
34300000
8e080000
Below is the code I use to generate the output:
using System;
using System.IO;
class MaddinClass
{
static void Main (string[] args)
{
StreamReader sr = new StreamReader("MachineCode.txt");
string binary_from_file = sr.ReadLine();
while(!sr.EndOfStream)
{
uint binary = Convert.ToUInt32(binary_from_file, 16);
Console.WriteLine(binary);
}
}
}
I am getting a loop result like this:
1006702593
1006702593
1006702593
I expect it to move to the next line and store a new binary result, instead it just prints the same value repetitively.
With your current setup you're not really reading the file to the end. You've created your StreamReader object, and read the first line from the file. You then end up in an endless loop due to:
while (!sr.EndOfStream)
Since you're loop body doesn't read any of the information from the stream, you're continuously processing the same line from the file you stored prior to entering the loop which is why you consistently see 1006702593. If you convert that value back to hexadecimal from decimal, you'll see that it matches your first hexadecimal input of 3c011001.
uint binary = Convert.ToUInt32(binary_from_file, 16);
Per the Microsoft instruction you should be assigning each line in the condition clause of your while loop. This will allow you to process each line individually until the end of the file, where ReadLine should return null since there is nothing left to read.
This example reads the contents of a text file, one line at a time,
into a string using the ReadLine method of the StreamReader class.
Each text line is stored into the string line and displayed on the
screen.
I would also like to point out that the line above isn't converting to binary, but rather an unsigned integer (hence your value of 1006702593 instead of 111100000000010001000000000001); you'll need to convert the result of that to string with base 2 representation, and unless you have a valid reason to use unsigned integers, I would used signed integers instead:
string binary = Convert.ToString(Convert.ToInt32(binary_from_file, 16), 2);
Below is a refactored copy paste from the link above to meet your needs:
string line;
using (StreamReader file = new StreamReader(#"c:\test.txt"))
while((line = file.ReadLine()) != null)
Console.WriteLine(Convert.ToString(Convert.ToInt32(line, 16), 2));
Console.ReadLine();
In a nutshell, the code above follows the execution path below:
Create the variable for storing each line.
Create a new StreamReader object pointed at your file.
Read each line from the file.
Print that line in a binary representation.
Close the stream.
Dispose of the stream.
Suspend the Console to prevent it from closing automatically.
Related
I've to convert a project from old VB6 to c#, the aim is to preserve the old code as much possible as I can, for a matter of time.
A function of the old project loads a binary file into a string variable, and then this variable is analyzezed in its single characters values with the asc function:
OLD VB Code:
Public Function LoadText(ByVal DirIn As String) As String
Dim FileBuffer As String
Dim LenghtFile As Long
Dim ContIN As Long
ContIN = FreeFile
Open DirIn For Binary Access Read As #ContIN
LenghtFile = LOF(ContIN)
FileBuffer = Space(LenghtFile)
Get #ContIN, , FileBuffer
Close #ContIN
LoadText = FileBuffer
'following line for test purpose
debug.print(asc(mid(filebuffer,1,1)))
debug.print(asc(mid(filebuffer,2,1)))
debug.print(asc(mid(filebuffer,3,1)))
End Function
SUB Main
dim testSTring as String
teststring=loadtext("e:\testme.bin")
end sub
Result in immediate window:
1
10
133
C# code:
public static string LoadText(string dirIn)
{
string myString, myString2;
FileStream fs = new FileStream(dirIn, FileMode.Open);
BinaryReader br = new BinaryReader(fs);
byte[] bin = br.ReadBytes(Convert.ToInt32(fs.Length));
//myString = Convert.ToBase64String(bin);
myString = Encoding.Default.GetString(bin);
string m1 = Encoding.Default.GetString(bin);
//string m1 = Encoding.ASCII.GetString(bin);
//string m1 = Encoding.BigEndianUnicode.GetString(bin);
//string m1 = Encoding.UTF32.GetString(bin);
//string m1 = Encoding.UTF7.GetString(bin);
//string m1 = Encoding.UTF8.GetString(bin);
//string m1 = Encoding.Unicode.GetString(bin);
//string m1 = Encoding.Unicode.GetString(bin);
Console.WriteLine(General.Asc(m1.Substring(0, 1)));
Console.WriteLine(General.Asc(m1.Substring(1, 1)));
Console.WriteLine(General.Asc(m1.Substring(2, 1)));
br.Close();
fs.Close();
return myString;
}
General class:
public static int Asc(string stringToEValuate)
{
return (int)stringToEValuate[0];
}
Result in output window:
1
10
8230 <--fail!
The string in VB6 has a length 174848, identical to the size of the test file.
In c# is the same size for DEFAUILT and ASCII encoding, while all the others has different size and i cannot use them unless i change everithing in the whole project.
The problem is that I can't find the correct encoding that permits to have a string which asc function returns identical numbers to the VB6 one.
The problem is all there, if the string is not identical I have to change a lot of lines of code, because the whole program is based on ASCii value and the position of it in the string.
Maybe it's the wrong way to load a binary into a string, or the Asc function..
If you want to try the example file you can download it from here:
http:// www.snokie.org / testme.bin
8230 is correct. It is a UTF-16 code unit for the Unicode codepoint (U+2026, which only needs one UTF-16 code unit). You expected 133. 133 as one byte is the encoding for the same character in at least one other character set: Windows-1252.
There is no text but encoded text.
When you read a text file you have to know the encoding that was used to write it. Once you read into a .NET String or Char, you have it in Unicode's UTF-16 encoding. Because Unicode is a superset of any character set you would be using, it is not incorrect.
If you don't want to compare characters as characters, read them as binary to keep it in them in the same encoding as the file. You can then compare the byte sequences.
The problem is that the VB6 code, rather than using Unicode for character code like it should have, used the "default ANSI" character set, which changes meaning from system to system and user to user.
The problem is this: "old project loads a binary file into a string variable". Yes, this was a common—but bad—VB6 practice. String datatypes are for text. Strings in VB6 are UTF-16 code unit sequences, just like in .NET (and Java, JavaScript, HTML, XML, …).
Get #ContIN, , FileBuffer converts from the system's default ANSI code page to UTF-16 and Asc converts it back again. So, you just have to do that in your .NET code, too.
Note: Just like in the VB6, Encoding.Default is hazardous because it can vary from system to system and user to user.
Reference Microsoft.VisualBasic.dll and
using static Microsoft.VisualBasic.Strings;
Then
var fileBuffer = File.ReadAllText(path, Encoding.Default);
Debug.WriteLine(Asc(Mid(fileBuffer, 3, 1));
If you'd rather not bring Microsoft.VisualBasic.dll into a C# project, you can write your own versions
static class VB6StringReplacements
{
static public Byte Asc(String source) =>
Encoding.Default.GetBytes(source.Substring(0,1)).FirstOrDefault();
static public String Mid(String source, Int32 offset, Int32 length) =>
source.Substring(offset, length);
}
and, change your using directive to
using static VB6StringReplacements;
How it is all setup:
I receive a byte[] which contains CSV data
I don't know the encoding (should be unicode / utf8)
I need to detect the encoding or fallback to a default (the text may contain umlauts, so the encoding is important)
I need to read the header line and compare it with defined strings
After a short search I how to get a string out of the byte[] I found How to convert byte[] to string? which stated to use something like
string result = System.Text.Encoding.UTF8.GetString(byteArray);
I (know) use this helper to detect the encoding and afterwards the Encoding.GetString method to read the string like so:
string csvFile = TextFileEncodingDetector.DetectTextByteArrayEncoding(data).GetString(data);
But when I now try to compare values from this result string with static strings in my code all comparisons fails!
// header is the first line from the string that I receive from EncodingHelper.ReadData(data)
for (int i = 0; i < headers.Count; i++) {
switch (headers[i].Trim().ToLower()) {
case "number":
// do
break;
default:
throw new Exception();
}
}
// where (headers[i].Trim().ToLower()) => "number"
While this seems to be a problem with the encoding of both strings my question is:
How can I detect the encoding of a string from a byte[] and convert it into the default encoding so that I am able to work with that string data?
Edit
The code supplied above was working as long the string data came from a file that was saved this way:
string tempFile = Path.GetTempFileName();
StreamReader reader = new StreamReader(inputStream);
string line = null;
TextWriter tw = new StreamWriter(tempFile);
fileCount++;
while ((line = reader.ReadLine()) != null)
{
if (line.Length > 1)
{
tw.WriteLine(line);
}
}
tw.Close();
and afterwards read out with
File.ReadAllText()
This
A. Forces the file to be unicode (ANSI format kills all umlauts)
B. requires the written file be accessible
Now I only got the inputStream and tried what I posted above. And as I mentioned this worked before and the strings look identical. But they are not.
Note: If I use ANSI encoded file, which uses Encoding.Default all works fine.
Edit 2
While ANSI encoded data work the UTF8 Encoded (notepadd++ only show UTF-8 not w/o BOM) start with char [0]: 65279
So where is my error because I guess System.Text.Encoding.UTF8.GetString(byteArray) is working the right way.
Yes, Encoding.GetString doesn't strip the BOM (see https://stackoverflow.com/a/11701560/613130). You could:
string result;
using (var memoryStream = new MemoryStream(byteArray))
{
result = new StreamReader(memoryStream).ReadToEnd();
}
The StreamReader will autodetect the encoding (your encoding detector is a copy of the StreamReader.DetectEncoding())
I am trying to write and read a binary file using c# BinaryWriter and BinaryReader classes.
When I am storing a string in file, it is storing it properly, but when I am trying to read it is returning a string which has '\0' character on every alternate place within the string.
Here is the code:
public void writeBinary(BinaryWriter bw)
{
bw.Write("Hello");
}
public void readBinary(BinaryReader br)
{
BinaryReader br = new BinaryReader(fs);
String s;
s = br.ReadString();
}
Here s is getting value as = "H\0e\0l\0l\0o\0".
You are using different encodings when reading and writing the file.
You are using UTF-16 when writing the file, so each character ends up as a 16 bit character code, i.e. two bytes.
You are using UTF-8 or some of the 8-bit encodings when reading the file, so each byte will end up as one character.
Pick one encoding and use for both reading and writing the file.
I have a file that contains text data and binary data. This may not be a good idea, but there's nothing I can do about it.
I know the end and start positions of the binary data.
What would be the best way to read in that binary data between those positions, make a Base64 string out of it, and then write it back to the position it was.
EDIT: The Base64-encoded string won't be same length as the binary data, so I might have to pad the Base64 string to the binary data length.
int binaryStart = 100;
int binaryEnd = 150;
//buffer to copy the remaining data to it and insert it after inserting the base64string
byte[] dataTailBuffer = null;
string base64String = null;
//get the binary data and convert it to base64string
using (System.IO.Stream fileStream = new FileStream(#"c:\Test Soap", FileMode.Open, FileAccess.Read))
{
using (System.IO.BinaryReader reader = new BinaryReader(fileStream))
{
reader.BaseStream.Seek(binaryStart, SeekOrigin.Begin);
var buffer = new byte[binaryEnd - binaryStart];
reader.Read(buffer, 0, buffer.Length);
base64String = Convert.ToBase64String(buffer);
if (reader.BaseStream.Position < reader.BaseStream.Length - 1)
{
dataTailBuffer = new byte[reader.BaseStream.Length - reader.BaseStream.Position];
reader.Read(dataTailBuffer, 0, dataTailBuffer.Length);
}
}
}
//write the new base64string at specifid location.
using (System.IO.Stream fileStream = new FileStream(#"C:\test soap", FileMode.Open, FileAccess.Write))
{
using (System.IO.BinaryWriter writer = new BinaryWriter(fileStream))
{
writer.Seek(binaryStart, SeekOrigin.Begin);
writer.Write(base64String);//writer.Write(Convert.FromBase64String(base64String));
if (dataTailBuffer != null)
{
writer.Write(dataTailBuffer, 0, dataTailBuffer.Length);
}
}
}
You'll want to use a FileStream object, and the Read(byte[], int, int) and Write(byte[], int, int) methods.
Although the point about base64 being bigger than binary is valid - you'll actually need to grab the data beyond the end point of what you want to replace, store it, write to the file with your new data, then write out the stored data after you finish.
I trust you're not trying to mod exe files to write viruses here... ;)
Clearly, writing out base-64 in the place of binary data cannot work, since the base-64 will be longer. So the question is, what do you need to do this for?
I will speculate that you have inherited this terrible binary file format, and you would like to use a text-editor to edit the textual portions of this binary file. If that is the case, then perhaps a more robust round-tripping binary-to-text-to-binary conversion is what you need.
I recommend using base-64 for the binary portions, but the rest of the file should be wrapped up in XML, or some other format that would be easy to parse and interpret. XML is good, because the parsers for it are already available in the system.
<mydoc>
<t>Original text</t>
<b fieldId="1">base-64 binary</b>
<t>Hello, world!</t>
<b fieldId="2">928h982hr98h2984hf</b>
</mydoc>
This file can be easily created from your specification, and it can be easily edited in any text editor. Then the file can be converted back into the original format. If any text intrudes into the binary fields, then it can be truncated. Likewise, text that is too short could be padded with spaces.
In C#, I have a string that I'm obtaining from WebClient.DownloadString. I've tried setting client.Encoding to new UTF8Encoding(false), but that's made no difference - I still end up with a byte order mark for UTF-8 at the beginning of the result string. I need to remove this (to parse the resulting XML with LINQ), and want to do so in memory.
So I have a string that starts with \x00EF\x00BB\x00BF, and I want to remove that if it exists. Right now I'm using
if (xml.StartsWith(ByteOrderMarkUtf8))
{
xml = xml.Remove(0, ByteOrderMarkUtf8.Length);
}
but that just feels wrong. I've tried all sorts of code with streams, GetBytes, and encodings, and nothing works. Can anyone provide the "right" algorithm to strip a BOM from a string?
I recently had issues with the .NET 4 upgrade, but until then the simple answer is
String.Trim()
removes the BOM up until .NET 3.5.
However, in .NET 4 you need to change it slightly:
String.Trim(new char[]{'\uFEFF'});
That will also get rid of the byte order mark, though you may also want to remove the ZERO WIDTH SPACE (U+200B):
String.Trim(new char[]{'\uFEFF','\u200B'});
This you could also use to remove other unwanted characters.
Some further information is from
String.Trim Method:
The .NET Framework 3.5 SP1 and earlier versions maintain an internal list of white-space characters that this method trims. Starting with the .NET Framework 4, the method trims all Unicode white-space characters (that is, characters that produce a true return value when they are passed to the Char.IsWhiteSpace method). Because of this change, the Trim method in the .NET Framework 3.5 SP1 and earlier versions removes two characters, ZERO WIDTH SPACE (U+200B) and ZERO WIDTH NO-BREAK SPACE (U+FEFF), that the Trim method in the .NET Framework 4 and later versions does not remove. In addition, the Trim method in the .NET Framework 3.5 SP1 and earlier versions does not trim three Unicode white-space characters: MONGOLIAN VOWEL SEPARATOR (U+180E), NARROW NO-BREAK SPACE (U+202F), and MEDIUM MATHEMATICAL SPACE (U+205F).
I had some incorrect test data, which caused me some confusion. Based on How to avoid tripping over UTF-8 BOM when reading files I found that this worked:
private readonly string _byteOrderMarkUtf8 =
Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
public string GetXmlResponse(Uri resource)
{
string xml;
using (var client = new WebClient())
{
client.Encoding = Encoding.UTF8;
xml = client.DownloadString(resource);
}
if (xml.StartsWith(_byteOrderMarkUtf8, StringComparison.Ordinal))
{
xml = xml.Remove(0, _byteOrderMarkUtf8.Length);
}
return xml;
}
Setting the client Encoding property correctly reduces the BOM to a single character. However, XDocument.Parse still will not read that string. This is the cleanest version I've come up with to date.
This works as well
int index = xmlResponse.IndexOf('<');
if (index > 0)
{
xmlResponse = xmlResponse.Substring(index, xmlResponse.Length - index);
}
A quick and simple method to remove it directly from a string:
private static string RemoveBom(string p)
{
string BOMMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (p.StartsWith(BOMMarkUtf8))
p = p.Remove(0, BOMMarkUtf8.Length);
return p.Replace("\0", "");
}
How to use it:
string yourCleanString=RemoveBom(yourBOMString);
If the variable xml is of type string, you did something wrong already - in a character string, the BOM should not be represented as three separate characters, but as a single code point.
Instead of using DownloadString, use DownloadData, and parse byte arrays instead. The XML parser should recognize the BOM itself, and skip it (except for auto-detecting the document encoding as UTF-8).
I had a very similar problem (I needed to parse an XML document represented as a byte array that had a byte order mark at the beginning of it). I used one of Martin's comments on his answer to come to a solution. I took the byte array I had (instead of converting it to a string) and created a MemoryStream object with it. Then I passed it to XDocument.Load, which worked like a charm. For example, let's say that xmlBytes contains your XML in UTF-8 encoding with a byte mark at the beginning of it. Then, this would be the code to solve the problem:
var stream = new MemoryStream(xmlBytes);
var document = XDocument.Load(stream);
It's that simple.
If starting out with a string, it should still be easy to do (assume xml is your string containing the XML with the byte order mark):
var bytes = Encoding.UTF8.GetBytes(xml);
var stream = new MemoryStream(bytes);
var document = XDocument.Load(stream);
I wrote the following post after coming across this issue.
Essentially instead of reading in the raw bytes of the file's contents using the BinaryReader class, I use the StreamReader class with a specific constructor which automatically removes the byte order mark character from the textual data I am trying to retrieve.
It's of course best if you can strip it out while still on the byte array level to avoid unwanted substrings / allocs. But if you already have a string, this is perhaps the easiest and most performant way to handle this.
Usage:
string feed = ""; // input
bool hadBOM = FixBOMIfNeeded(ref feed);
var xElem = XElement.Parse(feed); // now does not fail
/// <summary>
/// You can get this or test it originally with: Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble())[0];
/// But no need, this way we have a constant. As these three bytes `[239, 187, 191]` (a BOM) evaluate to a single C# char.
/// </summary>
public const char BOMChar = (char)65279;
public static bool FixBOMIfNeeded(ref string str)
{
if (string.IsNullOrEmpty(str))
return false;
bool hasBom = str[0] == BOMChar;
if (hasBom)
str = str.Substring(1);
return hasBom;
}
Pass the byte buffer (via DownloadData) to string Encoding.UTF8.GetString(byte[]) to get the string rather than download the buffer as a string. You probably have more problems with your current method than just trimming the byte order mark. Unless you're properly decoding it as I suggest here, Unicode characters will probably be misinterpreted, resulting in a corrupted string.
Martin's answer is better, since it avoids allocating an entire string for XML that still needs to be parsed anyway. The answer I gave best applies to general strings that don't need to be parsed as XML.
I ran into this when I had a Base64 encoded file to transform into the string. While I could have saved it to a file and then read it correctly, here's the best solution I could think of to get from the byte[] of the file to the string (based lightly on TrueWill's answer):
public static string GetUTF8String(byte[] data)
{
byte[] utf8Preamble = Encoding.UTF8.GetPreamble();
if (data.StartsWith(utf8Preamble))
{
return Encoding.UTF8.GetString(data, utf8Preamble.Length, data.Length - utf8Preamble.Length);
}
else
{
return Encoding.UTF8.GetString(data);
}
}
Where StartsWith(byte[]) is the logical extension:
public static bool StartsWith(this byte[] thisArray, byte[] otherArray)
{
// Handle invalid/unexpected input
// (nulls, thisArray.Length < otherArray.Length, etc.)
for (int i = 0; i < otherArray.Length; ++i)
{
if (thisArray[i] != otherArray[i])
{
return false;
}
}
return true;
}
StreamReader sr = new StreamReader(strFile, true);
XmlDocument xdoc = new XmlDocument();
xdoc.Load(sr);
Yet another generic variation to get rid of the UTF-8 BOM preamble:
var preamble = Encoding.UTF8.GetPreamble();
if (!functionBytes.Take(preamble.Length).SequenceEqual(preamble))
preamble = Array.Empty<Byte>();
return Encoding.UTF8.GetString(functionBytes, preamble.Length, functionBytes.Length - preamble.Length);
Use a regex replace to filter out any other characters other than the alphanumeric characters and spaces that are contained in a normal certificate thumbprint value:
certficateThumbprint = Regex.Replace(certficateThumbprint, #"[^a-zA-Z0-9\-\s*]", "");
And there you go. Voila!! It worked for me.
I solved the issue with the following code
using System.Xml.Linq;
void method()
{
byte[] bytes = GetXmlBytes();
XDocument doc;
using (var stream = new MemoryStream(docBytes))
{
doc = XDocument.Load(stream);
}
}