C# Read and replace binary data in text file - c#

I have a file that contains text data and binary data. This may not be a good idea, but there's nothing I can do about it.
I know the end and start positions of the binary data.
What would be the best way to read in that binary data between those positions, make a Base64 string out of it, and then write it back to the position it was.
EDIT: The Base64-encoded string won't be same length as the binary data, so I might have to pad the Base64 string to the binary data length.

int binaryStart = 100;
int binaryEnd = 150;
//buffer to copy the remaining data to it and insert it after inserting the base64string
byte[] dataTailBuffer = null;
string base64String = null;
//get the binary data and convert it to base64string
using (System.IO.Stream fileStream = new FileStream(#"c:\Test Soap", FileMode.Open, FileAccess.Read))
{
using (System.IO.BinaryReader reader = new BinaryReader(fileStream))
{
reader.BaseStream.Seek(binaryStart, SeekOrigin.Begin);
var buffer = new byte[binaryEnd - binaryStart];
reader.Read(buffer, 0, buffer.Length);
base64String = Convert.ToBase64String(buffer);
if (reader.BaseStream.Position < reader.BaseStream.Length - 1)
{
dataTailBuffer = new byte[reader.BaseStream.Length - reader.BaseStream.Position];
reader.Read(dataTailBuffer, 0, dataTailBuffer.Length);
}
}
}
//write the new base64string at specifid location.
using (System.IO.Stream fileStream = new FileStream(#"C:\test soap", FileMode.Open, FileAccess.Write))
{
using (System.IO.BinaryWriter writer = new BinaryWriter(fileStream))
{
writer.Seek(binaryStart, SeekOrigin.Begin);
writer.Write(base64String);//writer.Write(Convert.FromBase64String(base64String));
if (dataTailBuffer != null)
{
writer.Write(dataTailBuffer, 0, dataTailBuffer.Length);
}
}
}

You'll want to use a FileStream object, and the Read(byte[], int, int) and Write(byte[], int, int) methods.
Although the point about base64 being bigger than binary is valid - you'll actually need to grab the data beyond the end point of what you want to replace, store it, write to the file with your new data, then write out the stored data after you finish.
I trust you're not trying to mod exe files to write viruses here... ;)

Clearly, writing out base-64 in the place of binary data cannot work, since the base-64 will be longer. So the question is, what do you need to do this for?
I will speculate that you have inherited this terrible binary file format, and you would like to use a text-editor to edit the textual portions of this binary file. If that is the case, then perhaps a more robust round-tripping binary-to-text-to-binary conversion is what you need.
I recommend using base-64 for the binary portions, but the rest of the file should be wrapped up in XML, or some other format that would be easy to parse and interpret. XML is good, because the parsers for it are already available in the system.
<mydoc>
<t>Original text</t>
<b fieldId="1">base-64 binary</b>
<t>Hello, world!</t>
<b fieldId="2">928h982hr98h2984hf</b>
</mydoc>
This file can be easily created from your specification, and it can be easily edited in any text editor. Then the file can be converted back into the original format. If any text intrudes into the binary fields, then it can be truncated. Likewise, text that is too short could be padded with spaces.

Related

Storing result convert.FromBase64String into postgres bytea

I have a mini system where frontend will pass the file in base64 format to the backend. I used convert.frombase64string in order for me to convert the base64 format into byte array and used file stream to save the file into the server.
The codes are shown as below:
byte[] bytes = Convert.FromBase64String(file.Split(',')[1]);
using (var file = new FileStream("D:test.txt", FileMode.Create))
{
file.Write(bytes, 0, bytes.Length);
file.Flush();
}
var db = await _context.insertDB.FromSql("INSERT INTO blobTable (blob) VALUES ('" + bytes + "')").SingleAsync();
And this is the result that I select from blobTable.
As the return result from Convert.FromBase64String() is byte array. So that I decided to store this value into my database which is postgres with column bytea[].
The problem is that is it so weird that when I trying to
console.writeline(bytes);
The result is printed as "system.byte[]" instead of the bytes value. So that "system.byte[]" is stored into my database instead of the actual value.
Can anybody tell me how do I store the return value from convert.frombase64string() into postgres bytea[] column? Thanks you.

Image to binary returning System.byte[] instead of binary format in asp.net

I'm trying to convert image to binary and then store in database. I have a code to do this and after several Google searches, most answers are like the code I have written. The error I have is that instead of seeing a binary format in my database I'm getting System.byte[] as output. I also debugged and got the same thing.
Here's part of the code
if (Upload.HasFile)
{
HttpPostedFile postedFile = Upload.PostedFile;
string filename = Path.GetFileName(postedFile.FileName);
string fileExtension = Path.GetExtension(filename);
int filesize = postedFile.ContentLength;
if (fileExtension.ToLower() == ".jpg")
{
Stream stream = postedFile.InputStream;
BinaryReader binaryreader = new BinaryReader(stream);
byte[] bytes = binaryreader.ReadBytes((int)stream.Length);
Debug.WriteLine(bytes);
}
}
The result of my debug gives System.byte[] as output.
You can convert a byte array to string for DB storage
var storedString = Convert.ToBase64String(bytes);
and get the byte array from the stored string
bytes = Convert.FromBase64String(storedString);
If you really want to use the Binary format, you can look into the SoapHexBinary class, particularly the Parse() method and Value property

BinaryReader in c# reads '\0' between all characters of a string

I am trying to write and read a binary file using c# BinaryWriter and BinaryReader classes.
When I am storing a string in file, it is storing it properly, but when I am trying to read it is returning a string which has '\0' character on every alternate place within the string.
Here is the code:
public void writeBinary(BinaryWriter bw)
{
bw.Write("Hello");
}
public void readBinary(BinaryReader br)
{
BinaryReader br = new BinaryReader(fs);
String s;
s = br.ReadString();
}
Here s is getting value as = "H\0e\0l\0l\0o\0".
You are using different encodings when reading and writing the file.
You are using UTF-16 when writing the file, so each character ends up as a 16 bit character code, i.e. two bytes.
You are using UTF-8 or some of the 8-bit encodings when reading the file, so each byte will end up as one character.
Pick one encoding and use for both reading and writing the file.

Filestream prepends junk characters while read

I am reading a simple text file which contains single line using filestream class. But it seems filestream.read prepends some junk character in the beginning.
Below the code.
using (var _fs = File.Open(_idFilePath, FileMode.Open, FileAccess.ReadWrite, FileShare.Read))
{
byte[] b = new byte[_fs.Length];
UTF8Encoding temp = new UTF8Encoding(true);
while (_fs.Read(b, 0, b.Length) > 0)
{
Console.WriteLine(temp.GetString(b));
Console.WriteLine(ASCIIEncoding.ASCII.GetString(b));
}
}
for example: My data in text file is just "sample". But the above code returns
"?sample" and
"???sample"
Whats the reason?? is it start of the file indicator? is there a way to read only my actual content??
The byte order mark(BOM) consists of the Unicode char 0xFEFF and is used to mark a file with the encoding used for it.
So if you correctly decode the file as UTF8 you get that character as first char of your string. If you incorrectly decode it as ANSI you get 3 chars, since the UTF8 encoding of 0xFEFF is the byte sequence "EF BB BF" which is 3 bytes.
But your whole code can be replaced with
File.ReadAllText(fileName,Encoding.UTF8)
and that should remove the BOM too. Or you leave out the encoding parameter and let the function autodetect the encoding(for which it uses the BOM)
Could be the BOM - a.k.a byte order mark.
You are reading the BOM from the stream. If you are reading text, try using a StreamReader which will handle this automatically.
Try instead
using (StreamReader sr = new StreamReader(File.Open(path),Encoding.UTF8))
It will definitely strip you the BOM

Using .NET how to convert ISO 8859-1 encoded text files that contain Latin-1 accented characters to UTF-8

I am being sent text files saved in ISO 88591-1 format that contain accented characters from the Latin-1 range (as well as normal ASCII a-z, etc.). How do I convert these files to UTF-8 using C# so that the single-byte accented characters in ISO 8859-1 become valid UTF-8 characters?
I have tried to use a StreamReader with ASCIIEncoding, and then converting the ASCII string to UTF-8 by instantiating encoding ascii and encoding utf8 and then using Encoding.Convert(ascii, utf8, ascii.GetBytes( asciiString) ) — but the accented characters are being rendered as question marks.
What step am I missing?
You need to get the proper Encoding object. ASCII is just as it's named: ASCII, meaning that it only supports 7-bit ASCII characters. If what you want to do is convert files, then this is likely easier than dealing with the byte arrays directly.
using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName,
Encoding.GetEncoding("iso-8859-1")))
{
using (System.IO.StreamWriter writer = new System.IO.StreamWriter(
outFileName, Encoding.UTF8))
{
writer.Write(reader.ReadToEnd());
}
}
However, if you want to have the byte arrays yourself, it's easy enough to do with Encoding.Convert.
byte[] converted = Encoding.Convert(Encoding.GetEncoding("iso-8859-1"),
Encoding.UTF8, data);
It's important to note here, however, that if you want to go down this road then you should not use an encoding-based string reader like StreamReader for your file IO. FileStream would be better suited, as it will read the actual bytes of the files.
In the interest of fully exploring the issue, something like this would work:
using (System.IO.FileStream input = new System.IO.FileStream(fileName,
System.IO.FileMode.Open,
System.IO.FileAccess.Read))
{
byte[] buffer = new byte[input.Length];
int readLength = 0;
while (readLength < buffer.Length)
readLength += input.Read(buffer, readLength, buffer.Length - readLength);
byte[] converted = Encoding.Convert(Encoding.GetEncoding("iso-8859-1"),
Encoding.UTF8, buffer);
using (System.IO.FileStream output = new System.IO.FileStream(outFileName,
System.IO.FileMode.Create,
System.IO.FileAccess.Write))
{
output.Write(converted, 0, converted.Length);
}
}
In this example, the buffer variable gets filled with the actual data in the file as a byte[], so no conversion is done. Encoding.Convert specifies a source and destination encoding, then stores the converted bytes in the variable named...converted. This is then written to the output file directly.
Like I said, the first option using StreamReader and StreamWriter will be much simpler if this is all you're doing, but the latter example should give you more of a hint as to what's actually going on.
If the files are relatively small (say, ~10 megabytes), you'll only need two lines of code:
string txt = System.IO.File.ReadAllText(inpPath, Encoding.GetEncoding("iso-8859-1"));
System.IO.File.WriteAllText(outPath, txt);

Categories