System.Text.Encoding.Default.GetBytes fails - c#

Here is my sample code:
CodeSnippet 1: This code executes in my file repository server and returns the file as encoded string using the WCF Service:
byte[] fileBytes = new byte[0];
using (FileStream stream = System.IO.File.OpenRead(#"D:\PDFFiles\Sample1.pdf"))
{
fileBytes = new byte[stream.Length];
stream.Read(fileBytes, 0, fileBytes.Length);
stream.Close();
}
string retVal = System.Text.Encoding.Default.GetString(fileBytes); // fileBytes size is 209050
Code Snippet 2:
Client box, which demanded the PDF file, receives the encoded string and converts to PDF and save to local.
byte[] encodedBytes = System.Text.Encoding.Default.GetBytes(retVal); /// GETTING corrupted here
string pdfPath = #"C:\DemoPDF\Sample2.pdf";
using (FileStream fileStream = new FileStream(pdfPath, FileMode.Create)) //encodedBytes is 327279
{
fileStream.Write(encodedBytes, 0, encodedBytes.Length);
fileStream.Close();
}
Above code working absolutely fine Framework 4.5 , 4.6.1
When I use the same code in Asp.Net Core 2.0, it fails to convert to Byte Array properly. I am not getting any runtime error but, the final PDF is not able to open after it is created. Throws error as pdf file is corrupted.
I tried with Encoding.Unicode and Encoding.UTF-8 also. But getting same error for final PDF.
Also, I have noticed that when I use Encoding.Unicode, atleast the Original Byte Array and Result byte array size are same. But other encoding types are mismatching with bytes size also.
So, the question is, System.Text.Encoding.Default.GetBytes broken in .NET Core 2.0 ?
I have edited my question for better understanding.
Sample1.pdf exists on a different server and communicate using WCF to transmit the data to Client which stores the file encoded stream and converts as Sample2.pdf
Hopefully my question makes some sense now.

1: the number of times you should ever use Encoding.Default is essentially zero; there may be a hypothetical case, but if there is one: it is elusive
2: PDF files are not text, so trying to use an Encoding on them is just... wrong; you aren't "GETTING corrupted here" - it just isn't text.
You may wish to see Extracting text from PDFs in C# or Reading text from PDF in .NET
If you simply wish to copy the content without parsing it: File.Copy or Stream.CopyTo are good options.

Related

File.ReadAllText does not return full content in C#

In my c# program, I have an image which is successfully stored in a byte[] data called bytes. I successfully write it into a .txt file using the following code
using (FileStream file = new FileStream("text.txt", FileMode.Create, FileAccess.Write))
{
file.Write(bytes, 0, numToWrite);
file.Close();
}
The above code stores the exact content I wish to store.
Whenever I wish to read the content of the file, text.txt, into textbox I only get the first line or little part of the first line. But when I open the file, text.txt, I see the complete content.
This is the code I use to read the file
string kk = File.ReadAllText("text.txt");
You have said at the start of the question that you have a byte[] that you are writing into the file. It's not clear why you decided not to use File.WriteAllBytes but let's assume that your code is correctly writing all the data into the file called "text.txt", which has been explained in comments does not magically make this a text file.
Using File.ReadAllText is not going to work because The data in the file is binary data, not text. As you can see from the remarks on the documentation, it will try to decide the encoding of the text file (which won't work because it contains binary data) and will do end of line processing which you won't want for a binary file.
The best way to read the data back is to use File.ReadAllBytes, which gives you back a byte[], just like you started with.

c# How to undo Encoding.UTF8.GetBytes or convert to File.ReadAllBytes

C# application was written, to transfer files to FTP server. And function below was used to read jpeg file. This is bad function because it corrupts jpeg :
StreamReader sourceStream = new StreamReader("image.jpeg");
byte[] fileContents = Encoding.UTF8.GetBytes(sourceStream.ReadToEnd());
The code below would work for the file transfer.:
fileContents = File.ReadAllBytes(sourceStream.ReadToEnd());
And now i have library of corrupted jpegs.
How to fix the mess?
You shouldn't use StreamReader at all for reading binary files, it's a TextReader. Even your 2nd piece of code is wrong, unless sourceStream only contains a file name.
It's likely that your data is corrupted beyond repair. You can do the inverse with Encoding.UTF8.GetString and StreamWriter, but your encoding has most likely caused irreparable damage already.

Audio file is not working via FTP upload programatically

I am uploading an .mp3 file via FTP code using C#, the file is uploaded successfully on server but when i bind to a simple audio control or directly view in browser it does not work as expected, whereas when i upload manually on the server it works perfectly.
Code:
var inputStream = FileUpload1.PostedFile.InputStream;
byte[] fileBytes = new byte[inputStream.Length];
inputStream.Read(fileBytes, 0, fileBytes.Length);
Note: When i view the file in Firefox it shows MIME type is not supported.
Thanks!
You're reading the file as a string then using UTF8 encoding to turn it into bytes. If you do that, and the file contains any binary sequence that doesn't code to a valid UTF8 value, parts of the data stream will simply get discarded.
Instead, read it directly as bytes. Don't bother with the StreamReader. Call the Read() method on the underlying stream. Example:
var inputStream = FileUpload1.PostedFile.InputStream
byte[] fileBytes = new byte[inputStream.Length];
inputStream.Read(fileBytes, 0, fileStream.Length);

byte array to pdf

I am trying to convert content of a file stored in a sql column to a pdf.
I use the following piece of code:
byte[] bytes;
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, fileContent);
bytes = ms.ToArray();
System.IO.File.WriteAllBytes("hello.pdf", bytes);
The pdf generated is corrupt in the sense that when I open the pdf in notepad++, I see some junk header (which is same irrespective of the fileContent). The junk header is NUL SOH NUL NUL NUL ....
You shouldn't be using the BinaryFormatter for this - that's for serializing .Net types to a binary file so they can be read back again as .Net types.
If it's stored in the database, hopefully, as a varbinary - then all you need to do is get the byte array from that (that will depend on your data access technology - EF and Linq to Sql, for example, will create a mapping that makes it trivial to get a byte array) and then write it to the file as you do in your last line of code.
With any luck - I'm hoping that fileContent here is the byte array? In which case you can just do
System.IO.File.WriteAllBytes("hello.pdf", fileContent);
Usually this happens if something is wrong with the byte array.
File.WriteAllBytes("filename.PDF", Byte[]);
This creates a new file, writes the specified byte array to the file, and then closes the file. If the target file already exists, it is overwritten.
Asynchronous implementation of this is also available.
public static System.Threading.Tasks.Task WriteAllBytesAsync
(string path, byte[] bytes, System.Threading.CancellationToken cancellationToken = null);

Reliable way to convert a file to a byte[]

I found the following code on the web:
private byte [] StreamFile(string filename)
{
FileStream fs = new FileStream(filename, FileMode.Open,FileAccess.Read);
// Create a byte array of file stream length
byte[] ImageData = new byte[fs.Length];
//Read block of bytes from stream into the byte array
fs.Read(ImageData,0,System.Convert.ToInt32(fs.Length));
//Close the File Stream
fs.Close();
return ImageData; //return the byte data
}
Is it reliable enough to use to convert a file to byte[] in c#, or is there a better way to do this?
byte[] bytes = System.IO.File.ReadAllBytes(filename);
That should do the trick. ReadAllBytes opens the file, reads its contents into a new byte array, then closes it. Here's the MSDN page for that method.
byte[] bytes = File.ReadAllBytes(filename)
or ...
var bytes = File.ReadAllBytes(filename)
Not to repeat what everyone already have said but keep the following cheat sheet handly for File manipulations:
System.IO.File.ReadAllBytes(filename);
File.Exists(filename)
Path.Combine(folderName, resOfThePath);
Path.GetFullPath(path); // converts a relative path to absolute one
Path.GetExtension(path);
All these answers with .ReadAllBytes(). Another, similar (I won't say duplicate, since they were trying to refactor their code) question was asked on SO here: Best way to read a large file into a byte array in C#?
A comment was made on one of the posts regarding .ReadAllBytes():
File.ReadAllBytes throws OutOfMemoryException with big files (tested with 630 MB file
and it failed) – juanjo.arana Mar 13 '13 at 1:31
A better approach, to me, would be something like this, with BinaryReader:
public static byte[] FileToByteArray(string fileName)
{
byte[] fileData = null;
using (FileStream fs = File.OpenRead(fileName))
{
var binaryReader = new BinaryReader(fs);
fileData = binaryReader.ReadBytes((int)fs.Length);
}
return fileData;
}
But that's just me...
Of course, this all assumes you have the memory to handle the byte[] once it is read in, and I didn't put in the File.Exists check to ensure the file is there before proceeding, as you'd do that before calling this code.
looks good enough as a generic version. You can modify it to meet your needs, if they're specific enough.
also test for exceptions and error conditions, such as file doesn't exist or can't be read, etc.
you can also do the following to save some space:
byte[] bytes = System.IO.File.ReadAllBytes(filename);
Others have noted that you can use the built-in File.ReadAllBytes. The built-in method is fine, but it's worth noting that the code you post above is fragile for two reasons:
Stream is IDisposable - you should place the FileStream fs = new FileStream(filename, FileMode.Open,FileAccess.Read) initialization in a using clause to ensure the file is closed. Failure to do this may mean that the stream remains open if a failure occurs, which will mean the file remains locked - and that can cause other problems later on.
fs.Read may read fewer bytes than you request. In general, the .Read method of a Stream instance will read at least one byte, but not necessarily all bytes you ask for. You'll need to write a loop that retries reading until all bytes are read. This page explains this in more detail.
string filePath= #"D:\MiUnidad\testFile.pdf";
byte[] bytes = await System.IO.File.ReadAllBytesAsync(filePath);

Categories