Invalid Character in base64 string when decoding XML

Invalid Character in base64 string when decoding XML - c#

We have a Winform client app that is comsuming a web service we write. This client app requests documents that are contained in XML files, generally a PDF written to a base64 encoded binary field in the XML file.
Client successfully downloads, decodes, and opens 99% of the documents correctly.
However, we've started encountering some files that are failing when the client makes this call:
byte[] buffer = Convert.FromBase64String(xNode["fileIMAGE"].InnerText);
System.FormatException-
Message="Invalid character in a Base-64 string."
Source="mscorlib"
We've written out the base64 blob from the XML file to a text file. I don't see any "\0" characters. I could post the whole blob, but it's quite large.
Any ideas?

Issue Resolved
To stream the file from the server, we use a callback function to read/write chunks of the file. We were base64encoding each chunk. WRONG.
Resolution- Write all the chunks to a global memorystream object. At the end of the callbacks, then do the base64 encoding.
In the callback function:
if (brData.ChunkNo == 1)
{
// Set the Content-type of the file
if (brData.MimeType.Length < 1)
{
mimeType = "application/unknown";
}
else
{
mimeType = brData.MimeType;
}
msbase64Out = new MemoryStream();
}
if (brData.bytesJustRead > 0)
{
fileMS.WriteTo(msbase64Out);
}
if (brData.bytesRemaining < 1)
{
byte[] imgBytes = msbase64Out.ToArray();
string img64 = Convert.ToBase64String(imgBytes);
viewdocWriter.WriteString(img64);
}
msbase64Out is a global memory stream that gets written to each time the callback is called.
viewdocWriter is a global XML writer that is responsible for writing out the XML stream that gets sent to the client app.

Related

Azure Storage Blobs - Upload base64 as Text

I have an API server method that accept getting files as BASE64. Getting request like this:
{
file: "-BASE 64 HERE-"
}
I want that my server will get this file and store it on Azure Storage. So, I running this code:
var blob = container.GetBlockBlobReference("file.zip");
var buffer = Convert.FromBase64String(Model.File);
await blob.UploadFromByteArrayAsync(buffer, 0, buffer.Length);
It's working, but inefficient.
Why? Because the same bytes have two instances in my main memory: as byte array + as stream.
I wounder if it possible to upload the BASE64 as text, and letting the server understand that this is BASE64 - which will be treated like a file.
In that way I will upload the text directly without convert it to stream.
Is this possible?
May be linked to this thread
Thanks.

Can't you just use UploadFromStreamAsync?

System.Text.Encoding.Default.GetBytes fails

Here is my sample code:
CodeSnippet 1: This code executes in my file repository server and returns the file as encoded string using the WCF Service:
byte[] fileBytes = new byte[0];
using (FileStream stream = System.IO.File.OpenRead(#"D:\PDFFiles\Sample1.pdf"))
{
fileBytes = new byte[stream.Length];
stream.Read(fileBytes, 0, fileBytes.Length);
stream.Close();
}
string retVal = System.Text.Encoding.Default.GetString(fileBytes); // fileBytes size is 209050
Code Snippet 2:
Client box, which demanded the PDF file, receives the encoded string and converts to PDF and save to local.
byte[] encodedBytes = System.Text.Encoding.Default.GetBytes(retVal); /// GETTING corrupted here
string pdfPath = #"C:\DemoPDF\Sample2.pdf";
using (FileStream fileStream = new FileStream(pdfPath, FileMode.Create)) //encodedBytes is 327279
{
fileStream.Write(encodedBytes, 0, encodedBytes.Length);
fileStream.Close();
}
Above code working absolutely fine Framework 4.5 , 4.6.1
When I use the same code in Asp.Net Core 2.0, it fails to convert to Byte Array properly. I am not getting any runtime error but, the final PDF is not able to open after it is created. Throws error as pdf file is corrupted.
I tried with Encoding.Unicode and Encoding.UTF-8 also. But getting same error for final PDF.
Also, I have noticed that when I use Encoding.Unicode, atleast the Original Byte Array and Result byte array size are same. But other encoding types are mismatching with bytes size also.
So, the question is, System.Text.Encoding.Default.GetBytes broken in .NET Core 2.0 ?
I have edited my question for better understanding.
Sample1.pdf exists on a different server and communicate using WCF to transmit the data to Client which stores the file encoded stream and converts as Sample2.pdf
Hopefully my question makes some sense now.

1: the number of times you should ever use Encoding.Default is essentially zero; there may be a hypothetical case, but if there is one: it is elusive
2: PDF files are not text, so trying to use an Encoding on them is just... wrong; you aren't "GETTING corrupted here" - it just isn't text.
You may wish to see Extracting text from PDFs in C# or Reading text from PDF in .NET
If you simply wish to copy the content without parsing it: File.Copy or Stream.CopyTo are good options.

Opening a PDF file as a raw text document

I have a PDF document which becomes encrypted. During encryption a code is embedded the then end of the filestream before writing to a file.
This PDF is later decrypted and the details are view-able in any PDF viewer.
The issue is the embedded code is also then visible in the decrypted PDF and it needs removing.
I'm looking to decrypt the PDF document, remove the embedded document code then save it to a filename.
//Reading the PDF
Encoding enc = Encoding.GetEncoding("us-ascii");
while ((read = cs.Read(buffer, 0, buffer.Length)) > 0)
{
System.Text.Encoding.UTF8.GetString(buffer);
x = x + enc.GetString(buffer);
}
//Remove the code
x = x.Replace("CODE","");
//Write file
byte[] bytes = enc.GetBytes(x);
File.WriteAllBytes(#filePath, bytes);
When the original file is generated it appears to be using a different encoder because the first line on the original file reads %PDF-1.6%âãÏÓ and on the decoded file %PDF-1.6 %????.
I have tried ascii, us-ascii, UTF8 and Unicode but upon removal of the embedded CODE the file stoped opening due to corruption. Note the embedded code sits in the raw file after the PDF %%EOF tag.
Has anyone any ideas?

Audio file is not working via FTP upload programatically

I am uploading an .mp3 file via FTP code using C#, the file is uploaded successfully on server but when i bind to a simple audio control or directly view in browser it does not work as expected, whereas when i upload manually on the server it works perfectly.
Code:
var inputStream = FileUpload1.PostedFile.InputStream;
byte[] fileBytes = new byte[inputStream.Length];
inputStream.Read(fileBytes, 0, fileBytes.Length);
Note: When i view the file in Firefox it shows MIME type is not supported.
Thanks!

You're reading the file as a string then using UTF8 encoding to turn it into bytes. If you do that, and the file contains any binary sequence that doesn't code to a valid UTF8 value, parts of the data stream will simply get discarded.
Instead, read it directly as bytes. Don't bother with the StreamReader. Call the Read() method on the underlying stream. Example:
var inputStream = FileUpload1.PostedFile.InputStream
byte[] fileBytes = new byte[inputStream.Length];
inputStream.Read(fileBytes, 0, fileStream.Length);

GZipStream create a invalid charaset

I have a simple function to create a gzip file. This function work fine and pass the unit test. Then I hosted the generated filed at amazon s3.
But it produce some invalid character when the input value contain a unicode character.
eg.アームバンド & ケース > 9ÎvøS‰
public static void CompressStringToFile(string fileName, string value)
{
// Use GZipStream to write compressed bytes to target file.
using (FileStream f2 = new FileStream(fileName, FileMode.Create))
using (GZipStream gz = new GZipStream(f2,CompressionMode.Compress, false))
{
byte[] b = Encoding.Unicode.GetBytes(value);
gz.Write(b, 0, b.Length);
gz.Flush();
}
}

The output of GZip compression isn't meant to be text. It's effectively arbitrary binary content, which you should only use to decompress it to the original binary content... which in your case is UTF-16-encoded text. You shouldn't expect to be able to read the gzip file as a text file.
GZip itself doesn't interpret the (binary) data that it's given - it just compresses it, so it can be faithfully decompressed later on. GZip couldn't care less whether it's text, an image, a sound file, whatever: it just does the best it can to compress it.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Invalid Character in base64 string when decoding XML - c#

Related

Azure Storage Blobs - Upload base64 as Text

System.Text.Encoding.Default.GetBytes fails

Opening a PDF file as a raw text document

Audio file is not working via FTP upload programatically

GZipStream create a invalid charaset

Categories

Resources