Decoding base64 file contents between PHP and C#

Decoding base64 file contents between PHP and C# - c#

I need to serve an AES encrypted, base64 encoded file from PHP to a C# client (Mono, on various platforms). I've successfully got the AES encryption/decryption working fine but as soon as I attempt the base64 encoding/decoding I run into trouble. Both the examples below have the AES disabled, so that shouldn't be a factor.
My simplest test case, a Hello World string, works fine:
PHP serving output-
// Save encoded data to file
$data = base64_encode("Hello encryption world!!");
$file = fopen($targetPath, 'w');
fwrite($file, $data);
fclose($file);
// Later on, serve the file
header("Pragma: public");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Cache-Control: private",false);
header("Content-Type: application/octet-stream");
header("Content-Disposition: attachment; filename=".basename($product->PackageFilename($packageId)));
header("Content-Transfer-Encoding: binary");
header("Content-Length: ".filesize($targetPath));
ob_clean();
flush();
$handle = fopen($targetPath, "r");
fpassthru($handle);
fclose($handle);
C# decoding and using-
StreamReader reader = new StreamReader(stream);
char[] buffer = DecodeBuffer;
string decoded = "";
int read = 0;
while (0 < (read = reader.Read(buffer, 0, DecodeBufferSize)))
{
byte[] decodedBytes = Convert.FromBase64CharArray(buffer, 0, read);
decoded += System.Text.Encoding.UTF8.GetString(decodedBytes);
}
Log(decoded); // Correctly logs "Hello encryption world!!"
However once I start trying to do the same thing with the contents of a file, a FormatException: Invalid character found is thrown by Convert.FromBase64CharArray:
PHP serving output-
// Save encoded data to file
$data = base64_encode(file_get_contents($targetPath));
$file = fopen($targetPath, 'w');
fwrite($file, $data);
fclose($file);
// Later on, serve the file
// Same as above
C# decoding and using-
using (Stream file = File.Open(zipPath, FileMode.Create))
{
using (StreamReader reader = new StreamReader(stream))
{
char[] buffer = DecodeBuffer;
byte[] decodedBytes;
int read = 0;
while (0 < (read = reader.Read(buffer, 0, DecodeBufferSize)))
{
// Throws FormatException: Invalid character found
decodedBytes = Convert.FromBase64CharArray(buffer, 0, read);
file.Write(decodedBytes, 0, decodedBytes.Length);
}
}
}
Is there some kind of additional processing that should be done on larger data for base64 to be valid? Is it perhaps just not appropriate to be doing this with large binary data-and if so how else would you prevent potential problems with characters unsafe for transmission?

Your reading Base64 text code is not correct.
Base64 is text, so consider using text reader instead
Base64 may contain new lines/white spaces. I.e. it is custom to have split whole Base64 encoded value into 70-80 character long strings.
To verify if data in the file is correct read whole file as string (StreamReader.ReadToEnd) and convert to byte array (Convert.FromBase64String).
If file contains valid Base64 data and you can't read it as single string you should implement your own Base64 decoding or manually read correct number of non white space characters (multiple of 4) and decode such chunks.

Base64 encoding converts 3 octets into 4 encoded characters. Thus, the length of the data you provide for decoding needs to be a multiple of 4.
First, ensure that DecodeBufferSize is such a multiple of 4. Next, since StreamReader.Read does not guarantee that all the requested bytes will be read, you should continue reading into buffer until either it has been filled, or the end of the stream been reached.

Related

How to get the ZIP file from Base64 string in Javascript?

I am trying to get the compressed ZIP file back in Javascript. I am able to convert the zip file into Base64 String format. (Zip file is in Server)
Here is my try (at Server Side)
System.IO.FileStream fs = new System.IO.FileStream(SourceFilePath + "Arc.zip", System.IO.FileMode.Open);
Byte[] zipAsBytes = new Byte[fs.Length];
fs.Read(zipAsBytes, 0, zipAsBytes.Length);
String base64String = System.Convert.ToBase64String(zipAsBytes, 0, zipAsBytes.Length);
fs.Close();
if (zipAsBytes.Length > 0)
{
_response.Status = "ZipFile";
_response.Result = base64String;
}
return _json.Serialize(_response);
This part of code returns the JSON data. This JSON data includes the Base64 string. Now what i want to do is to get the original zip file from Base64 string. I searched over the internet but not get the idea.
Is this achievable ?.

It is achievable. First you must convert the Base64 string to an Arraybuffer. Can be done with this function:
function base64ToBuffer(str){
str = window.atob(str); // creates a ASCII string
var buffer = new ArrayBuffer(str.length),
view = new Uint8Array(buffer);
for(var i = 0; i < str.length; i++){
view[i] = str.charCodeAt(i);
}
return buffer;
}
Then, using a library like JSZip, you can convert the ArrayBuffer to a Zip file and read its contents:
var buffer = base64ToBuffer(str);
var zip = new JSZip(buffer);
var fileContent = zip.file("someFileInZip.txt").asText();

JavaScript does not have that functionality.
Theoretically there can be some js library that does this, but it's size probably would be bigger than the original text file itself.
You can also enable gzip compression on your server, so that any output text gets compressed. Most of the browsers would then uncompress the data upon its arrival.

Bytes read as UTF8 string and converted to Base64

Forgive the lengthy setup here but I thought it may help to have the context...
I am implementing a custom digital signature validation method in as part of a WCF service. We're using a custom method because various differing interpretations of some industry standards but the details there aren't all that relevant.
In this particular scenario, I am receiving an MTOM/XOP encoded request where the root MIME part contains a digital signature and the signature DigestValue and SignatureValue pieces are split up into separate MIME parts.
The MIME parts that contain the signature DigestValue and SignatureValue data is binary encoded so it is literally a bunch of raw bytes in the web request like this:
Content-Id: <c18605af-18ec-4fcb-bec7-e3767ef6fe53#example.jaxws.sun.com>
Content-Type: application/octet-stream
Content-Transfer-Encoding: binary
[non-printable-binary-data-goes-here]
--uuid:eda4d7f2-4647-4632-8ecb-5ba44f1a076d
I am reading the contents of the message in as a string (using the default UTF8 encoding) like this (see the requestAsString parameter below):
MessageBuffer buffer = request.CreateBufferedCopy(int.MaxValue);
try
{
using (MemoryStream mstream = new MemoryStream())
{
buffer.WriteMessage(mstream);
mstream.Position = 0;
using (StreamReader sr = new StreamReader(mstream))
{
requestAsString = sr.ReadToEnd();
}
request = buffer.CreateMessage();
}
}
After I read the MTOM/XOP message in, I am attempting to re-organize the multiple MIME parts into one SOAP message where the signature DigestValue and SignatureValue elements are restored to the original SOAP envelope (and not as attachments). So basically I am taking decoding the MTOM/XOP request.
Unfortunately, I am having trouble reading the DigestValue and SignatureValue pieces correctly. I need to read the bytes out of the message and get the base64 string representation of that data.
Despite all the context above, it seems the core problem is reading the binary data in as a string (UTF8 encoded) and then converting it to a proper base64 representation.
Here is what I am seeing in my test code:
This is my example base64 string:
string base64String = "mowXMw68eLSv9J1W7f43MvNgCrc=";
I can then get the byte representation of that string. This yields an array of 20 bytes:
byte[] base64Bytes = Convert.FromBase64String(base64String);
I then get the UTF8 encoded version of those bytes:
string decodedString = UTF8Encoding.UTF8.GetString(base64Bytes);
Now the strange part... if I convert the string back to bytes as follows, I get an array of bytes that is 39 bytes long:
byte[] base64BytesBack = UTF8Encoding.UTF8.GetBytes(decodedString);
So obviously at this point, when I convert back into a base64 string, it doesn't match the original value:
string base64StringBack = Convert.ToBase64String(base64BytesBack);
base64StringBack is set to "77+977+9FzMO77+9eO+/ve+/ve+/vVbvv73vv703Mu+/vWAK77+9"
What am I doing wrong here? If I switch to using UTF8Encoding.Unicode.GetString() and UTF8Encoding.Unicode.GetBytes(), it works as expected:
string base64String = "mowXMw68eLSv9J1W7f43MvNgCrc=";
// First get an array of bytes from the base64 string
byte[] base64Bytes = Convert.FromBase64String(base64String);
// Get the Unicode representation of the base64 bytes.
string decodedString = UTF8Encoding.Unicode.GetString(base64Bytes);
byte[] base64BytesBack = UTF8Encoding.Unicode.GetBytes(decodedString);
string base64StringBack = Convert.ToBase64String(base64BytesBack);
Now base64StringBack is set to "mowXMw68eLSv9J1W7f43MvNgCrc=" so it seems I am mis-using the UTF8 encoding somehow or it is behaving differently than I would expect.

Arbitrary binary data cannot be decoded into an UTF8 encoded string and then encoded back to the same binary data. The paragraph "Invalid byte sequences" in http://en.wikipedia.org/wiki/UTF-8 points that out.
I am a bit confused as to why you want the data encoded/decoded as UTF8.

Ok, I took a different approach to reading the MTOM/XOP message:
Instead of relying on my own code to parse the MIME parts by hand, I just used XmlDictionaryReader.CreateMtomReader() to get an XmlDictionaryReader and read the message into an XmlDocument (being careful to preserve whitespace on the XmlDocument so digital signatures aren't broken):
MessageBuffer buffer = request.CreateBufferedCopy(int.MaxValue);
messageContentType = WebOperationContext.Current.IncomingRequest.ContentType;
try
{
using (MemoryStream mstream = new MemoryStream())
{
buffer.WriteMessage(mstream);
mstream.Position = 0;
if (messageContentType.Contains("multipart/related;"))
{
Encoding[] encodings = new Encoding[1];
encodings[0] = Encoding.UTF8;
// MTOM
using (XmlDictionaryReader reader = XmlDictionaryReader.CreateMtomReader(mstream, encodings, messageContentType, XmlDictionaryReaderQuotas.Max))
{
XmlDocument msgDoc = new XmlDocument();
msgDoc.PreserveWhitespace = true;
msgDoc.Load(reader);
requestAsString = msgDoc.OuterXml;
reader.Close();
}
}
else
{
// Text
using (StreamReader sr = new StreamReader(mstream))
{
requestAsString = sr.ReadToEnd();
}
}
request = buffer.CreateMessage();
}
}
finally
{
buffer.Close();
}

Change StreamReader Encoding while reading from NetworkStream

I am trying to read an email from POP3 and change to the correct encoding when I find the charset in the headers.
I use a TCP Client to connect to the POP3 server.
Below is my code :
public string ReadToEnd(POP3Client pop3client, out System.Text.Encoding messageEncoding)
{
messageEncoding = TCPStream.CurrentEncoding;
if (EOF)
return ("");
System.Text.StringBuilder sb = new System.Text.StringBuilder(m_bytetotal * 2);
string st = "";
string tmp;
do
{
tmp = TCPStream.ReadLine();
if (tmp == ".")
EOF = true;
else
sb.Append(tmp + "\r\n");
//st += tmp + "\r\n";
m_byteread += tmp.Length + 2; // CRLF discarded by read
FireReceived();
if (tmp.ToLower().Contains("content-type:") && tmp.ToLower().Contains("charset="))
{
try
{
string charSetFound = tmp.Substring(tmp.IndexOf("charset=") + "charset=".Length).Replace("\"", "").Replace(";", "");
var realEnc = System.Text.Encoding.GetEncoding(charSetFound);
if (realEnc != TCPStream.CurrentEncoding)
{
TCPStream = new StreamReader(pop3client.m_tcpClient.GetStream(), realEnc);
}
}
catch { }
}
} while (!EOF);
messageEncoding = TCPStream.CurrentEncoding;
return (sb.ToString());
}
If I remove this line:
TCPStream = new StreamReader(pop3client.m_tcpClient.GetStream(), realEnc);
Everything works fine except that when the e-mail contains different charset characters I get question marks as the initial encoding is ASCII.
Any suggestions on how to change the encoding while reading data from the Network Stream?

You're doing it wrong (tm).
Seriously, though, you are going about trying to solve this problem in completely the wrong way. Don't use a StreamReader for this. And especially don't read 1 byte at a time (as you said you needed to do in a comment on an earlier "solution").
For an explanation of why not to use a StreamReader, besides the obvious "because it isn't designed to switch between encodings during the process of reading", feel free to read over another answer I gave about the inefficiencies of using a StreamReader here: Reading an mbox file in C#
What you need to do is buffer your reads (such as a 4k buffer should be fine). Then, as you are already having to do anyway, scan for the '\n' byte to extract content on a line-by-line basis, combining header lines that were folded.
Each header may have multiple encoded-word tokens which may each be in a separate charset, assuming they are properly encoded, otherwise you'll have to deal with undeclared 8-bit data and try to massage that into unicode somehow (probably by having a set of fallback charsets). I'd recommend trying UTF-8 first followed by a selection of charsets that the user of your library has provided before finally trying iso-8859-1 (make sure not to try iso-8859-1 until you've tried everything else, because any sequence of 8-bit text will convert properly to unicode using the iso-8859-1 character encoding).
When you get to text content of the message, you'll want to check the Content-Type header for a charset parameter. If no charset parameter is defined, it should be US-ASCII, but in practice it could be anything. Even if the charset is defined, it might not match the actual character encoding used in the text body of the message, so once again you'll probably want to have a set of fallbacks.
As you've probably guessed by this point, this is very clearly not a trivial task as it requires the parser to do on-the-fly character conversion as it goes (and the character conversion requires internal parser state about what the expected charset is at any given time).
Since I've already done the work, you should really consider using MimeKit which will parse the email and properly do charset conversion on the headers and the content using the appropriate charset encoding.
I've also written a Pop3Client class that is included in my MailKit library.
If your goal is to learn and write your own library, I'd still highly recommend reading over my code because it is highly efficient and does things in a proper way.

There are some ways you can detect the encoding by looking at the Byte Order Mark, which are the firts few bytes of the stream. These will tell you the encoding. However, the stream might not have a BOM, and in these cases it could be ASCII, UTF without BOM, or others.
You can convert your stream from one encoding to another with the Encoding Class:
Encoding textEncoding = Encoding.[your detected encoding here];
byte[] converted = Encoding.UTF8.GetBytes(textEncoding.GetString(TCPStream.GetBuffer()));
You may select your preferred encoding when converting.
Hope it answers your question.
edit
You may use this code to read your stream in blocks.
MemoryStream st = new MemoryStream();
int numOfBytes = 1024;
int reads = 1;
while (reads > 0)
{
byte[] bytes = new byte[numOfBytes];
reads = yourStream.Read(bytes, 0, numOfBytes);
if (reads > 0)
{
int writes = ( reads < numOfBytes ? reads : numOfBytes);
st.Write(bytes, 0, writes);
}
}

Zip file with utf-8 file names

In my website i have option to download all images uploaded by users. The problem is in images with hebrew names (i need original name of file). I tried to decode file names but this is not helping. Here is a code :
using ICSharpCode.SharpZipLib.Zip;
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(file.Name);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string name = iso.GetString(isoBytes);
var entry = new ZipEntry(name + ".jpg");
zipStream.PutNextEntry(entry);
using (var reader = new System.IO.FileStream(file.Name, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
byte[] buffer = new byte[ChunkSize];
int bytesRead;
while ((bytesRead = reader.Read(buffer, 0, buffer.Length)) > 0)
{
byte[] actual = new byte[bytesRead];
Buffer.BlockCopy(buffer, 0, actual, 0, bytesRead);
zipStream.Write(actual, 0, actual.Length);
}
}
After utf-8 encoding i get hebrew file names like this : ??????.jpg
Where is my mistake?

Unicode (UTF-8 is one of the binary encoding) can represent more characters than the other 8-bit encoding. Moreover, you are not doing a proper conversion but a re-interpretation, which means that you get garbage for your filenames. You should really read the article from Joel on Unicode.
...
Now that you've read the article, you should know that in C# string can store unicode data, so you probably don't need to do any conversion of file.Name and can pass this directly to ZipEntry constructor if the library does not contains encoding handling bugs (this is always possible).

Try using
ZipStrings.UseUnicode = true;
It should be a part of the ICSharpCode.SharpZipLib.Zip namespace.
After that you can use something like
var newZipEntry = new ZipEntry($"My ünicödë string.pdf");
and add the entry as normal to the stream. You shouldn't need to do any conversion of the string before that in C#.

You are doing wrong conversion, since strings in C# are already unicode.
What tools do you use to check file names in archive?
By default Windows ZIP implementations use system DOS encoding for file names, while other implementations can use other encoding.

convert base64Binary to pdf

I have raw data of base64Binary.
string base64BinaryStr = "J9JbWFnZ......"
How can I make pdf file? I know it need some conversion. Please help me.

Step 1 is converting from your base64 string to a byte array:
byte[] bytes = Convert.FromBase64String(base64BinaryStr);
Step 2 is saving the byte array to disk:
System.IO.FileStream stream =
new FileStream(#"C:\file.pdf", FileMode.CreateNew);
System.IO.BinaryWriter writer =
new BinaryWriter(stream);
writer.Write(bytes, 0, bytes.Length);
writer.Close();

using (System.IO.FileStream stream = System.IO.File.Create("c:\\temp\\file.pdf"))
{
System.Byte[] byteArray = System.Convert.FromBase64String(base64BinaryStr);
stream.Write(byteArray, 0, byteArray.Length);
}

First convert the Bas64 string to byte[] and write it into a file.
byte[] bytes = Convert.FromBase64String(base64BinaryStr);
File.WriteAllBytes(#"FolderPath\pdfFileName.pdf", bytes );

This code does not write any file on the hard drive.
Response.AddHeader("Content-Type", "application/pdf");
Response.AddHeader("Content-Length", base64Result.Length.ToString());
Response.AddHeader("Content-Disposition", "inline;");
Response.AddHeader("Cache-Control", "private, max-age=0, must-revalidate");
Response.AddHeader("Pragma", "public");
Response.BinaryWrite(Convert.FromBase64String(base64Result));
Note: the variable base64Result contains the Base64-String: "JVBERi0xLjMgCiXi48/TIAoxI..."

All you need to do is run it through any Base64 decoder which will take your data as a string and pass back an array of bytes. Then, simply write that file out with pdf in the file name.
Or, if you are streaming this back to a browser, simple write the bytes to the output stream, marking the appropriate mime-type in the headers.
Most languages either have built in methods for converted to/from Base64. Or a simple Google with your specific language will return numerous implementations you can use. The process of going back and forth to Base64 is pretty straightforward and can be implemented by even novice developers.

base64BinaryStr - from webservice SOAP message
byte[] bytes = Convert.FromBase64String(base64BinaryStr);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.