PDF content inside XML response file - c#

I receive XML file that includes PDF content:
<pdf>
<pdfContent>JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PCAvV.......
How can I save the content into PDF file?
I'm using C# 4.0

That string value is the PDF in base64. If you convert the base64 to a byte array you can just write that byte array to disk.
Convert.FromBase64String
var buffer = Convert.FromBase64String(xmlStringValue);
File.WriteAllBytes(yourFileName, buffer);

It looks like the pdf content is encoded in base64. You will have to decode it and save it to a file.
Edit: indeed, when I use base64 to encode a pdf file, the first few characters are JVBERi0x...

It seems encoded with Base64, But not sure. if it is, you can take that long string and convert with the function Convert.FromBase64. You will obtain a byte[] that you can save as the actual pdf.

Related

Convert Base64 String to an Image File and Save it to File System?

I am trying to make a .Net 6 console application that would take in a base64string and then save it to the file system as an actual image file
Example
I have this image
https://cdn.pixabay.com/photo/2016/03/28/12/35/cat-1285634_960_720.png
I would have this image already as a base64 string.
Now I want to save to my file system as "cat-1285634_960_720.png"
I just can't figure out how to do it. All the examples I see say to use Image.Save() but I can't find that in .Net6 and looks like it is removed.
First convert the base64 string to a byte array and then use File.WriteAllBytes(...) to save it:
byte[] imageByteArray = Convert.FromBase64String(base64String);
File.WriteAllBytes("image.png", imageByteArray);

Signed Mails (not encrypted) with smime.p7m

I try to extract one ore more PDF Files from a Signed Mail. Simply i tryed to load the smime.p7m with
mimeMessage = MimeMessage.Load(mem);
//mem is a MemoryStream from File created with File.WriteAllBytes(file,fileAttachment.Content); (EWS FileAttachment)
This is not working, because the File begins with:
0€ *†H†÷
 €0€10 + 0€ *†H†÷
 €$€‚
&Content-Type: multipart/mixed;
boundary="----=_NextPart_000_0024_01D432F9.7988F010"
So i removed the shit (not all here visible) before Content-Type (with IndexOf, Substring) .. now i can load it into a MineMessage. Now i try to decode the Base64 String, but if i use the decodeto Method the Filesize is nearly the Same
but File is damaged, if i look in the Raw Data of the Original PDF File decoded by Outlook and my decoded one, they are nearly the same but in the last 10% they are different (in the original are more Linebreaks).
So i tryed to use
Convert.FromBase64String()
But i get allways invalied base64 code exception
the PDF Part with header begins with:
Content-Type: application/pdf;
name="DE_Windows 7_WebDAV.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="WebDAV.pdf"
‚ JVBERi0xLjUNCiW1tbW1DQoxIDAgb...
(before and after the , are not here visible Chars, i removed them too). If i load the base64 Code (copy&Paste as Text, with Windows Editor) into a onlinedecoder its is decoding, if i upload the File with the base64code it fails ...
AND inside the base64 are some not base64 chars "unknown" ,"," "Uparrow Symbol", i think this will kill the decoding, the base64 code is too long for here =( (see picture)
But this ist 1:1 what File.WriteAllBytes(file,fileAttachment.Content); or/and fileAttachment.Load(file); saves
Can u help me please? And from where are this unknown Chars?
Ok, i got it ... 2 Days of my Life wasted for this ***
Before saving a signed Attachment u must run this code to "unsign" and all the chars u not want are gone =)
byte[] content = fileAttachment.Content;
var signed = new SignedCms();
signed.Decode(content);
byte[] unsigned = signed.ContentInfo.Content;

Saving a byte array to PDF file with OfficeJs

Using OfficeJs I want to save a Word document as a PDF and post that file to an Api.
Office.context.document.getFileAsync will let you get the entire document in a choice of 3 formats:
compressed: returns the entire document (.pptx or .docx) in Office Open XML (OOXML) format as a byte array
pdf: returns the entire document in PDF format as a byte array
text: returns only the text of the document as a string. (Word only)
I am posting the PDF byte array to a WebApi action that looks like this:
public async Task<IHttpActionResult> Upload([FromBody]byte[] bytes)
{
File.WriteAllBytes(#"C:\temp\testpdf.pdf", bytes);
return Ok();
}
On inspection the byte array is the same array created by the getFileAsync from Office Js.
The problem is the file written in File.WriteAllBytes is corrupt. If I open it with notepad, it is a string of the bytes - 37,80,68,70,45,49,46,53,13,10,37... and so on.
Any idea why the method WriteAllBytes does not create a PDF file from the OfficeJS pdf byte stream?
UPDATE 25/5/16
As hawkeye #StefanHegny pointed out, the byte array appears to be Ascii characters. Converting each byte to char and writing that out to PDF like this creates a blank PDF, but on inspection with NotePad, the contents do like a like a PDF document, though quite different to that when saving the same .docx as a .pdf.
var content = "";
foreach (var b in model.Bytes)
{
content += (char) b;
}
File.WriteAllText(#"C:\temp\testpdf.pdf", content);
Also note, this is extremely slow - about 5 minutes for 500kb PDF byte array on my dev machine.
I had the same pdf empty problem, and it was because I was converting to string and writing string to file(encoding problem), I solved by sending to the c# code the comma separated byte codes instead of converting to string, parsing bytes and using File.WriteAllBytes()
C# code:
string[] strings = HttpUtility.HtmlDecode(pdf).Split(',');
byte[] bytes = strings.Select(s => byte.Parse(s)).ToArray();
System.IO.File.WriteAllBytes("filename.pdf", bytes);

Retrieve byte array from c# webservice into Qt

I use webservice written in c# that exposes bytearray of audio file (mp3 file) that is stored in database (using entity framework).
When I retrieve it in c# and save it into file using File.WriteAllBytes() I can listen to audio (file's size is 10kB).
I need to do the same with Qt. I parse the xml and save audio byte array to QByteArray like this:
QByteArray bytes = readValue().toUtf8();
where readValue() is QStringRef and then I save it to file
qint64 bytesWritten = file.write(bytes);
File hes 14kB then and I suppose that there is some format problem but not sure where.
I solved this problem. In webservice I return byte array converter to hex string
string hex = BitConverter.ToString(ba);
string hexString = hex.Replace("-", "");
then in Qt I parse data
QByteArray bytes = QByteArray::fromHex(readValue().toString().toLatin1()));
and save it to file
qint64 bytesWritten = file.write(bytes);

What is wrong with my encoding, when reading characters from PDF?

I'm reading a PDF file with C#, but the characters are coming from another encoding, and returning different characters than those which I expected from when I view the file in a PDF viewer.
I thought a UTF-8 encoding would be correct.
What am I doing wrong?
string file = #"c:\document.pdf";
Stream stream = File.Open(file, FileMode.Open);
BinaryReader binaryReady = new BinaryReader(stream);
byte[] buffer = binaryReady.ReadBytes(Convert.ToInt32(stream.Length));
var encoder = UTF8Encoding.UTF8.GetString(buffer);
PDF is a very complex multi-part file, it is not just UTF8 text.
If you want to read a PDF file, you must read over the full PDF File Format Documentation and fully implement the large and complex details of how the file format works.

Categories