Opening a PDF file as a raw text document

Opening a PDF file as a raw text document - c#

I have a PDF document which becomes encrypted. During encryption a code is embedded the then end of the filestream before writing to a file.
This PDF is later decrypted and the details are view-able in any PDF viewer.
The issue is the embedded code is also then visible in the decrypted PDF and it needs removing.
I'm looking to decrypt the PDF document, remove the embedded document code then save it to a filename.
//Reading the PDF
Encoding enc = Encoding.GetEncoding("us-ascii");
while ((read = cs.Read(buffer, 0, buffer.Length)) > 0)
{
System.Text.Encoding.UTF8.GetString(buffer);
x = x + enc.GetString(buffer);
}
//Remove the code
x = x.Replace("CODE","");
//Write file
byte[] bytes = enc.GetBytes(x);
File.WriteAllBytes(#filePath, bytes);
When the original file is generated it appears to be using a different encoder because the first line on the original file reads %PDF-1.6%âãÏÓ and on the decoded file %PDF-1.6 %????.
I have tried ascii, us-ascii, UTF8 and Unicode but upon removal of the embedded CODE the file stoped opening due to corruption. Note the embedded code sits in the raw file after the PDF %%EOF tag.
Has anyone any ideas?

Related

Exporting a Microsoft report to PDF doesn't show Chinese characters

I've got the problem that I get no Chinese characters when exporting a Microsoft Report to a PDF file.
byte[] mybytes = report.Render("pdf");
using (FileStream fs = File.Create(#"D:\output.pdf"))
{
fs.Write(mybytes, 0, mybytes.Length);
}
If I export the same report to a Word file it works fine.
byte[] myWordbytes = report.Render("word");
using (FileStream fs = File.Create(#"D:\output.doc"))
{
fs.Write(myWordbytes, 0, myWordbytes.Length);
}
When converting that Word file to PDF, I also get the Chinese characters in the converted PDF file.
I don't want to do this workaround. How can I solve this?
The required fonts seem to be embedded into the PDF.
enter image description here

C# generated PDF files doesn't open in PDF readers. Error shows damaged or corrupt file

I am trying to save imageData as pdf file on server directory. Html5Canvas imageData was sent to server and after conversion in bytes array, tried to save as PDF file. File generated successfully on specified path but the generated file doesn't open correctly in most of the PDF readers(i.e. Adobe Reader, Foxit reader etc) and show error that file is either damaged or corrupt but it open correctly in MS Edge browser. I want them to show in common PDF reader too. Can you please suggest the solution. Here is my server side code.
public static string SaveImage(string imageData, string userEmail, int quantity)
{
string completePath = #"~\user-images\";
string imageName = "sample_file2.pdf";
string fileNameWitPath = completePath + imageName;
byte[] bytes = Convert.FromBase64String(imageData);
File.WriteAllBytes(HttpContext.Current.Server.MapPath(fileNameWitPath), bytes);
}
Same output generated for this code
FileStream fs = new FileStream(HttpContext.Current.Server.MapPath(fileNameWitPath), FileMode.OpenOrCreate);
fs.Write(bytes, 0, bytes.Length);
fs.Close();
and for this too.
using (FileStream fs = new FileStream(HttpContext.Current.Server.MapPath(fileNameWitPath), FileMode.Create))
{
using (BinaryWriter bw = new BinaryWriter(fs))
{
byte[] data = Convert.FromBase64String(imageData);
bw.Write(data);
bw.Close();
}
}

If you just save a raster image format file (like PNG or JPG one) with a .PDF file extension it doesn't make it a PDF file; it still remains an image file just with another extension. So it probably works in some browsers because they may do file format detection that is not based on extension alone.
To generate an actual PDF file you will need to employ some conversion. Consider one of the following libraries for this:
iTextSharp: https://sourceforge.net/projects/itextsharp/ (AGPL = a free software license that allows non-commercial use. For commercial use you need to purchase a commercial license)
PDFSharp http://www.pdfsharp.net/ (free, MIT license)
ABCpdf.NET http://www.websupergoo.com/products.htm#abcpdf (proprietary)

How to convert byte array of word doc into byte array of pdf document in C#

I have a sequence of bytes as follows (Word format) exported from crystal report
Stream stream = cryRpt.ExportToStream(CrystalDecisions.Shared.ExportFormatType.WordForWindows);
I would like to convert that stream to a PDF stream to be stored in a varbinary column in sql server database.
There is a barcode in the document and i am unable to embed it when I export to pdf. Exporting to word works though so I want to try from Word to PDF
Can anyone assist me in getting this PDF byte[]?

You can do that using the Microsoft.Office.Interop.Word NuGet Package. Once you added it on your application you can flush your Byte Array to a temporary file, then open the temp file with Interop.Word, work with it, save the result and read the result back into a Byte Array.
Here's a sample code snippet that does just that:
// byte[] fileBytes = getFileBytesFromDB();
var tmpFile = Path.GetTempFileName();
File.WriteAllBytes(tmpFile, fileBytes);
Application app = new word.Application();
Document doc = app.Documents.Open(filePath);
// Save DOCX into a PDF
var pdfPath = "path-to-pdf-file.pdf";
doc.SaveAs2(pdfPath, word.WdSaveFormat.wdFormatPDF);
doc.Close();
app.Quit(); // VERY IMPORTANT: do this to close the MS Word instance
byte[] pdfFileBytes = File.ReadAllBytes(pdfPath);
File.Delete(tmpFile);
For more info regarding the topic you can also read this post on my blog.

Crystal reports 2008 sp5 generates a corrupt word document

Since last updates of a Crystal report, I have about problem when a word file generation.
The word document is corrupt and I can not find on the web the sollution or a similar problem.
I can generate a PDF document without problem.
The document is corrupt with Word, but I can opens with WordPad.
the resulting file is .doc corrupted without error on my code
try
{
t = CrystalDecisions.Shared.ExportFormatType.WordForWindows;
content_type = "application/msword";
var oStream = MonReport.ExportToStream(t);
byte[] byteArray = new byte[oStream.Length];
oStream.Read(byteArray, 0, Convert.ToInt32(oStream.Length - 1));
parent.Response.ClearContent();
parent.Response.ClearHeaders();
parent.Response.ContentType = content_type;
parent.Response.BinaryWrite(byteArray);
parent.Response.Flush();
parent.Response.Close();
MonReport.Close();
MonReport.Dispose();

Your length is 1 too short. That should be:
oStream.Read(byteArray, 0, Convert.ToInt32(oStream.Length));
^^^^^^^^^^^^^^
Stream.Read's third parameter is
"The maximum number of bytes to be read from the current stream."

What is wrong with my encoding, when reading characters from PDF?

I'm reading a PDF file with C#, but the characters are coming from another encoding, and returning different characters than those which I expected from when I view the file in a PDF viewer.
I thought a UTF-8 encoding would be correct.
What am I doing wrong?
string file = #"c:\document.pdf";
Stream stream = File.Open(file, FileMode.Open);
BinaryReader binaryReady = new BinaryReader(stream);
byte[] buffer = binaryReady.ReadBytes(Convert.ToInt32(stream.Length));
var encoder = UTF8Encoding.UTF8.GetString(buffer);

PDF is a very complex multi-part file, it is not just UTF8 text.
If you want to read a PDF file, you must read over the full PDF File Format Documentation and fully implement the large and complex details of how the file format works.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Opening a PDF file as a raw text document - c#

Related

Exporting a Microsoft report to PDF doesn't show Chinese characters

C# generated PDF files doesn't open in PDF readers. Error shows damaged or corrupt file

How to convert byte array of word doc into byte array of pdf document in C#

Crystal reports 2008 sp5 generates a corrupt word document

What is wrong with my encoding, when reading characters from PDF?

Categories

Resources