Convert docx stream to pdf directly - c#

I need to convert any file coming from web response into .pdf format, I'm currently getting it word docx file format from the URL and saving it into memory stream so i can later insert it in it's designated library.
The problem I'm facing now is that I'm saving my docx files directly into .pdf by putting an extension at the end which obviously ends up not opening the file later, So i'm trying to convert my memory stream into pdf directly .
Here is my piece of code that i tried to convert the the stream to .pdf but it looks like the file isn't getting converted correctly.
private Stream DownloadFromUrl(string url)
{
var webRequest = WebRequest.Create(url);
webRequest.Credentials = CredentialCache.DefaultNetworkCredentials;
webRequest.PreAuthenticate = true;
webRequest.UseDefaultCredentials = true;
//EventLogUtility.LogInformationMessage(DocumentURL);
string message = string.Empty;
using (Stream outputStream = new MemoryStream())
{
using (var response = webRequest.GetResponse())
{
using (var content = response.GetResponseStream())
{
var memory = new MemoryStream();
content.CopyTo(memory);
Document doc = new Document(memory);
doc.Save(memory, SaveFormat.Pdf);
return memory;
}
}
}
}

If the content in the stream is actually in the Microsoft Word file format (and not just plain text), then you need to map the format to the file format for PDF. I know there is a 'Print to PDF' function available in Word, you could try looking into that.

Related

xlsx excel cannot open the file because the file format or file extension is not valid - c#

I am trying to create an excel file from the response of SSRS reporting server, on front-end I successfully able to download an excel file but when I open it I get the error.
xlsx excel cannot open the file because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file.
Here is the code:
HttpWebRequest req =
(HttpWebRequest)WebRequest.Create(sTargetURL);
req.PreAuthenticate = true;
req.Credentials = new System.Net.NetworkCredential(strReportUser, strReportUserPW, strReportUserDomain);
HttpWebResponse HttpWResp = (HttpWebResponse)req.GetResponse();
Stream fStream = HttpWResp.GetResponseStream();
//Now turn around and send this as the response..
byte[] fileBytes = ReadFully(fStream);
string fileToAttach = Convert.ToBase64String(fileBytes);
HttpWResp.Close();
Stream stream = new MemoryStream(fileBytes);
result.Content = fileToAttach;
result.ContentType = "application/vnd.openxmlformats-
officedocument.spreadsheetml.sheet";
result.FileName = fileName + ".xlsx";
result.result = true;
return result;
Any help would be appreciated.
Assuming response is some kind of HTTP response object that eventually gets sent to the browser, and that you have the actual Excel file contents in fStream, the problem is you are sending the Base64'd version of the file as the contents (which Excel is not going to understand):
Stream fStream = HttpWResp.GetResponseStream();
byte[] fileBytes = ReadFully(fStream);
string fileToAttach = Convert.ToBase64String(fileBytes);
result.Content = fileToAttach;
Since I can't see what response is, all I can say is you need to leave the file contents intact.
If you want to send the raw bytes, you could do:
Stream fStream = HttpWResp.GetResponseStream();
byte[] fileBytes = ReadFully(fStream);
result.Content = fileBytes;
If your response can handle a Stream, that would be the ideal way (then you don't have to convert it back and forth between a stream and byte data).
If result.Content has to be a string, then you have to worry about which encoding to use, and it'll look something like:
Stream fStream = HttpWResp.GetResponseStream();
byte[] fileBytes = ReadFully(fStream);
result.Content = Encoding.UTF8.GetString(fileBytes);
It wouldn't surprise me if Excel needs to be encoded as UTF-16 (Microsoft uses it quite heavily internally).

Converting StreamReader back to pdf

I am trying to convert pdf stream back to pdf file, then save it on my server.
The function GetHTTPRequest gets pdf url and returns the pdf url stream string.
I need to convert this stream to pdf file.
My Code:
public ActionResult Html(string strUrl)
{
string xhr;
xhr = GetHTTPRequest(strUrl, "GET");
// make pdf file fron xhr
return View("Index");
}
[ChildActionOnly]
public static string GetHTTPRequest(string RequestUrl, string RequestMethod)
{
try
{
HttpWebRequest r = (HttpWebRequest)WebRequest.Create(RequestUrl);
r.Method = RequestMethod;
HttpWebResponse res = (HttpWebResponse)r.GetResponse();
Stream sr = res.GetResponseStream();
StreamReader sre = new StreamReader(sr);
string s = sre.ReadToEnd();
return s;
}
catch (Exception)
{
return string.Empty;
}
}
This is only 1 line from what xhr contains:
"%PDF-1.3\n%�쏢\n5 0 obj\n<>\nstream\nx��\k���V �DB��rq��\r}��_�4U[��\nTҮU�~�ِBI h����>gl����n�N���Y{.�~�33�;�h����뷶.��W7�n����ͭ;[������[կ��?TR��T�_l�6� �O�e�ll��\m\b�����T����F�����V��n���6h嫍��Hk��l�R��O6ô3�7O5�:�.�i<�:���3��S�j��V�O��\b�~&\r�S93�l�ѭ�Q�z�v�cF)���\r��g��m�v����Z���\a���;{�i��Bq�7��^xK�7�U��P?��z>����]���F\b1�X�ec��/4�ji��/К��&#�EjF)�g�D�\v)��_ �\r|��XSEcu�\nILz��H|�ZL�a�wO\af^m6�NzS_lb$�Vx\rB�VG��_���44�v���������w������4J����Q�z���7�刭��+�a�|�7�Z����&��ٕQ��LsE��c�t뜁������)�ad���ӷ2��inSU��-��\a��\fā4ʹ�v1�w�ֽ����)�����S0�����2M�~�S�;�Ԩ2�\a|�����'K�0[\"{��F��5��2�ٱJ[��ӑ ��F�����3��X�s'\b9�[�&�Et��A\"�����N���������)7=�p�T�v.�!�q\"��H����M-i�,��TA��P����tV{]V=z����\rV����T��o�c�s�������[w*i�g��R�EZ�Λ.JQ}�\r:�w)�Ȕڣ�����{�)e%�(GF3�H8�G�Ԣ5)G^u�C�F�ÂQ\v�A=I\"�G\"�#�i�X�3d�����P;%=r���8#\he<�IAv��\a������D[ؕ]���7X��!��\b#B�Y$a5\b \b��T�.���Կ,\n�)1�V�\0N�z�$Pȡ����%
Can someone please put some light on it and solve it for me?
I never used stream libary before.
Thanks.
PDF is a binary file format. Don't use a string it doesn't make sense. What you are doing is eqivalent to trying to open a pdf file with NotePad. You should save the pdf file as file.
This is how you would download the file.
string resourceUrl = "http://www.google.com/file.pdf";
string fileName = downloadedFile.pdf;
WebClient myWebClient = new WebClient();
myWebClient.DownloadFile(resourceUrl,fileName);
If you need to read file.pdf as a string that's a different matter. You will need to look for a pdf parser. iTextSharp is not a bad choice for a start.

Download a PDF from a third party using ASP.NET HttpWebRequest/HttpWebResponse

I want to send a url as query string e.g.
localhost/abc.aspx?url=http:/ /www.site.com/report.pdf
and detect if the above URL returns the PDF file. If it will return PDF then it gets saved automatically otherwise it gives error.
There are some pages that uses Handler to fetch the files so in that case also I want to detect and download the same.
localhost/abc.aspx?url=http:/ /www.site.com/page.aspx?fileId=223344
The above may return a pdf file.
What is best way to capture this?
Thanks
You can download a PDF like this
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = req.GetResponse();
//check the filetype returned
string contentType = response.ContentType;
if(contentType!=null)
{
splitString = contentType.Split(';');
fileType = splitString[0];
}
//see if its PDF
if(fileType!=null && fileType=="application/pdf"){
Stream stream = response.GetResponseStream();
//save it
using(FileStream fileStream = File.Create(fileFullPath)){
// Initialize the bytes array with the stream length and then fill it with data
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, bytesInStream.Length);
// Use write method to write to the file specified above
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
}
response.Close();
The fact that it may come from an .aspx handler doesn't actually matter, it's the mime returned in the server response that is used.
If you are getting a generic mime type, like application/octet-stream then you must use a more heuristical approach.
Assuming you cannot simply use the file extension (eg for .aspx), then you can copy the file to a MemoryStream first (see How to get a MemoryStream from a Stream in .NET?). Once you have a memory stream of the file, you can take a 'cheeky' peek at it (I say cheeky because it's not the correct way to parse a PDF file)
I'm not an expert on PDF format, but I believe reading the first 5 chars with an ASCII reader will yield "%PDF-", so you can identify that with
bool isPDF;
using( StreamReader srAsciiFromStream = new StreamReader(memoryStream,
System.Text.Encoding.ASCII)){
isPDF = srAsciiFromStream.ReadLine().StartsWith("%PDF-");
}
//set the memory stream back to the start so you can save the file
memoryStream.Position = 0;

PDF generated is blank

private static void DownloadFile()
{
FtpWebRequest reqFTP;
WebResponse webResponse;
GetTheResponseFromFTP(out reqFTP, out webResponse, true);
FtpWebResponse response = (FtpWebResponse)webResponse;
Stream responseStream = response.GetResponseStream();
StreamReader reader = new StreamReader(responseStream);
using (StreamWriter streamWriter =
new StreamWriter("d:\\TestUnity.pdf", true))
{
streamWriter.WriteLine(reader.ReadToEnd());
}
reader.Close();
response.Close();
}
I have the above function, that download a file from the FTP location.
I am reading the text and trying to write it in a file in my local machine.
The PDf file generated is of the same size as it is downloaded but when I open the file its blank. Now I have two questions:
Can any one suggest how to save the downloaded file to a path which can be changed.
Whats the reason for the above problem mentioned.
From the documentation.
StreamWriter implements a TextWriter for writing characters to a stream
This means you haven't created a pdf file but a textfile with the *.pdf extension.
There are multiple utilities available to create a pdf
WkHtmlToPDF and ITextSharp are just two
Here is very simple code which works for me
void GeneratePDF(WebResponse response)
{
using (var streamFile = File.Create("E:/JSS.pdf"))
response.GetResponseStream().CopyTo(streamFile);
}

put generated pdf file without saving it on the server

I have code (in a .ashx-file) that generates a PDF file from a PDF template. The generated pdf gets personalized with a name and a code. I use iTextSharp to do so.
This is the code:
using (var existingFileStream = new FileStream(fileNameExisting, FileMode.Open))
using (var newFileStream = new FileStream(fileNameNew, FileMode.Create))
{
var pdfReader = new PdfReader(existingFileStream);
var stamper = new PdfStamper(pdfReader, newFileStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
form.SetField("Name", name);
form.SetField("Code", code);
stamper.FormFlattening = true;
stamper.Close();
pdfReader.Close();
}
context.Response.AppendHeader("content-disposition", "inline; filename=zenith_coupon.pdf");
context.Response.TransmitFile(fileNameNew);
context.Response.ContentType = "application/pdf";
This works, but it saves the file on the server. I don't want to do that because there're going to be a lot of people downloading the PDF file and the server will be full in no time.
So my question is, how can I generate a PDF with iTextSharp without saving it and put it to the user?
Instead of using a FileStream you could use a MemoryStream and then use Response.Write() to output the stream contents.
You can use any Stream (for example MemoryStream) for the intermediate PDF (in your code currently named newFileStream) if you don't want to save it as a file - for sample code see http://www.developerfusion.com/code/6623/dynamically-generating-pdfs-in-net/ and http://forums.asp.net/t/1093198.aspx/1.
Just remember to rewind (i.e. set Position = 0) the MemoryStream before transmitting it to the client (for example by Response.Write or CopyTo (Response.OutputStream) )...

Categories