PDF upload encoding issue

PDF upload encoding issue - c#

I'll get straight to the point: how to upload PDF files from a C# backend into a HTTP web service inside a multipart/form-data request without the contents being mangled to the point of the file becoming unreadable? The web service documentation only states that text files should be text/plain and image files should be binary; PDF files are only mentioned as "also supported", with no mention of what format or encoding they should be in.
The code I'm using to create the request:
HttpWebRequest request;
string boundary = "---------------------------" + DateTime.Now.Ticks.ToString("x");
request.ContentType = "multipart/form-data; boundary=" + boundary;
using (StreamWriter sw = new StreamWriter(request.GetRequestStream())) {
sw.WriteLine("--" + boundary);
sw.WriteLine("Content-Disposition: form-data; name=\"files\"; filename=\"" + Path.GetFileName(filePath) + "\"");
sw.WriteLine(filePath.EndsWith(".pdf") ? "Content-Type: application/pdf" : "Content-Type: text/plain");
sw.WriteLine();
if (filePath.EndsWith(".pdf")) {
// write PDF content into the request stream
}
else sw.WriteLine(File.ReadAllText(filePath));
sw.Write("--" + boundary);
sw.Write("--");
sw.Flush();
}
For simple text files, this code works just fine. However, I have trouble uploading a PDF file.
Writing the file into the request body using StreamWriter.WriteLine with either File.ReadAllText or Encoding.UTF8.GetString(File.ReadAllBytes) results in the uploaded file being unreadable due to .NET having replaced all the non-UTF-8 bytes with squares (which somehow also increased file size by over 100 kB). Same result with UTF-7 and ANSI, but UTF-8 results in the closest match to the original file's contents.
Writing the file into the request body as binary data using either BinaryWriter or Stream.Write results in the web service rejecting it outright as invalid POST data. Content-Transfer-Encoding: binary (indicated by the documentation as necessary for application/http, hence why I tried) also causes rejection.
What alternative options are available? How can I encode PDF without .NET silently replacing the invalid bytes with placeholder characters? Note that I have no control over what kind of content the web service accepts; if I did, I'd already have moved on to base64.

Problem solved, my bad. The multipart form header and the binary data were both correct but were in the wrong order because I didn't Flush() the StreamWriter before writing the binary data into the request stream with Stream.CopyTo().
Moral of the story: if you're writing into the same Stream with more than one Writer at the same time, always Flush() before doing anything with the next Writer.

Related

Rest Sharp consuming API that returns raw file

I am making a call to a an API which returns a file. I am told that the API returns...
"The payload is the raw file data with http file type headers."
The API returns the following response in the response.content...
%PDF-1.4\n% ����\n4\n0\nobj\n<<\n/Type\n/Catalog\n/Names\n<<\n/JavaScript\n3\n0\nR\n>>\n/PageLabels\n<<\n/Nums\n[\n0\n<<\n/S\n/D\n/St\n1\n>>\n]\n>>\n/Outlines\n2\n0\nR\n/Pages\n1\n0\nR\n>>\nendobj\n5\n0\nobj\n<<\n/Creator\n(��\0G\0o\0o\0g\0l\0e)\n>>\nendobj\n6\n0\nobj\n<<\n/Type\n/Page\n/Parent\n1\n0\nR\n/MediaBox\n[\n0\n0\n720\n405\n]\n/Contents\n7\n0\nR\n/Resources\n8\n0\nR\n/Annots\n10\n0\nR\n/Group\n<<\n/S\n/Transparency\n/CS\n/DeviceRGB\n>>\n>>\nendobj\n7\n0\nobj\n<<\n/Filter\n/FlateDecode\n/Length\n9\n0\nR\n>>\nstream\nx��SMK\u00031\u0010\rx��!��4�L2�QQ)^��ڃ��
This is not the full response as it is too big to post.
What is the best way to now download that as a file?
So far I have tried the below...
string file = ViewModel.Candidate.Id + "_" + g.ToString() + extension;
byte[] array = Encoding.ASCII.GetBytes(response.Content);
System.IO.File.WriteAllBytes(#"C:\Temp\" + file, array);
but it just creates an empty file
-- UPDATE
I can save file but get the below message

If you need to download raw file bytes use DownloadData method
byte[] fileContent = client.DownloadData(request);
//using RestSharp.Extensions;
fileContent.SaveAs(filepath);

how can I download Google Drive file in compressed format in c#

I am trying to improve performance for my application by downloading google drive files in compressed format. I am using this as reference.
https://developers.google.com/drive/api/v2/performance#gzip
I tried various things in the header of the HttpwebRequest I am sending , but couldnt achieve success. Can anyone please help me in this regard.
This is the code I am using.
public void DownloadFile(string url, string filename)
{
try
{
var tstart = DateTime.Now;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Timeout = 60 * 1000; // Timeout
request.Headers.Add("Authorization", "Bearer" + " " + AuthenticationKey);
request.Headers.Add("Accept-Encoding", "gzip,deflate");
request.UserAgent = "MyApplication/11.14 (gzip)";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string content = response.GetResponseHeader("content-disposition");
Console.WriteLine(content);
System.IO.Stream received = response.GetResponseStream();
using (System.IO.FileStream file = new System.IO.FileStream(filename, System.IO.FileMode.Create, System.IO.FileAccess.Write))
{
received.CopyTo(file);
}
var tend = DateTime.Now;
Console.WriteLine("time taken to download '{0}' is {1} seconds", filename, (tend - tstart).TotalSeconds);
}
catch (WebException e)
{
Console.WriteLine("Exception thrown - {0}", e.Message);
}
}

Using gzip
An easy and convenient way to reduce the bandwidth needed for each request is to enable gzip compression. Although this requires additional CPU time to uncompress the results, the trade-off with network costs usually makes it very worthwhile.
In order to receive a gzip-encoded response you must do two things: Set an Accept-Encoding header, and modify your user agent to contain the string gzip. Here is an example of properly formed HTTP headers for enabling gzip compression:
Accept-Encoding: gzip
User-Agent: my program (gzip)
Gzip compress the actual response from the API. Not the actual file you are downloading. For example file.export returns a file.resource json object this object would be compressed. Not the actual data of the file you will need to download. Files are downloaded in their corresponding type manage downloads While google may be able to convert a google doc file to a ms word file when you download it. It is not going to covert a google doc file to a zip file so that you can download it zipped.

Get Size of Image File before downloading from web

I am downloading image files from web using the following code in my Console Application.
WebClient client = new WebClient();
client.DownloadFile(string address_of_image_file,string filename);
The code is running absolutely fine.
I want to know if there is a way i can get the size of this image file before I download it.
PS- Actually I have written code to make a crawler which moves around the site downloading image files. So I doesn't know its size beforehand. All I have is the complete path of file which has been extracted from the source of webpage.

Here is a simple example you can try
if you have files of different extensions like .GIF, .JPG, etc
you can create a variable or wrap the code within a Switch Case Statement
System.Net.WebClient client = new System.Net.WebClient();
client.OpenRead("http://someURL.com/Images/MyImage.jpg");
Int64 bytes_total= Convert.ToInt64(client.ResponseHeaders["Content-Length"])
MessageBox.Show(bytes_total.ToString() + " Bytes");

If the web-service gives you a Content-Length HTTP header then it will be the image file size. However, if the web-service wants to "stream" data to you (using Chunk encoding), then you won't know until the whole file is downloaded.

You can use this code:
using System.Net;
public long GetFileSize(string url)
{
long result = 0;
WebRequest req = WebRequest.Create(url);
req.Method = "HEAD";
using (WebResponse resp = req.GetResponse())
{
if (long.TryParse(resp.Headers.Get("Content-Length"), out long contentLength))
{
result = contentLength;
}
}
return result;
}

You can use an HttpWebRequest to query the HEAD Method of the file and check the Content-Length in the response

You should look at this answer: C# Get http:/…/File Size where your question is fully explained. It's using HEAD HTTP request to retrieve the file size, but you can also read "Content-Length" header during GET request before reading response stream.

Download an Excel file

I have read some past posts here on how to download a Excel file from a website. So, I have setup the below code:
string path = MapPath(fname);
string name = Path.GetFileName(path);
string ext = Path.GetExtension(path);
string type = "application/vnd.ms-excel";
if (forceDownload)
{
Response.AppendHeader("content-disposition",
"attachment; filename=" + name);
}
if (type != "")
{
Response.ContentType = type;
Response.WriteFile(path);
Response.End();
}
However, I get no download dialog box.
I try this both in IE 8 and FireFox 10.0.2.
The file is there, it's not locked, and it's not set to read only.
I'm not sure were I went wrong.

According to this link, you need to add this line:
strFileName = Path.GetFileName(path);
Response.TransmitFile( Server.MapPath(strFileName) );
This will cause a Open / Save As dialog box to pop up with the filename of SailBig.jpg as the default filename preset.
This of course assumes you're feeding a file that already exists. If you need to feed dynamically generated - say an image [or any file] that was generated in memory - you can use Response.BinaryWrite() to stream a byte array or write the output directly in Response.OutputStream.
EDIT:
Microsoft's MSDN site has a detailed explanation about File Downloading. It includes both samples for Java and .Net applications, the concept is the same:
Get the response.
With the response:
Set the content type to "APPLICATION/OCTET-STREAM" (it means there's no application to open the file).
Set the header to "Content-Disposition", "attachment; filename=\"" + + "\"".
Write the file content into the response.
Close the response.
So, looking at the MSDN ASP.Net file download, you're lacking the 2.3 step. You're just writing the file name to the response.
// transfer the file byte-by-byte to the response object
System.IO.FileInfo fileToDownload = new
System.IO.FileInfo("C:\\downloadJSP\\DownloadConv\\myFile.txt");
Response.Flush();
Response.WriteFile(fileToDownload.FullName);
With this example you will download your file successfully, of course if you can get the file with no problems :).
EDIT 2:
The HTML component used to download any file must be a regular HTML Request. Any ajax request to download a file won't work. Microsoft explains that here. And the main quote:
Its impossible to attach an event before and after a download through javascript. Browser doesn't allow this type of events for security reasons.

You need to send this before the file attachment header:
Response.ContentType = "application/vnd.ms-excel"
See: Export data to excel file from Classic ASP failing

Try adding such HTTP headers
Content-Type: application/force-download
Content-Type: application/vnd.ms-excel
Content-Type: application/download

Invalid Character in base64 string when decoding XML

We have a Winform client app that is comsuming a web service we write. This client app requests documents that are contained in XML files, generally a PDF written to a base64 encoded binary field in the XML file.
Client successfully downloads, decodes, and opens 99% of the documents correctly.
However, we've started encountering some files that are failing when the client makes this call:
byte[] buffer = Convert.FromBase64String(xNode["fileIMAGE"].InnerText);
System.FormatException-
Message="Invalid character in a Base-64 string."
Source="mscorlib"
We've written out the base64 blob from the XML file to a text file. I don't see any "\0" characters. I could post the whole blob, but it's quite large.
Any ideas?

Issue Resolved
To stream the file from the server, we use a callback function to read/write chunks of the file. We were base64encoding each chunk. WRONG.
Resolution- Write all the chunks to a global memorystream object. At the end of the callbacks, then do the base64 encoding.
In the callback function:
if (brData.ChunkNo == 1)
{
// Set the Content-type of the file
if (brData.MimeType.Length < 1)
{
mimeType = "application/unknown";
}
else
{
mimeType = brData.MimeType;
}
msbase64Out = new MemoryStream();
}
if (brData.bytesJustRead > 0)
{
fileMS.WriteTo(msbase64Out);
}
if (brData.bytesRemaining < 1)
{
byte[] imgBytes = msbase64Out.ToArray();
string img64 = Convert.ToBase64String(imgBytes);
viewdocWriter.WriteString(img64);
}
msbase64Out is a global memory stream that gets written to each time the callback is called.
viewdocWriter is a global XML writer that is responsible for writing out the XML stream that gets sent to the client app.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

PDF upload encoding issue - c#

Related

Rest Sharp consuming API that returns raw file

how can I download Google Drive file in compressed format in c#

Get Size of Image File before downloading from web

Download an Excel file

Invalid Character in base64 string when decoding XML

Categories

Resources