Download N Megabytes of a XML file - c#

I want to download the first N Megabytes of huge XML File, so then I can close the broken tags with HTMLAgilityPack. Unfortunately, I can't use XMLReader.
I tried setting the Range on the HTTP Headers but that didn't seem to work, so now I'm trying this:
public string download(string url, int mb)
{
Int32 bytesToGet = 1048576 * mb;
HttpWebRequest request;
request = WebRequest.Create(url) as HttpWebRequest;
var buffer = new char[bytesToGet];
using (WebResponse response = request.GetResponse())
{
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
sr.Read(buffer, 0, bytesToGet);
}
}
return new string(buffer);
}
but this still doesn't work either. I tried it with mb=5 and I get just a few lines of the XML file.

You're only calling Read() once, which doesn't promise to fill your buffer. Keep count of bytes downloaded and keep reading until your buffer is full or the end of the stream is reached:
int offset = 0;
int bytesRead = 0;
do
{
bytesRead = sr.Read(buffer, offset, bytesToGet - offset);
offset += bytesRead;
} while (bytesRead > 0);

Related

Request stream fail to write

I have to upload a large file to the server with the following code snippet:
static async Task LordNoBugAsync(string token, string filePath, string uri)
{
HttpWebRequest fileWebRequest = (HttpWebRequest)WebRequest.Create(uri);
fileWebRequest.Method = "PATCH";
fileWebRequest.AllowWriteStreamBuffering = false; //this line tells to upload by chunks
fileWebRequest.ContentType = "application/x-www-form-urlencoded";
fileWebRequest.Headers["Authorization"] = "PHOENIX-TOKEN " + token;
fileWebRequest.KeepAlive = false;
fileWebRequest.Timeout = System.Threading.Timeout.Infinite;
fileWebRequest.Proxy = null;
using (FileStream fileStream = File.OpenRead(filePath) )
{
fileWebRequest.ContentLength = fileStream.Length; //have to provide length in order to upload by chunks
int bufferSize = 512000;
byte[] buffer = new byte[bufferSize];
int lastBytesRead = 0;
int byteCount = 0;
Stream requestStream = fileWebRequest.GetRequestStream();
requestStream.WriteTimeout = System.Threading.Timeout.Infinite;
while ((lastBytesRead = fileStream.Read(buffer, 0, bufferSize)) != 0)
{
if (lastBytesRead > 0)
{
await requestStream.WriteAsync(buffer, 0, lastBytesRead);
//for some reasons didnt really write to stream, but in fact buffer has content, >60MB
byteCount += bufferSize;
}
}
requestStream.Flush();
try
{
requestStream.Close();
requestStream.Dispose();
}
catch
{
Console.Write("Error");
}
try
{
fileStream.Close();
fileStream.Dispose();
}
catch
{
Console.Write("Error");
}
}
...getting response parts...
}
In the code, I made a HttpWebRequest and push the content to server with buffering. The code works perfectly for any files under 60MB.
I tried a 70MB pdf. The buffer array has different content for each buffering. Yet, the request stream does not seem to be getting written. The bytecount also reached 70M, showing the file is properly read.
Edit (more info): I set the break point at requestStream.Close(). It clearly takes ~2 mins for the request stream to write in 60MB files but only takes 2ms for 70MB files.
My calling:
Task magic = LordNoBugAsync(token, nameofFile, path);
magic.Wait();
I am sure my calling is correct (it works for 0B to 60MB files).
Any advice or suggestion is much appreciated.

pdf corrupted while downloading from URL VB.net/C#

Problem still there while i tried below three methods.
Using Window API "URLDownloadToFile"
WebClient Method
webclient.DownloadFile(url,dest) ''With/Without credientials
HTTP WebRequest Method:
public static void Download(String strURLFileandPath, String strFileSaveFileandPath)
{
HttpWebRequest wr = (HttpWebRequest)WebRequest.Create(strURLFileandPath);
HttpWebResponse ws = (HttpWebResponse)wr.GetResponse();
Stream str = ws.GetResponseStream();
byte[] inBuf = new byte[100000];
int bytesToRead = (int) inBuf.Length;
int bytesRead = 0;
while (bytesToRead > 0)
{
int n = str.Read(inBuf, bytesRead,bytesToRead);
if (n==0)
break;
bytesRead += n;
bytesToRead -= n;
}
FileStream fstr = new FileStream(strFileSaveFileandPath, FileMode.OpenOrCreate, FileAccess.Write);
fstr.Write(inBuf, 0, bytesRead);
str.Close();
fstr.Close();
}
Still i m facing the problem, file i am able to download at my local system, but when i open that it show Corrupt pdf.
!!!!I just want to download the pdf from URL and thats my query in VB.net/C# not using response method of ASP.net.
Please help if someone face this real problem.
Thanks in Advance!!!
Your code only writes 100000 bytes of the downloaded PDF and hence every PDF that is bigger than 100000 bytes gets corrupted.
To read more bytes you have to write the contents of every buffer to the FileStream.
The following should do it:
HttpWebRequest wr = (HttpWebRequest)WebRequest.Create(strURLFileandPath);
using (HttpWebResponse ws = (HttpWebResponse)wr.GetResponse())
using (Stream str = ws.GetResponseStream())
using (FileStream fstr = new FileStream(strFileSaveFileandPath, FileMode.OpenOrCreate, FileAccess.Write))
{
byte[] inBuf = new byte[100000];
int bytesRead = 0;
while ((bytesRead = str.Read(inBuf, 0, inBuf.Length)) > 0)
fstr.Write(inBuf, 0, bytesRead);
}
(It's good coding practice to use a using on every IDisposable instead of manually closing the streams.)

WebClient and WebRequest, downloading files > 4Gb

Whenever I try to download a file from a server (the server is on a device, it's not on the interwebs) > 4Gb, the transfer only actually transfers what appears to be (FileSize) % 4Gb. In other words, for a file just over 4.5Gb, I end up only transferring around 600mb of data.
It's something to do with content-length headers and so on I think, but I'm not sure what the exact mechanism is. I've tried using WebClient and WebRequest but both exhibit the same behaviour.
Does anyone have any idea how I can get past this limit? Here's my current loop:
byte[] buffer = new byte[4096];
WebRequest request = WebRequest.Create(new Uri(transferDetails.URL));
using (WebResponse response = request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
using (FileStream fileStream = new FileStream(actualPath, FileMode.Create, FileAccess.Write))
{
int count = 0;
do
{
// Read a block.
count = responseStream.Read(buffer, 0, buffer.Length);
// Write out to the local file.
if(count > 0)
{
fileStream.Write(buffer, 0, count);
}
} while (count != 0);
}
}
}

Download using c# code

I am developing c# application, in which i am downloading package(zip file) from server machine.It was downloading properly, but recently our package data has got some changes which is flex application.And by using c# we are downloading it into c drive or d drive.
Now with the new package i am facing some problem as
Unable to read data from the transport connection: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
My code is below
byte[] packageData = null;
packageData = touchServerClient.DownloadFile("/packages/" + this.PackageName);
public byte[] DownloadFile(string url)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(remoteSite.Url + url);
try
{
request.Method = "GET";
request.KeepAlive = false;
request.CookieContainer = new CookieContainer();
if (this.Cookies != null && this.Cookies.Count > 0)
request.CookieContainer.Add(this.Cookies);
HttpWebResponse webResponse = (HttpWebResponse)request.GetResponse();
// Console.WriteLine(response.StatusDescription);
Stream responseStream = webResponse.GetResponseStream();
int contentLength = Convert.ToInt32(webResponse.ContentLength);
byte[] fileData = StreamToByteArray(responseStream, contentLength);
return fileData;
}
public static byte[] StreamToByteArray(Stream stream, int initialLength)
{
// If we've been passed an unhelpful initial length, just
// use 32K.
if (initialLength < 1)
{
initialLength = 32768;
}
byte[] buffer = new byte[initialLength];
int read = 0;
int chunk;
while ((chunk = stream.Read(buffer, read, buffer.Length - read)) > 0)
{
read += chunk;
// If we've reached the end of our buffer, check to see if there's
// any more information
if (read == buffer.Length)
{
int nextByte = stream.ReadByte();
// End of stream? If so, we're done
if (nextByte == -1)
{
return buffer;
}
// Nope. Resize the buffer, put in the byte we've just
// read, and continue
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[read] = (byte)nextByte;
buffer = newBuffer;
read++;
}
}
// Buffer is now too big. Shrink it.
byte[] ret = new byte[read];
Array.Copy(buffer, ret, read);
return ret;
}
In the above function(StreamToByteArray) , i am getting error as
Unable to read data from the transport connection: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
Please help me on this, coz i am not supposed to change the code also.
Thanks in advance
Sangita
A few things to try:
Wrap your stream handling in a using statement.
This will clean up that resource.
using (Stream responseStream = webResponse.GetResponseStream())
{
int contentLength = Convert.ToInt32(webResponse.ContentLength);
byte[] fileData = StreamToByteArray(responseStream, contentLength);
return fileData;
}
Make sure there are no other heavy memory processes running on the same box. Particularly if they are making Socket-bound calls.
Try upping the value of the MaxUserPort registry value. Here is the article if you didn't see the link provided in the comments.

Download the first 1000 bytes of a file using C# [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Download the first 1000 bytes
I need to download a text file from the internet using C#. The file size can be quiet large and the information I need is always within the first 1000 bytes.
This is what I have so far. I found out that the server might ignore the range header. Is there a way to limit streamreader to only read the first 1000 characters?
string GetWebPageContent(string url)
{
string result = string.Empty;
HttpWebRequest request;
const int bytesToGet = 1000;
request = WebRequest.Create(url) as HttpWebRequest;
//get first 1000 bytes
request.AddRange(0, bytesToGet - 1);
// the following code is alternative, you may implement the function after your needs
using (WebResponse response = request.GetResponse())
{
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
result = sr.ReadToEnd();
}
}
return result;
}
Please follow-up in your question from yesterday!
There is a read method that you can specify the number of characters to read.
You can retrieve the first 1000 bytes from the stream, then decode the string from the bytes:
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
byte[] bytes = new byte[bytesToGet];
int count = stream.Read(bytes, 0, bytesToGet);
Encoding encoding = Encoding.GetEncoding(response.Encoding);
result = encoding.GetString(bytes, 0, count);
}
}
Instead of using request.AddRange() which may be ignored by some servers as you said, read 1000 bytes (1 KB = 1024 bytes) from stream and then close it. This is like you get disconnected from server after receiving 1000 bytes. Code:
int count = 0;
int result = 0;
byte[] buffer = new byte[1000];
// create stream from URL as you did above
do
{
// we want to read 1000 bytes but stream may read less. result = bytes read
result = stream.Read(buffer, 0, 1000); // Use try around this for error handling
count += result;
} while ((count < 1000) && (result != 0));
stream.Dispose();
// now buffer has the first 1000 bytes of your request

Categories