I'm using WebClient to mine a bunch of data. To conserve bandwidth (for both the client and web server), and speed my program up, I'd like to abort certain downloads early if it becomes evident that the file I'm downloading doesn't contain the information I'm looking for.
I'd like to base this decision based on the headers (mime type and file size), and possibly some of the content.
I'm presently using webClient.DownloadData, but I'd obviously have to switch this to an asynchronous method call. However, the async version doesn't pass the information I need either (headers and data). Is there perhaps another freely available class that meets these requirements?
Something that fires an event as soon as the headers have completed downloading would be nice, and periodically with progress updates.
If you want to decide whether or not to download something based on the headers, you can also send a HTTP HEAD request, which tells the server to only reply its headers.
Use the WebRequest class.
Related
I have a scenario where I have a stream of URLs which I need to make a HTTP request against. I'll then download the data received and save it in Blob storage. I have to do this using Azure functions so that I'm only paying for the service when there are actually URLs to process.
However, the difficulty I'm having is conceiving of a way of triggering downloads through a limited number of proxies. Although I'm happy for the download function to scale out to the number of proxies I have available, I want each proxy to deal with each URL it receives in series. In other words, each proxy must be limited to downloading data from one URL at a time.
I considered having URLs in one queue and proxies in another queue and triggering a function when one of each is available, then pushing the used proxy back into the proxy queue, but functions can only take one trigger.
I also considered creating as many queues as there are proxies and distributing URLs between the queues, but I'm not sure how to limit the concurrency on each triggered function to one.
Anybody got an idea how to do this?
Okay, I found a way to do this via this post:
https://medium.com/#yuka1984/azure-functions-%E3%81%AE-singletonattribute%E3%81%A8mode%E3%83%97%E3%83%AD%E3%83%91%E3%83%86%E3%82%A3-bb728062198e
The answer is to add a [Singleton] attribute to the function.
However, according to this comment, you are spending money while your entities are awaiting processing:
https://github.com/Azure/azure-functions-host/issues/912#issuecomment-419608830
I'm trying to make an application in windows phone 8 and make it easy to extend or change by coding it in to simple classes. I would like to do a simple web call with a webclient that does a HTTP POST to log a user in to a service. Here is the flow I have so far:
User enters username and api details or they are called from previously saved details.
WebClient does a POST to web service and receives XML response with user information etc.
Call back executes after having set WebClient.UploadStringCompleted += new UploadStringCompletedEventHandler(name)
User information is processed and stuff done with it.
For various reasons and mainly for wanting to code in to classes and ultimately release the code as a free SDK for other developers, I wish the flow to look like this:
User enters details or called from saved details.
Function within class called something similar to if(class.Authenticate(user,apikey)){logged in stuff}; which returns a boolean value or even better an integer value so I can easily process errors from the web service.
Nothing i have tried will make this work and I can't get my head around how to make it work with async and await as webclients just seem to execute on their own thread and wouldn't hold one up for me until the call back was completed.
I've found a custom webclient class somewhere but it only did GET requests which isn't good enough for my needs.
You don't want to do this.
To do this you would need to block the executing thread until the server responds:
- This could be a very long time.
- If on the UI thread this would stop the user interacting with the UI.
- What if there's no, or a very slow, connection
This is the reason that WebClient doesn't have a synchronous way of making calls.
There are ways to make the execution appear synchronous but you'll get a lot more from learning to work with the framework and understanding why it wants you to work a specific way and why that's appropriate on an occasionally connected device.
You could also make this code look synchronous by making an Awaitable request. See more at http://msdn.microsoft.com/en-us/library/vstudio/hh191443.aspx
Problem: I need to download hundreds of images from different hosts. Each host has anywhere between 20-hundreds of images.
Solution: using a new WebClient every time a image needs to be downloaded through the WebClient's DownloadData method.
Or would be better to keep a pool of open socket connections and making the http request using lower level calls?
Is it expensive to open/close a tcp connection (I'm assuming that is what WebClient does), so that using a pools sounds more efficient?
I believe the underlying infrastructure which WebClient uses will already pool HTTP connections, so there's no need to do this. You may want to check using something like Wireshark of course, with some sample URLs.
Fundamentally, I'd take the same approach to this as with other programming tasks - write the code in the simplest way that works, and then check whether it performs well enough for your needs. If it does, you're done. If it doesn't, use appropriate tools (network analyzers etc) to work out why it's not performing well enough, and use more complicated code only if it fixes the problem.
My experience is that WebClient is fine if it doesn't what you need - but it doesn't give you quite as much fine-grained control as WebRequest. If you don't need that control, go with WebClient.
I use HttpWebRequest and HttpWebResponse to scrape anything I want. Unless, of course, there are services available for the requirement, but even though, sometimes, there are limitations (business limitations) and I often prefer to dig the html from pure http request. Sometimes just make feel more like developer, you know...
I'm writting a C# utility program that I need the ability to download and save a file from a URL.
I have the code working to obtain the URL from a web service call, however I'm having trouble finding a simple example of how to begin the download and save the data to a disk file.
Does anyone have a good example of this, or can provide me with an example?
Thanks!
EDIT - I should mention that these files will be 3 to 4 GB in size. So if there are special considerations for files of this size I would appreciate any advice.
WebClient.DownloadData, the spec contains a small sample. It is arguably more efficient to use WebRequest.GetResponseStream and save the data chunk by chunk, but you'll have to properly prepare the WebRequest yourself.
Updated
If you have to download 3-4GB files then you must do much much more than what the .Net framework offers. WebClient is immedeatly out-of-the-question since it returns the content as one monolithic byte[]. Even presuming that your VAS (virtual address space) has the contigous 4GB to burn on these downloads, .Net cannot alocate anything larger than 2Gb (on x64 as well). So you must use streams, as in GetResponseStream().
Second you must implement HTTP range units in your request, as per HTTP/1.1 section 3.12. Your request must contain Content-Range headers to be able to resume intrerupted downloads. And of course, your target server will have to accept and recognize these headers, and perhaps respond with a prpoer accept-ranges, which few servers do.
You have your plate full, downloading 4Gb is anything but trivial.
Just use WebClient.DownloadFile (or WebClient.DownloadFileAsync if you don't want to freeze the UI during the download)
Since you have gigantic files, prepare for some kind of connection recovery.
Whatever method you will find out to get something from the http (WebClient, TcpStream, ...) you should probably code with recovery in mind from the start. That should be focus here.
For that, it would be imperative to check if Stream returned from GetResponseStream() supports Seek().
I'm looking for a way to pause or resume an upload process via C#'s WebClient.
pseudocode:
WebClient Client = new WebClient();
Client.UploadFileAsync(new Uri("http://mysite.com/receiver.php"), "POST", "C:\MyFile.jpg");
Maybe something like..
Client.Pause();
any idea?
WebClient doesn't have this kind of functionality - even the slightly-lower-level HttpWebRequest doesn't, as far as I'm aware. You'll need to use an HTTP library which gives you more control over exactly when things happen (which will no doubt involve more code as well, of course). The point of WebClient is to provide a very simple API to use in very simple situations.
As stated by Jon Skeet, this is not available in the Webclient not HttpWebRequest classes.
However, if you have control of the server, that receives the upload; perhaps you could upload small chunks of the file using WebClient, and have the server assemble the chunks when all has been received. Then it would be somewhat easier for you to make a Pause/resume functionality.
If you do not have control of the server, you will need to use an API that gives you mere control, and subsequently gives you more stuff to worry about. And even then, the server might give you a time-out if you pause for too long.
ok, with out giving you code examples I will tell you what you can do.
Write a WCF service for your upload, that service needs to use streaming.
things to remember:
client and server needs to identify
the file some how i suggest the use
of a Guid so the server knows what
file to append the extra data too.
Client needs to keep track of
position in the array so it knows
where to begin the streaming after it
resumes it. (you can even get the
server to tell the client how much
data it has but make sure the client
knows too).
Server needs to keep track of how
much data it has already downloaded
and how much still missing. files
should have a life time on the
server, you dont want half uploaded
and forgotten files stored on the
server forever.
please remember that, streaming does
not allow authentication since the
whole call is just one httprequest.
you can use ssl but remember that
will add a overhead.
you will need to create the service
contract at message level standard
method wont do.
I currently writing a Blog post about the very subject, It will be posted this week with code samples for how to get it working.
you can check it on My blog
I know this does not contain code samples but the blog will have some but all in all this is one way of doing stop and resume of file uploads to a server.
To do something like this you must write your own worker thread that does the actual http post stepwise.
Before sending a you have to check if the operation is paused and stop sending file content until it is resumed.
However depending on the server the connection can be closed if it isn't active for certain period of time and this can be just couple of seconds.