I have a scenario where I have a stream of URLs which I need to make a HTTP request against. I'll then download the data received and save it in Blob storage. I have to do this using Azure functions so that I'm only paying for the service when there are actually URLs to process.
However, the difficulty I'm having is conceiving of a way of triggering downloads through a limited number of proxies. Although I'm happy for the download function to scale out to the number of proxies I have available, I want each proxy to deal with each URL it receives in series. In other words, each proxy must be limited to downloading data from one URL at a time.
I considered having URLs in one queue and proxies in another queue and triggering a function when one of each is available, then pushing the used proxy back into the proxy queue, but functions can only take one trigger.
I also considered creating as many queues as there are proxies and distributing URLs between the queues, but I'm not sure how to limit the concurrency on each triggered function to one.
Anybody got an idea how to do this?
Okay, I found a way to do this via this post:
https://medium.com/#yuka1984/azure-functions-%E3%81%AE-singletonattribute%E3%81%A8mode%E3%83%97%E3%83%AD%E3%83%91%E3%83%86%E3%82%A3-bb728062198e
The answer is to add a [Singleton] attribute to the function.
However, according to this comment, you are spending money while your entities are awaiting processing:
https://github.com/Azure/azure-functions-host/issues/912#issuecomment-419608830
Related
I have an endpoint which returns the response containing hotels and a flag which shows more results are available, the client needs to call this endpoint recursively till the time the server returns more results flag as false. What is the better way to implement this? Could anyone help me on this?
First Option: Avoid It If Possible
Please try to avoid calls on HTTP APIs so as to avoid network latency.
This is very important if you want to make multiple calls from a client which is supposed to be responsive.
e.g. if you are developing a web application / WPF application and you want user to click on something which triggers 10-20 calls to API, the operation may not complete quickly may result in poor user experience.
If it is a background job, then probably it multiple calls would make more sense.
Second Option: Optimize HTTP Calls From Client
If you still want to make multiple calls over HTTP, then you will have to somehow optimize the code in such a way that at least you avoid the network latency.
For avoiding network latency, you can bring all the data or major chunk of the data in one call on the client side. Then client can iterate over this set of data.
Even if you reduce half of the calls you buy much more time for client processing.
Another Option
You can also try to think if this can be a disconnected operation - client sending just one notification to server and then server performing all iterations.
Client can read status somewhere from database to know if this operation is complete.
That way your client UI would still say responsive and you will be able to offload all heavy processing to Server.
You will have to think and which of these options suits High Level Design of your product/project.
Hope I have given enough food for thoughts (although this may not be solving your issue directly).
I want to know if there is any elegant way to ensure that Queue always have distinct messages (nothing related to Duplicate Detection Window or any time period for that matter) ?
I know that Service Bus Queue provides session concepts (as I mentioned Duplicate Detection of Service Bus Queue won't help me as it depends on time period), which can serve my purpose, but I don't want my component's dependency on another Azure service, just because of this feature.
Thanks,
This is not possible to do reliably.
There is just no mechanism that can query a Storage queue and find out if a message with the same contents is already there or was there before. You can try to implement your own logic using some storage table, but that will not be reliable - as the entry into the table may succeed and then entry into the queue may fail - and now you would potentially have bad data in the table.
Your code should always assume that it can retrieve a message containing the same data that was already processed. This is because messages can come back to the queue when workers that are working on them crash or take too long.
You can use Service Bus. Is like Azure Storage Queue but it allows messages of 256Kb-1MB and makes duplicate detection
I'm tasked to create a web application. I'm currently using c# & asp.net (mvc - but i doubt its relevant to the question) - am a rookie developer and somewhat new to .net.
Part of the logic in the application im building is to make requests to an external smsgateway by means of hitting a particular url with a request - either as part of a user-initiated action in the webapp (could be a couple of messages send) or as part of a scheduledtask run daily (could and will be several thousand message send).
In relation to a daily task, i am afraid that looping - say - 10.000 times in one thread (especially if im also to take action depending on the response of the request - like write to a db) is not the best strategy and that i could gain some performance/timesavings from some parallelization.
Ultimately i'm more afraid that thousands of users at the same time (very likely) will perform the action that triggers a request. With a naive implementation that spawns some kind of background thread (whatever its called) for each request i fear a scenario with hundreds/thousands of requests at once.
So if my assumptions are correct - how do i deal with this? do i have to manually spawn some appropriate number of new Thread()s and coordinate their work from a producer/consumer-like queue or is there some easy way?
Cheers
If you have to make 10,000 requests to a service then it means that the service's API is anemic - probably CRUD-based, designed as a thin wrapper over a database instead of an actual service.
A single "request" to a well-designed service should convey all of the information required to perform a single "unit of work" - in other words, those 10,000 requests could very likely be consolidated into one request, or at least a small handful of requests. This is especially important if requests are going to a remote server or may take a long time to complete (and 2-3 seconds is an extremely long time in computing).
If you do not have control over the service, if you do not have the ability to change the specification or the API - then I think you're going to find this very difficult. A single machine simply can't handle 10,000 outgoing connections at once; it will struggle with even a few hundred. You can try to parallelize this, but even if you achieve a tenfold increase in throughput, it's still going to take half an hour to complete, which is the kind of task you probably don't want running on a public-facing web site (but then, maybe you do, I don't know the specifics).
Perhaps you could be more specific about the environment, the architecture, and what it is you're trying to do?
In response to your update (possibly having thousands of users all performing an action at the same time that requires you to send one or two SMS messages for each):
This sounds like exactly the kind of scenario where you should be using Message Queuing. It's actually not too difficult to set up a solution using WCF. Some of the main reasons why one uses a message queue are:
There are a large number of messages to send;
The sending application cannot afford to send them synchronously or wait for any kind of response;
The messages must eventually be delivered.
And your requirements fit this like a glove. Since you're already on the Microsoft stack, I'd definitely recommend an asynchronous WCF service backed by MSMQ.
If you are working with SOAP, or some other type XML request, you may not have an issue dealing with the level of requests in a loop.
I set up something similar using a SOAP server with 4-5K requests with no problem...
A SOAP request to a web service (assuming .NET 2.0 and superior) looks something like this:
WebServiceProxyClient myclient = new WebServiceProxyClient();
myclient.SomeOperation(parameter1, parameter2);
myclient.Close();
I'm assuming that this code will will be embedded into your business logic that you will be trigger as part of the user initiated action, or as part of the scheduled task.
You don't need to do anything especial in your code to cope with a high volume of users. This will actually be a matter of scalling on your platform.
When you say 10.000 request, what do you mean? 10.000 request per second/minute/hour, this is your page hit per day, etc?
I'd also look into using an AsyncController, so that your site doesn't quickly become completely unusable.
I am working on a class library that retrieves information from a third-party web site. The web site being accessed will stop responding if too many requests are made within a set time period (~0.5 seconds).
The public methods of my library directly relate to a resource an file on the web server. In other words, each time a method is called, an HttpWebRequest is created and sent to the server. If all goes well, an XML file is returned to the caller. However, if this is the second web request in less than 0.5s, the request will timeout.
My dilemma lies in how I should handle request throttling (if at all). Obviously, I don't want the caller sit around waiting for a response -- especially if I'm completely certain that their request will timeout.
Would it make more sense for my library to queue and throttle the webrequests I create, or should my library simply throw an exception if the a client does not wait long enough between API calls?
The concept of a library is to give its client code as little to worry about as possible. Therefore I would make it the libraries job to queue requests and return results in a timely manner. In an ideal world you would use a callback or delegate model so that the client code can operate in asynchronously, not blocking the UI. You could also offer the option for skipping the queue, (and failing if it operates too soon) and possibly even offer priorities within the queue model.
I also believe it is the responsibility of the library author to default to being a good citizen, and for the library's default operation to be to comply to the conditions of the data provider.
I'd say both - you're dealing with two independent systems and both should take measures to defend themselves from excessive load. The web server should refuse incoming connections, and the client library should take steps to reduce the requests it makes to a slow or unresponsive external service. A common pattern for dealing with this on the client is 'circuit breaker' which wraps calls to an external service, and fails fast for a certain period after failure.
That's the Web server's responsibility, imo. Because the critical load depends on hardware, network bandwidth, etc a lot of things that are outside of your application's control, it should not concern itself with trying the deal with it. IIS can throttle traffic based on various configuration options.
What kind of client is it? Is this an interactive client, for eg: GUI based app?
In that case, you can equate that to a webbrowser scenario, and let the timeout surface to the caller. Also, if you know for sure that this webserver is throttling requests, you can tell the client that he has to wait for a given time period before retrying. In that way, the client will not keep on re-issuing requests, and will know when the first timeout occurs that it is futile to issue requests too fast.
We are using a WCF service layer to return images from a repository. Some of the images are color, multi-page, nearly all are TIFF format. We experience slowness - one of many issues.
1.) What experiences have you had with returning images via WCF
2.) Do you have any suggestions tips for returning large images?
3.) All messages are serialized via SOAP correct?
4.) Does wcf do a poor job of compressing the large tiff files?
Thanks all!
Okay Just to second the responses by ZombieSheep and Seba Gomez, you should definitely look at streaming your data. By doing so you could seamlessly integrate the GZipStream into the process. On the client side you can reverse the compression process and convert the stream back to your desired image.
By using streaming there is a select number of classes that can be used as parameters/return types and you do need to modify your bindings throughout.
Here is the MSDN site on enabling streaming. This is the MSDN page that describes the restrictions on streaming contracts.
I assume you are also controlling the client side code, this might be really hard if you aren't. I have only used streaming when I had control of both the server and client.
Good luck.
If you are using another .Net assembly as your client, you can use two methodologies for returning large chunks of data, streaming or MTOM.
Streaming will allow you to pass a TIFF image as if it were a normal file stream on the local filesystem. See here for more details on the choices and their pros and cons.
Unfortunately, you're still going to have to transfer a large block of data, and I can't see any way around that, considering the points already raised.
I just wanted to add that it is pretty important to make sure your data is being streamed instead of buffered.
I read somewhere that even if you set transferMode to 'Streamed' if you aren't working with either a Stream itself, a Message or an implementation of IXmlSerializable, the message is not streamed.
Make sure you keep that in mind.
What bindings are you using? WCF will have some overheads, but if you use basic-http with MTOM you lose most of the base-64 overead. You'll still have the headers etc.
Another option would be to (wait for it...) not use WCF here - perhaps just a handler (ashx etc) that returns the binary.
Re compression - WCF itself won't have much hand in compression; the transport might, especially via IIS etc with gzip enabled - however, images are notorious for being hard to compress.
In a previous project I worked we had a similar issue. We had a Web Service in C# that received requests for medias. A media can range from files to images and was stored in a database using BLOB columns. Initially the web method that handled media retrieval requests read the chunk from the BLOB and returned in to the caller. This was one round trip to the server. The problem with this approach is that the client has no feedback of the progress of the operation.
There is no problem in computer
science that cannot be solved by an
extra level of indirection.
We started by refactoring the method in three methods.
Method1 setup the conversation between caller and the web service. This includes information about the request (like media Id) and capabilities exchange. The web service responded with a ticked Id which is used for the caller for future requests. This initial call is used for resource allocation.
Method2 is called consecutively until there is more that to be retrieved for the media. The call includes information about the current offset and the ticked Id that was provided when Method1 was called. The return updates the current position.
Method3 is called to finish request when Method2 reports that the reading of the request media has completed. This frees allocated resources.
This approach is practical because you can give immediate feedback to the user about the progress of the operation. You have a bonus that is to split the requests to Method2 in different threads. The progress than can be reported by chunk as some BitTorrent clients do.
Depending on the size of the BLOB you can choose to load it from the database on one go or reading it also by chunks. This means that you could use a balanced mechanism that based on a given watermark (BLOB size) chooses to load it in one go or by chunks.
If there is still a performance issue consider packaging the results using GZipStream or read about message encoders and specifically pay attention to the binary and Message Transmission Optimization Mechanism (MTOM).