How to design parallel web api in c#? - c#

I am trying to design a web api that can get data from an external server but with limitations. I'm trying to figure out how best to design it to be efficient.
My api has an endpoint that takes an input. It is is a domain name like tom#domain.com. My endpoint then makes an http call to the domain to get an auth token, then makes another call to that domain with the username to get some data which is returned to the client. However my api can accept multiple usernames (comma delimited like ?users=tom#domain.a.com, bill#domain.b.com). My web server knows for each domain what is the max parallel connections I can make to get the data.
So the problem is how to organize the data so I can maximize parallel computing but stay within the limits.
Here's my thoughts:
First parse the user list and group them up. Then have a static dictionary. Key is domain, value is a custom object which has 2 queues. Both queues holds a list of Tasks (from async/await). However the first queue max length will be the value of the limit for that domain.
?users=bill#D.com, max#D.com, sarah#A.com, tom#D.com
dictionary = {
"D.com" : [
[],
["bill#D.com", "max#D.com", "tom#D.com"]
],
"A.com" : [
[],
["sarah#A.com"]
]
}
Then I can run a code every second, which loops through all dictionary values, and fills the first queue with as many Task objects from the second queue (.e. removing from 2nd queue and putting in first) so its within the limit.
As soon as its in the first queue, the task executes using Parallel.Invoke() then when the task is completed it gets removed from first queue (unless some request is waiting for it, explained in next paragraph).
I do this because if another api request is made to my endpoint with some names thats already from the first request, I want to reuse it. So If it's in the first queue, I call await on that Task.
Somehow when a task finishes, I need to know that no other people are waiting for that user in the task, and in that case, remove it from the first queue. Also if a client disconnects it should remove the watching of the users part for that client.
Does anyone know if this is a good approach?

Since it's parallel, you know right away you're probably going to need to use System.Collections.Concurrent, and since you need key/value lookup (user identifier/HTTP response) you need a ConcurrentDictionary. And since there is a common cache for all users, you will want to store it in a static variable, which is available to all threads and all HTTP requests.
Here is a simple example:
public class MyCacheClass
{
//Store the list of users/requests
static private ConcurrentDictionary<string, Task<HttpResponseMessage>> _cache = new ConcurrentDictionary<string, Task<HttpResponseMessage>>();
//Get from the ConcurrentDictionary or add if it's not there
public async Task<HttpResponseMessage> GetUser(string key)
{
return await _cache.GetOrAdd(key, GetResponse(key));
}
//You just to implement this method, potentially in a subclass, to get the data
protected virtual async Task<HttpResponseMessage> GetResponse(string key)
{
var httpClient = new HttpClient();
var url = string.Format(#"http://www.google.com?q={0}", key);
return await httpClient.GetAsync(url);
}
}
Then to get a user's information, just call:
var o = new MyCacheClass();
var userInfo = await o.GetUser(userID);
Note: If you're going to use code like this on a production system, you might consider adding some means of purging or trimming the cache after a period of time or when it reaches a certain size. Otherwise your solution may not scale the way you need it to.

Related

Task is not running asynchronously

I have wrapped the action in Task.Run but it seems that I am missing something very basic. But unable to figure it out.
public void SaveOrderList(List<Order> inputList)
{
Dictionary<string, string> result = new Dictionary<string, string>();
string code = string.Empty;
Task.Run(() =>
{
foreach (var item in inputList)
{
code = CreateSingleOrder(item);
result.Add(item.TicketNumber, code);
}
////TODO: Write logic to send mail
emailSender.SendEmail("abc#xyz.com");
});
}
Since there can be many entries in inputList and each entry may take 5 sec to process, I don't want the UI to be blocked for end user. Instead, I will send a mail and notify how many processed successfully and what all are failed.
To achieve this, best I knew was Task.Run. But, the problem is as soon as function completes, I don't see that the code inside the foreach loop ever worked because it never made to the DB.
Can anyone help me find out what is that I am missing here.
Just for information, this function is called from Web API and Web API POST method is called from javascript. Below is the code for Web API endpoint.
[HttpPost, Route("SaveOrderList")]
[ResponseType(typeof(bool))]
public IHttpActionResult SaveOrderList(List<Order> orderList)
{
orderManagerService.SaveOrderList(orderList)
return this.Ok();
}
Thanks in advance for help.
You need to consider carefully how this works. There are a few suggestions in this article:
https://blog.stephencleary.com/2014/06/fire-and-forget-on-asp-net.html
But I would point out that 'fire and forget' on a web application is usually the wrong approach.
For your example, you really want to consider your UX - if I make an order on your site and then only find out some time later that the order failed (via email, which I may not be checking), I'd not be too impressed. It would be better to await the save result, or make multiple API requests for single order items and show the incremental result of successful orders on your front end.
I'd also suggest a hard look at why your order saving is so slow - this will continue to be problematic for you until it's faster.

Send and return variable with c# API call?

I have a c# script task in an ssis package designed to geocode data through my company's proprietary system. It currently works like this:
1) Pull query of addresses and put in data table
2) Loop through that table and Foreach row, build request, send request, wait for response, then insert back into the database.
The issue is that each call takes forever to return, because before going out and getting a new address on the api side, it checks a current database(string match) to ensure the address does not already exist. If not exists, then go out and get me new data from a service like google.
Because I'm doing one at a time, it makes it easy to keep the ID field with the record when I go back to insert it into the database.
Now comes the issue at hand... I was told to configure this as multi-thread or asynchronous. Here is the page I was reading on here about this topic:
ASP.NET Multithreading Web Requests
var urls = new List<string>();
var results = new ConcurrentBag<OccupationSearch>();
Parallel.ForEach(urls, url =>
{
WebRequest request = WebRequest.Create(requestUrl);
string response = new StreamReader(request.GetResponse().GetResponseStream()).ReadToEnd();
var result = JsonSerializer().Deserialize<OccupationSearch>(new JsonTextReader(new StringReader(response)));
results.Add(result);
});
Perhaps I'm thinking about this wrong, but if I send 2 requests(A & B) and lets say B actually returns first, how can I ensure that when I go back to update my database I'm updating the correct record? Can I send the ID with the API call and return it?
My thoughts are to create an array of requests, burn through them without waiting for a response and return those value in another array, that I will then loop through on my insert statement.
Is this a good way of going about this? I've never used Parrallel.ForEach, and all the info I find on it is too technical for me to visualize and apply to my situation.
Perhaps I'm thinking about this wrong, but if I send 2 requests(A & B) and lets say B actually returns first, how can I ensure that when I go back to update my database I'm updating the correct record? Can I send the ID with the API call and return it?
None of your code contains anything that looks like an "ID," but I assume everything you need is in the URL. If that is the case, one simple answer is to use a Dictionary instead of a Bag.
List<string> urls = GetListOfUrlsFromSomewhere();
var results = new ConcurrentDictionary<string, OccupationSearch>();
Parallel.ForEach(urls.Distinct(), url =>
{
WebRequest request = WebRequest.Create(url);
string response = new StreamReader(request.GetResponse().GetResponseStream()).ReadToEnd();
var result = JsonSerializer().Deserialize<OccupationSearch>(new JsonTextReader(new StringReader(response)));
results.TryAdd(url, result);
});
After this code is done, the results dictionary will contain entries that correlate each response back to the original URL.
Note: you might want to use HttpClient instead of WebClient, and you should take care to dispose of your disposable objects, e.g. StreamReader and StringReader.

ASP.NET Web API: How to create a persistent collection across requests?

I have a Web API providing a backend to an Angular.JS web application. The backend API needs to track the state of user activities. (Example: it needs to note which content ID a user last retrieved from the API)
Most access to the API is authenticated via username/password. For these instances, it works fine for me to store the user state in our database.
However, we do need to allow "guest" access to the service. For guests, the state does need to be tracked but should not be persisted long-term (e.g. session-level tracking). I'd really like to not have to generate "pseudo users" in our user table just to store the state for guest users, which does not need to be maintained for a significant period of time.
My plan is to generate a random value and store it in the client as a cookie. (for guests only - we use bearer authentication for authenticated users.) I would then store whatever state is necessary in an in-memory object, such as a Dictionary, using the random value as a key. I could then expire items off the dictionary periodically. It is perfectly acceptable for this data to be lost if the Web API is ever relaunched, and it would even be acceptable for the dictionary to be reset say, every day at a certain time.
What I don't know how to do in WebAPI is create the dictionary object, so that it will persist across Web API calls. I basically need a singleton dictionary object that will maintain its contents for as long as the server is running the Web API (barring a scheduled clearing or programmatic flushing)
I had the idea of dumping the Dictionary off to disk every time an API call is made, and then reading it back in when it's needed, but this does not allow for multiple simultaneous in-flight requests. The only method I can think of right now is to add another database table (guest_state or something) and replicate the users table, and then setup some sort of manual method to regularly clean out the data in the guest table.
Summary: what I need is
a way to store some data persistently in a Web API backend without having to go off to a database
preferably store this data in a Dictionary object so I can use randomly-generated session IDs as the key, and an object to store the state
the data is OK to be cleared after a set period of time or on a regular basis (not too frequently, maybe a minimum of a 6 hour persistence)
I figured out a solution using the Singleton pattern:
public static class Services
{
private static Dictionary<string, string> cache;
private static object cacheLock = new object();
public static Dictionary<string,string> AppCache
{
get
{
lock (cacheLock)
{
if (cache == null)
{
cache = new Dictionary<string, string>();
}
return cache;
}
}
}
}
public class testController()
{
[HttpGet]
public HttpResponseMessage persist()
{
HttpResponseMessage hrm = Request.CreateResponse();
hrm.StatusCode = HttpStatusCode.OK;
Services.AppCache.Add(Guid.NewGuid().ToString(), DateTime.Now.ToString());
string resp = "";
foreach (string s in Services.AppCache.Keys)
{
resp += String.Format("{0}\t{1}\n", s, Services.AppCache[s]);
}
resp += String.Format("{0} records.", Services.AppCache.Keys.Count);
hrm.Content = new StringContent(resp, System.Text.Encoding.ASCII, "text/plain");
return hrm;
}
}
It seems the Services.AppCache object successfully holds onto data until either the idle timeout expires or the application pool recycles. Luckily I can control all of that in IIS, so I moved my app to its own AppPool and setup the idle timeout and recycling as appropriate, based on when I'm ok with the data being flushed.
Sadly, if you don't have control over IIS (or can't ask the admin to set the settings for you), this may not work if the default expirations are too soon for you... At that point using something like a LocalDB file or even a flat JSON file might be more useful.

Calling method async from WCF service and return immediately

A third party is calling our WCF service. The caller wants confirmation, that the sent records have been received and stored, within a small timeframe.
The records that are stored need some lenghty processing. Can the processing be executed async, right after storing the records, so the confirmation can be send immediately?
Ofcourse there can be a separate process that does the processing, but the question is whether I can combine storage and processing without timing out.
Update:
It looks like this works:
var aTask = new Task(myService.TheMethod);
aTask.Start();
return aVariableAsync;
Or is this a very bad idea to do from within my WCF host, because.. ?
You can set "AsyncPattern" to true on the OperationContract attribute as described on MSDN.
You can then control the concurrency using the following attribute on the service method:
[ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Multiple)]
Yes it can be done. I dont have a ton of experience with it, but this is a snippet of some code showing that. The service calls this method on the controller that saves an xml message to the hard drive and then kicks off a separate task to process it into MongoDB and returns a message back to the service that it was successfully saved.
public string SaveTransaction(XElement pTransactionXml, string pSavePath)
{
//save the transaction to the drive locally
pTransactionXml.Save(pSavePath);
...
var mongoTask = Task.Run(async () =>
{
await SendXMLFilesToMongo(pSavePath);
});
return response.WithResult("Successfully saved to disk.");
}
public virtual async Task<int> SendXMLFilesToMongo(string pSavePath)
{
//call the code to save to mongo and do additional processing
}

Finding Connection by UserId in SignalR

I have a webpage that uses ajax polling to get stock market updates from the server. I'd like to use SignalR instead, but I'm having trouble understanding how/if it would work.
ok, it's not really stock market updates, but the analogy works.
The SignalR examples I've seen send messages to either the current connection, all connections, or groups. In my example the stock updates happen outside of the current connection, so there's no such thing as the 'current connection'. And a user's account is associated with a few stocks, so sending a stock notification to all connections or to groups doesn't work either. I need to be able to find a connection associated with a certain userId.
Here's a fake code example:
foreach(var stock in StockService.GetStocksWithBigNews())
{
var userIds = UserService.GetUserIdsThatCareAboutStock(stock);
var connections = /* find connections associated with user ids */;
foreach(var connection in connections)
{
connection.Send(...);
}
}
In this question on filtering connections, they mention that I could keep current connections in memory but (1) it's bad for scaling and (2) it's bad for multi node websites. Both of these points are critically important to our current application. That makes me think I'd have to send a message out to all nodes to find users connected to each node >> my brain explodes in confusion.
THE QUESTION
How do I find a connection for a specific user that is scalable? Am I thinking about this the wrong way?
I created a little project last night to learn this also. I used 1.0 alpha and it was Straight forward. I created a Hub and from there on it just worked :)
I my project i have N Compute Units(some servers processing work), when they start up they invoke the ComputeUnitRegister.
await HubProxy.Invoke("ComputeUnitReqisted", _ComputeGuid);
and every time they do something they call
HubProxy.Invoke("Running", _ComputeGuid);
where HubProxy is :
HubConnection Hub = new HubConnection(RoleEnvironment.IsAvailable ?
RoleEnvironment.GetConfigurationSettingValue("SignalREndPoint"):
"http://taskqueue.cloudapp.net/");
IHubProxy HubProxy = Hub.CreateHubProxy("ComputeUnits");
I used RoleEnviroment.IsAvailable because i can now run this as a Azure Role , a Console App or what ever in .NET 4.5. The Hub is placed in a MVC4 Website project and is started like this:
GlobalHost.Configuration.ConnectionTimeout = TimeSpan.FromSeconds(50);
RouteTable.Routes.MapHubs();
public class ComputeUnits : Hub
{
public Task Running(Guid MyGuid)
{
return Clients.Group(MyGuid.ToString()).ComputeUnitHeartBeat(MyGuid,
DateTime.UtcNow.ToEpochMilliseconds());
}
public Task ComputeUnitReqister(Guid MyGuid)
{
Groups.Add(Context.ConnectionId, "ComputeUnits").Wait();
return Clients.Others.ComputeUnitCameOnline(new { Guid = MyGuid,
HeartBeat = DateTime.UtcNow.ToEpochMilliseconds() });
}
public void SubscribeToHeartBeats(Guid MyGuid)
{
Groups.Add(Context.ConnectionId, MyGuid.ToString());
}
}
My clients are Javascript clients, that have methods for(let me know if you need to see the code for this also). But basicly they listhen for the ComputeUnitCameOnline and when its run they call on the server SubscribeToHeartBeats. This means that whenever the server compute unit is doing some work it will call Running, which will trigger a ComputeUnitHeartBeat on javascript clients.
I hope you can use this to see how Groups and Connections can be used. And last, its also scaled out over multiply azure roles by adding a few lines of code:
GlobalHost.HubPipeline.EnableAutoRejoiningGroups();
GlobalHost.DependencyResolver.UseServiceBus(
serviceBusConnectionString,
2,
3,
GetRoleInstanceNumber(),
topicPathPrefix /* the prefix applied to the name of each topic used */
);
You can get the connection string on the servicebus on azure, remember the Provider=SharedSecret. But when adding the nuget packaged the connectionstring syntax is also pasted into your web.config.
2 is how many topics to split it about. Topics can contain 1Gb of data, so depending on performance you can increase it.
3 is the number of nodes to split it out on. I used 3 because i have 2 Azure Instances, and my localhost. You can get the RoleNumber like this (note that i hard coded my localhost to 2).
private static int GetRoleInstanceNumber()
{
if (!RoleEnvironment.IsAvailable)
return 2;
var roleInstanceId = RoleEnvironment.CurrentRoleInstance.Id;
var li1 = roleInstanceId.LastIndexOf(".");
var li2 = roleInstanceId.LastIndexOf("_");
var roleInstanceNo = roleInstanceId.Substring(Math.Max(li1, li2) + 1);
return Int32.Parse(roleInstanceNo);
}
You can see it all live at : http://taskqueue.cloudapp.net/#/compute-units
When using SignalR, after a client has connected to the server they are served up a Connection ID (this is essential to providing real time communication). Yes this is stored in memory but SignalR also can be used in multi-node environments. You can use the Redis or even Sql Server backplane (more to come) for example. So long story short, we take care of your scale-out scenarios for you via backplanes/service bus' without you having to worry about it.

Categories