Maintaining Data across machines in the Cloud

Maintaining Data across machines in the Cloud - c#

I'm working on a Cloud-Hosted ZipFile creation service.
This is a Cross-Origin WebApi2 service used to provide ZipFiles from a file system that cannot host any server side code.
The basic operation goes like this:
User makes a POST request with a string[] of Urls that correlate to file locations
WebApi reads the array into memory, and creates a ticket number
WebApi returns the ticket number to the user
AJAX callback then redirects the user to a web address with the ticket number appended, which returns the zip file in the HttpResponseMessage
In order to handle the ticket system, my design approach was to set up a Global Dictionary that paired a randomly generated 10 digit number to a List<String> value, and the dictionary was paired to a Queue storing 10,000 entries at a time. ([Reference here][1])
This is partially due to the fact that WebApi does not support Cache
When I make my AJAX call locally, it works 100% of the time. When I make the call remotely, it works about 20% of the time.
When it fails, this is the error I get:
The given key was not present in the dictionary.
Meaning, the ticket number was not found in the Global Dictionary Object.
We (with the help of Stack) tracked down the issue to multiple servers in the Cloud.
In this case, there are three.
That doesn't mean there is a one-in-three chance of this working, what seems to be going on is this:
Calls made while the browser is on the cloud site work 100% of the time because the same machine handles the whole operation end-to-end
Calls made from other sites work far less often because there is no continuity between the machine who takes the AJAX call, and the machine who takes the subsequent REDIRECT to the website to download the file. It's simple luck of the draw if the same machine handles both.
Now, I'm sure we could create a database to handle requests, but that seems like a lot more work to maintain state among these machines.
Is there any non-database way for these machines to maintain the same Dictionary across all sessions that doesn't involve setting up a fourth machine just to handle queue?

Is the reason for the dictionary simply to have a queue of operations?
It seems you either need:
A third machine that hosts the queue (despite your objection). If you're using Azure, an obvious choice might be the distributed Azure Cache Service.
To forget about the dictionary and just have the server package and deliver the requested result, perhaps in an asynchronous operation.

If your ASP.NET web app uses session state, you will need to configure an external session state provider (either the Redis Cache Service or a SQL Server session state provider).
There's a step-by-step guide here.

Related

What is an efficient way of checking if an url corresponds to the current machine

I have a C# application acting as an HTTP server which hypothetically can be reached at example1.com, example2.com, etc.
The server does not have this information at startup. Instead, it looks at the "host" field in every HTTP request to learn its "known names" and populates a list, i.e., ('example1.com', 'example2.com', 'localhost')
If the server receives an incorrect or malicious HTTP request with an invalid host field, it will still add the wrong hostname.
I want to check the host field on HTTP requests coming into my server to see if they correspond to the current machine. Is is possible to do this without any additional network requests?

The app would need to test whether it's actually example.com. I don't see any other (reliable) solution. You can't necessarily rely on DNS lookups since the webapp could have a private address.
You can set up a special endpoint for these tests. The flow I imagine is something like this:
The server receives a request for www.example.com/blah.html
It's the first time that the application is asked about www.example.com so to make sure that it really is www.example.com, it generates and stores a large random number, say 123456 and an index, say 5.
The application then sends a challenge to www.example.com/verify_hostname, passing the index as a parameter (i.e. www.example.com/verify_hostname?index=5).
The thread that handles the verification request looks up the stored random number by its index, and responds with 123456.
After receiving a response of 123456, the server now knows that it really is accessible through www.example.com, at least for now.
Of course other variations of this solution are possible.
Note that this approach leverages the fact that the application's threads have shared memory to store the random number and index. If the webapp is deployed in a cluster, you'd need to replace this simple authentication scheme. One way to do it would be using a shared secret and challenge-response, or some other cryptographic solution.
One other thing - in the solution I propose there's an inherent race condition between the thread storing the random number and index, and the thread verifying them. You'd need to make sure that the chosen random number and index are stored before the verification thread tries to read them. Various eventual consistency collections won't be enough to guarantee it, so they might fail from time to time.

How to transfer context to a WebSocket session on reconnect?

I am working on a web application in C#, ASP.NET, and .NET framework 4.5 with the use of WebSockets. In order to plan for scalability in the future, the application pool has the option for web gardens enabled to simulate multiple web servers on my single development machine.
The issue I am having is how to handle re-connects on the websocket side. When a new websocket session is initially created, the client browser can indirectly lock records in a SQL database. But when the connection is lost, my boss would like the browser to attempt to re-connect to the same instance of the websocket server session so it doesn't need to re-lock anything.
I don't know if something like this is possible because on re-connect the load balancer will "randomly" select which web server to handle the new connection. I was thinking of some hack to work around this but it isn't very clean:
Client opens initial websocket connection on Server A and locks a record.
Client temporarily loses internet connection and the websocket closes. (It is important to note that the server side will wait up to 60 seconds before it "disposes" itself; therefore, the SQL record will remain locked until the 60 seconds has elapsed).
Client internet connection is restored and reconnects to the website but this time on Server B.
Server B sees that this context was initially connected on Server A; therefore, transfers the session to Server A.
Server A checks the process id to see if it is running in the correct worker process (in the case of a web garden).
Server A has found the initial instance and handles the connection.
I tried Googling this question but it doesn't seem like a very common issue because I don't think most websocket web apps keep records locked for as long that my applications does (which is could be up to an hour).
Thanks in advance for all of your help!
Update 3/15/2016
I was hoping that the Server.TransferRequest would have been helpful however it doesn't seem to work for web sockets. Would anyone know of a way to best transfer a websocket context from one process to another?

First, you might want to re-examine why you're locking records for a long time and requiring a client to come back to the same server every time. That is not the usual type of high scale web architecture and perhaps you're just creating this need to reconnect to the identical server because of that requirement when maybe you should rethink how that is designed so that your application would work just fine no matter which host a user connects to.
That would certainly simplify scaling to large numbers of users and servers if you could remove that requirement. You can always then implement local caching and semi-sticky connections later as a performance enhancement, but only after you release the requirement to 100% of the time connect to the same host.
If you're going to stick with that requirement to always connect to the same host, then you will ultimately need some sort of sticky load balancing. There are a lot of different schemes. Some are driven by the networking infrastructure in front of your server, some are driven by your server and some are even client driven. They all have different tradeoffs. Here's a brief run-down of some of the schemes:
Hardware, networking load balancer. Here you have a fairly transparent mechanism by which a hardware load balancer (which is really just software running on a custom piece of hardware) sits in front of your web server farm and uses various techniques to make sure whatever server a given user is originally connected to it will get reconnected to on subsequent connections. This can be based on various schemes (IP address, cookie value, etc...) as the key to identifying a particular user and it typically has a number of possible configurations for how it can work.
Proxy load balancer. This is essentially an all software version of the hardware load balancer. Here a proxy sits in front of your server farm and directs connections to a particular server based on some algorithm (IP address, cookie value, etc...).
Server Redirect. Here an incoming connection is randomly assigned to a server. Upon connection the server figures out where the connection is supposed to be connected to an returns a 302 redirect to the actual host causing the client to reconnect to the proper server. This involves one less layer of infrastructure (no physical load balancers), but exposes the different server endpoints to the outside world which the first two options do not.
Client Selection Algorithm. Here the client is given knowledge of the various server endpoints and is coded with an algorithm for consistently selecting one for this user. It could be a hash of a userID that is then divided into the server bucket pool and the end result is that client ends up choosing a particular DNS name such as cl003.myserver.com which it then connects to. This choice requires the least work server-side so can be simpler to implement, but it requires changing the client code in order to modify the algorithm.
For an article on sticky load balancing for Amazon Web Services to give you an idea on how one mechanism works, you can read this: Elastic Load Balancing: Configure Sticky Sessions for Your Load Balancer.
Here's another article on how the nginx proxy is configured for sticky load balancing.
You can find lots of other articles with a Google search for "sticky load balancing".
A discussion of the pros/cons of the various schemes is the subject of a much longer discussion and some of it involves knowledge of more specific requirements and specific capabilities of your infrastructure.

Notifications from Salesforce to .net application

We are building an asp.net web application which completely pushes the data to salesforce and is a forms authenticated website. to minimize no.of API calls to Salesforce and reduce the response time to end user in the website, when a user login, we store all the contact information in session object. But, the problem is, when some one changes information in Salesforce, how can i get to know in the asp.net web application to have the updated information queried again and update the session object.
I know there is salesforce listener we can use to have the notifications send interms of outbound messages. But, just wondering how can i manage to update my current running session object of a contact in the asp.net web application.
Your inputs are valuable to me.

If you do have access to the listener and you can use it to push events - then I think an approach like this would likely minimize events/API calls tremendously.
The remote service - SalesForce
The local service - a WCF/SOAP Kind of service
The web application - the ASP.NET app that you are referring to
The local cache - a caching system (could be filesystem, could be more elaborate)
First of all you should look into creating a very simple local service who's purpose is to receive API calls from SalesForce when data is changed. It's purpose should be to receive API calls when the data that matters to you is changed. When such a call is received, you should update a local cache with the new values. The web application should always and first check if the item that is requested is in the local cache, if not then you can allow it to make an API call to the remote service in order to retrieve data. Once data is retrieved, update local cache and display it. Therefore, from this point forward, unless data changes (which SalesForce should push changes to you and to your local cache) you should never ever have to make an API call ever again.
You could even evolve to pushing data when it is created in SalesForce and also doing a massive series of API calls to SalesForce when the new local service is in place and the remote service is properly configured. This will then give you a solution where the "internet could die" and you would still have access to the local cache and therefore data.
The only challenge here is that I don't know if SalesForce outgoing API calls can be retried easily if they fail (in case the local service does go down, or the internet does, or SalesForce is not available) in order to keep eventual consistency.
If the local cache is the Session object (which I don't recommend because it's volatile) just integrate the local service and the web application into the same umbrella (same app).
The challenges here are
Make sure changes (including creations and deletions) trigger the proper calls from the remote service to the local service
Make sure the local cache is up to date - eventual consistency should be fine as long as it only takes minutes to update it locally when changes occur - a good design should be within 30 seconds if all services are operating normally
Make sure that you can push back any changes back to SalesForce?
Don't trust the network - it will eventually fail - account for that possibility
Good luck, hope this helps

Store the values in the cache, and set the expiration time of the entry to be something low enough that when a change is made, the update will be noticed quickly enough. For you that may be a few hours, it may be less, or it could be days.

How to get end user Machine name in IIS logs

IIS 7.0 and above. No load balancer involved in this setup. File being requested is a small spacer image which can be requested synchronously or aynchronous load using JQuery. The file is not important, It is just a way to get the end user to hit this IIS server for analytics.
I have a requirement to capture machine name of visitors from IIS logs. Current Log has client IP address already in there. Problem is IPs are short lived in our environment and if I don't resolve it to a machine name soon enough, it is not useful. So we need the machine name for visiting IP determined pretty much in real time.
What is a good approach to go about this. These are the options I found...
1) Enable reverse DNS lookup in IIS -> http://www.expta.com/2010/01/how-to-enable-reverse-dns-lookup-in-iis.html. This affects server performace and I am worried this will end up holding the user request and cause his page to load slow due to the increased expense of reverse lookup operation
2) Write a IIS log module that does enhances logging by doing a revere lookup of IPs and writing machine names in the log. >> I'm afraid this will slow the request turnaround time for end user and affect server performance due to the reverse DNS lookup. Pretty much I guess this is me doing point 1 above instead of relying to Microsoft's built in capability. At the end the realtime reverse DNS lookups will affect performance.
3) Same as point 1 or 2 above, but I will change the HTML of the page users are hitting to load the IIS hosted image file using a Async javascript call (as opposed to an inline call). That way end suer doesn't have to wait for this IIS request to complete and can haverest of the page (the content that matters to them) load without depending on the spacer image request to complete. But then browser will still dedicate one thread for the async image loading and it still is a performance hit for the end user.
4) Just use default IIS logging to log in real time. Have a separate C# app read the file every 5 minutes or so, detect the new lines added, parse them and get IP, do a reverse lookup and find machine name and log it to a database or flat file as requested. Flip side is that now I need to pretty much log in real time because if I don't log things immediately, the IP might have gotten assigned to a different machine by the time my application reads the log, finds it and does a reverse lookup on it. Also I have to deal with the complexity of reading the log file to read only newly inserted log entries after the previous read etc.
5) http://www.iis.net/learn/extensions/advanced-logging-module/advanced-logging-for-iis-real-time-logging -> I guess this is the same as point 2 above except it is written in VC++ instead of C#. So same disadvantages are there for this method also I guess
So every method out there seems to have downsides. What do you think is a good way to go around solving the problem?

Reversing IP to machine name is not possible due to the way routing works - many machines can come via the same IP.
If you found a way to map IP to machine name that is acceptable for you one approach could be to simply have site serving the image and doing all necessary discovery in normal request handler. This way you may also have more information about user (cookies, hauthentication headers, ...). This approach may be more flexible than configuring IIS logging.

WCF - solution architecture

I am working on a project in which a WCF service will be consumed by iOS apps. The number of hits expected on the webserver at any given point in time is around 900-1000. Every request may take 1-2 seconds to complete. The same number of requests are expected on every second 24/7.
This is what my plan:
Write WCF RESTful service (the instance context mode will be percall).
Request/Response will be in Json.
There are some information that needs to be persisted in the server - this information is actually received from another remote system - which is shared among all the requests. Since using a database may not be a good idea (response time is very important - 2 seconds is the max the customer can wait), would it be good to keep it in server memory (say a static Dictionary - assume this dictionary will be a collection of 150000 objects - each object consists of 5-7 string types and their keys). I know, this is volatile!
Each request will spawn a new thread (by using Threading.Timers) to do some cleanup - this thread will do some database read/write as well.
Now, if there is a load balancer introduced sometime later, the in-memory stored objects cannot be shared between requests routed through another node - any ideas?
I hope you gurus could help me by throwing your comments/suggestions on the entire architecture, WCF throttling, object state persistence etc. Please provide some pointers on the required Hardware as well. We plan to use Windows 2008 Enterprise Edition server, IIS and SQL Server 2008 Std edition database.
Adding more t #3:
As I said, we get some information to the service from a remote system. On the web server where the the WCF is hosted, a client of the remote system will be installed and WCF references one of this client dlls to get the information, in the form of a hashtable(that method returns a hashtable - around 150000 objects will be there in this collection). Would you suggest writing this information to the database, and the iOS requests (on every second) which reach the service retrieves this information from the database directly? Would it perform better than consuming directly from this hashtable if this is made static?

Since you are using Windows Server 2008 I would definitely use the Windows Server App Fabric Cache to store your state:
http://msdn.microsoft.com/en-us/library/ff383813.aspx
It is free to use, well supported and integrated and is (more or less) API compatible with the Windows Azure App Fabric Cache if you every shift your service to Azure. In our company (disclaimer: not my team) we used to use MemCache but changed to the App Fabirc Cache and don't regret it.

Let me throw some comments/suggestions based on my experience in serving a similar amount or request under the WCF framework, 3.5 back in the days.
I don't agree to #3. Using a database here is the right thing to do. To address response time, implement caching and possibly cache dependency in order to keep the data synchronized across all instances (assuming that you are load balanced)(also see App Fabric suggested above/below). In real world scenarios, data changes, often, and you must minimize the impact.
We used Barracuda hardware and software to handle scalability as far as I can tell.
Consider indexing keys/values with Lucene if applicable. Lucene delivers extremely good performances when it comes to read/write. Do not use it to store your entire data, read on it. A life saver if used correctly. Note that it could be complicated to implement on a load balanced environment.
Basically, caching might be the only necessary change to your architecture.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.