Best server design for MongoDB based system (low response times needed) - c#

I am developing a web service which will serve users all over the world.
The server is based on a C# WCF application hosted on IIS.
It uses an MsSQL Configuration Database (access time is not important here),
and a MongoDB database which contains all the important data (access time is VERY important here).
Also it serves small images (48px * 48px JPEGs).
Now, for the image hosting I will probably use Amazon's CloudFront CDN hosting (unless you guys have better suggestions).
My issue is maintaining a low access time (ping) to both the Web Application and the MongoDB.
I was thinking to lease 4 servers in Singapore + US + Europe + Middle East to get a low response time.
Each server will hold the Web Application and an instance of MongoDB.
And one server will hold the MsSQL instance.
I need all MongoDB's to be synced (not instantly if its an issue).
What design would you use?

Low access time is a function of cost vs benefit. First you need to identify, how low is low. Do you need a response time of 100ms overall from the app? or 1s?
Once you do, you map out the different costs.
Total time taken = time for request + //across internet
processing by web app + request for data +
preparing the response + response back to client.
If your desired latency is 100ms, there is a good chance that it can't be done regardless of how fast your servers are, simple because network traffic might take too long.
You need to analyze your dataset. Querying 1000 documents is different from querying 1 billion docs. You need to calculate how much size the index is taking, and is it in RAM or not. If index is not in RAM, your access is going to be slow.
Mongodb configuration
Mongodb can work in a cluster, with automatic syncing (immediate or delayed, this is configurable), and automatic failover (or manual, this is configurable too). It also supports sharding if your dataset is huge, so request is sent to the server that actually contains data.
Similarly, you need to have a look at your app server and figure out how slow/fast components are to get a guaranteed response time.
With the information you have provided, this is about as detailed a response I can give.
Profile and then optimize
If 80% of your requests come from middle east, then you ought make it fast for them first. Using the same principal, you need to figure out the slowest components in response time, and improve them. In order to do that, you need to gather the data.
Clustering
Setting up a cluster in one continent or across continents, will help you provide redundancy, automatic failover (if configured), and load balancing (depending on how you configure it). If you have alot of data, consider sharding.
Consider going through the docs for replication and sharding.
Example Server Setup
Suppose you want to have 10 shards with replication factor of 3 i.e your data is divided across 10 servers and each server really is a replica set of 3 servers (for availability and fail over) i.e each server in the replica set contains duplicate data.
here notation s1p1 means, shard1 - primary 1 and s1s1 is shard 1 secondary 1 and so on
s1p1 s2p1 ... s10p1
s1s1 s2s1 ... s10s1
s1s2 s2s2 ... s10s2
Shards 1-10 divide the data, where each shard approximately keeps 1/10 of total. Each shard comprises of a replica set with a primary and 2 secondaries. You can increase this if you need more redundancy. Try to keep it to odd, so during elections there is a tie breaker. If you want to have only 2 copies of the data, then you can also introduce an "Arbiter" to break the tie.
You could analyze your queries, and choose a shard key so that they go to the closest server, or the server which serves the region. You'll most likely have to do some sort of analysis to optimize this bit.
Hope it helps.

Related

How to Minimize Data Transfer Out From Azure Query in C# .NET

I have a small table(23 rows, 2 int columns), just a basic user-activity monitor. The first column represents user id. The second column holds a value that should be unique to every user, but I must alert the users if two values are the same. I'm using an Azure Sql database to hold this table, and Linq to Sql in C# to run the query.
The problem: Microsoft will bill me based on data transferred out of their data-centers. I would like have all of my users to be aware of the current state of this table at all times, second by second, and keep data-transfer under 5 GB per month. I'm thinking along the lines of a Linq-To-Sql expression such as
UserActivity.Where(x => x.Val == myVal).Count() > 1;
But this would download the table to the client, which cannot happen. Should I be implementing a Linq solution? Or would SqlDataReader download less metadata from the server? Am I taking the right approach by using a database at all? Gimme thoughts!
If it is data transfer you are worried about you need to do your processing on the server and return only the results. A SQLDataReader solution can return a smaller, already processed set of data to minimise the traffic.
A couple thoughts here:
First, I strongly encourage you to profile the SQL generated by your LINQ-to-SQL queries. There are several tools available for this, here's one at random (I have no particular preference or affiliation):
LINQ Profiler from Devart
Your prior experience with LINQ query inefficiency notwithstanding, the LINQ sample you quote in your question isn't particularly complex so I would expect you could make it or similar work efficiently, given a good feedback mechanism like the tool above or similar.
Second, you don't explicitly mention whether your query client is running in Azure or outside, but I gather from your concern about data egress costs that its running outside Azure. So the data egress costs are going to be query results using the TDS protocol (low-level protocol for SQL Server), which is pretty efficient. Some quick back-of-the-napkin math shows that you should be fine to stay below your monthly 5 GB limit:
23 users
10 hours/day
30 days/month (less if only weekdays)
3600 requests/hour/user
32 bits of raw data per response
= about 95 MB of raw response data per month
Even if you assume 10x overhead of TDS for header metadata, etc. (and if my math is right :-) ) then you've still got plenty of room underneath 5 GB. The point isn't that you should stop thinking about it and assume it's fine... but don't assume it isn't fine, either. In fact, don't assume anything. Test, and measure, and make an informed choice. I suspect you'll find a way to stay well under 5 GB without much trouble, even with LINQ.
One other thought... perhaps you could consider running your query inside Azure, and weigh the cost of that vs. the cost of data egress under the "query running outside Azure" scenario? This could (for example) take the form of a small Azure Web Job that runs the query every second and notifies the 23 users if the count goes above 1.
Azure Web Jobs
In essence, you wouldn't notify them if the condition is false, only when it's true. As for the notification mechanism, there are various cloud-friendly options:
Azure mobile push notifications
SMS messaging
SignalR notifications
The key here is to determine whether its more cost-effective and in line with any bigger-picture technology or business goals to have each user issue the query continuously, or to use some separate process in Azure to notify users asynchronously if the "trigger condition" is met.
Best of luck!

What is "Usage" in relation to Microsoft Azure billing

this might be a stupid question but I have to ask. I've never used Azure before but a client is looking to send some SQL databases and their web server to the cloud. On the Azure site they refer to billing for usage per hour.
If I create 10 SQL Databases, is usage considered the actual amount of time they were used by the application, or am I charged for the amount of time I had the database instances themselves? Same with a web application...if the web application goes 2 weeks without any web traffic, does that still count as usage since I have the app live in Azure? If the app is not used then the databases wouldn't be either, so both would be idle and not used at the moment.
I guess I'm just confused as to what the word "usage" is actually referring to.
Meaning of Usage in Azure varies based on the type of resource. For some items, usage is calculated in terms of consumed hours (websites, virtual machines etc. would come there) whereas for certain items it is calculated in terms of consumed space (azure storage is a good example of that).
Also, please note that pricing is not based on the utilization (e.g. how many times a website got hit) but based on provisioning. So in your example, if a website is provisioned for you, you will pay for it irrespective of the fact that anybody is using that website or not.
I would recommend taking a look at Azure Pricing Calculator to understand approximately how much are you going to pay by resource type.

High performance real time project .NET, SQL Server

I have a demanding project and I need your starting guidelines on this!
I need to have a database with approximately 2.000.000 records with markers lat,lng. These markers are moving objects and update their positions every 10 seconds. If the received marker does not exist in the database it needs to be inserted.
I need somehow the end user to have a realtime data in the web request e.g (www.example.com/getmarkers?minlat=x&maxlat=x&minlng=x&maxlng=x&zoom=x) for the specified zoom and eliminate the markers that overlap each other.
The main server app will receive the update commands via TCP and UDP protocol on multiple ports
Can I use C sharp and a memory datatable to do all these updates every second? Also can the end user hit this datatable so everything stays in memory to be faster? What do you think about performance and what is your opinion for develop a project like this? Real time data is what I need
I prefer to user C#, SQL Server 2008
Thanks a lot
I’d start of by making estimates based on following data with the goal of estimation number of requests per minute or second.
Average number of moving markers at any time. If you have 200 vehicles to track how many do you expect to be moving simultaneously? Does time of the day matter? If it does make sure you make calculations based on the peak hours.
How many simultaneous requests from users do you expect? If you have 800 users are they going to be using the application throughout the whole day or only several times a day or once a week?
Once you get the data multiply it by at least 3. This will accommodate for all false assumptions you may have made in the calculations and allow for future growth.
Once you get the final number it will be a lot easier to decide whether you need only one two 6-core CPU server, four 12 core CPU server or a mini data center with in memory databases and other advanced stuff

Live Analytic Data

I'm planning on creating a live analytics's page for my website - A bit like Google Analytic but will real live data which will change as new users load a page on my site etc.
The main site is/will be written using Asp.Net/C# as the back end with a MS SQL database and the front end will support things like JavaScript (JQuery), CSS3, HTML5 (If required).
I was wondering what methods can I use to have the live analytic in terms of; How to get the data onto the analytic's page, what efficient graphing can I use, and storing the data with fast input/output.
The first thing that came to my mind is to use Node.js - Could I use this to achieve a live analytic's page? Is a good idea? Are there any better alternatives? Any drawbacks with this?
Would I need a C# Application running on a server to use Node.js to send/receive all the data to and from the website?
Would using a MS SQL database be fast enough? Would I need to store all the data live, or could I store it in chunks every x amount of seconds/minutes? (Which would be more efficient?)
This illustrates my initial thoughts on the matter -
Edit:
I'm going to be using this system over multiple sites, I could be getting 10 hits at a time to around 1,000,000 (Highly unlikely, but still possible). I want to be able to scale this system and adapt it to the environment it's in.
It really depends on how "real time" the realtime data needs to be. For example, I made this recently:
http://www.reed.co.uk/labs/realtime/
Which shows job applications coming into the system. Obviously there is way too much going on during busy periods to actually be querying the main database in realtime - so, what we do is query a sliding "window" and cache it on the server - this is a chunk of the last 5minutes worth of events.
We then play this back to the user as is it's happening "now". having a little latency as part of a SLA (wherein the users don't really care) can make the whole system vastly more scalable.
[EDIT- further explanation]
The data is just retrieved from a basic stored procedure call - naturally, a big system like reed has hundreds of transactions/second - so we cant keep hitting the main cluster, for every user.
All we do, is make sure we have a current window, in this case the last 5min of data cached on the server. When a client comes to the site, we get that last 5min of data, and play it back like it's happening right now - the end user is none-the-wiser - but what it means is that all clients are reading off the cache. Once the cache is 5min old, we invalidate it, and start again. This means a max of 1 DB hit, every five min - thus making teh system vastly more scalable (not that it really needs to be - as it's just for fun, really)
Just so you are aware Google analytics's already offers live user tracking. when inside the dashboard of a site on Google analytics's. click the home button on the top bar, and then the real time button on the left bar. Considering the design work and quality of this service, it seems this may be a better option then to attempt to recreate its service. If you do choose to proceed to create your own, then you can at least use their services as a benchmark for the desired features.
Using Api's like the googles charting API https://developers.google.com/chart/ would be a good approach to displaying the output of your stored data, with decreased development time. If you provide more information on the number of hits you exspect, and the scale of the server this software will be hosted, then it will be easier to give you answers to the speed questions.

Reducing roundtrips to the server/database

When writing ASP.NET pages, what signs do you look for that your page is making too many roundtrips to a database or server?
(This is a general question but I say ASP.NET as the majority of my coding is on the web side of things).
How much is too much? The €1M question! Profile. Then profile. If your app is spending most of its time doing data access, you have a problem (and should look at a sql trace). If it is spending most of its time drawing the UI, then (assuming your view isn't doing data access) you should probably look elsewhere first...
Round trips are more relevant to latency than the total quantity of data being moved, so it really does make sense to optimize for them. The usual way is to use stored procedures that do multiple steps, perhaps even returning multiple result sets.
What I do is I look at the ASP performance counters and SQL performance counters. To get an accurate measurement you must ensure that there is no random noise activity on the SQL Server (ie. import batches running unrelated to the web site).
The relevant counters I look at are:
SQL Statistics/Batch requests/sec: This indicates exactly how many Transact-SQL batches the server receives. It can be, in most cases, equated 1:1 with the number of round trips from the web site to SQL.
Databases/Transaction/sec: this counter is instanced per database, so I can quickly see in which database there is 'activity'. This way I can correlate the web site data roundtrips (ie. my app logic requests, goes to app database) and the ASP session state and user stuff (goes to Asp session db or tempdb)
Databases/Write Transaction/sec: This I correlate with the counters above (transaction per second) so I can get a feel of the read-to-write ratio the site is doing.
ASP.NET Applications/Requests/sec: With this counter I can get the number of requests/sec the site is seeing. Correlated with the number of SQL Batch Requests/sec it gives a good indication of the average number of round-trips per request.
The next thing to measure is usually trying to get a feel for where is the time spent in the request. On my own project, I use abundantly performance counters I publish myself so is really easy to measure. But I'm not always so lucky as to clean up only my own mess... Profiling is usually not an option for me because I most times troubleshoot live production systems I cannot instrument.
My approach is to try to sort out the SQL side of things first, since it's easy to find the relevant statistics for execution times in SQL: SQL keeps a nice aggregated statistic ready to look at in sys.dm_exec_query_stats. I can also use Profiler to measure execution duration in real time. With some analysis of these numbers collected, knowing the normal request pattern of the most visited pages, you can give a pretty good estimate of the total time spent in SQL per web request. If this times adds up to nearly all the time it takes a request to serve the page, then you have your answer.
And to answer the original question title: to reduce the number of round-trips, you make fewer requests. Seriously. First, caching what is appropriate to cache I guess is obvious. Second you reduce the complexity: don't display unnecessary data on each page, you cache and display stale data when you can get away with it, you hide details on secondary navigation panels.
If you feel that the problem is the number of round-trips per se as opposed to the number of requests (ie. you would benefit tremendously from batching multiple requests in one round-trip), then you should somehow measure that the round-trip overhead is what's killing you. With connection pooling on a normal network connection this is usually not the most important factor.
And finally you should look if everything that can be done in sets is done in sets. If you have some half-brained ORM that retrieves objects one at a time from an ID keyset, get rid of it.
I know that this may sound reiterative, but client server round trips depends of how many program logic is located at any side of the connection.
First thing to check is validation: You have to validate and sanitize your input at server side always, but it does not means that you cannot do it too at client side too reducing a round trips that are been used only too check input.
At second: What can you do at client side to reduce server side overload? There are calculations that you can check or make at client side. There is also AJAX that can be used to load only a percentage of the page that is changing.
At third: Can you delegate work to another server? If your server is too loaded, why not to use web services or simply delegate some side of the logic to another server?
As Mark wrote: ¿How is too much? It is is up to you and your budget.
When writing ASP.NET pages, what signs
do you look for that your page is
making too many roundtrips to a
database or server?
Of course it all depends and you have to profile. However, here are some indicators, they do not mean there is a problem, but often will indicate
Page is taking a very long time to render locally.
Read this question: Slow response-time cheat sheet , In particular this link
To render the page you need more than 30 round trips. I pulled that number out of my hat, but assuming a round trip is taking about 3.5ms then 30 round trips will kick you over the 100ms guideline (before any other kind of processing).
All the queries involved in rendering the page are heavily optimized and do not take longer than a millisecond or two to execute. There are no operations that require lots of CPU cycles that execute every time you render the page.
Data access is abstracted away and not cached in any kind of way. If, for example, GetCustomer will call the DAL which in turn issues a query and your page is asking for 100 Customer objects which are not retrieved in a batch, you are probably in trouble.

Categories