I have a demanding project and I need your starting guidelines on this!
I need to have a database with approximately 2.000.000 records with markers lat,lng. These markers are moving objects and update their positions every 10 seconds. If the received marker does not exist in the database it needs to be inserted.
I need somehow the end user to have a realtime data in the web request e.g (www.example.com/getmarkers?minlat=x&maxlat=x&minlng=x&maxlng=x&zoom=x) for the specified zoom and eliminate the markers that overlap each other.
The main server app will receive the update commands via TCP and UDP protocol on multiple ports
Can I use C sharp and a memory datatable to do all these updates every second? Also can the end user hit this datatable so everything stays in memory to be faster? What do you think about performance and what is your opinion for develop a project like this? Real time data is what I need
I prefer to user C#, SQL Server 2008
Thanks a lot
I’d start of by making estimates based on following data with the goal of estimation number of requests per minute or second.
Average number of moving markers at any time. If you have 200 vehicles to track how many do you expect to be moving simultaneously? Does time of the day matter? If it does make sure you make calculations based on the peak hours.
How many simultaneous requests from users do you expect? If you have 800 users are they going to be using the application throughout the whole day or only several times a day or once a week?
Once you get the data multiply it by at least 3. This will accommodate for all false assumptions you may have made in the calculations and allow for future growth.
Once you get the final number it will be a lot easier to decide whether you need only one two 6-core CPU server, four 12 core CPU server or a mini data center with in memory databases and other advanced stuff
Related
I have a small table(23 rows, 2 int columns), just a basic user-activity monitor. The first column represents user id. The second column holds a value that should be unique to every user, but I must alert the users if two values are the same. I'm using an Azure Sql database to hold this table, and Linq to Sql in C# to run the query.
The problem: Microsoft will bill me based on data transferred out of their data-centers. I would like have all of my users to be aware of the current state of this table at all times, second by second, and keep data-transfer under 5 GB per month. I'm thinking along the lines of a Linq-To-Sql expression such as
UserActivity.Where(x => x.Val == myVal).Count() > 1;
But this would download the table to the client, which cannot happen. Should I be implementing a Linq solution? Or would SqlDataReader download less metadata from the server? Am I taking the right approach by using a database at all? Gimme thoughts!
If it is data transfer you are worried about you need to do your processing on the server and return only the results. A SQLDataReader solution can return a smaller, already processed set of data to minimise the traffic.
A couple thoughts here:
First, I strongly encourage you to profile the SQL generated by your LINQ-to-SQL queries. There are several tools available for this, here's one at random (I have no particular preference or affiliation):
LINQ Profiler from Devart
Your prior experience with LINQ query inefficiency notwithstanding, the LINQ sample you quote in your question isn't particularly complex so I would expect you could make it or similar work efficiently, given a good feedback mechanism like the tool above or similar.
Second, you don't explicitly mention whether your query client is running in Azure or outside, but I gather from your concern about data egress costs that its running outside Azure. So the data egress costs are going to be query results using the TDS protocol (low-level protocol for SQL Server), which is pretty efficient. Some quick back-of-the-napkin math shows that you should be fine to stay below your monthly 5 GB limit:
23 users
10 hours/day
30 days/month (less if only weekdays)
3600 requests/hour/user
32 bits of raw data per response
= about 95 MB of raw response data per month
Even if you assume 10x overhead of TDS for header metadata, etc. (and if my math is right :-) ) then you've still got plenty of room underneath 5 GB. The point isn't that you should stop thinking about it and assume it's fine... but don't assume it isn't fine, either. In fact, don't assume anything. Test, and measure, and make an informed choice. I suspect you'll find a way to stay well under 5 GB without much trouble, even with LINQ.
One other thought... perhaps you could consider running your query inside Azure, and weigh the cost of that vs. the cost of data egress under the "query running outside Azure" scenario? This could (for example) take the form of a small Azure Web Job that runs the query every second and notifies the 23 users if the count goes above 1.
Azure Web Jobs
In essence, you wouldn't notify them if the condition is false, only when it's true. As for the notification mechanism, there are various cloud-friendly options:
Azure mobile push notifications
SMS messaging
SignalR notifications
The key here is to determine whether its more cost-effective and in line with any bigger-picture technology or business goals to have each user issue the query continuously, or to use some separate process in Azure to notify users asynchronously if the "trigger condition" is met.
Best of luck!
I am developing a web service which will serve users all over the world.
The server is based on a C# WCF application hosted on IIS.
It uses an MsSQL Configuration Database (access time is not important here),
and a MongoDB database which contains all the important data (access time is VERY important here).
Also it serves small images (48px * 48px JPEGs).
Now, for the image hosting I will probably use Amazon's CloudFront CDN hosting (unless you guys have better suggestions).
My issue is maintaining a low access time (ping) to both the Web Application and the MongoDB.
I was thinking to lease 4 servers in Singapore + US + Europe + Middle East to get a low response time.
Each server will hold the Web Application and an instance of MongoDB.
And one server will hold the MsSQL instance.
I need all MongoDB's to be synced (not instantly if its an issue).
What design would you use?
Low access time is a function of cost vs benefit. First you need to identify, how low is low. Do you need a response time of 100ms overall from the app? or 1s?
Once you do, you map out the different costs.
Total time taken = time for request + //across internet
processing by web app + request for data +
preparing the response + response back to client.
If your desired latency is 100ms, there is a good chance that it can't be done regardless of how fast your servers are, simple because network traffic might take too long.
You need to analyze your dataset. Querying 1000 documents is different from querying 1 billion docs. You need to calculate how much size the index is taking, and is it in RAM or not. If index is not in RAM, your access is going to be slow.
Mongodb configuration
Mongodb can work in a cluster, with automatic syncing (immediate or delayed, this is configurable), and automatic failover (or manual, this is configurable too). It also supports sharding if your dataset is huge, so request is sent to the server that actually contains data.
Similarly, you need to have a look at your app server and figure out how slow/fast components are to get a guaranteed response time.
With the information you have provided, this is about as detailed a response I can give.
Profile and then optimize
If 80% of your requests come from middle east, then you ought make it fast for them first. Using the same principal, you need to figure out the slowest components in response time, and improve them. In order to do that, you need to gather the data.
Clustering
Setting up a cluster in one continent or across continents, will help you provide redundancy, automatic failover (if configured), and load balancing (depending on how you configure it). If you have alot of data, consider sharding.
Consider going through the docs for replication and sharding.
Example Server Setup
Suppose you want to have 10 shards with replication factor of 3 i.e your data is divided across 10 servers and each server really is a replica set of 3 servers (for availability and fail over) i.e each server in the replica set contains duplicate data.
here notation s1p1 means, shard1 - primary 1 and s1s1 is shard 1 secondary 1 and so on
s1p1 s2p1 ... s10p1
s1s1 s2s1 ... s10s1
s1s2 s2s2 ... s10s2
Shards 1-10 divide the data, where each shard approximately keeps 1/10 of total. Each shard comprises of a replica set with a primary and 2 secondaries. You can increase this if you need more redundancy. Try to keep it to odd, so during elections there is a tie breaker. If you want to have only 2 copies of the data, then you can also introduce an "Arbiter" to break the tie.
You could analyze your queries, and choose a shard key so that they go to the closest server, or the server which serves the region. You'll most likely have to do some sort of analysis to optimize this bit.
Hope it helps.
I work with a data logging system in car racing. I am developing an application that aids in the analysis of this logged data and have found some of the query functionality with, datasets, datatables and LINQ to be very useful, i.e. Minimums, Averages, etc.
Currently, I am extracting all data from its native format to a data table and post-processing that data. I am also currently working with data where all channels are logged at the same rate, i.e. 50 Hz (50 samples per second). I would like to start writing this logged data to a database so it is somewhat platform independent, and the extraction process doesn't have to happen everytime I want to analyze a dataset.
Which leads me to the main question... Does anyone have a recommendation for the best way to store data that is related by time, but logged at different rates? I have approximately 200 channels that are logged and the rates vary from 1 Hz to 500 Hz.
Some of the methods I have thought of so far are:
creating a datatable for all data at 500 Hz using Double.NaN for values that are between actual logged samples
creating separate tables for each logging frquency, i.e. one table for 1 Hz, another for 10 Hz, and another for 500 Hz.
creating a separate table for each channel with a relationship to a time table. Each time step would then be indexed, and the table of data for each channel would not be dependent on a fixed time frequency
I think I'm leaning towards the index time stamp with a separate table for each channel, but I wanted to find out if anyone has advice on a best practice.
For the record, the datasets can range from 10 Mb to 200-300 Mb depending on the duration of the time that the car is on track.
I would like to have a single data store that houses an entire season, or at least an entire race event, so that is something I am considering as well.
Thanks very much for any advice!
Can you create a table something like:
Channel, Timestamp, Measurement
?
The database structure doesn't need to depend on the frequency; the frequency can be determined by the amount of time between timestamps.
This gives you more flexibility as you can write one piece of code to handle the calculations on all the channels, just give it a channel name.
When writing ASP.NET pages, what signs do you look for that your page is making too many roundtrips to a database or server?
(This is a general question but I say ASP.NET as the majority of my coding is on the web side of things).
How much is too much? The €1M question! Profile. Then profile. If your app is spending most of its time doing data access, you have a problem (and should look at a sql trace). If it is spending most of its time drawing the UI, then (assuming your view isn't doing data access) you should probably look elsewhere first...
Round trips are more relevant to latency than the total quantity of data being moved, so it really does make sense to optimize for them. The usual way is to use stored procedures that do multiple steps, perhaps even returning multiple result sets.
What I do is I look at the ASP performance counters and SQL performance counters. To get an accurate measurement you must ensure that there is no random noise activity on the SQL Server (ie. import batches running unrelated to the web site).
The relevant counters I look at are:
SQL Statistics/Batch requests/sec: This indicates exactly how many Transact-SQL batches the server receives. It can be, in most cases, equated 1:1 with the number of round trips from the web site to SQL.
Databases/Transaction/sec: this counter is instanced per database, so I can quickly see in which database there is 'activity'. This way I can correlate the web site data roundtrips (ie. my app logic requests, goes to app database) and the ASP session state and user stuff (goes to Asp session db or tempdb)
Databases/Write Transaction/sec: This I correlate with the counters above (transaction per second) so I can get a feel of the read-to-write ratio the site is doing.
ASP.NET Applications/Requests/sec: With this counter I can get the number of requests/sec the site is seeing. Correlated with the number of SQL Batch Requests/sec it gives a good indication of the average number of round-trips per request.
The next thing to measure is usually trying to get a feel for where is the time spent in the request. On my own project, I use abundantly performance counters I publish myself so is really easy to measure. But I'm not always so lucky as to clean up only my own mess... Profiling is usually not an option for me because I most times troubleshoot live production systems I cannot instrument.
My approach is to try to sort out the SQL side of things first, since it's easy to find the relevant statistics for execution times in SQL: SQL keeps a nice aggregated statistic ready to look at in sys.dm_exec_query_stats. I can also use Profiler to measure execution duration in real time. With some analysis of these numbers collected, knowing the normal request pattern of the most visited pages, you can give a pretty good estimate of the total time spent in SQL per web request. If this times adds up to nearly all the time it takes a request to serve the page, then you have your answer.
And to answer the original question title: to reduce the number of round-trips, you make fewer requests. Seriously. First, caching what is appropriate to cache I guess is obvious. Second you reduce the complexity: don't display unnecessary data on each page, you cache and display stale data when you can get away with it, you hide details on secondary navigation panels.
If you feel that the problem is the number of round-trips per se as opposed to the number of requests (ie. you would benefit tremendously from batching multiple requests in one round-trip), then you should somehow measure that the round-trip overhead is what's killing you. With connection pooling on a normal network connection this is usually not the most important factor.
And finally you should look if everything that can be done in sets is done in sets. If you have some half-brained ORM that retrieves objects one at a time from an ID keyset, get rid of it.
I know that this may sound reiterative, but client server round trips depends of how many program logic is located at any side of the connection.
First thing to check is validation: You have to validate and sanitize your input at server side always, but it does not means that you cannot do it too at client side too reducing a round trips that are been used only too check input.
At second: What can you do at client side to reduce server side overload? There are calculations that you can check or make at client side. There is also AJAX that can be used to load only a percentage of the page that is changing.
At third: Can you delegate work to another server? If your server is too loaded, why not to use web services or simply delegate some side of the logic to another server?
As Mark wrote: ¿How is too much? It is is up to you and your budget.
When writing ASP.NET pages, what signs
do you look for that your page is
making too many roundtrips to a
database or server?
Of course it all depends and you have to profile. However, here are some indicators, they do not mean there is a problem, but often will indicate
Page is taking a very long time to render locally.
Read this question: Slow response-time cheat sheet , In particular this link
To render the page you need more than 30 round trips. I pulled that number out of my hat, but assuming a round trip is taking about 3.5ms then 30 round trips will kick you over the 100ms guideline (before any other kind of processing).
All the queries involved in rendering the page are heavily optimized and do not take longer than a millisecond or two to execute. There are no operations that require lots of CPU cycles that execute every time you render the page.
Data access is abstracted away and not cached in any kind of way. If, for example, GetCustomer will call the DAL which in turn issues a query and your page is asking for 100 Customer objects which are not retrieved in a batch, you are probably in trouble.
We are building an application which requires a daily insertion of approximately 1.5 million rows of data per table. We have 16 tables.
We keep track of 3-day historical data including the current day's data.
The application is done using C#; on the server side, we run an exe that fills the data tables during market hours (4.5 hours), and we update the 16 tables every 5 seconds.
On the client side, the application gets user queries which require the most recently inserted data ( in the last 5 seconds) and a historical point which could be today or before, and plots them somehow.
We are having some serious performance issue, as one query might take 1 second or more which is too much. The question is, for today's data that is being inserted at runtime, can we make use of caching instead of going to the database each time we want something from today's data? Will that be more efficient? And if so, how can we do that?
P.S one day data is approximately 300 MB, and we have enough RAM
Keep a copy of the data along with the datetime you used to retrieve the data. The next time, retrieve only the new data, which minimizes the amount of data you send over the wire.
If it is that all the queries run in the operation amount to 1 sec, maybe the issue you are seeing is that the UI is freezing. If that is the case, don't do it on the UI thread.
Upate (based on comments): the code you run in the event handlers of the controls, runs in the UI thread, which is what causes the UI to freeze. There isn't a single way to run it in a separate thread, I suggest BackGroundWorker for this scenario. Look the community provided example at the end.