Greetings,
I've been working on a C#.NET app that interacts with a data logger. The user can query and obtain logs for a specified time period, and view plots of the data. Typically a new data log is created every minute and stores a measurement for a few parameters. To get meaningful information out of the logger, a reasonable number of logs need to be acquired - data for at least a few days. The hardware interface is a UART to USB module on the device, which restricts transfers to a maximum of about 30 logs/second. This becomes quite slow when reading in the data acquired over a number of days/weeks.
What I would like to do is improve the perceived performance for the user. I realize that with the hardware speed limitation the user will have to wait for the full download cycle at least the first time they acquire a larger set of data. My goal is to cache all data seen by the app, so that it can be obtained faster if ever requested again. The approach I have been considering is to use a light database, like SqlServerCe, that can store the data logs as they are received. I am then hoping to first search the cache prior to querying a device for logs. The cache would be updated with any logs obtained by the request that were not already cached.
Finally my question - would you consider this to be a good approach? Are there any better alternatives you can think of? I've tried to search SO and Google for reinforcement of the idea, but I mostly run into discussions of web request/content caching.
Thanks for any feedback!
Seems like a very reasonable approach. Personally I'd go with SQL CE for storage, make sure you index the column holding the datetime of the record, then use TableDirect on the index for getting and inserting data so it's blazing fast. Since your data is already chronological there's no need to get any slow SQL query processor involved, just seek to the date (or the end) and roll forward with a SqlCeResultSet. You'll end up being speed limited only by I/O. I profiled doing really, really similar stuff on a project and found TableDirect with SQLCE was just as fast as a flat binary file.
I think you're on the right track wanting to store it locally in some queryable form.
I'd strongly recommend SQLite. There's a .NET class here.
Related
I have a small table(23 rows, 2 int columns), just a basic user-activity monitor. The first column represents user id. The second column holds a value that should be unique to every user, but I must alert the users if two values are the same. I'm using an Azure Sql database to hold this table, and Linq to Sql in C# to run the query.
The problem: Microsoft will bill me based on data transferred out of their data-centers. I would like have all of my users to be aware of the current state of this table at all times, second by second, and keep data-transfer under 5 GB per month. I'm thinking along the lines of a Linq-To-Sql expression such as
UserActivity.Where(x => x.Val == myVal).Count() > 1;
But this would download the table to the client, which cannot happen. Should I be implementing a Linq solution? Or would SqlDataReader download less metadata from the server? Am I taking the right approach by using a database at all? Gimme thoughts!
If it is data transfer you are worried about you need to do your processing on the server and return only the results. A SQLDataReader solution can return a smaller, already processed set of data to minimise the traffic.
A couple thoughts here:
First, I strongly encourage you to profile the SQL generated by your LINQ-to-SQL queries. There are several tools available for this, here's one at random (I have no particular preference or affiliation):
LINQ Profiler from Devart
Your prior experience with LINQ query inefficiency notwithstanding, the LINQ sample you quote in your question isn't particularly complex so I would expect you could make it or similar work efficiently, given a good feedback mechanism like the tool above or similar.
Second, you don't explicitly mention whether your query client is running in Azure or outside, but I gather from your concern about data egress costs that its running outside Azure. So the data egress costs are going to be query results using the TDS protocol (low-level protocol for SQL Server), which is pretty efficient. Some quick back-of-the-napkin math shows that you should be fine to stay below your monthly 5 GB limit:
23 users
10 hours/day
30 days/month (less if only weekdays)
3600 requests/hour/user
32 bits of raw data per response
= about 95 MB of raw response data per month
Even if you assume 10x overhead of TDS for header metadata, etc. (and if my math is right :-) ) then you've still got plenty of room underneath 5 GB. The point isn't that you should stop thinking about it and assume it's fine... but don't assume it isn't fine, either. In fact, don't assume anything. Test, and measure, and make an informed choice. I suspect you'll find a way to stay well under 5 GB without much trouble, even with LINQ.
One other thought... perhaps you could consider running your query inside Azure, and weigh the cost of that vs. the cost of data egress under the "query running outside Azure" scenario? This could (for example) take the form of a small Azure Web Job that runs the query every second and notifies the 23 users if the count goes above 1.
Azure Web Jobs
In essence, you wouldn't notify them if the condition is false, only when it's true. As for the notification mechanism, there are various cloud-friendly options:
Azure mobile push notifications
SMS messaging
SignalR notifications
The key here is to determine whether its more cost-effective and in line with any bigger-picture technology or business goals to have each user issue the query continuously, or to use some separate process in Azure to notify users asynchronously if the "trigger condition" is met.
Best of luck!
I'll give you a basic overview, summize it then ask the question so you all are informed as you can be. If you need more information please don't hesertate to ask.
Basic setup:
Client is constantly in communication with a server providing json data to be serialized and processed.
Client also needs to use this data and catalog it into a mysql server (main issue lay here)
Client also, to a lesser extent, needs to store some of the data provided by the server into a local database specific to that client.
So as stated above I have a client that communicates with a server that outputs json to be processed. Now the question isn't about the json data or the communication with the server. More so to do with the remote and local databases and what approach I should take in the, I guess, dto.
Now - originally I was going to process it through a loop inserting individual segments of data into the database one after each other until it reaches the end of the paginated data. This almost immediately showed itself to be troublesome as deadlocks become a thing very, very quickly. So quickly in fact that after about 1682 inputs deadlocks went from 1 in 500 to 9 of 10 until the rollback function threw again to stop execution.
Here is really my question.
What suggestions would you have to handle a large amount (> 500k) of data initially, then after time as the database is populated, segmented sections (~1k).
I've looked into csv's, bulkinput and query building with stringbuilder. Operationally the string builder option executes the fastest, but I'm not sure on how it will scale once the data is constantly running through it and not just test files.
Any general advice or suggestions. How you think it would be best. Stuff like that. Anything would help. Just looking for real-world scearios from people that have handled a situation like this and can just guide me in the right direction.
As for being told what to do - I will research the option given should you want to be more vague. That's fine :)
Thanks again
Edit: Also - Do you think using Tasks or coding my own threads is a better option for such a situation. Thanks
I personally would choose Bulk Copy. It's easy to implement and the fastest way to store thousands of records in the database.
Useful article to read: http://ignoringthevoices.blogspot.si/2014/09/working-with-entity-framework-code.html
I have a number of images that are stored as VARBINARY(MAX) (using FileStream) in a database. I'm looking to retrieve about 10 images or so at a time.
The prescribed, most common way using ASP.net is to use an HTTP handler and hit the database for each individual image. Seems fine, but is a bit slow at times.
Is it best to download all images for a given page at the same time in one big data chunk? Or should I try to grab each individually? Best practice?
Probably best to do them individually on a domain that doesn't have cookies set, or make sure your handler will work with multiple simultaneous requests. That way you can stream multiple results from the DB at the same time, and stream multiple images from your webserver as it gets them.
Well,
I think many people would have different opinions, and reasons about what the best practice is for them, but in reality, it all depends on hardware, software, data structure, and if the data is normalized.
In general, the SQL server likes SET operations better, meaning, the loops in general are slower.But, loops are safer for IOPs related issues, and they are better at causing less locks.
I am not sure which object mapper, or built in SQL library you are using( I have a feeling you may be using LINQ after you built a SQL class), but it also depends on the library you are using, and I would definitely recommend dapper.
I think reading them all at once would be faster, and here is why;
- If it is as you say, and you hit the database each time for the image, then that would add the delay of reconnecting to the database, so the latency will occur. But when there is one connection, the data retrieval is straight and your connection is open at that moment without requiring further session authentication.
I would recommend downloading them all at once, and informing the end user with a download screen during the process of that. Also for retrieving data, this link is very helpful I believe : https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx
Depending on the features of your server, and edition, you could definitely use different features.
I'm planning on creating a live analytics's page for my website - A bit like Google Analytic but will real live data which will change as new users load a page on my site etc.
The main site is/will be written using Asp.Net/C# as the back end with a MS SQL database and the front end will support things like JavaScript (JQuery), CSS3, HTML5 (If required).
I was wondering what methods can I use to have the live analytic in terms of; How to get the data onto the analytic's page, what efficient graphing can I use, and storing the data with fast input/output.
The first thing that came to my mind is to use Node.js - Could I use this to achieve a live analytic's page? Is a good idea? Are there any better alternatives? Any drawbacks with this?
Would I need a C# Application running on a server to use Node.js to send/receive all the data to and from the website?
Would using a MS SQL database be fast enough? Would I need to store all the data live, or could I store it in chunks every x amount of seconds/minutes? (Which would be more efficient?)
This illustrates my initial thoughts on the matter -
Edit:
I'm going to be using this system over multiple sites, I could be getting 10 hits at a time to around 1,000,000 (Highly unlikely, but still possible). I want to be able to scale this system and adapt it to the environment it's in.
It really depends on how "real time" the realtime data needs to be. For example, I made this recently:
http://www.reed.co.uk/labs/realtime/
Which shows job applications coming into the system. Obviously there is way too much going on during busy periods to actually be querying the main database in realtime - so, what we do is query a sliding "window" and cache it on the server - this is a chunk of the last 5minutes worth of events.
We then play this back to the user as is it's happening "now". having a little latency as part of a SLA (wherein the users don't really care) can make the whole system vastly more scalable.
[EDIT- further explanation]
The data is just retrieved from a basic stored procedure call - naturally, a big system like reed has hundreds of transactions/second - so we cant keep hitting the main cluster, for every user.
All we do, is make sure we have a current window, in this case the last 5min of data cached on the server. When a client comes to the site, we get that last 5min of data, and play it back like it's happening right now - the end user is none-the-wiser - but what it means is that all clients are reading off the cache. Once the cache is 5min old, we invalidate it, and start again. This means a max of 1 DB hit, every five min - thus making teh system vastly more scalable (not that it really needs to be - as it's just for fun, really)
Just so you are aware Google analytics's already offers live user tracking. when inside the dashboard of a site on Google analytics's. click the home button on the top bar, and then the real time button on the left bar. Considering the design work and quality of this service, it seems this may be a better option then to attempt to recreate its service. If you do choose to proceed to create your own, then you can at least use their services as a benchmark for the desired features.
Using Api's like the googles charting API https://developers.google.com/chart/ would be a good approach to displaying the output of your stored data, with decreased development time. If you provide more information on the number of hits you exspect, and the scale of the server this software will be hosted, then it will be easier to give you answers to the speed questions.
I'm working on a web service at the moment and there is the potential that the returned results could be quite large ( > 5mb).
It's perfectly valid for this set of data to be this large and the web service can be called either sync or async, but I'm wondering what people's thoughts are on the following:
If the connection is lost, the
entire resultset will have to be
regenerated and sent again. Is there
any way I can do any sort of
"resume" if the connection is lost
or reset?
Is sending a result set this large even appropriate? Would it be better to implement some sort of "paging" where the resultset is generated and stored on the server and the client can then download chunks of the resultset in smaller amounts and re-assemble the set at their end?
I have seen all three approaches, paged, store and retrieve, and massive push.
I think the solution to your problem depends to some extent on why your result set is so large and how it is generated. Do your results grow over time, are they calculated all at once and then pushed, do you want to stream them back as soon as you have them?
Paging Approach
In my experience, using a paging approach is appropriate when the client needs quick access to reasonably sized chunks of the result set similar to pages in search results. Considerations here are overall chattiness of your protocol, caching of the entire result set between client page requests, and/or the processing time it takes to generate a page of results.
Store and retrieve
Store and retrieve is useful when the results are not random access and the result set grows in size as the query is processed. Issues to consider here are complexity for clients and if you can provide the user with partial results or if you need to calculate all results before returning anything to the client (think sorting of results from distributed search engines).
Massive Push
The massive push approach is almost certainly flawed. Even if the client needs all of the information and it needs to be pushed in a monolithic result set, I would recommend taking the approach of WS-ReliableMessaging (either directly or through your own simplified version) and chunking your results. By doing this you
ensure that the pieces reach the client
can discard the chunk as soon as you get a receipt from the client
can reduce the possible issues with memory consumption from having to retain 5MB of XML, DOM, or whatever in memory (assuming that you aren't processing the results in a streaming manner) on the server and client sides.
Like others have said though, don't do anything until you know your result set size, how it is generated, and overall performance to be actual issues.
There's no hard law against 5 Mb as a result set size. Over 400 Mb can be hard to send.
You'll automatically get async handlers (since you're using .net)
implement some sort of "paging" where
the resultset is generated and stored
on the server and the client can then
download chunks of the resultset in
smaller amounts and re-assemble the
set at their end
That's already happening for you -- it's called tcp/ip ;-) Re-implementing that could be overkill.
Similarly --
entire resultset will have to be
regenerated and sent again
If it's MS-SQL, for example that is generating most of the resultset -- then re-generating it will take advantage of some implicit cacheing in SQL Server and the subsequent generations will be quicker.
To some extent you can get away with not worrying about these problems, until they surface as 'real' problems -- because the platform(s) you're using take care of a lot of the performance bottlenecks for you.
I somewhat disagree with secretGeek's comment:
That's already happening for you -- it's called tcp/ip ;-) Re-implementing that could be overkill.
There are times when you may want to do just this, but really only from a UI perspective. If you implement some way to either stream the data to the client (via something like a pushlets mechanism), or chunk it into pages as you suggest, you can then load some really small subset on the client and then slowly build up the UI with the full amount of data.
This makes for a slicker, speedier UI (from the user's perspective), but you have to evaluate if the extra effort will be worthwhile... because I don't think it will be an insignificant amount of work.
So it sounds like you'd be interested in a solution that adds 'starting record number' and 'final record number' parameter to your web method. (or 'page number' and 'results per page')
This shouldn't be too hard if the backing store is sql server (or even mysql) as they have built in support for row numbering.
Despite this you should be able to avoid doing any session management on the server, avoid any explicit caching of the result set, and just rely on the backing store's caching to keep your life simple.