In my app Datagridview displays object Proxy
Proxy has two properties Address, Status
DataGridview is bound to List which holds the Proxy objects.
The DataGridView and UI becomes unresponsive due to the heavy load on the memory, as the list reaches 1 million proxy count.
The app is harvesting proxies from diffrent websites, how do I scale the application to handle huge lists.
My concern, is harvesting, and implementing paging at the same time.
Paging with SQLCe, is it a good solution?? or will sql ce slow the harvesting process, or is there a better solution, i don't know.
the app harvests arround 500 - 1700 proxies per second, it is a feature, to extract "as fast as possible", I now there are other obvious limitations, bottle necks, but i am ignoring them for now.
Please advice how do i keep the speed, and make it scale, best practices., I am not sure about SQLCe
Now why would you ever want to display 1 million records to the user?! Even if paged, he'd still have to click through, let's say, 10000 pages!
Implement filtering, only display what's necessary and limit it to 7 records. Add float Score to Proxy; express it as a percentage - 0% means google.com didn't load at all, 100% means no slowdown compared to direct connection (haha).
Then it's
var displayedProxies = myProxies.OrderByDescending(Score).Take(7);
Think of potential usage scenarios and make the UI fit. In example, if it's targeted at spammers wanting to send out billions of emails, you just need one button - "Export in (machine-readable format name here)". However, if it's just some user wanting to surf anonymously, you can give him a list of "7 random proxies" with a message, that the scores are updating. Then just replace those 7 random ones in real-time with a list of the 7 best found so far.
I agree, the best approach is to get the data in chunks, calling a stored proc that receives the page number and the number of records that you want to be returned and then binding the records to the grid.
If there are filters applied to the grid, I would also pass them in to the stored proc.
I would disable VIEWSTATE on the datagrid if you are still passing many records (say more than a thousand per page); in fact, if you have too many records and you want this thing to fly, I would prefer a mix of ajax calls to a web service to get the data, coupled with the jquery datatables plugin, which I find fantastic and fairly well documented. Here's the link.
Edit: If you do the jquery datatables/webservice approach, try to convince people not to use IE Version < 9. IE Javascript engine sucks on IE 6 and 7 and less so on IE8 but still pretty bad compared to FF, Chrome, etc.
Related
I have a small table(23 rows, 2 int columns), just a basic user-activity monitor. The first column represents user id. The second column holds a value that should be unique to every user, but I must alert the users if two values are the same. I'm using an Azure Sql database to hold this table, and Linq to Sql in C# to run the query.
The problem: Microsoft will bill me based on data transferred out of their data-centers. I would like have all of my users to be aware of the current state of this table at all times, second by second, and keep data-transfer under 5 GB per month. I'm thinking along the lines of a Linq-To-Sql expression such as
UserActivity.Where(x => x.Val == myVal).Count() > 1;
But this would download the table to the client, which cannot happen. Should I be implementing a Linq solution? Or would SqlDataReader download less metadata from the server? Am I taking the right approach by using a database at all? Gimme thoughts!
If it is data transfer you are worried about you need to do your processing on the server and return only the results. A SQLDataReader solution can return a smaller, already processed set of data to minimise the traffic.
A couple thoughts here:
First, I strongly encourage you to profile the SQL generated by your LINQ-to-SQL queries. There are several tools available for this, here's one at random (I have no particular preference or affiliation):
LINQ Profiler from Devart
Your prior experience with LINQ query inefficiency notwithstanding, the LINQ sample you quote in your question isn't particularly complex so I would expect you could make it or similar work efficiently, given a good feedback mechanism like the tool above or similar.
Second, you don't explicitly mention whether your query client is running in Azure or outside, but I gather from your concern about data egress costs that its running outside Azure. So the data egress costs are going to be query results using the TDS protocol (low-level protocol for SQL Server), which is pretty efficient. Some quick back-of-the-napkin math shows that you should be fine to stay below your monthly 5 GB limit:
23 users
10 hours/day
30 days/month (less if only weekdays)
3600 requests/hour/user
32 bits of raw data per response
= about 95 MB of raw response data per month
Even if you assume 10x overhead of TDS for header metadata, etc. (and if my math is right :-) ) then you've still got plenty of room underneath 5 GB. The point isn't that you should stop thinking about it and assume it's fine... but don't assume it isn't fine, either. In fact, don't assume anything. Test, and measure, and make an informed choice. I suspect you'll find a way to stay well under 5 GB without much trouble, even with LINQ.
One other thought... perhaps you could consider running your query inside Azure, and weigh the cost of that vs. the cost of data egress under the "query running outside Azure" scenario? This could (for example) take the form of a small Azure Web Job that runs the query every second and notifies the 23 users if the count goes above 1.
Azure Web Jobs
In essence, you wouldn't notify them if the condition is false, only when it's true. As for the notification mechanism, there are various cloud-friendly options:
Azure mobile push notifications
SMS messaging
SignalR notifications
The key here is to determine whether its more cost-effective and in line with any bigger-picture technology or business goals to have each user issue the query continuously, or to use some separate process in Azure to notify users asynchronously if the "trigger condition" is met.
Best of luck!
Here is the scenario:
I have a dll which has method that gets data from db, depending on parameters passed, does various checks and gives me required data.
GetGOS_ForBill(AgencyCode)
In a windows application, I have listbox which list 500 + agencies.
I retrieve GOS for each agency append to a generic list.
If the user has selected all agencies (500 + for now), it takes about 10 min. to return data from the dll.
We though about background processing. But that doesn't reduce the time, other than user get to do other things on the screen. Considering multithreading.
Can anybody help me with this? What would be right approach and how can we accomplish with multithreading?
By the way you ask I think you don't have much experience with multithreading and multithreading is not a topic to just be improvised and throw away via a Stackoverflow quesiton. I would strongly advice against using multithreading if you don't know what you're doing... instead of one problem you'll have two.
In your case the performance problem does not have to do with using threading to get a parallel workload but with correctly structuring the problem.
Right now you're querying each agency separately which is working fine for a couple of agencies but is degrading quickly. The query itself is probably fast, the problem is you're running that query 500 times. Instead of that why don't you try to get all the GOS for all the agencies in a single query (which is probably gonna be fast) and store that in memory (say a Dictionary). Then just retrieve the appropiate set of GOS when needed.
If the most usual case is a user just selecting a couple of them you can always establish a threshold... if the selected number is less than, say, 30 do each query, otherwise run the general query and retrieve from memory.
I'm planning on creating a live analytics's page for my website - A bit like Google Analytic but will real live data which will change as new users load a page on my site etc.
The main site is/will be written using Asp.Net/C# as the back end with a MS SQL database and the front end will support things like JavaScript (JQuery), CSS3, HTML5 (If required).
I was wondering what methods can I use to have the live analytic in terms of; How to get the data onto the analytic's page, what efficient graphing can I use, and storing the data with fast input/output.
The first thing that came to my mind is to use Node.js - Could I use this to achieve a live analytic's page? Is a good idea? Are there any better alternatives? Any drawbacks with this?
Would I need a C# Application running on a server to use Node.js to send/receive all the data to and from the website?
Would using a MS SQL database be fast enough? Would I need to store all the data live, or could I store it in chunks every x amount of seconds/minutes? (Which would be more efficient?)
This illustrates my initial thoughts on the matter -
Edit:
I'm going to be using this system over multiple sites, I could be getting 10 hits at a time to around 1,000,000 (Highly unlikely, but still possible). I want to be able to scale this system and adapt it to the environment it's in.
It really depends on how "real time" the realtime data needs to be. For example, I made this recently:
http://www.reed.co.uk/labs/realtime/
Which shows job applications coming into the system. Obviously there is way too much going on during busy periods to actually be querying the main database in realtime - so, what we do is query a sliding "window" and cache it on the server - this is a chunk of the last 5minutes worth of events.
We then play this back to the user as is it's happening "now". having a little latency as part of a SLA (wherein the users don't really care) can make the whole system vastly more scalable.
[EDIT- further explanation]
The data is just retrieved from a basic stored procedure call - naturally, a big system like reed has hundreds of transactions/second - so we cant keep hitting the main cluster, for every user.
All we do, is make sure we have a current window, in this case the last 5min of data cached on the server. When a client comes to the site, we get that last 5min of data, and play it back like it's happening right now - the end user is none-the-wiser - but what it means is that all clients are reading off the cache. Once the cache is 5min old, we invalidate it, and start again. This means a max of 1 DB hit, every five min - thus making teh system vastly more scalable (not that it really needs to be - as it's just for fun, really)
Just so you are aware Google analytics's already offers live user tracking. when inside the dashboard of a site on Google analytics's. click the home button on the top bar, and then the real time button on the left bar. Considering the design work and quality of this service, it seems this may be a better option then to attempt to recreate its service. If you do choose to proceed to create your own, then you can at least use their services as a benchmark for the desired features.
Using Api's like the googles charting API https://developers.google.com/chart/ would be a good approach to displaying the output of your stored data, with decreased development time. If you provide more information on the number of hits you exspect, and the scale of the server this software will be hosted, then it will be easier to give you answers to the speed questions.
I have a windows application in which a form is bound with the data.
The form loads slowly because of large data. I am also showing paging in form to navigate through the records.
How can I increase the performance?
Bottom line- your app needs to 'page the data' effectively.
This means, you will need to "lazy load" the data. The UI should only load and display data that it needs to show; therefore load additional data only when needed.
Since you didnt provide much of an information regarding your app and the data that you load.
So, lets assume that your app fetches 10,000,000,01 records.
Load your form
If, for instance your grid shows 25 records per page, so use TOP 100 to fetch the top 100 records, and fill in your first page and next four pages.
Upon each Next or consecutive 'Nexts' you can hit database to fetch next records. Note that, you will require some mechanism(ROW_NUMBER?) to keep track of the records being fetched, row numbers, etc.
This article discusses exactly what you are after and I am referring towards.
It's hard to say for certain without knowing more about your application, but the immediate thing that comes to mind is that if your dataset is large, you should be doing pagination on the database side (by constraining the query using row counts) rather than on the application side.
Databinding is a convenience feature of .NET, but it comes with a severe performance overhead. In general it's only acceptable for working with small datasets of less than a few thousand rows bound to a couple of dozen controls at most. If the datasets grow very large, they take their toll very quickly and no amount of tweaking will make the application speedy. The key is always to constrain the amount of memory being juggled by the data binding system at any given time so that it doesn't overload itself with the meta-processing.
Here are some recommendations:
Find out why you need to bring a large set of data. That much data displayed on the screen will not lead to a good user experience. If this is a search result or something, limit your search results, say 100, and let the user know that there is more but they need a more fine grained search criteria.
Check to make sure that your database query is well optimized and indexed and you are not bringing more data than you need to.
Assuming you are using a DataGridView, see if taking advantage of VirtualMode helps. Below description is from msdn and there is also a link to an example in there.
Virtual mode is designed for use with very large stores of data. When the VirtualMode property is true, you create a DataGridView with a set number of rows and columns and then handle the CellValueNeeded event to populate the cells.
If you are using some other control, you can see if that control provides a similar feature. ListView also has VirtualMode.
Fire up the SQL profiler to see what your applications is needing from the database. You may see some unnecessary calls, opportunities to trim your data needs and lazy load. Also debug and profile your application to see where you spend most of your time.
If you are using SQL server, implement paging using the Commaon table Expressions and ROW_NUBER(). This will allow you to get less data from the Sql server and definitely better performance.
When writing ASP.NET pages, what signs do you look for that your page is making too many roundtrips to a database or server?
(This is a general question but I say ASP.NET as the majority of my coding is on the web side of things).
How much is too much? The €1M question! Profile. Then profile. If your app is spending most of its time doing data access, you have a problem (and should look at a sql trace). If it is spending most of its time drawing the UI, then (assuming your view isn't doing data access) you should probably look elsewhere first...
Round trips are more relevant to latency than the total quantity of data being moved, so it really does make sense to optimize for them. The usual way is to use stored procedures that do multiple steps, perhaps even returning multiple result sets.
What I do is I look at the ASP performance counters and SQL performance counters. To get an accurate measurement you must ensure that there is no random noise activity on the SQL Server (ie. import batches running unrelated to the web site).
The relevant counters I look at are:
SQL Statistics/Batch requests/sec: This indicates exactly how many Transact-SQL batches the server receives. It can be, in most cases, equated 1:1 with the number of round trips from the web site to SQL.
Databases/Transaction/sec: this counter is instanced per database, so I can quickly see in which database there is 'activity'. This way I can correlate the web site data roundtrips (ie. my app logic requests, goes to app database) and the ASP session state and user stuff (goes to Asp session db or tempdb)
Databases/Write Transaction/sec: This I correlate with the counters above (transaction per second) so I can get a feel of the read-to-write ratio the site is doing.
ASP.NET Applications/Requests/sec: With this counter I can get the number of requests/sec the site is seeing. Correlated with the number of SQL Batch Requests/sec it gives a good indication of the average number of round-trips per request.
The next thing to measure is usually trying to get a feel for where is the time spent in the request. On my own project, I use abundantly performance counters I publish myself so is really easy to measure. But I'm not always so lucky as to clean up only my own mess... Profiling is usually not an option for me because I most times troubleshoot live production systems I cannot instrument.
My approach is to try to sort out the SQL side of things first, since it's easy to find the relevant statistics for execution times in SQL: SQL keeps a nice aggregated statistic ready to look at in sys.dm_exec_query_stats. I can also use Profiler to measure execution duration in real time. With some analysis of these numbers collected, knowing the normal request pattern of the most visited pages, you can give a pretty good estimate of the total time spent in SQL per web request. If this times adds up to nearly all the time it takes a request to serve the page, then you have your answer.
And to answer the original question title: to reduce the number of round-trips, you make fewer requests. Seriously. First, caching what is appropriate to cache I guess is obvious. Second you reduce the complexity: don't display unnecessary data on each page, you cache and display stale data when you can get away with it, you hide details on secondary navigation panels.
If you feel that the problem is the number of round-trips per se as opposed to the number of requests (ie. you would benefit tremendously from batching multiple requests in one round-trip), then you should somehow measure that the round-trip overhead is what's killing you. With connection pooling on a normal network connection this is usually not the most important factor.
And finally you should look if everything that can be done in sets is done in sets. If you have some half-brained ORM that retrieves objects one at a time from an ID keyset, get rid of it.
I know that this may sound reiterative, but client server round trips depends of how many program logic is located at any side of the connection.
First thing to check is validation: You have to validate and sanitize your input at server side always, but it does not means that you cannot do it too at client side too reducing a round trips that are been used only too check input.
At second: What can you do at client side to reduce server side overload? There are calculations that you can check or make at client side. There is also AJAX that can be used to load only a percentage of the page that is changing.
At third: Can you delegate work to another server? If your server is too loaded, why not to use web services or simply delegate some side of the logic to another server?
As Mark wrote: ¿How is too much? It is is up to you and your budget.
When writing ASP.NET pages, what signs
do you look for that your page is
making too many roundtrips to a
database or server?
Of course it all depends and you have to profile. However, here are some indicators, they do not mean there is a problem, but often will indicate
Page is taking a very long time to render locally.
Read this question: Slow response-time cheat sheet , In particular this link
To render the page you need more than 30 round trips. I pulled that number out of my hat, but assuming a round trip is taking about 3.5ms then 30 round trips will kick you over the 100ms guideline (before any other kind of processing).
All the queries involved in rendering the page are heavily optimized and do not take longer than a millisecond or two to execute. There are no operations that require lots of CPU cycles that execute every time you render the page.
Data access is abstracted away and not cached in any kind of way. If, for example, GetCustomer will call the DAL which in turn issues a query and your page is asking for 100 Customer objects which are not retrieved in a batch, you are probably in trouble.