My app sends emails. I currently:
get a list of customers from the DB (objects)
get a list of their email types from the DB (ditto)
get a list of email recipients/unique data for the email from the DB (again)
use the data above to generate mailmessages
loop through the mailmessages and send them out while logging the smtp status
Now this behavior is fine when your sending off 500 emails, but what is the impact if it's 10,000+ emails? I imagine at some point the amount of objects I am storing until I get to step 5 is considerable. How can I measure it to know I am approaching capacity? I figure I can at least time this whole scenario to understand how long it takes as a clue toward when it is becoming a drag on the system.
Would it be better to run this scenario on a per customer basis? It seems that would be less efficient, hitting the DB potentially hundreds of times instead of 3 or so. I know the logging will be one off hits back to the DB.
I am looking for an approach, not a code resolution. I got in trouble last time I didn't specify that.
I guess this depends on several things, such as how big/powerful your system is (database capacity, processing/memmory and more?), and how important it is to send out these mails quickly, among other things.
An idea might be to use a temporary DB table to store the info from steps 1-4. You could do this in batches(as Blogobeard mentioned), or all at once, depending on efficiency. The actual mailing-job could then also be split into batches, and when a mail is sent, that customer`s info would be deleted from the temp-table.
There are probably several ways to fine-tune this, and it's probably easier to give a better advice once you've tried something, and have some specific results to act on.
Related
Here is the scenario:
I have a dll which has method that gets data from db, depending on parameters passed, does various checks and gives me required data.
GetGOS_ForBill(AgencyCode)
In a windows application, I have listbox which list 500 + agencies.
I retrieve GOS for each agency append to a generic list.
If the user has selected all agencies (500 + for now), it takes about 10 min. to return data from the dll.
We though about background processing. But that doesn't reduce the time, other than user get to do other things on the screen. Considering multithreading.
Can anybody help me with this? What would be right approach and how can we accomplish with multithreading?
By the way you ask I think you don't have much experience with multithreading and multithreading is not a topic to just be improvised and throw away via a Stackoverflow quesiton. I would strongly advice against using multithreading if you don't know what you're doing... instead of one problem you'll have two.
In your case the performance problem does not have to do with using threading to get a parallel workload but with correctly structuring the problem.
Right now you're querying each agency separately which is working fine for a couple of agencies but is degrading quickly. The query itself is probably fast, the problem is you're running that query 500 times. Instead of that why don't you try to get all the GOS for all the agencies in a single query (which is probably gonna be fast) and store that in memory (say a Dictionary). Then just retrieve the appropiate set of GOS when needed.
If the most usual case is a user just selecting a couple of them you can always establish a threshold... if the selected number is less than, say, 30 do each query, otherwise run the general query and retrieve from memory.
I am quite confused on which approach to take and what is best practice.
Lets say i have a C# application which does the following:
sends emails from a queue. Emails to send and all the content is stored in the DB.
Now, I know how to make my C# application almost scalable but I need to go somewhat further.
I want some form of responsibility of being able to distribute the tasks across say X servers. So it is not just 1 server doing all the processing but to share it amoungst the servers.
If one server goes down, then the load is shared between the other servers. I know NLB does this but im not looking for an NLB here.
Sure, you could add a column of some kind in the DB table to indicate which server should be assigned to process that record, and each of the applications on the servers would have an ID of some kind that matches the value in the DB and they would only pull their own records - but this I consider to be cheap, bad practice and unrealistic.
Having a DB table row lock as well, is not something I would do due to potential deadlocks and other possible issues.
I am also NOT indicating using threading "to the extreme" here but yes, there will be threading per item to process or batching them up per thread for x amount of threads.
How should I approach and what do you recommend on making a C# application which is scalable and has high availability? The aim is to have X servers, each with the same application and for each to be able to get records and process them but have the level of processing/items to process shared amoungst the servers so incase if one server or service fails, the other can take on that load until another server is put back.
Sorry for my lack of understanding or knowledge but have been thinking about this quite alot and had lack of sleep trying to think of a good robust solution.
I would be thinking of batching up the work, so each app only pulled back x number of records at a time, marking those retrieved records as taken with a bool field in the table. I'd amend the the SELECT statement to pull only records not marked as taken/done. Table locks would be ok in this instance for very short periods to ensure there is no overlap of apps processing the same records.
EDIT: It's not very elegant, but you could have a datestamp and a status for each entry (instead of a bool field as above). Then you could run a periodic Agent job which runs a sproc to reset the status of any records which have a status of In Progress but which have gone beyond a time threshold without being set to complete. They would be ready for reprocessing by another app later on.
This may not be enterprise-y enough for your tastes, but I'd bet my hide that there are plenty of apps out there in the enterprise which are just as un-sophisticated and work just fine. The best things work with the least complexity.
I have a site that people put stuff for sale on.
Every month, every user who has something for sale will get sent an email by a windows service asking them if their item has been sold and giving them a custom link to click to confirm it hasn't (as it's against the user agreement for items to remain on the site that are sold).
First I must run a query to get all the unsold items and with the related users email.
Currently I am looping all these and generate a custom email for each person and sending them out as individual emails.
foreach (Item unsoldItem in unsoldItemsCollection)
{
//generate email
string email = GenerateUnsoldEmail(itemName, itemPrice);
Utils.Sendemail(unsoldItem .UserEmail, "no-reply#website.com", "Unsold Item", email);
}
(this is kind of pseudo code, but this is pretty much what I'm doing)
My problem is there could possibly be thousands ( if it takes off, maybe millions :) ) of items in that loop all requiring emails, which I think is going to cause problems.
What other way could I do this?
Bex
I would do this asynchronously so that the program is not halted until the current email is sent.
You can't mail merge since all emails are different; links inside it will be different... unless the body of the email has some sort of landing page that forces the user to log in and then redirects the user to a specific page where all the items for the user are listed and he is asked to remove the ones already sold. In that case, you could send an email instead of thousands.
SmtpClient has an async version of the Send method
If you want to do something for every item in a collection, you will have to use some kind of loop (or something equivalent, like recursion). There is no way around that.
You should stop worrying about potential problems. If you think this code will be a performance problem, measure it and find out how fast it actually is and how many emails it will handle.
You're worried it might take hours to send all the emails. Is that actually a problem? Why?
You're also worried it might take too much memory. If you can't fit all items into memory at once, you should process them in batches: get for example 1000 at a time, process them, and then get another 1000.
You can create a windows service which will send all your emails once a day.
Or, even simpler, you can create a small console utility which does the same and run it as a scheduled task.
I have a web app that will send out daily, weekly email updates depending on the user permissions and their alert settings (daily, weekly, monthly, or none).
Each email to an account (which would have multiple users) requires a few DB calls and calculations. Thus making these daily/weekly emails pretty expensive as the number of users increase.
Are there any general tips on writing these services? I'm looking for some architecture tips or patterns and not really topics like email deliverability.
I would cache the data before the processing time, if you are having to handle very large sets of information, so that the DB 'calculations' can be omitted from the processing cycle at the specific times. Effectively break the processing up so that the DB intensive stuff is done a bit before the scheduled processing of the information. When it comes time to actually send these emails out, I would imagine you can process a very large volume quickly without a whole lot of tuning up front. Granted, I also don't know what kind of volume we're talking about here.
You might also thread the application so that your processing data is further split into logical chunks to reduce the overall amount of data that has to be processed all at once, depending on your situation it might streamline things, granted, I normally don't recommend getting into threading unless there is a good reason to, and you may have one. At the very least, use a background worker type of threaded process and fire off a few dependent on how you segment your data.
When handling exceptions, remember to now let those bring your processing down, handle them through logging of some sort or notification and then move on, you wouldn't want an error to mess things up for further processing, I'm sure you probably planned for that though.
Also, send your emails asynchronously so they don't block processing, it's probably an obvious observance but sometimes little things like that are overlooked and can create quite the bottleneck when sending out lots of emails.
Lastly, test it with a reasonable load beforehand, and shoot for well over capacity.
You may want to check out sql reporting services.
You may have to translate the current setup into the sql reporting format but in return you'll get a whole administrative interface for scheduling the report generation, allowing users to modify the report inputs, caching historical/current reports, and the ability for users to manage their own email subscriptions.
http://msdn.microsoft.com/en-us/library/ms160334.aspx
I'm working on a web service at the moment and there is the potential that the returned results could be quite large ( > 5mb).
It's perfectly valid for this set of data to be this large and the web service can be called either sync or async, but I'm wondering what people's thoughts are on the following:
If the connection is lost, the
entire resultset will have to be
regenerated and sent again. Is there
any way I can do any sort of
"resume" if the connection is lost
or reset?
Is sending a result set this large even appropriate? Would it be better to implement some sort of "paging" where the resultset is generated and stored on the server and the client can then download chunks of the resultset in smaller amounts and re-assemble the set at their end?
I have seen all three approaches, paged, store and retrieve, and massive push.
I think the solution to your problem depends to some extent on why your result set is so large and how it is generated. Do your results grow over time, are they calculated all at once and then pushed, do you want to stream them back as soon as you have them?
Paging Approach
In my experience, using a paging approach is appropriate when the client needs quick access to reasonably sized chunks of the result set similar to pages in search results. Considerations here are overall chattiness of your protocol, caching of the entire result set between client page requests, and/or the processing time it takes to generate a page of results.
Store and retrieve
Store and retrieve is useful when the results are not random access and the result set grows in size as the query is processed. Issues to consider here are complexity for clients and if you can provide the user with partial results or if you need to calculate all results before returning anything to the client (think sorting of results from distributed search engines).
Massive Push
The massive push approach is almost certainly flawed. Even if the client needs all of the information and it needs to be pushed in a monolithic result set, I would recommend taking the approach of WS-ReliableMessaging (either directly or through your own simplified version) and chunking your results. By doing this you
ensure that the pieces reach the client
can discard the chunk as soon as you get a receipt from the client
can reduce the possible issues with memory consumption from having to retain 5MB of XML, DOM, or whatever in memory (assuming that you aren't processing the results in a streaming manner) on the server and client sides.
Like others have said though, don't do anything until you know your result set size, how it is generated, and overall performance to be actual issues.
There's no hard law against 5 Mb as a result set size. Over 400 Mb can be hard to send.
You'll automatically get async handlers (since you're using .net)
implement some sort of "paging" where
the resultset is generated and stored
on the server and the client can then
download chunks of the resultset in
smaller amounts and re-assemble the
set at their end
That's already happening for you -- it's called tcp/ip ;-) Re-implementing that could be overkill.
Similarly --
entire resultset will have to be
regenerated and sent again
If it's MS-SQL, for example that is generating most of the resultset -- then re-generating it will take advantage of some implicit cacheing in SQL Server and the subsequent generations will be quicker.
To some extent you can get away with not worrying about these problems, until they surface as 'real' problems -- because the platform(s) you're using take care of a lot of the performance bottlenecks for you.
I somewhat disagree with secretGeek's comment:
That's already happening for you -- it's called tcp/ip ;-) Re-implementing that could be overkill.
There are times when you may want to do just this, but really only from a UI perspective. If you implement some way to either stream the data to the client (via something like a pushlets mechanism), or chunk it into pages as you suggest, you can then load some really small subset on the client and then slowly build up the UI with the full amount of data.
This makes for a slicker, speedier UI (from the user's perspective), but you have to evaluate if the extra effort will be worthwhile... because I don't think it will be an insignificant amount of work.
So it sounds like you'd be interested in a solution that adds 'starting record number' and 'final record number' parameter to your web method. (or 'page number' and 'results per page')
This shouldn't be too hard if the backing store is sql server (or even mysql) as they have built in support for row numbering.
Despite this you should be able to avoid doing any session management on the server, avoid any explicit caching of the result set, and just rely on the backing store's caching to keep your life simple.