I am wondering how to implement a solution that will retrieve data that I have scraped, and use it to display in an ASP.NET MVC web application.
The current implementation scrapes the data and displays it from the controller to the view, however by doing so, the request to view the web page will take very long due to the scraper running when a request to view the page with scraped data is processed.
Is there any implementation I can do to separate the data retrieval and the website?
Currently I have a console application scraper class that scrapes data, and a ASP.NET MVC web application that will display the data. How can I couple them together easily?
Based on system size I think you can do 2 things:
Periodically scrape data and save it in the memory
Periodically scrape data and save it in the database
It is oblivious that if scrapped data is big you need to store it in database, otherwise you can keep it memory and highly boost performance.
Running tasks in asp.net periodically is covered by background workers. Some easy way to periodically run tasks is to initiate thread in Application_Start. I don't go more deeply in implementation, because it is already answered. You can reed it here: Best way to run scheduled tasks
For saving data in memory you can use something like this:
public static class Global
{
public static ConcurrentBag<ScrapedItem> ScrapedItems;
}
*Note, it is necessary to use thread-safe collection, because of getting and adding to this collection will be done from different threads: one from background worker, one from request. Or you can use lock object when getting/setting to not thread safe collection.
Related
I have a C# Web API application with a react front end. Most of the front end pages call multiple API's on load. The performance and response times of each individual API is quite fast and good enough for my application. However, when I test the application, page load times take longer and longer as I add more users. I assume this is because there are more requests coming in than IIS can handle e.g. all its threads are busy so it is taking longer to process each request.
Given this, the solution seems to be to reduce the number of requests coming in. I can do this by combining the page load calls for each page into single calls, then retrieve all the required data in one. This would mean that each individual piece of data would be obtained synchronously, slowing down response times again. Given this, it seems it would be best to multi thread the combined API calls so the whole request would only take as long as the longest request.
In summary, the problem seems to be that requests were slow as IIS was running out of threads to process the requests. The solution is to make single API calls that are multi threaded.
My question is, would moving the extra thread demand from IIS to the C# application not be just moving the same problem to a different place? Eg are the IIS and C# threads derived from the same resource?
I have an application runs ASP.NET Web API on the backend. There's a specific controller that looks something like
public class MyController : ApiController
{
dynamic Post(dynamic postParams)
{
var data = retrieveData(postParams.foo, postParams.bar);
writeDataToDb(data);
return data;
}
}
So there are three operations, the retrieval of data based on the post parameters, writing the retrieved data to a DB, and returning the retrieved data as JSON to the client.
This could be greatly sped up if the DB write step was not blocking returning the JSON to the client, as they are not dependent on one another.
My concern is that if I spin off the DB write off in another thread that the Post method returning will cause the process to tear-down and could potentially kill the DB write. What's the correct way to handle this scenario?
Should I have a separate handler that I shoot an async web request to so that the DB write will occur in its own process?
I would generate a seperation of concerns. Let one method query the data from the DB (suitable for a GET method), and another for updating the data in your DB (suitable for a POST or UPDATE method). That way, retrieving your data would be a lighter operation and less time conuming.
As a side note, spinning of new threads without registering them with the ASP.NET ThreadPool is dangerous, as IIS may decide to recycle your app at certain times. Therefore, if you're on .NET 4.5.2, make sure to use HostingEnvironment.QueueBackgroundWorkItem to queue work on the ThreadPool. If not, you can use stephan clearys BackgroundTaskManager. Also, i suggest reading more in stephans article Fire and Forget on ASP.NET
I'm writing a .NET MVC application that needs two background processes to be running and I'm not sure if these processes should be threads within the web app, or if they should be separate windows services. I'd prefer them to be within the web app so that it's easier to deploy, configure and manage.
The first process is a timed event that needs to check a datasource every 15 minutes to determine if certain products have been shipped. If any have been shipped, this process would take the info about the product and create a "job" record in a database table (a queue) so that it could be processed later.
The second process is another timed event that needs to check the database table (the queue) to see if there is any work to do. If there are unprocessed records, it would read them into a thread safe list, and then use several .NET Tasks to process them in parallel.
The reason I'm building this as a web app is that I'm going to give my customers the ability to view historical info on this process, the ability to manually submit jobs, and the ability to configure which products the application should "look" for.
Is there a good way to build all of this into the web app or should I be looking at splitting it up into multiple applications? The demand on the web views will be pretty low.
Phil Haack recently blogged about the challenges you will have to face if you ever decide to implement recurring background tasks directly in your web application in contrast to externalizing them in a separate service.
I would advise a dedicated service for the background actions. The reason for this is that IIS can tear down the whole app at any point in time.
My environment - C# 3.5 and ASP.NET 4.0 and VS 2010
Apologies - am a bit new to some of the concepts related to threading and Async methods.
My scenario is this:
My site will periodically make a couple of GET/POSTS to an external site and collect some data
This data will be cached in a central cache
The periodic action will happen once in about 5 minutes, and will happen for every new member who registers on my site. The querying for the member will stop based on certain conditions
The user does NOT need to be logged in for these periodic queries - they register on the site, and then off my async code goes - it keeps working 24/7 and messages the user once a while via email depending on certain trigger condition. So essentially it all should happen in the background regardless of whether the user is explicitly logged in or not.
Load Expected - I anticipate about 100 total running members a day (accounting for new members + old ones leaving/stopping).
the equation is ~ 100 visitors / day x 4 POST per fetch x 12 fetches / hour x 8 hours / day
In my mind - I'm running 100 threads a day, and each 'thread' wakes up once in 5 minutes and does some stuff. The threads will interact with a static central cache which is shared among all of them.
I've read some discussions on ThreadPools, AsyncPage etc - all a bit new territory. In my scenario what would you suggest? What's the best approach to doing this so it's efficient?
In your response I would appreciate if you mention specific classes/methods/links to use so I can chase this. Thanks a bunch!
You will not be able to do it with ASP.net as such, you will not be able to keep the "threads" running with any level of reliability. IIS could decide to restart the appication pool (I.E. the whole process) at any point in time. Really what you would need is some kind of windows service that runs and makes the requests. You could the use HttpWebRequest.BeginGetResponse method to make your calls. This will fire off the relevent delegate when the response comes back and .net will manage the threading.
Agreeing with Ben, I would not use threading in IIS with ASP.NET. It's not the same as using it in a desktop application.
If you're going to use some kind of polling or timed action, I recommend having a handler (.ashx) or asp.net page (aspx) that can take the request that you want to run in the background and return XML or JSON as a response. You can then set some javascript in your pages to do an AJAX request to that URI and get whatever data you need. That handler can do the server side operations that you need. This will let you run background processes and update the front-end for your users if need be, and will take advantage of the existing IIS thread pool, which you can scale to fit the traffic you're getting.
So, for instance
ajaxRequest.ashx : Processes "background" request, takes http POST/GET parameters.
myPage.aspx : your UI
someScript.js : javascript file with functions to call ajaxRequest.ashx from myPage.aspx (or any other page) when certain actions or intervals occur.
jQuery.js : No need to write all the AJAX code or event handlers yourself :)
You will need to create a separate Windows service(or a console app that runs using the Windows scheduler) to poll the remote server.
If you need to trigger requests based on user interation with your site, the best way is to use some kind of queuing system(eg MSMQ) that your service monitors.
I'm having some trouble getting my cache to work the way I want.
The problem:
The process of retrieving the requested data is very time consuming. If using standard ASP.NET caching some users will take the "hit" of retrieving the data. This is not acceptable.
The solution?:
It is not super important that the data is 100% current. I would like to serve old invalidated data while updating the cached data in another thread making the new data available for future requests. I reckon that the data needs to be persisted in some way in order to be able to serve the first user after application restart without that user taking the "hit".
I've made a solution which does somewhat of the above, but I'm wondering if there is a "best practice" way or of there is a caching framework out there already supporting this behaviour?
There are tools that do this, for example Microsofts ISA Server (may be a bit expensive / overkill).
You can cache it in memory using Enterprise Libary Caching. Let your users read from Cache, and have other pages that update the Cache, these other pages should be called as regularly as you need to keep the data upto date.
You could listen when the Cached Item is Removed and Process then,
public void RemovedCallback(String k, Object v, CacheItemRemovedReason r)
{
// Put Item Back IN Cache, ( so others can use it until u have finished grabbing the new data)
// Spawn Thread to Go Get Up To Date Data
// Over right Old data with new return...
}
in global asax
protected void Application_Start(object sender, EventArgs e)
{
// Spawn worker thread to pre-load critical data
}
Ohh...I have no idea if this is best practice, i just thought it would be slick~
Good Luck~
I created my own solution with a Dictionary/Hashtable in memory as a duplicate of the actual cache. When a method call came in requesting the object from cache and it wasn't there but was present in memory, the memory stored object was returned and fired a new thread to update the object in both memory and the cache using a delegate method.
You can do this pretty easily with the Cache and Timer classes built into .NET. The Timer runs on a separate thread.
And I actually wrote a very small wrapper library called WebCacheHelper which exposes this functionality in an overloaded constructor. The library also serves as a strongly typed wrapper around the Cache object.
Here's an example of how you could do this...
public readonly static WebCacheHelper.Cache<int> RegisteredUsersCount =
new WebCacheHelper.Cache<int>(new TimeSpan(0, 5, 0), () => GetRegisteredUsersCount());
This has a lazy loading aspect to it where GetRegisteredUsersCount() will be executed on the calling thread the instant that RegisteredUsersCount is first accessed. However, after that it's executed every 5 minutes on a background thread. This means that the only user who will be penalized with a slow wait time will be the very first user.
Then getting the value is as simple as referencing RegisteredUsersCount.Value.
Yeah you could just cache the most frequently accessed data when your app starts but that still means the first user to trigger that would "take the hit" as you say (assuming inproc cache of course).
What I do in this situation is using a CacheTable in db to cache the latest data, and running a background job (with a windows service. in a shared environment you can use threads also) that refreshes the data on the table.
There is a very little posibility to show user a blank screen. I eliminate this by also caching via asp.net cache for 1 minute.
Don't know if it's a bad design, but it's working great without a problem on a highly used web site.