I am not sure what the best practice is about coding server-side sometimes.
Lets say we have a rank tracker application which updates the domain google ranking for specific keywords. We create an application (e.g. Laravel Framework) which includes frontend and backend. Then we have to update the rankings for all websites from time to time. I know that a cronjob would help me to execute a script every few minutes.
But if it gets more complicated like the Uber Driving system... cronjobs will not be enough right? We need some server-side application written in C#, Java, ... which are continuously checking for tasks. right?
I just need some advice. Maybe someone could also point out in which case cronjobs are not enough and we have to use own applications (C#, Java,..) to make sure everything is working fine.
First you need to take a look at what Cron Jobs are used for:
https://en.wikipedia.org/wiki/Cron
The software utility Cron is a time-based job scheduler
So in cases where you want to execute/task things at specific times, then a Cron Job would satisfy that need.
If however you want to continuously check for a condition, and relay that information to-and-fro client and server, Sockets are generally what you're looking at. Specifically web sockets (in web based applications, also again depends on where/how you want to use it).
https://en.wikipedia.org/wiki/WebSocket
https://en.wikipedia.org/wiki/Computer_network_programming
"Continuously" disregards the current time (I.e.: It's not the same as running every second, or even millisecond).
Language also doesn't matter, but preferably you'd want to use something that has decent socket support / well documented libraries available.
TLDR;
Cron Jobs are good for when you want to do things at very specific times, like database backups. Where as sockets are more widely used to relay live information.
Related
I'm writing a calculation intensive program in C# using the TPL. Some preliminary benchmarking shows good reduction in computation time through using processors with more cores/threads.
However, there is a limit to how many threads are available on a single CPU (I think even the best Xeons money can buy is currently have about 16).
I've been reading about how render farms with a 'grid' of multiple inexpensive CPUs in their own machines is a good way to increase the overall core count, but I have no idea how I go about implementing one of these. Is it implemented at the OS level with Microsoft server technology (and if so, how?), or do I also need to modify the C# code itself?
Any help or links to existing information would be greatly appreciated.
If you want to do this at scale (100s of nodes) then developing your own system is hard. You have to handle; nodes becoming unavailable, data replication to each node, tracking job progress.. It's a long list. You also need to consider the sort of communication you're going to require between your nodes. Remember that the cost of sending a message (data) from one thread to another is tiny compared to the cost of sending it to another machine across a network (even a fast one). You may have to completely rewrite your multithreaded application to run well on a distributed system, even to the point of using a completely different algorithm.
Hadoop
Microsoft had plans to commercialize Dryad as LINQ to HPC but this project was sidelined a while back (I worked on this project before I left Microsoft). I believe you can still get the final "public preview", but it's unsupported. The SQL team opted to work with the Hadoop/Hortonworks people on getting a Windows/Azure/.NET friendly Hadoop distribution off the ground. As far as I know the only thing they have delivered is HDInsight. A Hadoop service running in Azure.
There is now a Microsoft .NET SDK For Hadoop which will allow you to manage a cluster and submit jobs etc. It does not seem to allow you to write code that executes on the Hadoop nodes. You can however use the Hadoop streaming API. This is fairly low level but is language agnostic so you can pretty much use it to integrate map reduce code written in any language with Hadoop. More details on this can be found in this blog post.
Hadoop for .NET Developers
If you want to do this as a smaller scale (10s of nodes) then I would look for something like MPI .NET. it looks like this project has been abandoned but something similar is probably what you want.
You might look into some like Dryad - http://research.microsoft.com/en-us/projects/dryadlinq/default.aspx
It might on the other hand also be a big too much for your situation, but the ideas in Dryad could be simplified to your needs.
You might also look into making your own TaskScheduler, which could handle the distribution of threads to agents running on other boxes, but you would have to implement a simple socket client/server communication to get and push the data.
Another and a bit odd suggestion, which might be okay for investigating things, is to do the following.
Let the master of the calculation cut the problem into the number of available client computers.
Write the parameters to kick of the calculation for each client to a file shared by all on the network.
Let the clients look for files dedicated to them, and kick of the calculation for their piece, when file appears. The output is written back to a result file.
The server will sit an listen for all clients completing their jobs.
The files could be replaced with a database, low-level sockets, REST services, Web Services etc. depending on your needs.
This may seem very elementary, but I don't really have any experience in this realm - all my experience has been on the web side of things.
I need to create a process of some sort that will repeatedly query an API (around 5 times a second), get the results from the API (in JSON format), and then my process will do what it needs to do with the results (in my case, insert them into a SQL database). These details don't really matter to the scope of the question I have, I just want to give you an idea on what I'm trying to achieve in case someone wants to recommend a better way of doing it.
My first thought was to create a console app that basically never quits (unless I specifically tell it to). Is a console app the way to go for this? The idea is I'll have a VM set up which will host my solution, including this "process" I create. I'm not all too familiar with Windows Services, or Windows Tasks, but I probably need to write some custom code so therefore I imagine I can't use the Windows Task Scheduler, am I right?
Once options would be to create a Windows Service which is the OS-level implementation of a long running process. To do so in C# you may wish to read through some tutorials online, perhaps starting with the MSDN Walkthrough. You should also read about Windows Services in general and the differences between a service and a regular user process (mainly the fact that services have no UI and can't interact with the user directly, and some of the other security considerations).
Other options may be to leverage a framework such as WCF or similar.
I am developing a asp.net site that needs hit a few social media sites daily for blanket friend/follower data. I have chosen arvixe business class as my hosting. In the future if we grow, I'd love to get onto a dedicated server and run a windows service, however since that is not in the cards at this point I need another reliable way of running scheduled tasks. I am familiar with running a thread timer from the app_code(global.aspx). However the app pool recycling will cause some problems with the timer. I have never used task scheduling like quartz but have read a lot about it on stackoverflow. I was looking for some advise as to how to approach my goal. One big problem I have using either method is that I will need the crawler threads to sleep for up to an hour regularly due to api call limits. My first thoughts were to use the db to save the starting and ending of a job. When the app pool recycles I would clear out any parts not completed and only start parts that do not have a record of running on that day. What do the experts here think? any good links to sample architecture of this type of scheduling?
It doesn't really matter what method you use, whether you roll your own or use Quartz. You are at the mercy of ASP.NET/IIS because that's where you want to host it.
Do you have a spare computer laying around that can just run a scheduled task and upload data to a hosted database? To be honest, it's possibly safer (depending on your use case) to just do it that way then try to run a scheduler in ASP.NET.
Somewhat along the lines of Bryan's post;
Find a spare computer.
Instead of allowing DB access have it call up a web service on your site. This service call should be the initiator of the process you are trying to do. Don't try to put params into it, just something like "StartProcess()" should work fine.
As far as going to sleep and resuming later take a look at Workflow Foundation. There are some nice built in features to persist state.
Don't expose your DB to the outside world, instead expose that page or web service and wraps some security around that. WCF has some nice built in security features for that.
The best part is when you decide to move off, you can keep your web service and have it called from a Windows Service in the same manner.
As long as you use a persistent job store (like a database) and you write and schedule your jobs so that they can handle things like being killed half way through, having IIS recycle your process is not that big a deal.
The bigger issue is that IIS shuts your site down if it doesn't have traffic. If you can keep your site up, then just make sure you set the misfire policy appropriately and that your jobs store any state data needed to pick up where they left off, you should be able to pull it off.
If you are language-agnostic and don't mind writing your "job-activation-script" in your favourite, Linux-supported language...
One solution that has worked very well for me is:
Getting relatively cheap, stable Linux hosting(from reputable
companies),
Creating a WCF service on your .Net hosted platform that will contain the logic you want to run regularly (RESTfully or SOAP or XMLRPC... whichever suits you),
Handling the calls through your Linux hosted cron jobs, written in your language of choice(I use PHP).
Working very well, like I said. No VPS expense,configurable and externally activated. I have one central place where my jobs are activated, with 99 to 100% uptime(never had any failures).
I am basically creating a site for recruiters. One of the functionality in my application requires posting to Facebook periodically. The posting frequency can be from 0(Never) to 4(High)
For Eg. If a recruiter has 4 open jobs and he has posting frequency set to 4, each job should be posted as per it's turn: 1st job on 1st day, 2nd job on 2nd, 3rd job on 3rd etc, on 5th day again 1st job (round robin fashion).
Had he set the posting frequency to 2, two jobs would be posted daily (thus each job would be posted every 2 days)
My only question is what type of threading should I create for this since this is all dynamic!! Also, any guidelines on what type of information should I store in database?
I need just a general strategy to solve this problem. No code..
I think you need to seperate it from your website, I mean its better to run the logic for posting jobs in a service hosted on IIS ( I am not sure such a thing exists or not, but I guess there is).
Also you need to have table for job queue to remember which jobs need to be posted, then your service would pick them up and post them one by one.
To decide if this is the time for posting a job you can define a timer with a configurable interval to check if there is any job to post or not.
Make sure that you keep the verbose log details if posting fails. It is important because it is possible that Facebook changes its API or your API key becomes invalid or anything else then you need to know what happened.
Also I strongly suggest to have a webpage for reporting the status of jobs-to-post queue, if they failed what was the causes of problem.
If you program runs non-stop, you can just use one of the Timer classes available in .NET framework, without the need to go for full-blown concurrency (e.g. via Task Parallel Library).
I suspect, though, that you'll need more than that - some kind of mechanism to detect which jobs were successfully posted and which were "missed" due program not running (or network problems etc.), so they can be posted the next time the program is started (or network becomes available). A small local database (such as SQLite or MS SQL Server Compact) should serve this purpose nicely.
If the requirements are as simple as you described, then I wouldn't use threading at all. It wouldn't even need to be a long-running app. I'd create a simple app that would just try to post a job and then exit immediately. However, I would scheduled it to run once every given period (via Windows Task Scheduler).
This app would check first if it hasn't posted any job yet for the given posting frequency. Maybe put a "Last-Successful-Post-Time" setting in your datastore. If it's allowed to post, the app would just query the highest priority job and then post it to Facebook. Once it successfully posts to Facebook, that job would then be downgraded to the lowest priority.
The job priority could just be a simple integer column in your data store. Lower values mean higher priorities.
Edit:
I guess what I'm suggesting is if you have clear boundaries in your requirements, I would suggest breaking your project into multiple applications. This way there is a separation of concerns. You wouldn't then need to worry how to spawn your Facebook notification process inside your web site code.
I have a require ment to read data from a table(SQL 2005) and send that data to other application for every 5 seconds. I am looking for the best approach to do the same.
Right now I am planning to write a console application(.NET and C#) which will read the data from sql server 2005(QUEUE table which will be filled through different applications) and send to other application through TCP/IP(Central server). Run that console application under schedule task for every 5 seconds. I am assuming scheduled task will take care to discard new run event if task is already running(avoid to run concurrent executions).
Does any body come accross similar situation? Please share your experience and advice me for best approach.
Thanks in advance for your valuable time spending for my request.
-Por-hills-
We have done simliar work. If you are going to query a sql database every 5 seconds, be sure to use a stored procedure that is optimized to be very fast. It should not update data unless aboslutely necessary. This approach is typically called 'polling' and I've found that it is acceptable if your sqlserver is not otherwise bogged down with too many other calls.
In approaches we've used, a Windows Service that does the polling works well.
To communicate results to another app, it all depends on what your other app is doing and what type of interface you can make into it, and how quickly you need the results. The WCF class libraries from Microsoft provide many workable approaches for real time communication. My preference is to write to the applications database, and then have the application read the data (if it works for that app). If you need something real time, WCF is the way to go, and I'd suggest using a stateless protocol like http if < 5 sec response time is required, (using standard HTTP posts), or TCP/IP if subsecond response time is required.
since I assume your central storage is also SQL 2005, have you considered using what SQL Server 2005 offers out of the box to achieve your requirements? Rather than pool every 5 seconds, marshal and unmarshal TCP/IP, implement authentication and authorization for the TCP/IP pipe, scale TCP transmission with boxcaring, manage message acknowledgments and retries, deal with central site availability, fragment large messages, implement fairness in transmission and so on and so forth, why not simply use Service Broker? It does all you need and more, out of the box, already tested, already tuned for performance and scalability.
Getting reliable messaging right is not trivial and you should focus your efforts in meeting your business specifics, not reiventing the wheel.
I would recommend writing a Windows Service (since you are C#) that has some timer which runs every 5 seconds. That way you wont be starting and stopping an application all the time, it can run even when there is no one logged into the machine, and it will automatically start when the machine is restarted.
For one of my projects, I needed to do something periodically. I opted for a service and set up a timer that takes care of reading the data. You might consider that solution. It has worked well for me.
I suggest to create a windows service and not an application and to perform the timing yourself - create a timer and execute one step on each timer event. For the communication you have many choices - I would consider using standard technologies like a webservice or Winows Communication Foundation.
Besides this custom solution I would evaluate if the task can be solved using Microsoft Integration Services .
Finally other question comes to mind - why do you need this application? Why doesn't/don't the application(s) consuming the data query the database? Is the expensive polling required? Is it possible for the data producers to signal the availibilty of new data directly to the data consumers?
I am not sure about the details of your project, specifically related to security but maybe it would be better to create an SSIS package and schedule it as a job?