Design questions for RSS generating application

Design questions for RSS generating application - c#

I've been tasked with creating a small .net application to serve content from a DB through an RSS feed. The content will be updated from the DB on a fixed interval (say every 30s or so). This will be my first time working with RSS and I have somewhat limited web application skills. However, the DBs and DA layer im pretty good so im not exactly starting from scratch.
My questions are:
I want to decouple the content updating process from the request servicing process. Am I better off writing an independent windows service to handle the db-related content retrieval and XML transformation or would using a background process in a web application be fine?
a. If the answer is dedicated WS, will thread-blocking be an issue as the service tries to update a page at the same time the page is being served?
b. If the answer is BG process, is there a way to share a collection or some-type of in memory object between the background process and the main application so that on client request, the XML is generated real-time from objects in a collection?
So SOAP/REST WS a strong option for content delivery or am I better off with a full web application with rss.aspx?
For transforming the content to XML, should I use SyndicationFeed class or some form of XML template with substitution? There are a very limited number of fields (4-8) that will be updated routinely, so the XML will be relatively tiny.
Sorry if I seem all over the place on this. Im just trying to really think of a robust solution thats extensible and well designed. Thanks in advance and please know I appreciated any thoughts/ideas on this project.

I have some experience building RSS systems, so let me try to answer your questions.
If by decoupling you mean "generating the XML files aynschronously", then it depends on how many differents feeds you have. Based on what you describe, you'll be serving feeds based on queries to a database. If these queries have parameters, then you'll have as many different feeds that you have possible queries and generating them offline will not work. Generally, I think most people generate feeds 'on the fly' with the requests they get.
I'm not familiar with rss.aspx, so I can't help much :)
The benefit of using your own XML templates is that you'll be able extend (the X in XML!) your schema at some point with some other namespaces should you need to do that in the future.

Related

Advice on the best project type to use for an RSS reader

So this is more of an advice question.
We have a project that involves an RSS feed and the reports from the feed being saved to a database. This will need to have a Job service so either quartz or chron.
My question is that there are many types of project that we use as developers but in my line of work these are normally web API with MVC hooked up to an angular front end.
With this project we do not need any end points so no need for MVC. Just after some advice as to what others would recommend.
The flow will be
1. c# call to rss feed with a parameter (5 per second max)
2. xml returned
3. xml mapped to a DTO/Modal
4. DTO/modal saved to Database
5. External reporting tool will handle data.
Any help is appreciated.
Thanks in advance

I would recommend a console application for the following reasons:
Requirement for a job. Console apps can easily be ran via scheduled tasks/jobs.
Lack of requirement for User Interface. Perhaps you might need to pass in a few parameters, not sure. Perfect for a console application.
The requirements for retrieving and storing of the rss feeds XML can all be handled in C#, nothing special needed from a framework perspective that the console app can't easily do.

You may also consider a windows service project.
https://i.stack.imgur.com/WIAWC.jpg
Make it a purely background process.
You don’t have to worry about console windows appearing in a user session and taking measures to keep it hidden.
There are few operational tasks - like managing the service through service management console, out of box support for logon and failure recovery etc. which can be leveraged too.

How to load XML from external site periodically?

On a personal project I'm working on, I have a requirement where I need to save (on disk) a XML feed periodically from an external site, and then parse the XML and render the contents in a particular format. Parsing the XML and rendering it is no problem - the confusion comes in finding the appropriate way to pole the external site/url store the XML periodically.
I have done a fair amount of research, but I've ended up even more stumped. My initial thoughts were to create a service that poles the external site, and retrieve and store the XML at prescribed intervals. I've not created a service before, so a) I'm not really sure where to start, and b) I'll be hosting the site through a hosting provider and I'm not sure that this a viable option?
The SO thread writing a service to periodically retrieve XML and send SMS seems to do exactly what I need, but I don't entirely understand the proposed solution.
I also found an article on delivering data across domains using an AJAX proxy, but this seems overkill for what I need.
Does anyone have any recommmendations on how to achieve this?

Read this and when you're finished, I would suggest you read the XML via an HTTPWebRequest, instead of trying to download it. I assume you'll be able to do this and write the result to a file? If not, I can expand my answer a bit.
You'll definitely want to create a windows service as their sole purpose is to keep running in the background and periodically do stuff.

API that not exposes business Logic

I need to write a WCF services that bring data to reporting tool.
Reporting tool presents data in lazy way, till user not clicked it's not showing the data.
I cannot send everything in once because there can be several megabits of data and because of that I need to send it in portions.
The problem is that I don't want to create a lot of web functions for each report , because this way part of the BL will be in the reporting tool.
Is it possible somehow to make each report run in it's own web session and each time it's asks a next portion of data I will be able to send it back and everything made in the same session?
May be you have better solution to my problem .

There are a number of technologies that could help. I would take a look at WCF Data Services which allow you to do flexible querying (IQuerable) and association traversal which should take care of your lazy loading concerns without having to create a whole load of seperate WCF calls.
Also take a look at SQl Server Reporting Services which is a more general reporting solution that may appeal to you.
Either of these technologies should help you avoid your BL leaking into your reporting tool. There are probably a whole host of similar non-MS solutions that do similar things, but I have listed the two above as your are using WCF, so you are probably more familiar with the MS stack (but maybe that was a silly assumption by me...if thats not the case they will get you started on what to look for!)

consuming web services - how to develop ui pulling results from many sources

I´m looking for books, tutorials or videos displaying best practices for consuming webservices. My main idea is to learn how to manage a user interface were results are pulled from many sources (eg: ebay developer, yahoo shopping xml, etc) and displayed to customers in a list of results. many years ago a website called www.mpire.com used to work in that way, displaying results on demand.
I´m developing with C#, razor, ef 4, sql server.
thanks in advance.

Here is an overview. With this you can start searching more concepts on google.
Learn how to connect to the various API's and retrieve data. You can do this in your controller but it would be considered best practice to create a c# api wrapper for each of the api's you are connecting to. Your application then can use the wrapper to simplify and seperate concerns. For many popular api's .net wrappers are already created and are available on open source sites such as codeplex or github. The documentation for each api is a good place to start. Generally they will reference a wrapper for the language you are working in or may have developed there own you can download.
Once you are retrieving data, then you have to consider if you are going to store the data in your app or if it is always going to call to the api to get the data. Depending on the situation, you can store the data in you database, thus making things faster and reducing calls to the external api. This doesn't work / isn't allowed in all situations but just depends on your use case. If you are going to save the data, you will need to learn about database persistance. Linq2sql is a good place to start because it is so easy. There are good examples on www.asp.net/mvc
Depending on if you are retrieving the data from your database or from the API directly, you will then need to create custom view models for your views. In your controller, you can gather the data from the various sources and combine it into a single object called a view model. From there you pass the view model to your view and then display the data on the page. I would stay away from the asynchronous controllers until you get everything working properly and go into performance tuning. These will add complexity that you don't need as you are learning.

Because invoking remote services is an I/O intensive tasks it would be beneficial to use asynchronous controllers. The documentation contains a couple of examples of howto put them in practice.

regular expression: mine text data from other websites

I want to crawl through lets say other companies websites like for cars and extract readonly information in my local database. Then I want to be able to display this collected information on my website. Purely from technology perspective, is there a .net tool, program, etc already out there that is generic enough for my purpose. Or do I have to write it from scratch?
To do it effectively, I may need a WCF job that just mines data on constant basis and refreshes the database which then provides data to the website.
Also, is there a way to mask my calls to those websites? Would I create "traffic burden" for my target websites? Would it impact their functionality if I am just harmlessly crawling them?
How do I make my request look "human" instead of coming from Crawler?
Are there code examples out there on how to use a library that parses the DOM tree?
Can I send request to a specific site and get a response in terms of DOM with WebBrowser control?

Use HtmlAgilityPack to parse the HTML. Then use a Windows Service (not WCF) to run the long-running process.

I don't know about how you'd affect a target site, but one nifty way to generate human-looking traffic is the WinForms browser control. I've used it a couple of times to grab things from Wikipedia because my normal mode of using HttpWebRequest to perform HTTP get flagged a non-human filter there and I got blocked.

As far as affecting the target site it totally depends on the site. If you crawl stackoverflow enough times fast enough they'll ban your ip. If you do the same to google they'll start asking you to answer captchas. Most sites have rate limiters, so you can only ask for a request so often.
As far as scraping the data out of the page, never use regular expressions it's been said over and over. You should be using eaither a library that parses the DOM tree or roll your own if you want. In a previous startup of mine the way we approached the issue was we wrote an intermediary template language that would tell our scraper where the data was on the page so that we knew what data and what type of data we were extracting. The hard part you'll find is constantly changing and varying data. Once you have the parser working it takes constant work to have it keep working even on the same site.

I use a fantastically flexible tool Visual Web Ripper. Output to Excel, SQL, text. Input from the same.

There is no Generic tool which would extract the data from the Web for you. This is not a trivial operation. In general Crawling the pages is not that difficult. But stripping / extracting the content you need is difficult. This operation will have to be customized for every website.
We use professional tools dedicated for this and they are designed to feed the Crawler with instructions about which areas within the web page to extract the data you need.
I have also seen Perl Scripts designed extract data from Specific web pages. They could be highly effective depending on the site you parse.
If you hit a site too frequently, you will be banned (At least temporarily).
To mask your IP you can try http://proxify.com/

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.