How to load XML from external site periodically? - c#

On a personal project I'm working on, I have a requirement where I need to save (on disk) a XML feed periodically from an external site, and then parse the XML and render the contents in a particular format. Parsing the XML and rendering it is no problem - the confusion comes in finding the appropriate way to pole the external site/url store the XML periodically.
I have done a fair amount of research, but I've ended up even more stumped. My initial thoughts were to create a service that poles the external site, and retrieve and store the XML at prescribed intervals. I've not created a service before, so a) I'm not really sure where to start, and b) I'll be hosting the site through a hosting provider and I'm not sure that this a viable option?
The SO thread writing a service to periodically retrieve XML and send SMS seems to do exactly what I need, but I don't entirely understand the proposed solution.
I also found an article on delivering data across domains using an AJAX proxy, but this seems overkill for what I need.
Does anyone have any recommmendations on how to achieve this?

Read this and when you're finished, I would suggest you read the XML via an HTTPWebRequest, instead of trying to download it. I assume you'll be able to do this and write the result to a file? If not, I can expand my answer a bit.
You'll definitely want to create a windows service as their sole purpose is to keep running in the background and periodically do stuff.

Related

Advice on the best project type to use for an RSS reader

So this is more of an advice question.
We have a project that involves an RSS feed and the reports from the feed being saved to a database. This will need to have a Job service so either quartz or chron.
My question is that there are many types of project that we use as developers but in my line of work these are normally web API with MVC hooked up to an angular front end.
With this project we do not need any end points so no need for MVC. Just after some advice as to what others would recommend.
The flow will be
1. c# call to rss feed with a parameter (5 per second max)
2. xml returned
3. xml mapped to a DTO/Modal
4. DTO/modal saved to Database
5. External reporting tool will handle data.
Any help is appreciated.
Thanks in advance
I would recommend a console application for the following reasons:
Requirement for a job. Console apps can easily be ran via scheduled tasks/jobs.
Lack of requirement for User Interface. Perhaps you might need to pass in a few parameters, not sure. Perfect for a console application.
The requirements for retrieving and storing of the rss feeds XML can all be handled in C#, nothing special needed from a framework perspective that the console app can't easily do.
You may also consider a windows service project.
https://i.stack.imgur.com/WIAWC.jpg
Make it a purely background process.
You don’t have to worry about console windows appearing in a user session and taking measures to keep it hidden.
There are few operational tasks - like managing the service through service management console, out of box support for logon and failure recovery etc. which can be leveraged too.

Design questions for RSS generating application

I've been tasked with creating a small .net application to serve content from a DB through an RSS feed. The content will be updated from the DB on a fixed interval (say every 30s or so). This will be my first time working with RSS and I have somewhat limited web application skills. However, the DBs and DA layer im pretty good so im not exactly starting from scratch.
My questions are:
I want to decouple the content updating process from the request servicing process. Am I better off writing an independent windows service to handle the db-related content retrieval and XML transformation or would using a background process in a web application be fine?
a. If the answer is dedicated WS, will thread-blocking be an issue as the service tries to update a page at the same time the page is being served?
b. If the answer is BG process, is there a way to share a collection or some-type of in memory object between the background process and the main application so that on client request, the XML is generated real-time from objects in a collection?
So SOAP/REST WS a strong option for content delivery or am I better off with a full web application with rss.aspx?
For transforming the content to XML, should I use SyndicationFeed class or some form of XML template with substitution? There are a very limited number of fields (4-8) that will be updated routinely, so the XML will be relatively tiny.
Sorry if I seem all over the place on this. Im just trying to really think of a robust solution thats extensible and well designed. Thanks in advance and please know I appreciated any thoughts/ideas on this project.
I have some experience building RSS systems, so let me try to answer your questions.
If by decoupling you mean "generating the XML files aynschronously", then it depends on how many differents feeds you have. Based on what you describe, you'll be serving feeds based on queries to a database. If these queries have parameters, then you'll have as many different feeds that you have possible queries and generating them offline will not work. Generally, I think most people generate feeds 'on the fly' with the requests they get.
I'm not familiar with rss.aspx, so I can't help much :)
The benefit of using your own XML templates is that you'll be able extend (the X in XML!) your schema at some point with some other namespaces should you need to do that in the future.

regular expression: mine text data from other websites

I want to crawl through lets say other companies websites like for cars and extract readonly information in my local database. Then I want to be able to display this collected information on my website. Purely from technology perspective, is there a .net tool, program, etc already out there that is generic enough for my purpose. Or do I have to write it from scratch?
To do it effectively, I may need a WCF job that just mines data on constant basis and refreshes the database which then provides data to the website.
Also, is there a way to mask my calls to those websites? Would I create "traffic burden" for my target websites? Would it impact their functionality if I am just harmlessly crawling them?
How do I make my request look "human" instead of coming from Crawler?
Are there code examples out there on how to use a library that parses the DOM tree?
Can I send request to a specific site and get a response in terms of DOM with WebBrowser control?
Use HtmlAgilityPack to parse the HTML. Then use a Windows Service (not WCF) to run the long-running process.
I don't know about how you'd affect a target site, but one nifty way to generate human-looking traffic is the WinForms browser control. I've used it a couple of times to grab things from Wikipedia because my normal mode of using HttpWebRequest to perform HTTP get flagged a non-human filter there and I got blocked.
As far as affecting the target site it totally depends on the site. If you crawl stackoverflow enough times fast enough they'll ban your ip. If you do the same to google they'll start asking you to answer captchas. Most sites have rate limiters, so you can only ask for a request so often.
As far as scraping the data out of the page, never use regular expressions it's been said over and over. You should be using eaither a library that parses the DOM tree or roll your own if you want. In a previous startup of mine the way we approached the issue was we wrote an intermediary template language that would tell our scraper where the data was on the page so that we knew what data and what type of data we were extracting. The hard part you'll find is constantly changing and varying data. Once you have the parser working it takes constant work to have it keep working even on the same site.
I use a fantastically flexible tool Visual Web Ripper. Output to Excel, SQL, text. Input from the same.
There is no Generic tool which would extract the data from the Web for you. This is not a trivial operation. In general Crawling the pages is not that difficult. But stripping / extracting the content you need is difficult. This operation will have to be customized for every website.
We use professional tools dedicated for this and they are designed to feed the Crawler with instructions about which areas within the web page to extract the data you need.
I have also seen Perl Scripts designed extract data from Specific web pages. They could be highly effective depending on the site you parse.
If you hit a site too frequently, you will be banned (At least temporarily).
To mask your IP you can try http://proxify.com/

System that needs to upload documents into a MOSS document library

Hi I need your help if you are an expert in MOSS.
I have a system that needs to upload documents into a MOSS document library.
I decided that the easiest approach for a phase 1 system would simply be to map a network path to a MOSS Document library.
The whole thing seems too easy. After that its a straight copy using System.IO.
What I would like to know, is this method reliable enough to be used in a production system?
Speculation would be great, but if you have real experience with working with MOSS in this way, your answer would mean a lot.
Thanks.
So long as you do the proper error checking around the copy, its fine - if you bear in mind the standard caveats with SharePoint document libraries and file naming conventions.
SharePoint does not allow some characters in file names which NTFS and FAT do - these will cause an error when you try to copy them to the DL, regardless of how you do that, so you will need to sanitise your filenames beforehand.
The only downside to using a network path to the webdav interface of SharePoint is that if you stress it too much (a large copy of a lot of files), you can easily overwhelm it and it will cause the network share to become unavailable for a period of time. If you are talking about a few files every now and then, several an hour for example, it should be fine.
You are better off reading the files off the network path and then using the Object Model API or the web services to upload it.
You can use timer jobs which can be scheduled to run at a convenient time. The timer job can read it's configuration settings from an XML file.
This system would be easier to maintain, and troubleshoot, when compared to a straight copy using System.IO.

Silverlight Application for the web - storing data on site

I've made a little game as an application for the web in silverlight using C#, and I simply would like to save the top ten scores of any of the users that go on it.
How can I write to a file and save it on my web hosting area? Is this possible?
I think this would be the best way, because I only need to store a name and score (csv file), and this would be extremely easy. I hope this is possible.
If not could someone point me in the rite direction of being able to do this with a database, I've created a template just incase using MySQL with the features provided from my web hosts. Is there any easy way to do it that way?
Thanks in advance,
Lloyd
You can add a small WCF service to your website with an ISaveScores interface. The SL app can connect to the WCF service to post scores, and the WCF service can then store the data however you want. If you use a csv file, make sure you handle locking properly, since it is very possible for multiple requests to happen simultaneously.
EDIT
Since the host is Linux, just create yourself a rest service or some other service that silverlight can post to in the same way. Silverlight can talk to pretty much any type of service, so use the same technique in your environment.
You could do it with a service as Brian suggested (although it sounds like you might not have windows hosting, so you may not be able to use WCF for it) which is probably the best way -- but if you wanted a simpler solution you could also do it with just a postback to a particular page setup for the purpose.
Write a quickee PHP page that looks for a name and score in POST data, and writes it to your MYSQL database. Call it from your SL app with a webrequest. Then you just need another simple page to query the DB and list the results.

Categories