How to save RSS-feed to database - c#

Newbie here!
I'm working with trying to get data from diffrent RSS-feeds and save them to my MS SQL 2008 database. As of now I can successfully retrieve the data I want, see: Paging of RSS using System.ServiceModel.Syndication
My database right now has two tables, one connected to the separate RSS-feeds, and another table to it's content. (Look at it as a TV-series and it's episodes).
Since I want to work with the data from the RSS-feeds further, I need to save it all to my database. But also, keep updating it as the RSS-feed(s) update.
My question is how this is most effectively achieved? And how can i make this an automated process?
Since there will be a lot of RSS-feeds I'm thinking perhaps the most efficient way is to look at the separate RSS-feed and take the date of the last update (stored in my database) and compare it to the RSS-feed. Thus adding the new content and after that updating the "last update" to that of the RSS-feeds latest post?

So what i did was i added all my RSS-feeds and their information into one table. Then set up another table for all the feeds content.
After this I built a script that runs every 6 hours, looping through each RSS-feed and then it's contet, checking if new unique posts has been added to the RSS, then these are added to the "content table" as well.
I think I over-complicated the problem, the solution was quite easy. :)

Related

Xml data file to Sql Database including data comparison and validation

Evening all,
Background: I have many xml data files which I need to import in to an SQL database via my C# WPF based Windows Application. Whilst some data files are a simple and straight forward INSERT, many require validation and verification checks, require GUIDs from existing records (from within the db) before the INSERT takes place in other to maintain certain relationships.
So I've broken the process in to three stages:-
Validate records. Many checks exist such as e.g. 50,000 accounts in xml file must have a matching reference with an associated record already in the database. If there is no corresponding account, abandon the entire import process; Another would be, only the 'Current' record can be updated, so again, if it points to a 'historic' record then it needs to crash and burn.
UPDATE the associated database records e.g Set from 'Current' to 'Historic' as record is to be superseded by what's to come next...
INSERT the records across multiple tables. e.g. 50,000 accounts to be inserted.
So, in summary, I need to validate records before any changes can be made to the database with a few validation checks. At which point I will change various tables for existing records before inserting the 'latest' records.
My first attempt to resolve this situation was to load the xml file in to XmlDocument or XDocument instance, iterate over every account in the xml file and perform an sql command for every account. Verify it exists, verify its a current account, change the record before inserting it. Rinse and repeat for thousands of records - Not ideal to say the least.
So my second attempt is to load the xml file in to a data table. Export the corresponding accounts from the database in to another data table and perform a nested loop validation e.g. does DT1.AccountID exist in DT2.AccountID, move DT2.GUID to DT1.GUID etc etc. I appreciate this could also be a slow process. That said, I do have the luxury then of performing both the UPDATE and INSERT stored procedures with a table value parameter (TVP) and making use of the data table information.
I appreciate many will suggest letting the SQL do all of the work but i'm lacking in that skillset unfortunately (happy to learn if thats the general consensus) but I would much rather do the work in C# code if at all possible.
Any views on this are greatly appreciated. Many of the questions I've found are around bulk INSERT, not so much about validating existing records, following by updating records followed by inserting records. I suppose my question is around the first part, the validation. Does extracting the data from the db in to a data table to work on seem wrong, old fashioned, pointless?
I'm sure i've missed out some vital piece of information so apologies if unclear.
Cheers

Can a table be added to a replication Db which isn't part of the original Db?

I am trying to build a WinForms app that allows the user to fill out a form for our company while keeping it clean and standardized. As part of this project, I have created a replication from our production Db with only the tables for clients and client contacts. I need those tables available to the App so it can pull current client information for this form. I wanted to add another table to the replication that is not from the original Db that will take all the information added for the form. I guess I don't need it, but I was wondering if it was possible and, if so, will it break anything.
I can already hear some of you guys reading this saying, why don't you just add another table to the production Db. Well, I thought of that but, the App that is bound by that Db is very strict and runs a check every time the app is launched to make sure the Db hasn't been corrupted. If it finds a new table has been added, I'm sure it wouldn't work. Also, I hear some of you shouting at your monitors, "Why do you want this table as part of the Replication, Why not just house it on a different Db that won't affect your precious app?" And to those people I would say, I just thought of that. But, I am trying to make something lightweight that can be put on a lot of computers without much over head or back end. Thank you in advance for considering this problem.

How would you solve this SQL writing problem in C#?

How would you solve this SQL writing problem in C#?
I'm using a data poller which retrieves data from an API every minutes.
What I'm doing is that I'm going to take these data and write it to a table in my SQL. The data from the API are like a feed. There are times where new items in the feed is added, and sometime the feed are still the same.
My problem now is that I retrieve the feed from the API each minute and send it to the database, but what I want to achieve is to only send the data table to the database if there actually has been some changes or new elements has been added since last time I retrieved feed from the API.
How would you solve this? The reason I want to solve this, is because I don't want to overkill the DB with post request
It depends on your data feed. The best option would be, if your feed offers the ability to fetch only new item since some point in time (incremental imports). If this is not possible, you should have some identifier for your items. You could then check, if an id is already present in your database. If it is already present, skip the entry. If not, send the whole entry to your database. If you also don't have any sort of identifier, you're most likely out of luck.

Best way to incorporate legacy data

I am working on a price list management program for my business in C# (Prototype is in Win Forms but am thinking of using WPF for the final ap as a MVVM learning exercise).
Our EMS system is based on a COBOL back end and will remain that way for at least 3 years so I cannot really access it's data directly. I want to pull data from them EMS system periodically to ensure that pricing remains in sync (And to provide some other information to users in a non-editable manner such as bin locations). What I am looking at doing is...
Use WinBatch to automatically run a report nightly then to Use Monarch to convert the text report to a flat file (.xls?)
Drop the file into a folder and write a small ap to read it in and add it to the database
How should I add this to the database? (SQL Express) I could have a table that is just replaced completely each time but I am a beginner at most of this and I am concerned what would happen if an entire table was replaced while the database was being used by the price list ap.
Mike
If you truncate and refill a whole table you should do it in one single transaction and place a full table lock. This is more secure and faster.
You also could update all changed rows, then insert new (missing rows) and then delete all rows which weren't updated in this run (insert some kind of version number in each row to determine this).
First create a .txt file from the legacy application. Then use a batch insert to pull it into a work table for whatever clean up you need to make. Do the clean up using t-sql. Then run t-sql to insert new data into the proper tables and/or to update rows where data has changed. If there are toomany records, do the inserting and updating in batches. Schedule all this as a job to run during hours when the database is not busy.
You can of course do all of this best in SSIS but I don't know if that is available with Express.
Are there any fields/tables available to tell you when the price was last updated? If so you can just pull the recently updated rows and update that in your database.... assuming you have a readily available unique primary key in your cobol app's datastore.
This wouldn't be up to date though because you're running it as a nightly script to update the database used by the new app. You can maybe create a .net script to query the cobol datastore specifically for whatever price the user is looking for, and if the cobol datastores update time is more recent than what you have logged, update the SQL Server record(s).
(I'm not familiar with cobol at all, just throwing ideas out there)

Best way to track changes and make changes from Mysql -> MSSQL

So I need to track changes that happen on a Mysql table. I was thinking of using triggers to log all the changes made to it and then save these changes in another table. Then I will have a cron script get all these changes and propagate the changes into the Mssql database.
I really dont expect a lot of information to be proporgated, but the data is very time sensitive. Ideally the MSSQL will see these changes within a minute, but I know that this requirement may be too high.
I was wondering if anyone had a better solution.
I have the bulk of the site written in .net but use vbulletin as the forums (sorry but there are no .net forums as powerful or feature rich like vbulletin)
The majority of the replicator tools use this technique. Fill another table on insert/update/delete triggers that containt the tablename and the PK or a unique key.
Then a reader reads this table, do the proper "select" if insert/update to get the data, then updates the other database.
HTH

Categories