How would you solve this SQL writing problem in C#?
I'm using a data poller which retrieves data from an API every minutes.
What I'm doing is that I'm going to take these data and write it to a table in my SQL. The data from the API are like a feed. There are times where new items in the feed is added, and sometime the feed are still the same.
My problem now is that I retrieve the feed from the API each minute and send it to the database, but what I want to achieve is to only send the data table to the database if there actually has been some changes or new elements has been added since last time I retrieved feed from the API.
How would you solve this? The reason I want to solve this, is because I don't want to overkill the DB with post request
It depends on your data feed. The best option would be, if your feed offers the ability to fetch only new item since some point in time (incremental imports). If this is not possible, you should have some identifier for your items. You could then check, if an id is already present in your database. If it is already present, skip the entry. If not, send the whole entry to your database. If you also don't have any sort of identifier, you're most likely out of luck.
Related
Evening all,
Background: I have many xml data files which I need to import in to an SQL database via my C# WPF based Windows Application. Whilst some data files are a simple and straight forward INSERT, many require validation and verification checks, require GUIDs from existing records (from within the db) before the INSERT takes place in other to maintain certain relationships.
So I've broken the process in to three stages:-
Validate records. Many checks exist such as e.g. 50,000 accounts in xml file must have a matching reference with an associated record already in the database. If there is no corresponding account, abandon the entire import process; Another would be, only the 'Current' record can be updated, so again, if it points to a 'historic' record then it needs to crash and burn.
UPDATE the associated database records e.g Set from 'Current' to 'Historic' as record is to be superseded by what's to come next...
INSERT the records across multiple tables. e.g. 50,000 accounts to be inserted.
So, in summary, I need to validate records before any changes can be made to the database with a few validation checks. At which point I will change various tables for existing records before inserting the 'latest' records.
My first attempt to resolve this situation was to load the xml file in to XmlDocument or XDocument instance, iterate over every account in the xml file and perform an sql command for every account. Verify it exists, verify its a current account, change the record before inserting it. Rinse and repeat for thousands of records - Not ideal to say the least.
So my second attempt is to load the xml file in to a data table. Export the corresponding accounts from the database in to another data table and perform a nested loop validation e.g. does DT1.AccountID exist in DT2.AccountID, move DT2.GUID to DT1.GUID etc etc. I appreciate this could also be a slow process. That said, I do have the luxury then of performing both the UPDATE and INSERT stored procedures with a table value parameter (TVP) and making use of the data table information.
I appreciate many will suggest letting the SQL do all of the work but i'm lacking in that skillset unfortunately (happy to learn if thats the general consensus) but I would much rather do the work in C# code if at all possible.
Any views on this are greatly appreciated. Many of the questions I've found are around bulk INSERT, not so much about validating existing records, following by updating records followed by inserting records. I suppose my question is around the first part, the validation. Does extracting the data from the db in to a data table to work on seem wrong, old fashioned, pointless?
I'm sure i've missed out some vital piece of information so apologies if unclear.
Cheers
Case
I have an app where I am downloading information about some products from a server and storing it to an SQLite database.
Every minute I re-download the whole information in case it has been modified, deleted or added although I know it is not efficient.
Goal
What I need is some form of getting only the data that has been modified if it actually has.
How can I achieve this?
There are many possible solutions for this problem. No one here will be able to give a conclusive answer without knowing your system in more. But following is a possible approach that might work for you.
Client side need to cache/save the timestamp of when the api was last called.
Server need to make changes to the existing api, to send the timestamp. If there is any changes made to the data since the timestamp, the server will return those changes or else no data.
Another approach will be to have a data sync, which will be more efficient but it involves more complexity and there might be more work involved.
Newbie here!
I'm working with trying to get data from diffrent RSS-feeds and save them to my MS SQL 2008 database. As of now I can successfully retrieve the data I want, see: Paging of RSS using System.ServiceModel.Syndication
My database right now has two tables, one connected to the separate RSS-feeds, and another table to it's content. (Look at it as a TV-series and it's episodes).
Since I want to work with the data from the RSS-feeds further, I need to save it all to my database. But also, keep updating it as the RSS-feed(s) update.
My question is how this is most effectively achieved? And how can i make this an automated process?
Since there will be a lot of RSS-feeds I'm thinking perhaps the most efficient way is to look at the separate RSS-feed and take the date of the last update (stored in my database) and compare it to the RSS-feed. Thus adding the new content and after that updating the "last update" to that of the RSS-feeds latest post?
So what i did was i added all my RSS-feeds and their information into one table. Then set up another table for all the feeds content.
After this I built a script that runs every 6 hours, looping through each RSS-feed and then it's contet, checking if new unique posts has been added to the RSS, then these are added to the "content table" as well.
I think I over-complicated the problem, the solution was quite easy. :)
I have a grid/table that I am populating with the data coming from a scanner(USB). I am using SignalR to transfer data from API(Scanning) to client. I also have a functionality where I need to trigger a Print API when table/Grid reaches a pre defined number of rows. The Problem here is how can I manage that predefined value through SignalR.
Since it is a stateless process, I cant use hidden variable (I havent tried, but thought I can't). Also I cant go back again and again to Db (performance issues)
So my query is that, how can I whether number of rows have reached or not with every scan?
I am working on a e-commerce website and there is an issue which we are trying to solve.
After customer completed order she is receiving three emails (all of them same) instead of one.
The website is running on three servers and we think that's the problem because using only one server brings one email delivered to the customer.
I would like to know what we should do so the user will receive only one email instead of three and we will still run the website on three servers.
Thanks in advance, Laziale
You cannot count on locking hints in the database for this. A hint is just a hint; there's no guarantee that the locking will happen as you expect (assuming this is SQL Server). In general, a relational database is just that, a database. A table is not a queuing mechanism and you will always have problems if you try to use it that way.
Nonetheless, in order to implement a different solution, we have to determine if a single record is being added to the "queue" or if three records are being added. If it is the first, and only a single record is added but three emails are sent out then the solution is simple. Instead of using a database table as your queue, use Microsoft Message Queues (MSMQ) instead. They are part of Windows Server and have been since at least 2003, maybe even all the way back to 2000. They will provide you with an actual queue specifically designed for what you're trying to accomplish.
If there are three actual records being added to the "queue" table in the database that means there is a code problem. Even with three Web servers in the load balancer, the fact remains that a single order submission only happens on one of those servers. The business logic that places the email notification in the queue could not come from more than one server because the request only originates from one server.
I would check the table first and determine if there are multiple records being added. If not, change the implementation to use MSMQ. If so, check your code to see why more than one record is being added in the request.