All,
I have a test program that will serialize test subjects for some research sessions. I'll be running the program at different times, so I need this data to persist. It will be a simple ID number in 001 format (two leading zeros until 10, one leading zero until 100) and will max out at 999. How would I accomplish this in C#? Ideally, it starts up, reads the persistent data, then starts registering new test subjects at the latest number. This number will then be used as a primary key to recognize the test subjects in a database. I've never done anything remotely like this before, so I'm clueless as to what I should do.
EDIT:
I probably should have clarified... there are multiple databases. One is a local SQLite file that holds the test subject's trial data (the specific data from each test). The other is a much larger MySQL database that holds more general information (things about the test subject relevant to the study). The MySQL database is remote and data from the application is not directly submitted to it... that's handled by another application that takes the SQLite file and submits that data to the MySQL database. The test environment is variable and may not have a connection to the MySQL database. As such, it's not a viable candidate for holding such data as I need the ID numbers each time I start the program, regardless of the connection state to the MySQL database. The SQLite files are written after program execution from a text file (csv) and need to contain the ID number to be used as a primary key, so the SQLite database might not be the best candidate for storing the persistent data. Sorry I didn't explain this earlier... it's still early in the day :P
If these numbers are used in a database as the index, why not check the database for the next number? If 5 subjects have been registered already, next time just check the database, get the max for the index and add 1 for the next subject. Once you insert that subject, you can add 1 again.
Related
Evening all,
Background: I have many xml data files which I need to import in to an SQL database via my C# WPF based Windows Application. Whilst some data files are a simple and straight forward INSERT, many require validation and verification checks, require GUIDs from existing records (from within the db) before the INSERT takes place in other to maintain certain relationships.
So I've broken the process in to three stages:-
Validate records. Many checks exist such as e.g. 50,000 accounts in xml file must have a matching reference with an associated record already in the database. If there is no corresponding account, abandon the entire import process; Another would be, only the 'Current' record can be updated, so again, if it points to a 'historic' record then it needs to crash and burn.
UPDATE the associated database records e.g Set from 'Current' to 'Historic' as record is to be superseded by what's to come next...
INSERT the records across multiple tables. e.g. 50,000 accounts to be inserted.
So, in summary, I need to validate records before any changes can be made to the database with a few validation checks. At which point I will change various tables for existing records before inserting the 'latest' records.
My first attempt to resolve this situation was to load the xml file in to XmlDocument or XDocument instance, iterate over every account in the xml file and perform an sql command for every account. Verify it exists, verify its a current account, change the record before inserting it. Rinse and repeat for thousands of records - Not ideal to say the least.
So my second attempt is to load the xml file in to a data table. Export the corresponding accounts from the database in to another data table and perform a nested loop validation e.g. does DT1.AccountID exist in DT2.AccountID, move DT2.GUID to DT1.GUID etc etc. I appreciate this could also be a slow process. That said, I do have the luxury then of performing both the UPDATE and INSERT stored procedures with a table value parameter (TVP) and making use of the data table information.
I appreciate many will suggest letting the SQL do all of the work but i'm lacking in that skillset unfortunately (happy to learn if thats the general consensus) but I would much rather do the work in C# code if at all possible.
Any views on this are greatly appreciated. Many of the questions I've found are around bulk INSERT, not so much about validating existing records, following by updating records followed by inserting records. I suppose my question is around the first part, the validation. Does extracting the data from the db in to a data table to work on seem wrong, old fashioned, pointless?
I'm sure i've missed out some vital piece of information so apologies if unclear.
Cheers
I've got a database which will be accessed by multiple users.
For example User 1 retrieves a list of all available datasets in the table "Test". User 2 does the same thing and both users afterwards got the same datasets.
So now, if User 1 wants to write ABC to dataset with Index 1, he can do so and the dataset is persistent.
But if User 2 NOW wants to write ABB to dataset with Index 1, how can he know, that the dataset has already been updated?
Is there a pattern for multi user database access or can I just use hashing algorithms to obtain if there is an updated dataset?
Or are there any other approaches?
There are a number of possible answers here and a lot depends on your application architecture.
Are two users going to be working on the same row at the same time? And if so, if they have different ideas about what data should be there, this sounds like a business problem that needs to be considered and resolved.
That said, if you have a front end that is receiving data and then trying to write back, using either a time stamp or a checksum as part of validation is a common and useful way of handling this solution.
If I were implementing this I would use a stored procedure and force my application to pass the checksum back to the proc. The procedure would check to see if the checksum is still accurate and the write would fail if it wasn't.
We are using MySQL to get data from database, match the data and send back the matched data to user. The MySQL Db contain 10 table , 9 tables are having less data which needed to be matched with 10th table which has 25 Million records and still adding. I need to create C# application to match the data and send to user. After every 1 min, new data is updated in rest of 9 table and old is deleted after being compared. I have got 10 table data in C# memory, but it sometime get out of memory. I'm thinking of diving C# application into 5-6 parts to handle data and than to do rest of logic. But i need some some good suggestion to start my work.
Thanks
APS
I think you are approaching your problem incorrectly. From your post, it sounds like you are trying to load massive quantities of highly volatile data into memory. By doing that, you are entirely defeating the point of having a database server like MySql. Don't preload all of the data into memory...let your users query the data they need from the database via your C# application. That is exactly what database servers are for, and they are going to do a considerably better job at providing optimized, performant access to data than you can do yourself.
You should probably think about your algorithms and decide if there is any way to split the problem into smaller chunks, for example to work on small partitions of the data at a time.
32 bit .net processes have a memory limit of 2GB. Perhaps you are hitting this limit, hence the out of memory errors? If so, two things you could do are:
Have multiple processes running, each dealing with a subset of the data
Move to a 64bit OS and recompile your code into a 64bit executable
Please do not say you have a lot of data. 24 million rows is not exactly a lot by todays standards.
Where does C# enter here? This looks 100% like something (from your explanation) that should be done totally on the server side with SQL.
I dont use MySQL but I would suggest using a stored procedure to sort through the data first. Depends on how complex or cpu-expensive your computation is and how big the dataset is that you're going to send over your network. But normally I'd try to let the server handle it. That way you don't end up sending all your data over the network. Plus you avoid trouble when your data model changes. You don't have to recompile and distribute your C# app. You change 1 stored procedure and you're ready
I am currently working on a project to parse an excel sheet and insert any values into a database which were not inserted previously. The sheet contains roughly 80 date-value pairs for different names, with an average of about 1500 rows per pair.
Each name has 5 date-value pairs entered manually at the end of the week. Over the weekend, my process will parse the excel file and insert any values that are not currently in the database.
My question is, given the large amount of total data and the small amount added each week, how would you determine easily which values need to be inserted? I have considered adding another table to store the last date inserted for each name and taking any rows after that.
Simplest solution, I would bring it all into a staging table and do the compare in the server. Alternatively, SSIS with an appropriate sort and lookup could determine the differences and insert them.
120000 rows is not significant to compare in the database using SQL, but 120000 individual calls to the database to verify if the row is in the database might take a while on a client-side.
Option 1 would be to create a "lastdate" table that is automatically stamped at the end of your weekend import. Then the next week your program could query the last record in that table, then only read from the excel file after that date. probably your best bet.
Option 2 would be to find a unique field in the data, and row by row check if that key exists in the database. If it doesn't exist, you add it, if it does you won't. This would be my 2nd choice if Option 1 didn't work how you expect it.
It all depends how bullet proof your solution needs to be. If you trust the users that the spreadsheet will not be tweaked in any way that would make it inconsistent, than your solution would be fine.
If you want to be on the safe side (e.g. if some old values could potentially change), you would need to compare the whole thing with the database. To be honest the amount of data you are talking here doesn't seem very big, especially when you process will run on a Weekend. And you can still optimize by writing "batch" type of stored procs for the database.
Thanks for the answers all.
I have decided, rather than creating a new table that stores the last date, I will just select the max date for each name, then insert values after that date into the table.
This assumes that the data prior to the last date remains consistent, which should be fine for this problem.
I am working on a price list management program for my business in C# (Prototype is in Win Forms but am thinking of using WPF for the final ap as a MVVM learning exercise).
Our EMS system is based on a COBOL back end and will remain that way for at least 3 years so I cannot really access it's data directly. I want to pull data from them EMS system periodically to ensure that pricing remains in sync (And to provide some other information to users in a non-editable manner such as bin locations). What I am looking at doing is...
Use WinBatch to automatically run a report nightly then to Use Monarch to convert the text report to a flat file (.xls?)
Drop the file into a folder and write a small ap to read it in and add it to the database
How should I add this to the database? (SQL Express) I could have a table that is just replaced completely each time but I am a beginner at most of this and I am concerned what would happen if an entire table was replaced while the database was being used by the price list ap.
Mike
If you truncate and refill a whole table you should do it in one single transaction and place a full table lock. This is more secure and faster.
You also could update all changed rows, then insert new (missing rows) and then delete all rows which weren't updated in this run (insert some kind of version number in each row to determine this).
First create a .txt file from the legacy application. Then use a batch insert to pull it into a work table for whatever clean up you need to make. Do the clean up using t-sql. Then run t-sql to insert new data into the proper tables and/or to update rows where data has changed. If there are toomany records, do the inserting and updating in batches. Schedule all this as a job to run during hours when the database is not busy.
You can of course do all of this best in SSIS but I don't know if that is available with Express.
Are there any fields/tables available to tell you when the price was last updated? If so you can just pull the recently updated rows and update that in your database.... assuming you have a readily available unique primary key in your cobol app's datastore.
This wouldn't be up to date though because you're running it as a nightly script to update the database used by the new app. You can maybe create a .net script to query the cobol datastore specifically for whatever price the user is looking for, and if the cobol datastores update time is more recent than what you have logged, update the SQL Server record(s).
(I'm not familiar with cobol at all, just throwing ideas out there)