I have a sql database where one of the columns is a varchar value. This value is always unique, it's not decided by me but a 3rd party application that supplies the data, it's length is undefined and is a mixture of numbers and letters. I should add that it's not declared as unique in the database as to my knowledge you can't for a varchar type?
Each week I run an import of this data from a csv file, however, the only way I know how to check if I'm importing a unique value is to loop through each row in the database and compare it to each line in the csv file to check if the corresponding value is unique.
Obviously this is very inefficient and is only going to get worse over time as the database gets bigger.
I've tried checking Google but no to avail, could be that I'm looking for the wrong thing though.
Any pointers would be much appreciated.
Application is written in C#
Look at running a MERGE command on SQL instead of an INSERT, which will allow you to explicitly guide action to be taken on a duplicate.
Note that if the unique field is indexed unique, then searching for a value is O(LOG(n)) and not O(n). THis means that overall performance for inserting N values is O(NLog(N)) and not O(NN). As N gets large, this is a substantial performance improvement.
Index the table on the unique field.
Do a 'if exists' on the unique key field value.If it returns a true, the row exists, update the row. If the return is false then this is a new row, insert the row.
Related
I need to update a column in a large table (over 30 million rows) that has no primary key. A table row has a unique email address column. The update involves generating a value that must occur in C# and appending it to a column value. So the row must be read, the column value updated, and written back out.
I was hoping there was a concept of cursoring in ADO.NET but I do not see this. I can read the rows quickly enough, but the update call, using a WHERE clause for the email address, takes forever. After researching this most answers seem to be "put in a primary key!" but that is not an option here. Any thoughts?
For a 30mil rows heap, there's not many options. Without any index you can do basically nothing to speed it up.
Only solution is to check a fragmentation of a heap. You should add a clustered index to alleviate the table fragmentation, then drop it immediately. But if you cannot affect that table in any way, it could be faster to move all the data into a new table :-)
I have a Form Windows program in C# that adds a record to the database and can remove it.
In the database i have ID (which is Auto Number), but if i delete a record and if i want to add another record instead, the Auto Number increases and doesn't add the missing numbers.
I mean that if i have 9 records in my Access Database and i want to remove a record, it will be 8, but when i add a new record, i get 10 instead of 9. like this picture:
Is there any solution for that?
If it's an auto number, the database will generate a number greater than the last one used - this is how relational databases are supposed to work. Why would there be solution for this? Imagine deleting 5, what would you want to do then, have the auto number create the next record as 5? If you are displaying an id in your C# app - bad idea - then change this to some other value that you can control as you wish.
However what you are trying to achieve does not make sense.
if i delete a record and if i want to add another record instead, the Auto Number increases and doesn't add the missing numbers.
[...]
Is there any solution for that?
The short answer is "No". Once used, AutoNumber values are typically never re-used, even if the deleted record had the largest AutoNumber value in the table. This is due (at least in in part) to the fact that the Jet/Ace database engine has to be able to manage AutoNumber values in a multi-user environment.
(One exception to the above rule is if the Access database is compacted then the next available AutoNumber value for a table with a sequential AutoNumber field is reset to Max(current_value)+1.)
For more details on how AutoNumber fields work, see my other answer here.
In MS access, there is no any solutions for this. But in case of sql server you can create your own function rather using Identity column.
I have defined various text value by int. I store int value in data table for better and fast search. I have three options to display text value:
I declare Enum in my codes and display text value according to int value. It is static and I have to change code if new values is to be added.
To make it dynamic, I can store int and text value in a table which is in another database and admin own it. New values can be updated by admin in this table. I use inner join to display text value whenever a record is fetched.
I store actual text in respective data table. This will make search slow.
My question is which option is best to use under following condition?
Data table has more than records between 1 and 10 millions.
There are more than 5000 users doing fetch, search, update process on table.
Maximum text values are 12 in number and length (max) 50 char.
There are 30 data tables having above conditions and functions.
I like combination of option #2 and option #1 - to use int's but have dictionary table in another database.
Let me explain:
to store int and text in a table which is in another database;
in origin table to store int only;
do not join table from another database to get text but cache dictionary on client and resolve text from that dictionary
I would not go for option 1 for the reason given. Enums are not there as lookups. You could replace 1 with creating a dictionary but again it would need to be recompiled each time a change is made which is bad.
Storing text in a table (ie option 3) is bad if it is guaranteed to be duplicated a lot as here. This is exactly where you should use a lookup table as you suggest in number 2.
So yes, store them in a database table and administer them through that.
The joining shouldn't take long to do at all if it is just to a small table. If you are worried though an alternative might be to load the lookup table into a dictionary in the code the first time you need it and look up the values on the code from your small lookup table. I doubt you'll have problems with just doing it by the join though.
And I'd do this approach no matter what the conditions are (ie number of records, etc.). The conditions do make it more sensible though. :)
If you have literally millions of records, there's almost certainly no point in trying to spin up such a structure in server code or on the client in any form. It needs to be kept in a database, IMHO.
The query that creates the list needs to be smart enough to constrain the count of returned records to a manageable number. Perhaps partitioned views or stored procedures might help in this regard.
If this is primarily a read-only list, with updates only done in the context of management activities, it should be possible to make queries against the table very rapid with proper indexes and queries on the client side.
I have a program that is reading a credit card statement and I need it to insert it into a table. The problem i am having is when i just use insert it will let the user insert the same information over and over again. However I really cant set any of the columns as unique because there can be duplicates in all of the fields.
The fields i have are DATE | Description | Amount
So the user could have used the card on the same date at the same place and for the same amount. These are monthly statements so is there a way to do this besides insert ignore while using a unique key??
Brent
You have to clarify the business rules: Either something is required to be unique (a single column or a combination of columns) or identical lines are allowed.
If identical lines are valid, i.e. the user have used the card twice on the same date, at the same place for the same amount you cannot require the data to be unique.
What you can do is to add a warning to the user (if the data entry is interactive) if there is already an existing identical line. If you are doing some batch import you could issue a warning if all (or at least a contigous block) of transactions are identical to already existing ones.
I have a bulk insert of around 100,000 records that is going to a oracle table having one unique value column. This bulk insert will happen twice or thrice a day up to many years(Never ending).
Need a robust mechanism to generate unique numbers of unique value column. I am building the dataset to commit to database at once.
Previously I created sequence in oracle, and while building the dataset rows, hitting the database, getting a new sequence number and putting into that column. But it is giving performance issues as for 100,000 records, 100,000 database hits will be needed.
Any other method. This unique value column is varchar2 and max length is 20
Why not just create an autonumber sequence using triggers if you're only doing a bulk insert?
You didn't mention that the numbers must be sequential (1..n) so perhaps you could generate GUIDs and represent them in a compact way. In the long run you might encounter collisions, and in that case you can generate a new GUID.
The only problem I see is that you'd need 25 chars to represent the GUID in Base64 (23 if you strip the padding).
you can generate new sequence GUID and take its 20 characters instead of '-' symbol and insert into database. This GUID in not user friendly so no one use can remember this easily....