I have a program that is reading a credit card statement and I need it to insert it into a table. The problem i am having is when i just use insert it will let the user insert the same information over and over again. However I really cant set any of the columns as unique because there can be duplicates in all of the fields.
The fields i have are DATE | Description | Amount
So the user could have used the card on the same date at the same place and for the same amount. These are monthly statements so is there a way to do this besides insert ignore while using a unique key??
Brent
You have to clarify the business rules: Either something is required to be unique (a single column or a combination of columns) or identical lines are allowed.
If identical lines are valid, i.e. the user have used the card twice on the same date, at the same place for the same amount you cannot require the data to be unique.
What you can do is to add a warning to the user (if the data entry is interactive) if there is already an existing identical line. If you are doing some batch import you could issue a warning if all (or at least a contigous block) of transactions are identical to already existing ones.
Related
We have many lookup tables in the system and if it's already referred by some other tables, we shouldn't be allowed to update or delete the look-up table "value" column. eg: EnrollStatusName in below table.
Eg:
Lookup table: EnrollStatus
ID
EnrollStatusName
1
Pending
2
Approved
3
Rejected
Other table: UserRegistration
URID
EnrollStatusID(FK)
11
1
12
1
13
2
In this now I can edit Lookup table row 3 since it's not referring anywhere.
The solution which comes to my mind is to add a read-only column to look up the table and whenever there is a DML to the UserRegistration table, update the read-only column to true. Is there any other best approach to this? It can be either handling in application code or in SQL hence I'm tagging c# also to know the possibilities.
Delete is easy; just establish a foreign key relationship to some other table, and don't cascade or setnull. It's no longer possible to delete the in-use row because it has dependent rows in other tables
Update is perhaps trickier. You can use the same mechanism and I think it's neatest, instead of doing the update as an update, do it as a delete and insert - if the row is in use the foreign key will prevent the delete..
Belayer pointed out in the comments that you can use UPDATE also; you'll have to include the PK column in the list of columns you set and you can't set it to the same value it already is, nor to a value that is already in use. You'll probably need a strategy like two updates in a row if you want to have a controlled list of IDs
UPDATE EnrollStatus SET id=-id, EnrollStatusName='whatever' WHERE id=3
UPDATE EnrollStatus SET id=-id WHERE id=-3
A strategy of flipping it negative then back positive will work out only if it's not in use. If it is used then it will error out on the first statement.
If you don't care that your PKs end up a mix of positives and negatives (and you shouldn't, but people do seem to care more than they should about what values PKs have) you can forego the second update; you can always insert new values as positive incrementing and flipflop them while they're being edited before being brought into use..
I have a .NET App connected to a Postgres DB using Npgsql and I am trying to import data into two tables, say Users and Todos. A user has many todos. The User table has an id column that is automatically set by the DB, and the Todos table has a foreign key to the Users table called user_id.
Now, I know how to insert Users, and I know how to insert Todos, but I do not know how to set the user_id for those Todos since the id column from User is only known after the users are inserted into the DB. Any idea?
This depends on how you are importing and which tool you are using. If you are using raw INSERT statements, PostgreSQL has a RETURNING clause which will send you back the ID of the inserted statements (see the docs).
If you are using binary COPY (which is the most efficient way to bulk-import data), there's no such option. This case, one good way is to "allocate" all the ids in one go, by incrementing the sequence backing the ID column, and then sending the IDs when you're importing. This means the database is longer generating those IDs - you're sending them explicitly like any other field.
In practical terms, say you have 100 users (and any number of todos). You can do one call to setval to increment the sequence by 100, and then you can import your users, explicitly setting their IDs to those 100 values. This allows you to also specify the user IDs on the todos. However, if you do this, be mindful of concurrency issues if someone else modifies the sequence at the same time.
I have a sql database where one of the columns is a varchar value. This value is always unique, it's not decided by me but a 3rd party application that supplies the data, it's length is undefined and is a mixture of numbers and letters. I should add that it's not declared as unique in the database as to my knowledge you can't for a varchar type?
Each week I run an import of this data from a csv file, however, the only way I know how to check if I'm importing a unique value is to loop through each row in the database and compare it to each line in the csv file to check if the corresponding value is unique.
Obviously this is very inefficient and is only going to get worse over time as the database gets bigger.
I've tried checking Google but no to avail, could be that I'm looking for the wrong thing though.
Any pointers would be much appreciated.
Application is written in C#
Look at running a MERGE command on SQL instead of an INSERT, which will allow you to explicitly guide action to be taken on a duplicate.
Note that if the unique field is indexed unique, then searching for a value is O(LOG(n)) and not O(n). THis means that overall performance for inserting N values is O(NLog(N)) and not O(NN). As N gets large, this is a substantial performance improvement.
Index the table on the unique field.
Do a 'if exists' on the unique key field value.If it returns a true, the row exists, update the row. If the return is false then this is a new row, insert the row.
I have a Form Windows program in C# that adds a record to the database and can remove it.
In the database i have ID (which is Auto Number), but if i delete a record and if i want to add another record instead, the Auto Number increases and doesn't add the missing numbers.
I mean that if i have 9 records in my Access Database and i want to remove a record, it will be 8, but when i add a new record, i get 10 instead of 9. like this picture:
Is there any solution for that?
If it's an auto number, the database will generate a number greater than the last one used - this is how relational databases are supposed to work. Why would there be solution for this? Imagine deleting 5, what would you want to do then, have the auto number create the next record as 5? If you are displaying an id in your C# app - bad idea - then change this to some other value that you can control as you wish.
However what you are trying to achieve does not make sense.
if i delete a record and if i want to add another record instead, the Auto Number increases and doesn't add the missing numbers.
[...]
Is there any solution for that?
The short answer is "No". Once used, AutoNumber values are typically never re-used, even if the deleted record had the largest AutoNumber value in the table. This is due (at least in in part) to the fact that the Jet/Ace database engine has to be able to manage AutoNumber values in a multi-user environment.
(One exception to the above rule is if the Access database is compacted then the next available AutoNumber value for a table with a sequential AutoNumber field is reset to Max(current_value)+1.)
For more details on how AutoNumber fields work, see my other answer here.
In MS access, there is no any solutions for this. But in case of sql server you can create your own function rather using Identity column.
I have defined various text value by int. I store int value in data table for better and fast search. I have three options to display text value:
I declare Enum in my codes and display text value according to int value. It is static and I have to change code if new values is to be added.
To make it dynamic, I can store int and text value in a table which is in another database and admin own it. New values can be updated by admin in this table. I use inner join to display text value whenever a record is fetched.
I store actual text in respective data table. This will make search slow.
My question is which option is best to use under following condition?
Data table has more than records between 1 and 10 millions.
There are more than 5000 users doing fetch, search, update process on table.
Maximum text values are 12 in number and length (max) 50 char.
There are 30 data tables having above conditions and functions.
I like combination of option #2 and option #1 - to use int's but have dictionary table in another database.
Let me explain:
to store int and text in a table which is in another database;
in origin table to store int only;
do not join table from another database to get text but cache dictionary on client and resolve text from that dictionary
I would not go for option 1 for the reason given. Enums are not there as lookups. You could replace 1 with creating a dictionary but again it would need to be recompiled each time a change is made which is bad.
Storing text in a table (ie option 3) is bad if it is guaranteed to be duplicated a lot as here. This is exactly where you should use a lookup table as you suggest in number 2.
So yes, store them in a database table and administer them through that.
The joining shouldn't take long to do at all if it is just to a small table. If you are worried though an alternative might be to load the lookup table into a dictionary in the code the first time you need it and look up the values on the code from your small lookup table. I doubt you'll have problems with just doing it by the join though.
And I'd do this approach no matter what the conditions are (ie number of records, etc.). The conditions do make it more sensible though. :)
If you have literally millions of records, there's almost certainly no point in trying to spin up such a structure in server code or on the client in any form. It needs to be kept in a database, IMHO.
The query that creates the list needs to be smart enough to constrain the count of returned records to a manageable number. Perhaps partitioned views or stored procedures might help in this regard.
If this is primarily a read-only list, with updates only done in the context of management activities, it should be possible to make queries against the table very rapid with proper indexes and queries on the client side.