Is it possible to store data in sets or batches of 12 in sql? I have a query which before inserting a new row should check if the existing records are equal or less than twelve. If they are twelve then it should create a new batch which also stores a maximum of twelve records.
There's no built-in concept of data batching in SQL Server, or any other SQL-like database out there. What you're describing is data grouping/relations, which is the responsibility of the database designer to figure out. What you should do is create a related table, called Batches. Give it a primary key. Then place that primary key column into your actual table with data as a foreign key. The logic about when batches are created should probably be defined in a trigger, or define the logic in a UDF and set the Default Value of that BatchId column to the result of that UDF.
Related
I have a .NET App connected to a Postgres DB using Npgsql and I am trying to import data into two tables, say Users and Todos. A user has many todos. The User table has an id column that is automatically set by the DB, and the Todos table has a foreign key to the Users table called user_id.
Now, I know how to insert Users, and I know how to insert Todos, but I do not know how to set the user_id for those Todos since the id column from User is only known after the users are inserted into the DB. Any idea?
This depends on how you are importing and which tool you are using. If you are using raw INSERT statements, PostgreSQL has a RETURNING clause which will send you back the ID of the inserted statements (see the docs).
If you are using binary COPY (which is the most efficient way to bulk-import data), there's no such option. This case, one good way is to "allocate" all the ids in one go, by incrementing the sequence backing the ID column, and then sending the IDs when you're importing. This means the database is longer generating those IDs - you're sending them explicitly like any other field.
In practical terms, say you have 100 users (and any number of todos). You can do one call to setval to increment the sequence by 100, and then you can import your users, explicitly setting their IDs to those 100 values. This allows you to also specify the user IDs on the todos. However, if you do this, be mindful of concurrency issues if someone else modifies the sequence at the same time.
I have defined various text value by int. I store int value in data table for better and fast search. I have three options to display text value:
I declare Enum in my codes and display text value according to int value. It is static and I have to change code if new values is to be added.
To make it dynamic, I can store int and text value in a table which is in another database and admin own it. New values can be updated by admin in this table. I use inner join to display text value whenever a record is fetched.
I store actual text in respective data table. This will make search slow.
My question is which option is best to use under following condition?
Data table has more than records between 1 and 10 millions.
There are more than 5000 users doing fetch, search, update process on table.
Maximum text values are 12 in number and length (max) 50 char.
There are 30 data tables having above conditions and functions.
I like combination of option #2 and option #1 - to use int's but have dictionary table in another database.
Let me explain:
to store int and text in a table which is in another database;
in origin table to store int only;
do not join table from another database to get text but cache dictionary on client and resolve text from that dictionary
I would not go for option 1 for the reason given. Enums are not there as lookups. You could replace 1 with creating a dictionary but again it would need to be recompiled each time a change is made which is bad.
Storing text in a table (ie option 3) is bad if it is guaranteed to be duplicated a lot as here. This is exactly where you should use a lookup table as you suggest in number 2.
So yes, store them in a database table and administer them through that.
The joining shouldn't take long to do at all if it is just to a small table. If you are worried though an alternative might be to load the lookup table into a dictionary in the code the first time you need it and look up the values on the code from your small lookup table. I doubt you'll have problems with just doing it by the join though.
And I'd do this approach no matter what the conditions are (ie number of records, etc.). The conditions do make it more sensible though. :)
If you have literally millions of records, there's almost certainly no point in trying to spin up such a structure in server code or on the client in any form. It needs to be kept in a database, IMHO.
The query that creates the list needs to be smart enough to constrain the count of returned records to a manageable number. Perhaps partitioned views or stored procedures might help in this regard.
If this is primarily a read-only list, with updates only done in the context of management activities, it should be possible to make queries against the table very rapid with proper indexes and queries on the client side.
In my ASP.NET web app I'm trying to implement an import/export procedure to save or insert data in the application DB. My procedure generates some CSV files: one for each table.
Obviously there are relations between some of these tables and when I import CSV in my DB I'd like to maintain association between rows.
Say I have Table1 and Table2 with Table2 that has a foreign key to Table1. So I could have a row in Table1 with ID = 100 and a row in Table2 with Table1_ID = 100.
When I import CSV with Table1 data, new IDs are generated for Table1 rows, how can I maintain consistency of the foreign keys in Table2 when I import the corresponding CSV file?
I'm using Linq-to-SQL to retrieve data from DB... using DataSet and DataTable can help me?
NOTE I'd like to permit cumulative import, so when I import a CSV file there may already be data in the DB. So I cannot use 'Set Identity OFF'.
Add the items of Table1 first, so when you add the items of Table2 there are the corresponding records of Table1 already in the database. For more tables you will have figure out the order. If you are creating a system of arbitrary database schema, you will want to create a table graph (where each node is a table and each arc is a foreign key) in memory [There are no types for that in the base library] and then convert it to a tree such that you get the correct order by traversing the tree (breadth-first).
You can let the database handle the cases where there is a violation of the foreign key, because there is not such field. You will have to decide if you make a transaction of the whole import operation, or per item.
Although analisying the CSVs before hand is possible. To do that, you will want to store the values for the primary key of each table [Use a set for that] (again, iterate over the tables in the correct order), and then when you are reading a table that has a foreign key to a table that you have already read you can check if the key is there, also it will help you yo detect any possible duplicate. [If you have things already in the database to take into account, you would have to query too... although, take care if the database is in an active system where records could be deleted while you are still deciding if you can add the CSVs without problem].
To address that you are generating new IDs when you add...
The simplest solution that I can think of is: don't. In particular if it is an active system, where other requests are being processed, because then there is no way to predict the new IDs before hand. Your best bet would be to add them one by one, in that case, you will have to think your transaction strategy accordningly... it may be the case that you will not be able to roll back.
Although, I think your question is a bit deeper: If the ID of the Table1 did change, then how can I update the corresponding records in the Table2 so they point to the correct record in Table1?
To do that, I want to suggest to do the analysis as I described above, then you will have a group of sets that will works as indexes. This will help you locate the records that you need to update in Table2 for each ID in Table1. [It is also important to keep track if you have already updated a record, and don't do it twice, because it may happen the generated ID match an ID that is yet to be sent to the database].
To roll back, you can also use those sets, as they will end up having the new IDs that identify the records that you will have to pull out of the database if you want to abort the operation.
Edit: those sets (I recommend hashset) are only have the story, because they only have the primary key (for intance: ID in Table1). You will need bags to keep the foreing keys (in this case Table1_ID in Table2).
I have a C# app which allows the user to update some columns in a DB. My problem is that I have 300.000 records in the DB, and just updating 50.000 took 30 mins. Can I do something to speed things up?
My update query looks like this:
UPDATE SET UM = 'UM', Code = 'Code' WHERE Material = 'MaterialCode'.
My only unique constrain is Material. I read the file the user selects, and put the data in a datatable, and then I go row by row, and update the corresponding material in the DB
Limit the number of indexes in your database especially if your application updates data very frequently.This is because each index takes up disk space and slow the adding, deleting, and updating of rows, you should create new indexes only after analyze the uses of the data, the types and frequencies of queries performed, and how your queries will use the new indexes.
In many cases, the speed advantages of creating the new indexes outweigh the disadvantages of additional space used and slowly rows modification. However, avoid using redundant indexes, create them only when it is necessary. For read-only table, the number of indexes can be increased.
Use non clustered index on the table if the update is frequent.
Use clustered index on the table if the updates/inserts are not frequent.
C# code may not be a problem , your update statement is important. Where clause of the update statement is the place to lookout for. You need to have some indexed column in the where clause.
Another thing, is the field material, indexed? And also, is the where clause, needed to be on a field with a varchar value? Can't it be an integer valued field?
Performance will be better if you filter on fields having integers and not strings. Not sure if this is possible for you.
Relatively simple problem.
Table A has ID int PK, unique Name varchar(500), and cola, colb, etc
Table B has a foreign key to Table A.
So, in the application, we are generating records for both table A and table B into DataTables in memory.
We would be generating thousands of these records on a very large number of "clients".
Eventually we make the call to store these records. However, records from table A may already exist in the database, so we need to get the primary keys for the records that already exist, and insert the missing ones. Then insert all records for table B with the correct foreign key.
Proposed solution:
I was considering sending an xml document to SQL Server to open as a rowset into TableVarA, update TableVarA with the primary keys for the records that already exist, then insert the missing records and output that to TableVarNew, I then select the Name and primary key from TableVarA union all TableVarNew.
Then in code populate the correct FKs into TableB in memory, and insert all of these records using SqlBulkCopy.
Does this sound like a good solution? And if so, what is the best way to populate the FKs in memory for TableB to match the primary key from the returned DataSet.
Sounds like a plan - but I think the handling of Table A can be simpler (a single in-memory table/table variable should be sufficient):
have a TableVarA that contains all rows for Table A
update the ID for all existing rows with their ID (should be doable in a single SQL statement)
insert all non-existing rows (that still have an empty ID) into Table A and make a note of their ID
This could all happen in a single table variable - I don't see why you need to copy stuff around....
Once you've handled your Table A, as you say, update Table B's foreign keys and bulk insert those rows in one go.
What I'm not quite clear on is how Table B references Table A - you just said it had an FK, but you didn't specify what column it was on (assuming on ID). Then how are your rows from Table B referencing Table A for new rows, that aren't inserted yet and thus don't have an ID in Table A yet?
This is more of a comment than a complete answer but I was running out of room so please don't vote it down for not being up to answer criteria.
My concern would be that evaluating a set for missing keys and then inserting in bulk you take a risk that the key got added elsewhere in the mean time. You stated this could be from a large number of clients so it this is going to happen. Yes you could wrap it in a big transaction but big transactions are hogs would lock out other clients.
My thought is to deal with those that have keys in bulk separate assuming there is no risk the PK would be deleted. A TVP is efficient but you need explicit knowledge of which got processed. I think you need to first search on Name to get a list of PK that exists then process that via TVP.
For data integrity process the rest one at a time via a stored procedure that creates the PK as necessary.
Thousands of records is not scary (millions is). Large number of "clients" that is the scary part.