This is an extension of a question I asked before: C#: How do I get the ID number of the last row inserted using Informix
I am writing some code in C# to insert records into the informix db using the .NET Informix driver. I was able to get the id of the last insert, but in some of my tables the 'serial' attribute is not used. I was looking for a command similar to the following, but to get rowid instead of id.
SELECT DBINFO ('sqlca.sqlerrd1') FROM systables WHERE tabid = 1;
And yes, I do realize working with the rowid is dangerous because it is not constant. However, I plan to make my application force the client apps to reset the data if the table is altered in a way that the rowids got rearranged or the such.
One problem with ROWID is that it is a 4-byte quantity but the value used on a fragmented table is an 8-byte quantity (nominally FRAGID and ROWID), but Informix has never exposed the FRAGID.
In theory, the SQLCA data structure reports the ROWID in the sqlca.sqlerrd[5] element (assuming C-style indexing from 0; it is sqlca.sqlerrd[6] in Informix 4GL which indexes from 1). If anything was going to work with DBINFO, it would be DBINFO('sqlca.sqlerrd5'), but I get:
SQL -728: Unknown first argument of dbinfo(sqlca.sqlerrd5).
So, the indirect approach using DBINFO is not on. In ESQL/C, where sqlca is readily available, the information is available too:
SQL[739]: begin;
BEGIN WORK: Rows processed = 0
SQL[740]: create table p(q integer);
CREATE TABLE: Rows processed = 0
SQL[741]: insert into p values(1);
INSERT: Rows processed = 1, Last ROWID = 257
SQL[742]: select dbinfo('sqlca.sqlerrd5') from dual;
SQL -728: Unknown first argument of dbinfo(sqlca.sqlerrd5).
SQLSTATE: IX000 at /dev/stdin:4
SQL[743]:
I am not a user of C# or the .NET driver, so I have no knowledge of whether there is a back-door mechanism to get at the information. Even in ODBC, there might not be a front-door mechanism to get at it, but you could drop into C code to read the global data structure easily enough:
#include <sqlca.h>
#include <ifxtypes.h>
int4 get_sqlca_sqlerrd5(void)
{
return sqlca.sqlerrd[5];
}
Or, even:
int4 get_sqlca_sqlerrdN(int N)
{
if (N >= 0 && N <= 5)
return sqlca.sqlerrd[N];
else
return -22; /* errno 22 (EINVAL): Invalid argument */
}
If C# can access DLL's written in C, you could package that up.
Otherwise, the approved way of identifying rows of data is via the primary key (or any other unique identifier, sometimes known as an alternative key or candidate key) for the row. If you don't have a primary key or other unique identifier for the row, you are making life difficult for yourself. If it is a compound key, that 'works' but could be inconvenient. Maybe you need to consider adding a SERIAL column (or BIGSERIAL column) to the table.
You can use:
SELECT ROWID
FROM TargetTable
WHERE PK_Column1 = <value1> AND PK_Column2 = <value2>
or something similar to obtain the ROWID, assuming you can identify the row accurately.
In dire straights, there is a mechanism to add a physical ROWID column to a fragmented table (normally, it is a virtual column). You'd then use the query above. This is not recommended, but the option is there.
Related
If i do a query like this
SELECT * from Foo where Bar = '42'
and Bar is a int column. Will that string value be optimized to 42 in the db engine? Will it have some kind of impact if i leave it as it is instead of changing it to:
Select * from Foo where Bar = 42
This is done on a SQL Compact database if that makes a difference.
I know its not the correct way to do it but it's a big pain going though all code looking at every query and DB schema to see if the column is a int type or not.
SQL Server automatically convert it to INT that because INT has higher precedence than VARCHAR.
You should also be aware of the impact that implicit conversions can
have on a query’s performance. To demonstrate what I mean, I’ve created and populated the following table in the AdventureWorks2008 database:
USE AdventureWorks2008;
IF OBJECT_ID ('ProductInfo', 'U') IS NOT NULL
DROP TABLE ProductInfo;
CREATE TABLE ProductInfo
(
ProductID NVARCHAR(10) NOT NULL PRIMARY KEY,
ProductName NVARCHAR(50) NOT NULL
);
INSERT INTO ProductInfo
SELECT ProductID, Name
FROM Production.Product;
As you can see, the table includes a primary key configured with the
NVARCHAR data type. Because the ProductID column is the primary key,
it will automatically be configured with a clustered index. Next, I
set the statistics IO to on so I can view information about disk
activity:
SET STATISTICS IO ON;
Then I run the following SELECT statement to retrieve product
information for product 350:
SELECT ProductID, ProductName
FROM ProductInfo
WHERE ProductID = 350;
Because statistics IO is turned on, my results include the following
information:
Table 'ProductInfo'. Scan count 1, logical reads 6, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
Two important items to notice is that the query performed a scan and
that it took six logical reads to retrieve the data. Because my WHERE
clause specified a value in the primary key column as part of the
search condition, I would have expected an index seek to be performed,
rather than I scan. As the figure below confirms, the database engine performed a scan, rather than a seek. Figure below shows the details of that scan (accessed by hovering the mouse over the scan icon).
Notice that in the Predicate section, the CONVERT_IMPLICIT function is
being used to convert the values in the ProductID column in order to
compare them to the value of 350 (represented by #1) I passed into the
WHERE clause. The reason that the data is being implicitly converted
is because I passed the 350 in as an integer value, not a string
value, so SQL Server is converting all the ProductID values to
integers in order to perform the comparisons.
Because there are relatively few rows in the ProductInfo table,
performance is not much of a consideration in this instance. But if
your table contains millions of rows, you’re talking about a serious
hit on performance. The way to get around this, of course, is to pass
in the 350 argument as a string, as I’ve done in the following
example:
SELECT ProductID, ProductName
FROM ProductInfo
WHERE ProductID = '350';
Once again, the statement returns the product information and the statistics IO data, as shown in the following results:
Now the index is being properly used to locate the record. And if you
refer to Figure below, you’ll see that the values in the ProductID
column are no longer being implicitly converted before being compared
to the 350 specified in the search condition.
As this example demonstrates, you need to be aware of how performance
can be affected by implicit conversions, just like you need to be
aware of any types of implicit conversions being conducted by the
database engine. For that reason, you’ll often want to explicitly
convert your data so you can control the impact of that conversion.
You can read more about Data Conversion in SQL Server.
If you look into the MSDN chart which tells about the implicit conversion you will find that string is implicitly converted into int.
both should work in your case but the norme is to use quote anyway.
cuz if this work.
Select * from Foo where Bar = 42
this not
Select * from Foo where Bar = %42%
and this will
SELECT * from Foo where Bar = '%42%'
ps: you should anyway look at entity framework and linq query it make it simple...
If i am not mistaken, the SQL Server will read it as INT if the string will only contains number (numeric) and you're comparing it to the INTEGER column datatype, but if the string is is alphanumeric , then that is the time you will encounter an error or have an unexpected result.
My suggestion is , in WHERE clause, if you are comparing integer, do not put single quote. that is the best practice to avoid error and unexpected result.
You should use always parameters when executing sql by code, to avoid security lacks (EJ: Sql injection).
Can I store SOF/2015/01 as my ID, and can I auto increment 01 like usual primary key?
Can I store SOF/2015/01 as my ID
Answer : yes you can
can I auto increment 01 like usual primary key.
Answer : no you can't
Auto increment can increment only numbers.
You have to that manually.
You can use function in trigger to generate your desire auto incremented number like this
create function NextCustomerNumber()
returns char(5)
as
begin
declare #lastval char(5)
set #lastval = (select max(customerNumber) from Customers)
if #lastval is null set #lastval = 'C0001'
declare #i int
set #i = right(#lastval,4) + 1
return 'C' + right('000' + convert(varchar(10),#i),4)
end
This can cause some issues, however:
What if two processes attempt to add a row to the table at the exact
same time? Can you ensure that the same value is not generated for
both processes?
There can be overhead querying the existing data each time you'd like
to insert new data
Unless this is implemented as a trigger, this means that all inserts
to your data must always go through the same stored procedure that
calculates these sequences. This means that bulk imports, or moving
data from production to testing and so on, might not be possible or
might be very inefficient.
If it is implemented as a trigger, will it work for a set-based
multi-row INSERT statement? If so, how efficient will it be? This
function wouldn't work if called for each row in a single set-based
INSERT -- each NextCustomerNumber() returned would be the same value.
You can learn more from this
Create a two column unique primary key with the string 'SOF/2015' as part one and an auto increment-ing integer as the second column. You can combine the two columns using a function that returns a string to give you the combined key. For syntactic sugar, you can create a view on the table using your function to combine the keys into one view column.
You can certainly use NCHAR or NVARCHAR types as primary keys, on the presumption that any variable-sized columns don't use MAX, and data doesn't exceed the maximum allowable size of your index.
As for using it as an auto-incremented column, that won't work. SQL is very smart, but not quite in that way.
I would suggest pulling that string into two or three separate columns, so that you can store the "01" portion as a separate, IDENTITY'd column. But certainly this is a design question that you'd have to work out on your own.
The other solution would be a trigger, but I'd probably hesitate in general with using something like this as a primary key. Using numeric types is just a lot nicer in many ways, particularly when you have to reference the tables elsewhere. You could always apply a UNIQUE index on the string representation.
This table schema in question is here: Oracle SQL: Selecting a single row with the latest date between multiple columns
I'm working with a table that has over 5 million entries. What is the fastest and most accurate way to upsert to this table AND return the last upserted row id using a stored procedure?
Most of what I've read recommends using the merge statement for upserts. However, merge doesn't support returning into.
In our table, we have the CREATE_DATE, CREATE_USER, UPDATE_DATE, and UPDATE_USER fields that are updated as expected. My thought was to create a stored procedure that returned the id of the row that has the latest date between those two columns and where the respective user data was equal to the current user data. This is what the people who answered the referring question helped me with (thanks!).
However, I'm concerned about the combined execution time vs other methods, as well as the huge gaps created in sequences due to merging. Calling a separate statement simply to get the id also seems a bit inefficient. However, almost everything I've read says that merge is much faster than the pre-merge upsert statements.
Note that these are being called through a c#/asp web application. Any help is appreciated :)
edit
Below is an example of the stored procedure I'm using for the Upsert. Note that the CREATE_DATE and UPDATE_DATE columns are updated with triggers.
create or replace
PROCEDURE P_SAVE_EXAMPLE_TABLE_ROW
(
pID IN OUT EXAMPLE_TABLE.ID%type,
--Other row params here
pUSER IN EXAMPLE_TABLE.CREATE_USER%type,
pPLSQLErrorNumber OUT NUMBER,
pPLSQLErrorMessage OUT VARCHAR2
)
AS
BEGIN
MERGE INTO USERS_WORKGROUPS_XREF USING dual ON (ID=pID)
WHEN NOT MATCHED THEN
INSERT (--OTHER COLS--, CREATE_USER) VALUES (--OTHER COLS--, pUSER)
WHEN MATCHED THEN
UPDATE SET
--OTHER COLS--
UPDATE_USER=pUSER
WHERE ID=pID;
EXCEPTION
WHEN OTHERS THEN
pID := 0;
pPLSQLErrorNumber := SQLCODE;
pPLSQLErrorMessage := SUBSTR(SQLERRM, 1, 256);
RETURN;
-- STATEMENT TO RETURN LAST AFFECTED ID INTO pID GOES HERE
END;
If you're trying to return the maximum value of a sequence-generated PK on the table then I'd just run a "Select max(id) .." directly afterwards. If other sessions are also modifying the table then maybe reading the currval of the sequence would be better.
I have a SQL Server database designed like this :
TableParameter
Id (int, PRIMARY KEY, IDENTITY)
Name1 (string)
Name2 (string, can be null)
Name3 (string, can be null)
Name4 (string, can be null)
TableValue
Iteration (int)
IdTableParameter (int, FOREIGN KEY)
Type (string)
Value (decimal)
So, as you've just understood, TableValue is linked to TableParameter.
TableParameter is like a multidimensionnal dictionary.
TableParameter is supposed to have a lot of rows (more than 300,000 rows)
From my c# client program, I have to fill this database after each Compute() function :
for (int iteration = 0; iteration < 5000; iteration++)
{
Compute();
FillResultsInDatabase();
}
In FillResultsInDatabase() method, I have to :
Check if the label of my parameter already exists in TableParameter. If it doesn't exist, i have to insert a new one.
I have to insert the value in the TableValue
Step 1 takes a long time ! I load all the table TableParameter in a IEnumerable property and then, for each parameter I make a
.FirstOfDefault( x => x.Name1 == item.Name1 &&
x.Name2 == item.Name2 &&
x.Name3 == item.Name3 &&
x.Name4 == item.Name4 );
in order to detect if it already exists (and after to get the id).
Performance are very bad like this !
I've tried to make selection with WHERE word in order to avoid loading every row of TableParameter but performance are worse !
How can I improve the performance of step 1 ?
For Step 2, performance are still bad with classic INSERT. I am going to try SqlBulkCopy.
How can I improve the performance of step 2 ?
EDITED
I've tried with Store Procedure :
CREATE PROCEDURE GetIdParameter
#Id int OUTPUT,
#Name1 nvarchar(50) = null,
#Name2 nvarchar(50) = null,
#Name3 nvarchar(50) = null
AS
SELECT TOP 1 #Id = Id FROM TableParameter
WHERE
TableParameter.Name1 = #Name1
AND
(#Name2 IS NULL OR TableParameter.Name2= #Name2)
AND
(#Name3 IS NULL OR TableParameter.Name3 = #Name3)
GO
CREATE PROCEDURE CreateValue
#Iteration int,
#Type nvarchar(50),
#Value decimal(32, 18),
#Name1 nvarchar(50) = null,
#Name2 nvarchar(50) = null,
#Name3 nvarchar(50) = null
AS
DECLARE #IdParameter int
EXEC GetIdParameter #IdParameter OUTPUT,
#Name1, #Name2, #Name3
IF #IdParameter IS NULL
BEGIN
INSERT TablePArameter (Name1, Name2, Name3)
VALUES
(#Name1, #Name2, #Name3)
SELECT #IdParameter= SCOPE_IDENTITY()
END
INSERT TableValue (Iteration, IdParamter, Type, Value)
VALUES
(#Iteration, #IdParameter, #Type, #Value)
GO
I still have the same performance... :-( (not acceptable)
If I understand what's happening you're querying the database to see if the data is there in step 1. I'd use a db call to a stored procedure that that inserts the data if it not there. So just compute the results and pass to the sp.
Can you compute the results first, and then insert in batches?
Does the compute function take data from the database? If so can you turn the operation in to a set based operation and perform it on the server itself? Or may part of it?
Remember that sql server is designed for a large dataset operations.
Edit: reflecting comments
Since the code is slow on the data inserts, and you suspect that it's because the insert has to search back before it can be done, I'd suggest that you may need to place SQL Indexes on the columns that you search on in order to improve searching speed.
However I have another idea.
Why don't you just insert the data without the check and then later when you read the data remove the duplicates in that query?
Given the fact that name2 - name3 can be null, would it be possible to restructure the parameter table:
TableParameter
Id (int, PRIMARY KEY, IDENTITY)
Name (string)
Dimension int
Now you can index it and simplify the query. (WHERE name = "TheNameIWant" AND Dimension="2")
(And speaking of indices, you do have index the name columns in the parameter table?)
Where do you do your commits on the insert? if you do one statement commits, group multiple inserts into one.
If you are the only one inserting values, if speed is really of essence, load all values from the database into the memory and check there.
just some ideas
hth
Mario
I must admit that I'm struggling to grasp the business process that you are trying to achieve here.
On initial review, it appears as if you are are performing a data comparison within your application tier. I would advise against this and suggest that you let the Database Engine do what it is designed to do, to manage and implement your data access.
As another poster has mentioned, I concur that you should look to create a Stored Procedure to handle your record insertion logic. The procedure can perform a simple check to see if your records already exist.
You should also consider:
Enforcing the insertion logic/rule by creating a Unique Constraint across the four name columns.
Creating a covering non-clustered index incorporating the four name columns.
With regard to performance of your inserts, perhaps you can provide some metrics to qualify what it is that you are seeing and how you are measuring it?
To give you a yardstick the current ETL insertion record for SQL Server is approx 16 million rows per second. What sort of numbers are you expecting and wanting to see?
the fastest way ( i know so far) is bulk insert. but not just lines of INSERT. try insert + select + union. it works pretty fast.
insert into myTable
select a1, b1, c1, ...
union select a2, b2, c2, ...
union select a3, b3, c3, ...
We have some code in which we need to maintain our own identity (PK) column in SQL. We have a table in which we bulk insert data, but we add data to related tables before the bulk insert is done, thus we can not use an IDENTITY column and find out the value up front.
The current code is selecting the MAX value of the field and incrementing it by 1. Although there is a highly unlikely chance that two instances of our application will be running at the same time, it is still not thread-safe (not to mention that it goes to the database everytime).
I am using the ADO.net entity model. How would I go about 'reserving' a range of id's to use, and when that range runs out, grab a new block to use, and guarantee that the same range will not be used.
use more universal unique identifier data type like UNIQUEIDENTIFIER (UUID) instead of INTEGER. In this case you can basically create it on the client side, pass it to the SQL and do not have to worry about it. The disadvantage is that, of course, the size of this field.
create a simple table in the database CREATE TABLE ID_GEN (ID INTEGER IDENTITY), and use this as a factory to give you the identifiers. Ideally you would create a stored procedure (or function), to which you would pass the number of identifiers you need. The stored procedure will then insert this number of rows (empty) into this ID_GEN table and will return you all new ID's, which you can use in your code. Obviously, your original tables will not have the IDENTITY anymore.
create your own variation of the ID_Factory above.
I would choose simplicity (UUID) if you are not constrained otherwise.
If it's viable to change the structure of the table, then perhaps use a uniqueidentifier for the PK instead along with newid() [SQL] or Guid.NewGuid() [C#] in your row generation code.
From Guid.NewGuid() doco:
There is a very low probability that the value of the new Guid is all zeroes or equal to any other Guid.
Why are you using ADO.net Entity Framework to do what sounds like ETL work? (See critique of ADO.NET Entity Framework and ORM in general below. It is rant free).
Why use ints at all? Using a uniqueidentifier would solve the "multiple instances of the application running" issue.
Using a uniqueidentifier as a column default will be slower than using an int IDENTITY... it takes more time to generate a guid than an int. A guid will also be larger (16 byte) than an int (4 bytes). Try this first and if it results in acceptable performance, run with it.
If the delay introduced by generating a guid on each row insert it unacceptable, create guids in bulk (or on another server) and cache them in a table.
Sample TSQL code:
CREATE TABLE testinsert
(
date_generated datetime NOT NULL DEFAULT GETDATE(),
guid uniqueidentifier NOT NULL,
TheValue nvarchar(255) NULL
)
GO
CREATE TABLE guids
(
guid uniqueidentifier NOT NULL DEFAULT newid(),
used bit NOT NULL DEFAULT 0,
date_generated datetime NOT NULL DEFAULT GETDATE(),
date_used datetime NULL
)
GO
CREATE PROCEDURE GetGuid
#guid uniqueidentifier OUTPUT
AS
BEGIN
SET NOCOUNT ON
DECLARE #return int = 0
BEGIN TRY
BEGIN TRANSACTION
SELECT TOP 1 #guid = guid FROM guids WHERE used = 0
IF #guid IS NOT NULL
UPDATE guids
SET
used = 1,
date_used = GETDATE()
WHERE guid = #guid
ELSE
BEGIN
SET #return = -1
PRINT 'GetGuid Error: No Unused guids are available'
END
COMMIT TRANSACTION
END TRY
BEGIN CATCH
SET #return = ERROR_NUMBER() -- some error occurred
SET #guid = NULL
PRINT 'GetGuid Error: ' + CAST(ERROR_NUMBER() as varchar) + CHAR(13) + CHAR(10) + ERROR_MESSAGE()
ROLLBACK
END CATCH
RETURN #return
END
GO
CREATE PROCEDURE InsertIntoTestInsert
#TheValue nvarchar(255)
AS
BEGIN
SET NOCOUNT ON
DECLARE #return int = 0
DECLARE #guid uniqueidentifier
DECLARE #getguid_return int
EXEC #getguid_return = GetGuid #guid OUTPUT
IF #getguid_return = 0
BEGIN
INSERT INTO testinsert(guid, TheValue) VALUES (#guid, #TheValue)
END
ELSE
SET #return = -1
RETURN #return
END
GO
-- generate the guids
INSERT INTO guids(used) VALUES (0)
INSERT INTO guids(used) VALUES (0)
--Insert data through the stored proc
EXEC InsertIntoTestInsert N'Foo 1'
EXEC InsertIntoTestInsert N'Foo 2'
EXEC InsertIntoTestInsert N'Foo 3' -- will fail, only two guids were created
-- look at the inserted data
SELECT * FROM testinsert
-- look at the guids table
SELECT * FROM guids
The fun question is... how do you map this to ADO.Net's Entity Framework?
This is a classic problem that started in the early days of ORM (Object Relational Mapping).
If you use relational-database best practices (never allow direct access to base tables, only allow data manipulation through views and stored procedures), then you add headcount (someone capable and willing to write not only the database schema, but also all the views and stored procedures that form the API) and introduce delay (the time to actually write this stuff) to the project.
So everyone cuts this and people write queries directly against a normalized database, which they don't understand... thus the need for ORM, in this case, the ADO.NET Entity Framework.
ORM scares the heck out of me. I've seen ORM tools generate horribly inefficient queries which bring otherwise performant database servers to their knees. What was gained in programmer productivity was lost in end-user waiting and DBA frustration.
The Hi/Lo algorithm may be of interest to you:
What's the Hi/Lo algorithm?
Two clients could reserve the same block of id's.
There is no solution short of serializing your inserts by locking.
See Locking Hints in MSDN.
I fyou have a lot of child tables you might not want to change the PK. PLus the integer filedsa relikely to perform better in joins. But you could still add a GUID field and populate it in the bulk insert with pre-generated values. Then you could leave the identity insert alone (almost alawys a bad idea to turn it off) and use the GUID values you pre-generated to get back the Identity values you just inserted for the insert into child tables.
If you use a regular set-based insert (one with the select clause instead of the values clause) instead of a bulk insert, you could use the output clause to get the identities back for the rows if you are using SQL Server 2008.
The most general solution is generate client identifiers that never across with database identifiers - usually it is negative values, then update identifiers with identifier generated by database on inserting.
This way is safe to use in application with many users inserts the data simultaneously. Any other ways except GUIDs are not multiuser-safe.
But if you have that rare case when entity's primary key is required to be known before entity is saved to database, and it is impossible to use GUID, you may use identifier generation algorithm which are prevent identifier overlapping.
The most simple is assigning a unique identifier prefix for each connected client, and prepend it to each identifier generated by this client.
If you are using ADO.NET Entity Framework, you probably should not worry about identifier generation: EF generates identifiers by itself, just mark primary key of the entity as IsDbGenerated=true.
Strictly saying, entity framework as other ORM does not require identifier for objects are not saved to database yet, it is enought object reference for correctly operating with new entities. Actual primary key value is required only on updating/deleting entity, and on updating/deleting/inserting entity that references new entity, e.i. in cases when actual primary key value is about to be written in database. If entity is new, it is impossible to save other entites that are referenced new entity until new entity is not saved to database, and ORMs maintains specific order of entities saving which take references map into account.