Where to process data? Db or locally? - c#
We are working on a solution which fires many search requests torwards three different public databases placed in three different countries. For example a search fetches data from one db and passes them as parameter to another db. The parameter is a list which each item needs to be logically connected with an OR operator. Therefore we end up having a sql select statement with up to 1000 OR operators linked inside the where clause.
Now my question is does 1000 or 500 or even 5000 logical AND or OR Operators inside select statement make the db slower and should I instead better request all data to the pc and do the matching on my pc.
The amount of is data is between 5000 and 10000 records, we are talking about a public db therefore the amount keeps growing.
For example such a sql statement:
select * from some_table
where .. and .. or .. or.. or..
or.. or.. or.. or.. or.. or.. (1000 times)
If I fetch all data to my pc I could have a LINQ Statement that does the filtering.
What do you suggest me to do? Any experiences on this one guys?
Sorry if this is a duplicate just let me know in comments and I'll delete this question.
EDIT:
It should be considered that many users may access the databases at the same time.
I always learned that running a query with hundreds of OR statements are bad for performance. However, even when running a sample here on 12g, querying a table with or or in using an primary key index doesn't seem to change the execution plan.
Therefore I say: it doesn't matter. The only things you could consider are readability, query length, etc.
Still, I personally prefer the where in.
See this other useful question with sample data.
Process this all in the database with a single query. Batching similar operations is usually the best thing you can do for database performance.
The most expensive part of the query is reading the data from disk. Once the data is in memory, filtering out a few thousand conditions is a small amount of work. Your local processor probably is faster than the database server. But it doesn't matter because your machine would spend too much time on unnecessary IO if you returned all the records.
Also, 5000 conditions in a SQL query is only a problem if you run that query a hundred times a second.
I think you should just try.
Create an example that is a simple as possible, yet complex enough to be realistic, and then run it with some form of benchmarking
Whatever works best for you is what you should choose to do.
Edit:
That said - such a large number of and's and or's in a single SQL statement does sound complicated and messy. Unless there is a real benefit from doing it this way(?), I would probably try to find a cleaner way to do this, for instance by splitting the operation into several steps and applying Linq or something similar, as you sugest, even if it is just to make the solution more manageable.
The answer is - depend
How big is the data on the public db ? if you are querying Google than fetching all the data is not an option.
It would be reasonable to assume that those public db have much stronger hardware and db tuning than your home pc.
Is there an option that you will got black listed from those public db ?
Does order matter ? if you query db 1 and then db 2 will be faster then query db 2 and then db 1 ?
Mostly it's try & error and what work best for you and is possible.
SQL Queries on ORACLE Using Multiple Boolean Operators
Comments: I have worked many years with CRYSTAL REPORTS, a database report designer. It was one of the first drag-and-drop, GUI based tools which made it easier for developers with not much database background to construct queries with multiple tables and filter conditions. The trade-off was that the the tool was writing SQL under the hood; many times it was a serious performance hog because the workstation running the report file had to suck down the entire contents of the database tables being queried, only to run the filtering process locally on the client system. That was more than a decade ago, but I see other next-gen tools that also auto-generate really awful SQL code.
No amount of software can compensate for lousy database design. You won't get everything right the first time (as others have noticed), but a little planning can give some breathing room when the product reveals under real-world use the demands of PERFORMANCE and SCALABILITY.
Demonstration Schema and Test Data
The following solution was designed on an ORACLE 11g Release 2 RDBMS system. The first table can be represented by a database VIEW, INLINE QUERY, SUB QUERY, MATERIALIZED VIEW or even a CURSOR output, so the "attributes" discussed in this example could be coming from multiple table sources and joining criteria.
CREATE TABLE "ZZ_DATA_ATTRIBUTES"
( "DATA_ID" NUMBER(10,0) NOT NULL ENABLE,
"NAME" VARCHAR2(50),
"AGE" NUMBER(5,0),
"HH_SIZE" NUMBER(5,0),
"SURVEY_SCORE" NUMBER(5,0),
"DMA_REGION" VARCHAR2(100),
"LAST_CONTACT" DATE,
CONSTRAINT "ZZ_DATA_ATTRIBUTES_PK" PRIMARY KEY ("DATA_ID") ENABLE
)
/
CREATE SEQUENCE "ZZ_DATA_ATTRIBUTES_SEQ" MINVALUE 1 MAXVALUE
9999999999999999999999999999 INCREMENT BY 1 START WITH 41 CACHE 20 NOORDER NOCYCLE
/
CREATE OR REPLACE TRIGGER "BI_ZZ_DATA_ATTRIBUTES"
before insert on "ZZ_DATA_ATTRIBUTES"
for each row
begin
if :NEW."DATA_ID" is null then
select "ZZ_DATA_ATTRIBUTES_SEQ".nextval into :NEW."DATA_ID" from sys.dual;
end if;
end;
/
ALTER TRIGGER "BI_ZZ_DATA_ATTRIBUTES" ENABLE
/
The SEQUENCE and TRIGGER objects are just for unique, auto-incremented values for the primary key on each table.
CREATE TABLE "ZZ_CONDITION_RESULTS"
( "RESULT_ID" NUMBER(10,0) NOT NULL ENABLE,
"DATA_ID" NUMBER(10,0) NOT NULL ENABLE,
"COND_ONE" NUMBER(10,0),
"COND_TWO" NUMBER(10,0),
"COND_THREE" NUMBER(10,0),
"COND_FOUR" NUMBER(10,0),
"COND_FIVE" NUMBER(10,0),
CONSTRAINT "ZZ_CONDITION_RESULTS_PK" PRIMARY KEY ("RESULT_ID") ENABLE
)
/
ALTER TABLE "ZZ_CONDITION_RESULTS" ADD CONSTRAINT "ZZ_CONDITION_RESULTS_FK"
FOREIGN KEY ("DATA_ID") REFERENCES "ZZ_DATA_ATTRIBUTES" ("DATA_ID") ENABLE
/
CREATE SEQUENCE "ZZ_CONDITION_RESULTS_SEQ" MINVALUE 1 MAXVALUE
9999999999999999999999999999 INCREMENT BY 1 START WITH 1 CACHE 20 NOORDER NOCYCLE
/
CREATE OR REPLACE TRIGGER "BI_ZZ_CONDITION_RESULTS"
before insert on "ZZ_CONDITION_RESULTS"
for each row
begin
if :NEW."RESULT_ID" is null then
select "ZZ_CONDITION_RESULTS_SEQ".nextval into :NEW."RESULT_ID" from sys.dual;
end if;
end;
/
ALTER TRIGGER "BI_ZZ_CONDITION_RESULTS" ENABLE
/
The table ZZ_CONDITION_RESULTS should be a TABLE type. It will contain the results of each individual boolean OR criteria. While 1000's of columns may not be practically feasible, the initial approach will show how you can line up lots of boolean outputs and be able to quickly identify and isolate the combinations and patterns of interest.
Sample Data
You can pick your own data values, but these were created to make the examples work. I chose the theme of MARKETING, where the data pulled together are different attributes our fictional company has gathered about their customers: customer name, age, hh_size (Household Size), The scoring results of some bench marked survey, DMA (Demographic Marketing Area) Region and the date the customer was last contacted.
Defined Boolean Arguments Using an Oracle Package Structure
The initial design is to calculate the business logic through an Oracle PL/SQL Package Object. For example, in the OP:
select * from some_table
where .. and .. or .. or.. or..
or.. or.. or.. or.. or.. or.. (1000 times)
Each blank is a separate Oracle function call from within the package(s). The result is represented as a column value for each record of attributes that are evaluated.
create or replace package ZZ_PKG_MARKETING_DEMO as
c_result_true constant pls_integer:= 1;
c_result_false constant pls_integer:= 0;
cursor attrib_cur is
select data_id, name, age, hh_size, survey_score, dma_region,
last_contact
from zz_data_attributes;
TYPE attrib_record_type IS RECORD (
data_id zz_data_attributes.data_id%TYPE,
name zz_data_attributes.name%TYPE,
age zz_data_attributes.age%TYPE,
hh_size zz_data_attributes.hh_size%TYPE,
survey_score zz_data_attributes.survey_score%TYPE,
dma_region zz_data_attributes.dma_region%TYPE,
last_contact zz_data_attributes.last_contact%TYPE
);
function evaluate_cond_one (
p_attrib_rec attrib_record_type) return pls_integer;
function evaluate_cond_two (
p_attrib_rec attrib_record_type) return pls_integer;
function evaluate_cond_three (
p_attrib_rec attrib_record_type) return pls_integer;
function evaluate_cond_four (
p_attrib_rec attrib_record_type) return pls_integer;
function evaluate_cond_five (
p_attrib_rec attrib_record_type) return pls_integer;
procedure main_driver;
end;
create or replace package body "ZZ_PKG_MARKETING_DEMO" is
function evaluate_cond_one (
p_attrib_rec attrib_record_type) return pls_integer
as
begin
-- Checks if person is from a DMA Region in California.
IF p_attrib_rec.dma_region like 'CA%'
THEN return c_result_true;
ELSE return c_result_false;
END IF;
end EVALUATE_COND_ONE;
function evaluate_cond_two (
p_attrib_rec attrib_record_type) return pls_integer
as
c_begin_age_range constant zz_data_attributes.age%TYPE:= 20;
c_end_age_range constant zz_data_attributes.age%TYPE:= 35;
begin
-- Part 1 of 2 Checks if person belongs to the 20 to 35 years age bracket
IF p_attrib_rec.age between c_begin_age_range and c_end_age_range
THEN return c_result_true;
ELSE return c_result_false;
END IF;
end EVALUATE_COND_TWO;
function evaluate_cond_three (
p_attrib_rec attrib_record_type) return pls_integer
as
c_lowest_age constant zz_data_attributes.age%TYPE:= 45;
begin
-- Part 2 of 2 Checks if person is from age 45 and up demographic.
IF p_attrib_rec.age >= c_lowest_age
THEN return c_result_true;
ELSE return c_result_false;
END IF;
end EVALUATE_COND_THREE;
function evaluate_cond_four (
p_attrib_rec attrib_record_type) return pls_integer
as
c_cutoff_score CONSTANT zz_data_attributes.survey_score%TYPE:= 1200;
begin
-- Checks if person's survey score is higher than c_cutoff_score
IF p_attrib_rec.survey_score >= c_cutoff_score
THEN return c_result_true;
ELSE return c_result_false;
END IF;
end EVALUATE_COND_FOUR;
function evaluate_cond_five (
p_attrib_rec attrib_record_type) return pls_integer
as
c_last_contact_period CONSTANT pls_integer:= -750;
-- Note current date is anchored to a static value so the data output
-- in this example will still work regardless of how old this post
-- may get.
c_current_date CONSTANT zz_data_attributes.last_contact%TYPE:=
to_date('03/25/2014','MM/DD/YYYY');
begin
-- Checks if person's last contact date has been in the last 750
-- days.
IF p_attrib_rec.last_contact >=
(c_current_date + c_last_contact_period)
THEN return c_result_true;
ELSE return c_result_false;
END IF;
end EVALUATE_COND_FIVE;
procedure MAIN_DRIVER
as
v_rec_attr attrib_record_type;
v_rec_cond zz_condition_results%ROWTYPE;
begin
for i in attrib_cur
loop
-- Set the input record variable with the attribute values queried by the
-- current cursor.
v_rec_attr.data_id := i.data_id;
v_rec_attr.name := i.name;
v_rec_attr.age := i.age;
v_rec_attr.hh_size := i.hh_size;
v_rec_attr.survey_score := i.survey_score;
v_rec_attr.dma_region := i.dma_region;
v_rec_attr.last_contact := i.last_contact;
-- Set each condition column value equal to their matching package function.
v_rec_cond.cond_one := evaluate_cond_one(p_attrib_rec => v_rec_attr);
v_rec_cond.cond_two := evaluate_cond_two(p_attrib_rec => v_rec_attr);
v_rec_cond.cond_three:= evaluate_cond_three(p_attrib_rec => v_rec_attr);
v_rec_cond.cond_four := evaluate_cond_four(p_attrib_rec => v_rec_attr);
v_rec_cond.cond_five := evaluate_cond_five(p_attrib_rec => v_rec_attr);
INSERT INTO zz_condition_results (data_id, cond_one, cond_two,
cond_three, cond_four, cond_five)
VALUES
( v_rec_attr.data_id,
v_rec_cond.cond_one,
v_rec_cond.cond_two,
v_rec_cond.cond_three,
v_rec_cond.cond_four,
v_rec_cond.cond_five );
end loop;
COMMIT;
end MAIN_DRIVER;
end "ZZ_PKG_MARKETING_DEMO";
PL/SQL Notes: Some may not be familiar the CUSTOM DATA TYPES such as the RECORD VARIABLE TYPE defined within the package in procedure MAIN_DRIVER. They provide easier to handle and reference identification of the data being processed.
Boolean Arithmetic in Plain English (well, sort of)
The CURSOR Named ATTRIB_CUR can be modified to operate on a single record or a smaller input data set. For now, invoke the MAIN_DRIVER procedure to process all the records in the attributes data source (again, this doesn't have to be a single table).
BEGIN
ZZ_PKG_MARKETING_DEMO.MAIN_DRIVER;
END;
Now that each example condition has been evaluated for all the sample records, there are several simpler pathways to evaluating the boolean values, currently captured as values of "1" (for TRUE) and "0" (for FALSE).
If only one of this series of conditions need to be met (as in a long chain of OR operators), then the WHERE clause should look something like this:
WHERE COND_ONE = 1 OR COND_TWO = 1 OR COND_THREE = 1 OR COND_FOUR = 1 OR COND_FIVE = 1
A shorthand approach could be:
WHERE (COND_ONE + COND_TWO + COND_THREE + COND_FOUR + COND_FIVE) > 0
What does this buy? There are performance gains by processing an otherwise static evaluation (the custom conditions) at the time that the data record is populated. One good reason is that each subsequent query that asks about this criteria will not need to crunch through the business logic again. We also leverage an advantage through a decision value with a very, very, very low cardinality (TWO!)
The second "shorthand" example of the WHERE filter criteria is a clue about how the final approach will manage "thousands" of Boolean evaluations.
Scalability: How to Do This Several Thousand More Times in a Row
It would be impractical to assume this approach could scale up to the magnitude presented in the OP. The final question: How can this solution apply for an N thousand chain of boolean values?
Hint: PIVOT your results.
Expandable Table Design for Lots of Boolean Conditions
Here is also a mock-up of the table with the way the sample data would fit into it:
The SQL needed to fetch a multiple OR relation between the five sample conditions can be accomplished through an aggregation query:
-- For multiple OR relations:
SELECT DATA_ID
FROM ZZ_CONDITION_PIVOT
GROUP BY DATA_ID
HAVING SUM(RESULT) > 0
Veterans will probably note this syntax can be further simplified with the use of database supported ANALYTICAL FUNCTIONS.
This design should be low maintenance with any number of boolean conditions introduced during or after the implementation. The table designs should remain the same throughout.
Let me know your thoughts, it looks like the discussion has moved on to other issues and contributors so this is probably long enough to get you started. Onward!
Related
Bulk Insert in PostgreSql using c# [duplicate]
I need to programmatically insert tens of millions of records into a Postgres database. Presently, I'm executing thousands of insert statements in a single query. Is there a better way to do this, some bulk insert statement I do not know about?
PostgreSQL has a guide on how to best populate a database initially, and they suggest using the COPY command for bulk loading rows. The guide has some other good tips on how to speed up the process, like removing indexes and foreign keys before loading the data (and adding them back afterwards).
There is an alternative to using COPY, which is the multirow values syntax that Postgres supports. From the documentation: INSERT INTO films (code, title, did, date_prod, kind) VALUES ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'), ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy'); The above code inserts two rows, but you can extend it arbitrarily, until you hit the maximum number of prepared statement tokens (it might be $999, but I'm not 100% sure about that). Sometimes one cannot use COPY, and this is a worthy replacement for those situations.
One way to speed things up is to explicitly perform multiple inserts or copy's within a transaction (say 1000). Postgres's default behavior is to commit after each statement, so by batching the commits, you can avoid some overhead. As the guide in Daniel's answer says, you may have to disable autocommit for this to work. Also note the comment at the bottom that suggests increasing the size of the wal_buffers to 16 MB may also help.
UNNEST function with arrays can be used along with multirow VALUES syntax. I'm think that this method is slower than using COPY but it is useful to me in work with psycopg and python (python list passed to cursor.execute becomes pg ARRAY): INSERT INTO tablename (fieldname1, fieldname2, fieldname3) VALUES ( UNNEST(ARRAY[1, 2, 3]), UNNEST(ARRAY[100, 200, 300]), UNNEST(ARRAY['a', 'b', 'c']) ); without VALUES using subselect with additional existance check: INSERT INTO tablename (fieldname1, fieldname2, fieldname3) SELECT * FROM ( SELECT UNNEST(ARRAY[1, 2, 3]), UNNEST(ARRAY[100, 200, 300]), UNNEST(ARRAY['a', 'b', 'c']) ) AS temptable WHERE NOT EXISTS ( SELECT 1 FROM tablename tt WHERE tt.fieldname1=temptable.fieldname1 ); the same syntax to bulk updates: UPDATE tablename SET fieldname1=temptable.data FROM ( SELECT UNNEST(ARRAY[1,2]) AS id, UNNEST(ARRAY['a', 'b']) AS data ) AS temptable WHERE tablename.id=temptable.id;
((this is a WIKI you can edit and enhance the answer!)) The external file is the best and typical bulk-data The term "bulk data" is related to "a lot of data", so it is natural to use original raw data, with no need to transform it into SQL. Typical raw data files for "bulk insert" are CSV and JSON formats. Bulk insert with some transformation In ETL applications and ingestion processes, we need to change the data before inserting it. Temporary table consumes (a lot of) disk space, and it is not the faster way to do it. The PostgreSQL foreign-data wrapper (FDW) is the best choice. CSV example. Suppose the tablename (x, y, z) on SQL and a CSV file like fieldname1,fieldname2,fieldname3 etc,etc,etc ... million lines ... You can use the classic SQL COPY to load (as is original data) into tmp_tablename, them insert filtered data into tablename... But, to avoid disk consumption, the best is to ingested directly by INSERT INTO tablename (x, y, z) SELECT f1(fieldname1), f2(fieldname2), f3(fieldname3) -- the transforms FROM tmp_tablename_fdw -- WHERE condictions ; You need to prepare database for FDW, and instead static tmp_tablename_fdw you can use a function that generates it: CREATE EXTENSION file_fdw; CREATE SERVER import FOREIGN DATA WRAPPER file_fdw; CREATE FOREIGN TABLE tmp_tablename_fdw( ... ) SERVER import OPTIONS ( filename '/tmp/pg_io/file.csv', format 'csv'); JSON example. A set of two files, myRawData1.json and Ranger_Policies2.json can be ingested by: INSERT INTO tablename (fname, metadata, content) SELECT fname, meta, j -- do any data transformation here FROM jsonb_read_files('myRawData%.json') -- WHERE any_condiction_here ; where the function jsonb_read_files() reads all files of a folder, defined by a mask: CREATE or replace FUNCTION jsonb_read_files( p_flike text, p_fpath text DEFAULT '/tmp/pg_io/' ) RETURNS TABLE (fid int, fname text, fmeta jsonb, j jsonb) AS $f$ WITH t AS ( SELECT (row_number() OVER ())::int id, f AS fname, p_fpath ||'/'|| f AS f FROM pg_ls_dir(p_fpath) t(f) WHERE f LIKE p_flike ) SELECT id, fname, to_jsonb( pg_stat_file(f) ) || jsonb_build_object('fpath', p_fpath), pg_read_file(f)::jsonb FROM t $f$ LANGUAGE SQL IMMUTABLE; Lack of gzip streaming The most frequent method for "file ingestion" (mainlly in Big Data) is preserving original file on gzip format and transfering it with streaming algorithm, anything that can runs fast and without disc consumption in unix pipes: gunzip remote_or_local_file.csv.gz | convert_to_sql | psql So ideal (future) is a server option for format .csv.gz. Note after #CharlieClark comment: currently (2022) nothing to do, the best alternative seems pgloader STDIN: gunzip -c file.csv.gz | pgloader --type csv ... - pgsql:///target?foo
You can use COPY table TO ... WITH BINARY which is "somewhat faster than the text and CSV formats." Only do this if you have millions of rows to insert, and if you are comfortable with binary data. Here is an example recipe in Python, using psycopg2 with binary input.
It mostly depends on the (other) activity in the database. Operations like this effectively freeze the entire database for other sessions. Another consideration is the datamodel and the presence of constraints,triggers, etc. My first approach is always: create a (temp) table with a structure similar to the target table (create table tmp AS select * from target where 1=0), and start by reading the file into the temp table. Then I check what can be checked: duplicates, keys that already exist in the target, etc. Then I just do a do insert into target select * from tmp or similar. If this fails, or takes too long, I abort it and consider other methods (temporarily dropping indexes/constraints, etc)
I just encountered this issue and would recommend csvsql (releases) for bulk imports to Postgres. To perform a bulk insert you'd simply createdb and then use csvsql, which connects to your database and creates individual tables for an entire folder of CSVs. $ createdb test $ csvsql --db postgresql:///test --insert examples/*.csv
I implemented very fast Postgresq data loader with native libpq methods. Try my package https://www.nuget.org/packages/NpgsqlBulkCopy/
May be I'm late already. But, there is a Java library called pgbulkinsert by Bytefish. Me and my team were able to bulk insert 1 Million records in 15 seconds. Of course, there were some other operations that we performed like, reading 1M+ records from a file sitting on Minio, do couple of processing on the top of 1M+ records, filter down records if duplicates, and then finally insert 1M records into the Postgres Database. And all these processes were completed within 15 seconds. I don't remember exactly how much time it took to do the DB operation, but I think it was around less then 5 seconds. Find more details from https://www.bytefish.de/blog/pgbulkinsert_bulkprocessor.html
As others have noted, when importing data into Postgres, things will be slowed by the checks that Postgres is designed to do for you. Also, you often need to manipulate the data in one way or another so that it's suitable for use. Any of this that can be done outside of the Postgres process will mean that you can import using the COPY protocol. For my use I regularly import data from the httparchive.org project using pgloader. As the source files are created by MySQL you need to be able to handle some MySQL oddities such as the use of \N for an empty value and along with encoding problems. The files are also so large that, at least on my machine, using FDW runs out of memory. pgloader makes it easy to create a pipeline that lets you select the fields you want, cast to the relevant data types and any additional work before it goes into your main database so that index updates, etc. are minimal.
The query below can create test table with generate_series column which has 10000 rows. *I usually create such test table to test query performance and you can check generate_series(): CREATE TABLE test AS SELECT generate_series(1, 10000); postgres=# SELECT count(*) FROM test; count ------- 10000 (1 row) postgres=# SELECT * FROM test; generate_series ----------------- 1 2 3 4 5 6 -- More -- And, run the query below to insert 10000 rows if you've already had test table: INSERT INTO test (generate_series) SELECT generate_series(1, 10000);
Oracle SQL insert into table a user ID and a value of a list of strings
I would like to do the following but I lack the knowledge of Oracle SQL to do this. I wish to give a userid as a string, and then a list of strings as a parameter to a procedure. In the procedure I wish to make an insert into a table for each value in that list, together with the same user ID each time. My questions: I can't find an example of what to declare the input parameter as to make it a list. Do I need to make it a long varchar? I roughly know what the max length could be if the list is filled out entirely, but i was wondering if there is a data type for a sort of list? How do I loop trough the list. I see a lot of examples like this: FOR r IN ('The', 'Quick', 'brown', 'fox') LOOP // Do stuff END LOOP; So does this mean I should provide my list as a single string with comma separate values? I was wondering if there are other ways to do this.
Here's a (PL/)SQL option you might want to consider. For testing purposes, I've created a TEST table which will contain the ID - VALUE pairs. The procedure accepts two parameters, both are strings: ID will be common for all values VALUE is a comma-separated values list. Although you can pass a collection, I'd suggest you to use VARCHAR2 as it is quite simple to maintain SELECT within the procedure uses a hierarchical query with regular expressions; its purpose is to split that comma-separated values string into rows so that you could insert each value into its own row. Doing so, you don't even need a loop. Besides, that SELECT would work even if you run it standalone, but - you want a procedure. OK, here it goes: SQL> create table test (id varchar2(10), value varchar2(20)); Table created. SQL> create or replace procedure p_ins (par_id in varchar2, 2 par_value in varchar2) 3 is 4 begin 5 insert into test (id, value) 6 select par_id, 7 trim(regexp_substr(par_value, '[^,]+', 1, level)) 8 from dual 9 connect by level <= regexp_count(par_value, ',') + 1; 10 end; 11 / Procedure created. Testing: SQL> begin 2 p_ins('A', 'The, quick, brown fox, runs, or, whatever, it does'); 3 end; 4 / PL/SQL procedure successfully completed. SQL> select * from test; ID VALUE ---------- -------------------- A The A quick A brown fox A runs A or A whatever A it does 7 rows selected.
To execute the same insert N times with N different strings string[] things = new[]{"foo", "bar", "baz"); SomeSqlCommand sql = new SomeSqlCommand("INSERT INTO table(a, b) VALUES(#a, #b)", "some connection string"); sql.Parameters.AddWithValue("#a", "fixed value"); sql.Parameters.AddWithValue("#b", "dummy value - will change in loop"); sql.Connection.Open(); foreach(string thing in things) { sql.Parameters["#b"].Value = thing; sql.ExecuteNonQuery(); } sql.Connection.Close(); Omitted using etc for clarity; basic premise: set up a parameterised SQL, set the parameter values, execute, change values, execute again... When I used Oracle, parameter names were preceded by a colon; no idea if that's still true. Treat this as pseudocode (it's more like SQLServer syntax) and merge the concept into your existing Oracle style
Avoid duplicates in SQL server due the latency
I have a POS like system in C#, and for long time it not present any problem (it was just one POS). But in this days are 4 POS using the system, and connected to the same database and all the sales of one POS go to the same Audit (table) where all of the others sales go. So in this system this is the procedure Function to get the last Ticket number (with simple SELECT) Add 1 to that number (next tickt no). Generates a ID Code injecting this Ticket number (with the terminal, date, and employee code) into the algorithm Insert record of the sale into database with all the necesary information (Date, Client, Employee, IDCode, etc.) (with simple INSERT INTO) But having 4 POS I realize that some sales where having the same Ticket number, fortunately the Ticket ID code are not the same because the terminal and the employee are different, but how can avoid this? Edit 1: Every POS system have dual function, in one mode the POS sales are centralized, and every POS in this mode generates consecutive tickets (like they all where one POS), in the other mode every POS have their own Ticket numertion, for that reason I can't use the identity.
Just use a sequence to generate the next ticket number. CREATE SEQUENCE Tickets START WITH 1 INCREMENT BY 1; Then each POS just do SELECT NEXT VALUE FOR Tickets; The sequence is guaranteed to never return the same number twice.
As has been mentioned, if the TicketNumber is sequential and unique, it sounds like an IDENTITY field would be the way to go. BUT, if for some reason there is something preventing that, or if that requires too many changes as this time, you could constrain the process itself to be single-threaded by creating a lock on the ID Code generation process itself through the use of Application Locks (see sp_getapplock and sp_releaseapplock). Application Locks let you create locks around arbitrary concepts. Meaning, you can define the #Resource as "generate_id_code" which will force each caller to wait their turn. It would follow this structure: BEGIN TRANSACTION; EXEC sp_getapplock #Resource = 'generate_id_code', #LockMode = 'Exclusive'; ...current 4 steps to generate the ID Code... EXEC sp_releaseapplock #Resource = 'generate_id_code'; COMMIT TRANSACTION; You need to manage errors / ROLLBACK yourself (as stated in the linked MSDN documentation) so put in the usual TRY / CATCH. But, this does allow you to manage the situation. Please note: sp_getapplock / sp_releaseapplock should be used sparingly; Application Locks can definitely be very handy (such as in cases like this one) but they should only be used when absolutely necessary.
You need to do this in an atomic action. So you can wrap everything in a transaction and lock the table. See here for a good discussion on locking etc. Locking will slow down everything else since everything will start waiting for the table to free up for it to complete and that may not be something you can live with. Or you can should use an identity on the column which will be managed by the database and maintain unique incrementing numbers. You could also create your primary key (hope you have one) to be a combination of a few things. And then you could keeping a running number for each POS endpoint to see more data about how they are performing. But that gets more into analytics which isn't in scope here.
I would strongly suggest moving away from the current approach if you can and changing to a GUID PK. However, I realize that in some cases a redesign is not possible (we have the exact same scenario that you describe in a legacy database). In this case, you can get the maximum value safely using the UPDLOCK table hint in combination with the insert command and use the OUTPUT INSERTED functionality to retrieve the new primary key value into a local variable if needed: DECLARE #PK Table (PK INT NOT NULL) INSERT INTO Audit ( TicketNumber, Terminal, Date, EmployeeCode, Client, IDCode, ... other fields ) /* Record the new PK in the tablevariable */ OUTPUT INSERTED.TicketNumber INTO #PK SELECT IsNull(MAX(TicketNumber), 0) + 1, #Terminal, #Date, #EmployeeCode, #Client, #IDCode, ... other values FROM Audit WITH (UPDLOCK) DECLARE #TicketNumber INT /* Move the new PK from the local tablevariable into a local variable for subsequent use */ SELECT #TicketNumber = PK FROM #PK
Oracle: Fastest way to UPSERT and return the last affected Row ID in oracle for large data sets
This table schema in question is here: Oracle SQL: Selecting a single row with the latest date between multiple columns I'm working with a table that has over 5 million entries. What is the fastest and most accurate way to upsert to this table AND return the last upserted row id using a stored procedure? Most of what I've read recommends using the merge statement for upserts. However, merge doesn't support returning into. In our table, we have the CREATE_DATE, CREATE_USER, UPDATE_DATE, and UPDATE_USER fields that are updated as expected. My thought was to create a stored procedure that returned the id of the row that has the latest date between those two columns and where the respective user data was equal to the current user data. This is what the people who answered the referring question helped me with (thanks!). However, I'm concerned about the combined execution time vs other methods, as well as the huge gaps created in sequences due to merging. Calling a separate statement simply to get the id also seems a bit inefficient. However, almost everything I've read says that merge is much faster than the pre-merge upsert statements. Note that these are being called through a c#/asp web application. Any help is appreciated :) edit Below is an example of the stored procedure I'm using for the Upsert. Note that the CREATE_DATE and UPDATE_DATE columns are updated with triggers. create or replace PROCEDURE P_SAVE_EXAMPLE_TABLE_ROW ( pID IN OUT EXAMPLE_TABLE.ID%type, --Other row params here pUSER IN EXAMPLE_TABLE.CREATE_USER%type, pPLSQLErrorNumber OUT NUMBER, pPLSQLErrorMessage OUT VARCHAR2 ) AS BEGIN MERGE INTO USERS_WORKGROUPS_XREF USING dual ON (ID=pID) WHEN NOT MATCHED THEN INSERT (--OTHER COLS--, CREATE_USER) VALUES (--OTHER COLS--, pUSER) WHEN MATCHED THEN UPDATE SET --OTHER COLS-- UPDATE_USER=pUSER WHERE ID=pID; EXCEPTION WHEN OTHERS THEN pID := 0; pPLSQLErrorNumber := SQLCODE; pPLSQLErrorMessage := SUBSTR(SQLERRM, 1, 256); RETURN; -- STATEMENT TO RETURN LAST AFFECTED ID INTO pID GOES HERE END;
If you're trying to return the maximum value of a sequence-generated PK on the table then I'd just run a "Select max(id) .." directly afterwards. If other sessions are also modifying the table then maybe reading the currval of the sequence would be better.
Performance: Returning multiple tables from Sql Server. C# .Net
I have a a form containing 10 drop down lists. These lists we fetch by doing 10 calls to database at the time of form load. I want to know the performance on application as well as on Sql Server in following 2 cases. Also please suggest best approach. Fetch data for each of these drop down lists doing 10 requests Create stored proc which will fetch 10 tables and return these 10 tables on UI in a data reader to create entities (single hit) Please suggest your views...
Its good if you are fetch data in one go i.e by calling proceudre onece and get all ten dropdown data ..but it also depends on the number records you have and time to process each record that you are going to bind with each dropdownbox
option 1. It is easy to maintain. 1.10 requests doesn't cost very much 2.assume some day you want to query only five of them, you can easily combine the data parts. if you put them into one store procdure,things will be diffcult when business logic is changed.
You can return multiple tables from sql server stored procedure. Create a stored procedure with multiple select queries. for example if your sp has 10 select queries, it will return ten result sets or tables.
Few months back we had a same situation, and we went for option 2 in a way, that we have 5 data tables being returned from different SPs, so we made one SP, with 5 output parameters. In those parameters we send as Input that if specific data table is required or not, and later SP returns that at which index the specific data table is returned. CREATE procedure [dbo].[MySP] #pTable1 smallint OUTPUT #pTable2 smallint OUTPUT AS DECLARE iLocation smallint = 0; BEGIN IF #pTable1 = 1 BEGIN SELECT * FROM TABLE1; SET #pTable1 = iLocation; iLocation = iLocation + 1; END END ..... AND SO ON I hope it will give you a better idea.