I have a data set in memory which has two data tables in it.
Cont
Ins
Secondly i have separate data table (in memory) in which i have rows in following format.
And suppose it has two rows. In actual it may have million of rows
ID TableName ColumnName Operator value
1 Const ID = 1
2 Ins app_ID = 558877
As is: Now i get information from given rows and construct a query and execute it in database server like:
Select count(*) from Cont.Id= 1 and Ins.app_id = 558877;
On the basis of above query result i have implemented my business logic.
Objective: In order to increase performance i want to execute query in application server as now i have complete table in memory. how can i do it.
Note: Tables name may vary according to data.
Do you really want to keep tables with millions of rows in memory?
In terms of counting an in memory table, how is it kept? If it's a datatable as per your tag you can use the DataTable.Rows.Count property.
If your table names aren't known, you can loop over the tables in DataSet.Tables and call Rows.Count on each of them.
You have to create index on app_ID field. After that your count will run with good performance.
Your query references undefined alias "Ins". You must specify another table in FROM/JOIN clause and change "count()" to "count(Cont.)"
Related
I have a .Net DataTable that contains records, all of which are "added" records. The corresponding table in the database may contain millions of rows. If I attempt to simply call the "Update" method on my SqlDataAdapter, any existing records cause an exception to be raised due to a violation of the primary key constraint. I considered loading all of the physical table's records into a second DataTable instance, merging the two, and then calling the Update method on the second DataTable. This actually works exactly like I want. However, my concern is that if there are 30 billion records in the physical table, loading all of that data into a DataTable in memory could be an issue.
I considered selecting a sub-set of data from the physical table and proceeding as described above, but the construction of the sub-query has proved to be very involved and very tedious. You see, I am not working with a single known table. I am working with a DataSet that contains several hundred DataTables. Each of the DataTables maps to its own physical table. The name and schema of the tables are not known at compile time. This has to all be done at run time.
I have played with the SqlBulkCopy class but have the same issue - duplicate records raise an exception.
I don't want to have to dynamically construct queries for each table at run time. If that is the only way, so be it, but I just can't help but think that there must be a simpler solution using what Ado.Net provides.
you could create your insertcommand like this:
declare #pk int = 1
declare #txt nvarchar(100) = 'nothing'
insert into #temp (id, txt)
select distinct #pk, #txt
where not exists (select id from #temp x where x.id = #pk)
assuming that your table #temp (temporary table used for this example) is created like this (with primary key on id)
create table #temp (id int not null, txt nvarchar(100))
I am creating an application that takes data from a text file which has sales data from Amazon market place.The market place has items with different names compared to the data in our main database. The application accepts the text file as input and it needs to check if the item exists in our database. If not present I should throw an option to save the item to a Master table or to Sub item table and map it to a master item. My question is if the text file has 100+ items should I hit the database each time to check if the data exists there.Is there any better way of doing that so that we can minimize the database hits.
I have two options that i have used earlier
Hit database and check if it exists in table.
Fill the data in a DataTable and use DataTable.Select to check if it exists.
Can some one tell me the best way to do this?. I have to check two tables (master table, subItem table), maybe 1 at a time. Thanks.
Update:
#Downvoters add an comment .
i am not asking you whats the way to check if an item exists in database.I just want to know the best way of doing that. Should I be hitting database 1000 times if an file has 1000 items? That's my question.
The current query I use:
if exists (select * from [table] where itemname= [itemname] )
select 'True'
else
select 'False'
return
(From Chat)
I would create a Stored Procedure which takes a table valued parameter of all the items that you want to check. You can then use a join (a couple of options here)* to return a result set of items and whether each one exists or not. You can use TVP's from ADO like this.
It will certainly handle the 100 to 1000 row range mentioned in your post. To be honest, I haven't used it in the 1M+ range.
in newer versions of SQL Server, I would prefer TVP's over using an xml input parameter, as it is really quite cumbersome to pack the xml in your .Net code and then unpack it again in your SPROC.
(*) Re Joins : With the result set, you can either just inner join the TVP to your items / product table and check in .Net if the row doesn't exist, or you can do an left outer join with the TVP as the left table, and e.g. ISNULL() missing items to 0 / 'false' etc.
Make it as batch of 100 items to the database. probably a stored procedure might help, since repetitive queries has to be fired. If the data is not changed frequently, you can consider caching. I assume you will be making service calls from ur .net application, so ingest a xml from back end, in batches. Consider increasing batch size based on the filesize.
If your entire application is local, batch size size may very high, as there is no netowrk oberhead, still dont make 100 calls to db.
Try like this
SELECT EXISTS(SELECT * FROM table1 WHERE itemname= [itemname])
SELECT EXISTS(SELECT 1 FROM table1 WHERE itemname= [itemname])
We are refactoring a project from plain MySQL queries to the usage of NHibernate.
In the MySQL connector there is the ExecuteNonQuery function that returns the rows affected. So
int RowsDeleted = ExecuteNonQuery("DELETE FROM `table` WHERE ...");
would show me how many rows where effectively deleted.
How can I achieve the same with NHibernate? So far I can see it is not possible with Session.Delete(query);.
My current workaround is first loading all of the objects that are about to be deleted and delete them one-by-one, incrementing a counter on each delete. But that will cost performance I may assume.
If you don't mind that nHibernate will create delete statements for each row and maybe additional statements for orphans and/or other relationships, you can use session.Delete.
For better performance I would recommend to do batch deletes (see example below).
session.Delete
If you delete many objects with session.Delete, nHibernate makes sure that the integrity is preserved, it will load everything into the session if needed anyways. So there is no real reason to count your objects or have a method to retrieve the number of objects which have been deleted, because you would simply do a query before running the delete to determine the number of objects which will be affected...
The following statement will delete all entities of type post by id.
The select statement will query the database only for the Ids so it is actually very performant...
var idList = session.Query<Post>().Select(p => p.Id).ToList<int>();
session.Delete(string.Format("from Post where Id in ({0})", string.Join(",", idList.ToArray())));
The number of objects deleted will be equal to the number of Ids in the list...
This is actually the same (in terms of queries nHibernate will fire against your database) as if you would query<T> and loop over the result and delete all of them one by one...
Batch delete
You can use session.CreateSqlQuery to run native SQL commands. It also allows you to have input and output parameters.
The following statement would simply delete everything from the table as you would expect
session.CreateSQLQuery(#"Delete from MyTableName");
To retrieve the number of rows delete, we'll use the normal TSQL ##ROWCOUNT variable and output it via select. To retrieve the selected row count, we have to add an output parameter to the created query via AddScalar and UniqueResult simple returns the integer:
var rowsAffected = session.CreateSQLQuery(#"
Delete from MyTableName;
Select ##ROWCOUNT as NumberOfRows")
.AddScalar("NumberOfRows", NHibernateUtil.Int32)
.UniqueResult();
To pass input variables you can do this with .SetParameter(<name>,<value>)
var rowsAffected = session.CreateSQLQuery(#"
DELETE from MyTableName where ColumnName = :val;
select ##ROWCOUNT NumberOfRows;")
.AddScalar("NumberOfRows", NHibernateUtil.Int32)
.SetParameter("val", 1)
.UniqueResult();
I'm not so confortable with MySQL, the example I wrote is for MSSQL, I think in MySQL the ##ROWCOUNT equivalent would be SELECT ROW_COUNT();?
In an C# program I have an array with about 100.000 elements.
Then I have a SQL Server 2008 table where the primary key column contains more or less nearly all elements of the array (but a few not). The table can have up to 30.000.000 rows.
Now I want to determine which elements of the array do not exist in the table. How can this be achieved efficiently?
The most efficient method would probably be to bulk-insert those 100,000 elements into a temp table and then perform the comparison within the database itself.
(Note that I haven't tested this theory; it's just an educated guess.)
Query the table with a
select <primarykey> where <primarykey> in (<primary key of ur list of elements in c#>)
This should be faster than inserting all rows into a table and then checking with an except/minus command for missing elements, because it does not involve any write operation.
Once you have the list of primary keys which are common..pull it back into c# and compare.
A way to avoid creating temp tables would be to use a stored procedure which accepts a table valued parameter of a user-defined table type (udtt). This table would have a schema of one column of a data type matching that in your array.
If you populate a DataTable (with a schema matching the udtt schema) with your array values and supply the data table as your stored proc's parameter, you can pass up all 100,000 of your items in their sql binary format. The proc can just do a join between the 30M row table and the table-valued parameter, returning the items in the TVP table with no matches in the master table.
This avoids needing to build massive IN statements.
EDIT Regarding the comment from #Kyro below
I'm now less confident in this approach. I found an article showing the under-the-covers row-by-row inserts that Kyro describes. What you might gain in sending binary data over the network rather than a large TSQL where in() statement, may well be taken away by the performance SQL side. However, it's a fairly simple code approach, so might just be worth a quick test. Let us know how you get on?
I am confused about selecting two approaches.
Scenario
there are two tables Table 1 and Table 2 respectively. Table 1 contains user's data for example first name, last name etc
Table 2 contains cars each user has with its description. i.e Color, Registration No etc
Now if I want to have all the information of all users then what approach is best to be completed in minimum time?
Approach 1.
Query for all rows in Table 1 and store them all in a list for ex.
then Loop through the list and query it and get data from Table 2 according to user saved in in first step.
Approach 2
Query for all rows and while saving that row get its all values from table 2 and save them too.
If I think of system processes then I think it might be the same because there are same no of records to be processed in both approaches.
If there is any other better idea please let me know
Your two approaches will have about the same performance (slow because of N+1 queries). It would be faster to do a single query like this:
select *
from T1
left join T2 on ...
order by T1.PrimaryKey
Your client app can them interpret the results and have all data in a single query. An alternative would be:
select *, 1 as Tag
from T1
union all
select *, 2 as Tag
from T2
order by T1.PrimaryKey, Tag
This is just pseudo code but you could make it work.
The union-all query will have surprisingly good performance because sql server will do a "merge union" which works like a merge-join. This pattern also works for multi-level parent-child relationships, although not as well.