I am confused about selecting two approaches.
Scenario
there are two tables Table 1 and Table 2 respectively. Table 1 contains user's data for example first name, last name etc
Table 2 contains cars each user has with its description. i.e Color, Registration No etc
Now if I want to have all the information of all users then what approach is best to be completed in minimum time?
Approach 1.
Query for all rows in Table 1 and store them all in a list for ex.
then Loop through the list and query it and get data from Table 2 according to user saved in in first step.
Approach 2
Query for all rows and while saving that row get its all values from table 2 and save them too.
If I think of system processes then I think it might be the same because there are same no of records to be processed in both approaches.
If there is any other better idea please let me know
Your two approaches will have about the same performance (slow because of N+1 queries). It would be faster to do a single query like this:
select *
from T1
left join T2 on ...
order by T1.PrimaryKey
Your client app can them interpret the results and have all data in a single query. An alternative would be:
select *, 1 as Tag
from T1
union all
select *, 2 as Tag
from T2
order by T1.PrimaryKey, Tag
This is just pseudo code but you could make it work.
The union-all query will have surprisingly good performance because sql server will do a "merge union" which works like a merge-join. This pattern also works for multi-level parent-child relationships, although not as well.
Related
In our current application we have some performance issues with some of our queries. Usually we have something like:
List<int> idList = some data here…;
var query = (from a in someTable where idList.Contains(a.Id));
while for simple queries this is acceptable, it becomes a bottleneck when we have more items in idList (in some queries we have about 700 id’s to check, for example).
Is there any way to use something other then Contains? We are thinking of using some temporary tables to first insert the Ids, and then to execute join instead of Contains, but it would seem EntityFramework does not support such operations (creating temporary tables in code) :(
What else can we try?
I Suggest using LINQ PAD it offers a "Transform to SQL" option which allows you to see your query in SQL syntax.
there is a chance that this is the optimal solution (if youre not into messy stuff).
might try holding the idList as a sorted array and have the contains method replaced with a binary search. (you can implement your own extension).
You can try this:
var query = someTable.Where(a => idList.Any(b => b.Id == a.Id));
If you don't mind having a physical table you could use a semi-temporary table. The basic idea is:
Create a physical table with a "query id" column
Generate a unique ID (not random, but unique)
Insert data into the table tagging the records with the query ID
Pass the query id to the main query, using it to join to the link table
Once the query is complete, delete the temporary records
At worst if something goes wrong you will have orphaned records in the link table (which is why you use a unique query ID).
It's not the cleanest solution but it will be faster than using Contains if you have a lot of values to check against.
When Entity Framework starts being a performance bottleneck, generally it's time to write actual SQL.
So what you could do for example is build a table-valued function that takes a table-valued parameter (your list of IDs) as parameter. The function would just return the result of your JOIN.
Table valued function feature requires EF5, so it might be not an option if you're really stuck with EF4.
The idea is to refactor your queries to get rid of idList.
For example you should return the list of orders of male users 18-25 year, from France. If you filter users table by age, sex and country to get idList of users you end up with 700+ id's. Instead you make Orders table join with Users and apply filters to the Users table. So you don't have 2 requests (one for ids and one for orders) and it works much faster cause it can use indexes while joining the table.
Makes sense?
I have 2 tables linked by a foreign Key (ID). In table1 there are 1 million records. In table 2 there are 50 million records.
I would like to read record from Table1 and read all the associated records of table 2.I can use SqlDataReader and implement peek() to implement this functionality as discussed here (How do I implement a Peek() function on a DataReader?)
select ID, Col1 from Table1 order by ID
select ID, col2 from Table2 order by ID
But the downside of peek approach is I have to compare each child record with parent before advancing pointer of the parent result.
If I use join in SQL Server, it will perform join operation and then start streaming the result which requires a lot of memory.
Another approach would be to divide join operation in batches but this involved firing multiple SQL queries which I don't want..
Can you please suggest some alternative approach to achieve this?
If I understand your problem correctly, you might want to look into using a partitioned table. Here's the MySQL manual page on partitioning, and here's a Stack Overflow question that deals with partitioning and joins
I am creating an application that takes data from a text file which has sales data from Amazon market place.The market place has items with different names compared to the data in our main database. The application accepts the text file as input and it needs to check if the item exists in our database. If not present I should throw an option to save the item to a Master table or to Sub item table and map it to a master item. My question is if the text file has 100+ items should I hit the database each time to check if the data exists there.Is there any better way of doing that so that we can minimize the database hits.
I have two options that i have used earlier
Hit database and check if it exists in table.
Fill the data in a DataTable and use DataTable.Select to check if it exists.
Can some one tell me the best way to do this?. I have to check two tables (master table, subItem table), maybe 1 at a time. Thanks.
Update:
#Downvoters add an comment .
i am not asking you whats the way to check if an item exists in database.I just want to know the best way of doing that. Should I be hitting database 1000 times if an file has 1000 items? That's my question.
The current query I use:
if exists (select * from [table] where itemname= [itemname] )
select 'True'
else
select 'False'
return
(From Chat)
I would create a Stored Procedure which takes a table valued parameter of all the items that you want to check. You can then use a join (a couple of options here)* to return a result set of items and whether each one exists or not. You can use TVP's from ADO like this.
It will certainly handle the 100 to 1000 row range mentioned in your post. To be honest, I haven't used it in the 1M+ range.
in newer versions of SQL Server, I would prefer TVP's over using an xml input parameter, as it is really quite cumbersome to pack the xml in your .Net code and then unpack it again in your SPROC.
(*) Re Joins : With the result set, you can either just inner join the TVP to your items / product table and check in .Net if the row doesn't exist, or you can do an left outer join with the TVP as the left table, and e.g. ISNULL() missing items to 0 / 'false' etc.
Make it as batch of 100 items to the database. probably a stored procedure might help, since repetitive queries has to be fired. If the data is not changed frequently, you can consider caching. I assume you will be making service calls from ur .net application, so ingest a xml from back end, in batches. Consider increasing batch size based on the filesize.
If your entire application is local, batch size size may very high, as there is no netowrk oberhead, still dont make 100 calls to db.
Try like this
SELECT EXISTS(SELECT * FROM table1 WHERE itemname= [itemname])
SELECT EXISTS(SELECT 1 FROM table1 WHERE itemname= [itemname])
I have defined various text value by int. I store int value in data table for better and fast search. I have three options to display text value:
I declare Enum in my codes and display text value according to int value. It is static and I have to change code if new values is to be added.
To make it dynamic, I can store int and text value in a table which is in another database and admin own it. New values can be updated by admin in this table. I use inner join to display text value whenever a record is fetched.
I store actual text in respective data table. This will make search slow.
My question is which option is best to use under following condition?
Data table has more than records between 1 and 10 millions.
There are more than 5000 users doing fetch, search, update process on table.
Maximum text values are 12 in number and length (max) 50 char.
There are 30 data tables having above conditions and functions.
I like combination of option #2 and option #1 - to use int's but have dictionary table in another database.
Let me explain:
to store int and text in a table which is in another database;
in origin table to store int only;
do not join table from another database to get text but cache dictionary on client and resolve text from that dictionary
I would not go for option 1 for the reason given. Enums are not there as lookups. You could replace 1 with creating a dictionary but again it would need to be recompiled each time a change is made which is bad.
Storing text in a table (ie option 3) is bad if it is guaranteed to be duplicated a lot as here. This is exactly where you should use a lookup table as you suggest in number 2.
So yes, store them in a database table and administer them through that.
The joining shouldn't take long to do at all if it is just to a small table. If you are worried though an alternative might be to load the lookup table into a dictionary in the code the first time you need it and look up the values on the code from your small lookup table. I doubt you'll have problems with just doing it by the join though.
And I'd do this approach no matter what the conditions are (ie number of records, etc.). The conditions do make it more sensible though. :)
If you have literally millions of records, there's almost certainly no point in trying to spin up such a structure in server code or on the client in any form. It needs to be kept in a database, IMHO.
The query that creates the list needs to be smart enough to constrain the count of returned records to a manageable number. Perhaps partitioned views or stored procedures might help in this regard.
If this is primarily a read-only list, with updates only done in the context of management activities, it should be possible to make queries against the table very rapid with proper indexes and queries on the client side.
I have a medical database that keeps different types of data on patients: examinations, lab results, x-rays... each type of record exists in a separate table. I need to present this data on one table to show the patient's history with a particular clinic.
My question: what is the best way to do it? Should I do a SELECT from each table where the patient ID matches, order them by date, and then keep them in some artificial list-like structure (ordered by date)? Or is there a better way of doing this?
I'm using WPF and SQL Server 2008 for this app.
As others have said, JOIN is the way you'd normally do this. However, if there are multiple rows in one table for a patient then there's a chance you'll get data in some columns repeated across multiple rows, which often you don't want. In that case it's sometimes easier to use UNION or UNION ALL.
Let's say you have two tables, examinations and xrays, each with a PatientID, a Date and some extra details. You could combine them like this:
SELECT PatientID, ExamDate [Date], ExamResults [Details]
FROM examinations
WHERE PatientID = #patient
UNION ALL
SELECT PatientID, XrayDate [Date], XrayComments [Details]
FROM xrays
WHERE PatientID = #patient
Now you have one big result set with PatientID, Date and Details columns. I've found this handy for "merging" multiple tables with similar, but not identical, data.
If this is something you're going to be doing often, I'd be tempted to create a denormalized view on all of patient data (join the appropriate tables) and index the appropriate column(s) in the view. Then use the appropriate method (stored procedure, etc) to retrieve the data for a passed-in patientID.
Use a JOIN to get data from several tables.
You can use a join (can't remember which type exactly) to get all the records from each table for a specific patient. The way this works depends on your database design.
I'd do it with separate SELECT statements, since a simple JOIN probably won't do due to the fact that some tables might have more than 1 row for the patient.
So I would retrieve multiple result-sets in a simple DataSet, add a DalaRelation, cache the object and query it down the line (by date, by exam type, subsets, ...)
The main point is that you have all the data handy, even cached if needed, in a structure which is easily queried and filtered.