SQL Index Table Join

SQL Index Table Join - c#

I am executing the following two queries against a SQL database from within my C# MVC application.
Query1
SELECT tableone.id, name, time, type, grade, product, element, value
FROM dbo.tableone INNER JOIN dbo.tabletwo ON dbo.tableone.id = dbo.tabletwo.id
Where name = '" + Name + "' Order By tableone.id Asc, element
Query2
Select DISTINCT element FROM dbo.tableone
INNER JOIN dbo.tabletwo ON dbo.tableone.id = dbo.tabletwo.id
Where name = '" + Name + "'"
Upon running the method that executes these queries each query hangs and oftentimes the next page of my application will not load for over a minute or it will time out on one or the other. When I run the same queries in SQL Server each of them take between 10 and 15 seconds to run which is still too long.
How can I speed them up? Ive never created a SQL index and Im not sure how to create it for each of these or if thats the right path to pursue.
Tableone currently has 20808805 rows and 3 columns, tabletwo has 597707 rows and 6 columns.
Tableone
id(int, not null)
element(char(9), not null)
value(real, null)
Tabletwo
id(int, not null)
name(char(7), null)
time(datetime, null)
type(char(5), null)
grade(char(4), null)
product(char(14), null)

Firstly, as #Robert Co said, a index on tabletwo.name will help on performance.
Also, are there indexes on tableone.id and tabletwo.id? I will asume there are, given they look like primary keys. If not, you definitely need to put indexes on them. I can see tableone to tabletwo is a many-to-one relation, which means you probably don't have a primary key on table one. You seriously need to add a primary key on tableone, such as tableoneid, and make it a clustered index!
I think another reason here is, your tableone is much bigger than tabletwo which is limited down even further by the where clause(name = 'Name'). This means you are joining a large table (tableone) to a small table (tabletwo with the where clause). In SQL, join large table to a small table is going to be slow.
The solution that I can think about is, maybe you can move some columns, such as 'type', to tableone, so that you can limit tableone into a small set in your query as well:
Select DISTINCT element FROM dbo.tableone
INNER JOIN dbo.tabletwo ON dbo.tableone.id = dbo.tabletwo.id
Where tableone.type = 'some type' and name = '" + Name + "'"
I am not quite sure how these suggestions fitted into your data model, I just hope they may help.

10 to 15 seconds with 20 million rows and no index? That's not bad!
As Ethen Li says it's all about indexes. In an ideal world you would create indexes on all columns that feature in a filter (JOINs and WHEREs) or ORDER BYs. However, as this could severely impact UPDATEs and INSERTs you need to be more practical and less ideal. With the information you have provided I would suggest creating the following indexes:
CREATE INDEX index1 ON tableone (name);
If tableone.id is your candidate key (that which uniquely identifies the row) you should also create an index on it - possibly clustered, it depends how ID is generated):
CREATE UNIQUE INDEX IX1TableOne ON tableone (id);
Or
CREATE UNIQUE CLUSTERED INDEX IX1TableOne ON tableone (id);
For tabletwo: the same applies to ID as for tableone - create at least a unqiue index on ID.
With these indexes in-place you should find a significant performance improvement.
Alternatively to add primary key constraints:
ALTER TABLE tableone ADD CONSTRAINT pktableone PRIMARY KEY CLUSTERED (id);
ALTER TABLE tabletwo ADD CONSTRAINT pktabletwo PRIMARY KEY CLUSTERED (id);
On tableone this might take a while because the data might have to be physically re-ordered. Therefore, do it during a maintenance period when there are no active users.

Related

How to Auto-increment non-integer primary key in sql-server? [duplicate]

Can I make a primary key like 'c0001, c0002' and for supplier 's0001, s0002' in one table?

The idea in database design, is to keep each data element separate. And each element has its own datatype, constraints and rules. That c0002 is not one field, but two. Same with XXXnnn or whatever. It is incorrect , and it will severely limit your ability to use the data, and use database features and facilities.
Break it up into two discrete data items:
column_1 CHAR(1)
column_2 INTEGER
Then set AUTOINCREMENT on column_2
And yes, your Primary Key can be (column_1, column_2), so you have not lost whatever meaning c0002 has for you.
Never place suppliers and customers (whatever "c" and "s" means) in the same table. If you do that, you will not have a database table, you will have a flat file. And various problems and limitations consequent to that.
That means, Normalise the data. You will end up with:
one table for Person or Organisation containing the common data (Name, Address...)
one table for Customer containing customer-specific data (CreditLimit...)
one table for Supplier containing supplier-specific data (PaymentTerms...)
no ambiguous or optional columns, therefore no Nulls
no limitations on use or SQL functions
.
And when you need to add columns, you do it only where it is required, without affecting all the other sues of the flat file. The scope of effect is limited to the scope of change.

My approach would be:
create an ID INT IDENTITY column and use that as your primary key (it's unique, narrow, static - perfect)
if you really need an ID with a letter or something, create a computed column based on that ID INT IDENTITY
Try something like this:
CREATE TABLE dbo.Demo(ID INT IDENTITY PRIMARY KEY,
IDwithChar AS 'C' + RIGHT('000000' + CAST(ID AS VARCHAR(10)), 6) PERSISTED
)
This table would contain ID values from 1, 2, 3, 4........ and the IDwithChar would be something like C000001, C000002, ....., C000042 and so forth.
With this, you have the best of both worlds:
a proper, perfectly suited primary key (and clustering key) on your table, ideally suited to be referenced from other tables
your character-based ID, properly defined, computed, always up to date.....

Yes, Actually these are two different questions,
1. Can we use varchar column as an auto increment column with unique values like roll numbers in a class
ANS: Yes, You can get it right by using below piece of code without specifying the value of ID and P_ID,
CREATE TABLE dbo.TestDemo
(ID INT IDENTITY(786,1) NOT NULL PRIMARY KEY CLUSTERED,
P_ID AS 'LFQ' + RIGHT('00000' + CAST(ID AS VARCHAR(5)), 5) PERSISTED,
Name varchar(50),
PhoneNumber varchar(50)
)
Two different increments in the same column,
ANS: No, you can't use this in one table.

I prefer artificial primary keys. Your requirements can also be implemented as unique index on a computed column:
CREATE TABLE [dbo].[AutoInc](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Range] [varchar](50) NOT NULL,
[Descriptor] AS ([range]+CONVERT([varchar],[id],(0))) PERSISTED,
CONSTRAINT [PK_AutoInc] PRIMARY KEY ([ID] ASC)
)
GO
CREATE UNIQUE INDEX [UK_AutoInc] ON [dbo].[AutoInc]
(
[Descriptor] ASC
)
GO

Assigning domain meaning to the primary key is a practice that goes way, way back to the time when Cobol programmers and dinosaurs walked the earth together. The practice survives to this day most often in legacy inventory systems. It is mainly a way of eliminating one or more columns of data and embedding the data from the eliminated column(s) in the PK value.
If you want to store customer and supplier in the same table, just do it, and use an autoincrementing integer PK and add a column called ContactType or something similar, which can contain the values 'S' and 'C' or whatever. You do not need a composite primary key.
You can always concatenate these columns (PK and ContactType) on reports, e.g. C12345, S20000, (casting the integer to string) if you want to eliminate the column in order to save space (i.e. on the printed or displayed page), and everyone in your organization understands the convention that the first character of the entity id stands for the ContactType code.
This approach will leverage autoincrementing capabilities that are built into the database engine, simplify your PK and related code in the data layer, and make your program and database more robust.

First let us state that you can't do directly. If you try
create table dbo.t1 (
id varchar(10) identity,
);
the error message tells you which data types are supported directly.
Msg 2749, Level 16, State 2, Line 1
Die 'id'-Identitätsspalte muss vom
Datentyp 'int', 'bigint', 'smallint',
'tinyint' oder 'decimal' bzw.
'numeric' mit 0 Dezimalstellen sein
und darf keine NULL-Werte zulassen.
BTW: I tried to find this information in BOL or on MSDN and failed.
Now knowing that you can't do it the direct way, it is a good choice to follow #marc_s proposal using computed columns.

Instead of doing 'c0001, c0002' for customers and 's0001, s0002' for suppliers in one table, proceed in the following way:
Create one Auto-Increment field "id" of Data Type "int (10) unsigned".
Create another field "type" of Data Type "enum ('c', 's')" (where c=Customer, s=Supplier).
As "#PerformanceDBA" pointed out, you can then make the Primary Key Index for two fields "id" & "type", so that your requirement gets fulfilled with the correct methodology.

INSERT INTO Yourtable (yourvarcharID)
values('yourvarcharPrefix'+(
SELECT CAST((SELECT CAST((
SELECT Substring((
SELECT MAX(yourvarcharID) FROM [Yourtable ]),3,6)) AS int)+1)
AS VARCHAR(20))))
Here varchar column is prefixed with 'RX' then followed by 001, So I selected substring after that prefix of it and incremented the that number alone.

We can add Default Constraint Function with table definition to achieve this.
First create table -
create table temp_so (prikey varchar(100) primary key, name varchar(100))
go
Second create new User Defined Function -
create function dbo.fn_AutoIncrementPriKey_so ()
returns varchar(100)
as
begin
declare #prikey varchar(100)
set #prikey = (select top (1) left(prikey,2) + cast(cast(stuff(prikey,1,2,'') as int)+1 as varchar(100)) from temp_so order by prikey desc)
return isnull(#prikey, 'SB3000')
end
go
Third alter table definition to add default constraint -
alter table temp_so
add constraint df_temp_prikey
default dbo.[fn_AutoIncrementPriKey_so]() for prikey
go
Fourth insert new row into table without specifying value for primary column-
insert into temp_so (name) values ('Rohit')
go 4
Check out data in table now -
select * from temp_so
OUTPUT -
prikey name
SB3000 Rohit
SB3001 Rohit
SB3002 Rohit
SB3003 Rohit

you may try below code:
SET #variable1 = SUBSTR((SELECT id FROM user WHERE id = (SELECT MAX(id) FROM user)), 5, 7)+1;
SET #variable2 = CONCAT("LHPL", #variable1);
INSERT INTO `user`(`id`, `name`) VALUES (#variable2,"Jeet");
1st line to get last inserted Id by removing four character than increase one value and set to a variable1
2nd line to make complete id with four character prefix and assign to variable2
insert new value with generated new primary key = variable2
you should have minimum one data in this table to work above SQL

No. If you really need this, you will have to generate ID manually.

Get foreign key field from primary key in another table that linked to many table

My database is designed in SQL Server & I want get output in asp.net, LINQ, C#
I have 2 tables linked to 1 table (1:1)
My question is can I get a primary key linked to which table?
For example:
tbl_Document (ID, Date, ...)
tbl_Factor (ID, DocID, ...)
tbl_Finance (ID, DocID, ...)
What is the best way to know ID in tbl_Document linked to which table?
I can add record in tbl_Document as 'whichTable' and write the name of table in every column, and every time I want to search set "if" and check 'WhichTable'.
Is there a better way to do that?
Thanks, and sorry for my bad English :)

By default, you can get only all tables that have foreign key constraints to the parent table:
select object_name(f.referenced_object_id) pk_table, c1.name pk_column_name,
object_name(f.parent_object_id) fk_table, c2.name fk_column_name
from
sys.foreign_keys f
join sys.columns c1 on c1.object_id = f.referenced_object_id
join sys.columns c2 on c2.object_id = f.parent_object_id
join sys.foreign_key_columns k
on (k.constraint_object_id = f.object_id
and c2.column_id = k.parent_column_id
and c1.column_id = k.referenced_column_id )
where object_name(f.referenced_object_id) ='tbl_Document'
There is no such additional information regarding every particular row in parent table. It would be a duplicate information (since you can figure out it by searching in every child table). Thus, as you mentioned you can store child table name in the additional column and then as an option, construct sql dynamically to query child row.

improve query performance on SQL Server table contain 3.5 Million rows and growing

I have written one application in C# which is connected to sql server database express edition, from front end I populate the particular table in database every few second and insert approx 200~300 Rows in this table.
Currently table contains approx 3.5 Million rows and its keep growing, the table definition is as below
[DEVICE_ID] [decimal](19, 5) NULL,
[METER_ID] [decimal](19, 5) NULL,
[DATE_TIME] [decimal](19, 5) NULL,
[COL1] [decimal](19, 5) NULL,
[COL2] [decimal](19, 5) NULL,
.
.
.
.
[COL25] [decimal](19, 5) NULL
I have created non clustered index on Date_Time column, and to note there is no unique column exists if it requires I can create identity column (Auto increment) to this but my report generation logic is totally based on Date_Time column.
I usually fire the query based on time, I.e. if I need to calculate the variation occurred in the col1 in the month period. I will need the value of Col1 on first value of 1st day and last value of last day of month, like wise i need to fire the query for flexible dates and I usually need only opening value and closing value based on Date_Time column for any chosen column.
To get first value of col1 for the first day, the query is
select top (1) COL1 from VALUEDATA where DeviceId=#DId and MeterId =#MId and Date_Time between #StartDateTime and #EndDateTime order by Date_Time
To get last value of col1 for the last day, the query is
select top (1) COL1 from VALUEDATA where DeviceId=#DId and MeterId =#MId and Date_Time between #StartDateTime and #EndDateTime order by Date_Time desc
But when I fire the above queries its takes approx 20~30 seconds, I believe this can be further optimized but don't know the way ahead.
One thought i given to this is to create another table and insert first and last row on every day basis and fetch data from this. But I will avoid the same if I can do something in existing table and query.
It’s greatly appreciable if someone can provide the inputs for the same.

To fully optimize those queries you need two different multiple indexes :
CREATE INDEX ix_valuedata_asc ON VALUEDATA (DeviceId, MeterId, Date_Time);
CREATE INDEX ix_valuedata_des ON VALUEDATA (DeviceId, MeterId, Date_Time DESC);

I have another suggestion: if your goal is to get the values of COL1, COL2 etc after you do the index lookup, the solution with just a nonclustered index on the filtering columns still has to join back to the main table, ie; do a bookmark / RID lookup.
Your info gives me the impression your base table does is not clustered (has no clustered index); is in fact a heap table
If most of your querys on the table follow the pattern you describe, I would make this table clustered. In contrary what most people think, you do not have to define an clustered index as the (unique) primary key. If you define a clustered index in SQL server on non unique data, SQL server will make it unique 'under water' by adding an invisible row identifier...
If the main, most often USED selection / filter criteria on this table is date time, I would change the table to the following clustered structure:
First, remove all non clustered indexes
Then add the following clustered index:
CREATE CLUSTERED INDEX clix_valuedata ON VALUEDATA (Date_Time, DeviceId, MeterId);
When using query's that follow your pattern, you (probably!) will get very performant Clustered index SEEK style access to your table if you look at the query explain plan.. You will now get all the other columns in the table for free, as bookmark lookups are not needed anymore. This approach will probably scale better too as the table grows; because of the SEEK behaviour...

View and Entity Framework data not right?

I have a view from some table when I select from the view in SQL Server Management Studio it works fine, but when I use Entity Framework to get the data from view it's different.
ReturnDbForTesEntities1 db = new ReturnDbForTesEntities1();
List<VJOBS2> list = new List<VJOBS2>();
list = db.VJOBS2.ToList();
Same number of records but last 2 rows are different.
I have table for job applicant applicant can apply for 2 jobs or more
ApplicantId ApplicantName JobId JobName
1 Mohamed 1 Developer
1 Mohamed 2 IT Supporter
but in list
ApplicantId ApplicantName JobId JobName
1 Mohamed 1 Developer
1 Mohamed 1 Developer

There is a subtle problem with views when used from Entity Framework.
If you have a table, do use it with EF, you need to have a primary key to uniquely identify each row. Typically, that's a single column, e.g. an ID or something like that.
With a view, you don't have the concept of a "primary key" - the view just contains some columns from some tables.
So when EF maps a view, it cannot find a primary key - and therefore, it will use all non-nullable columns from the view as "substitute" primary key.
I don't know what these are in your case - you should be able to tell from the .edmx model.
Let's assume that (ApplicantId, ApplicantName) are the two non-nullable columns that EF now uses as a "substitute" primary key. When EF goes to read the data, it will read the first line (1, Mohamed, 1, Developer) and create an object for that.
When EF reads the second line (1, Mohamed, 2, IT-Supporter), it notices that the "primary key" (1, Mohamed) is the same as before - so it doesn't bother creating a new object with those values read, but the primary key is the same, it hence must be the same object as it has already read before, so it uses that object instead.
So the problem really is that you can't have explicit primary keys on a view.
Either you can tweak your EF model to make it clear to EF that e.g. (ApplicantId, JobId) is really the primary key (you need to make sure those columns are both non-nullable) - or you need to add something like a "artificial" primary key to your view:
CREATE VIEW dbo.VJOBS2
AS
SELECT
ApplicantId, ApplicantName, JobId, JobName,
RowNum = ROW_NUMBER() OVER(ORDER BY JobId)
FROM
dbo.YourBaseTable
By adding this RowNum column to your view, which just numbers the rows 1, 2, ...., n, you get a new, non-nullable column which EF will include into the "substitute PK" and since those numbers are sequential, no two rows will have the same "PK" values and therefore none will erroneously be replaced by something that's been read from the database already.

FYI, I had to add ISNULL to get it to work for me, see the modification in the first line of this code example:
SELECT ISNULL(ROW_NUMBER() OVER(ORDER BY a.OrderItemID),0) as ident, a.*
FROM
(
SELECT e.AssignedMachineID, e.StartDate, e.OrderItemID, e2.OrderItemID AS doubleBookedEventID, e.StartTime, e.EndTime, e2.StartTime AS doubleBookedStartDateTime, e2.EndTime AS doubleBookedEndDateTime, DATEDIFF(MINUTE,e2.StartTime,e.EndTime) AS doubleBookedMinutes
FROM schedule e
INNER JOIN schedule e2
ON e.AssignedMachineID = e2.AssignedMachineID
and e.StartDate=e2.StartDate
AND e.schedID <> e2.schedID
AND e2.StartTime BETWEEN DATEADD(minute,1,e.StartTime) AND DateAdd(minute,-1,e.EndTime) where Coalesce(e.ManuallyOverrided,0)=0 and Coalesce(e.AssignedMachineID,0) > 0
) a

Selecting id's from a huge database

I have a database with over 3,000,000 rows, each has an id and xml field with varchar(6000).
If I do SELECT id FROM bigtable it takes +- 2 minutes to complete. Is there any way to get this in 30 seconds?

Build clustered index on id column
See http://msdn.microsoft.com/en-us/library/ms186342.aspx

You could apply indexes to your tables. In your case a clustered index.
Clustered indexes:
http://msdn.microsoft.com/en-gb/library/aa933131(v=sql.80).aspx
I would also suggest filtering your query so it doesn't return all 3 million rows each time, this can be done by using TOP or WHERE.
TOP:
SELECT TOP 1000 ID
FROM bigtable
WHERE:
SELECT ID FROM
bigtable
WHERE id IN (1,2,3,4,5)

First of all, 3 milion records dont make a table 'Huge'.
To optimize your query, you should do the following.
Filter your query, why do you need to get ALL your IDs?
Create clustered index for the ID column to get a smaller lookup table to search first before pointing to the selected row.
Helpful threads, here and here

Okay, why are you retuning all the Ids to the client?
Even if your table has no clustered index (which I doubt), the vast majority of you processing time will be client-side, transferring the Id values over the network and displaying them on the screen.
Querying for all values rather defeats the point of having a query engine.
The only reason I can think of (perhaps I lack imagination) for getting all the Ids is some sort of misguided caching.
If you want to know many you have do
SELECT count(*) FROM [bigtable]
If you want to know if an Id exists do
SELECT count([Id[) FROM [bigtable] WHERE [Id] = 1 /* or some other Id */
This will return 1 row with a 1 or 0 indicating existence of the specified Id.
Both these queries will benefit massively from a clustered index on Id and will return minimal data with maximal information.
Both of these queries will return in less than 30 seconds, and in less than 30 milliseconds if you have a clustered index on Id
Selecting all the Ids will provide no more useful information than these queries and all it will achieve is a workout for you network and client.

You could index your table for better performance.
There are additional options as well which you could use to imrpove performance like partion feature.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.