Lets say I need to fetch some records from the database, and filter them based on an enumeration type property.
fetch List<SomeType>
filter on SomeType.Size
enumeration Size { Small, Medium, Large }
when displaying records, there will be a predefined value for Size filter (ex Medium). In most of the cases, user will select a value from filtered data by predefined value.
There is a possibility that a user could also filter to Large, then filter to Medium, then filter to Large again.
I have different situations with same scenario:
List contains less than 100 records and 3-5 properties
List contains 100-500 records and 3-5 properties
List contains max 2000 records with 3-5 properties
What is my best approach here? Should I have a tab that will contain grid for each enum, or should I have one common enum and always filter, or?
I would do the filtering right on the database, if those fields are indexed I would suspect having the db filter it would be much faster than filtering with c-sharp after the fact.
Of course you can always cache the filtered database result as to prevent multiple unnescessary database calls.
EDIT: as for storing the information in the database, suppose you had this field setup:
CREATE TABLE Tshirts
(
id int not null identity(1,1),
name nvarchar(255) not null,
tshirtsizeid int not null,
primary key(id)
)
CREATE TABLE TshirtSizes
(
id int not null, -- not auto-increment
name nvarchar(255)
)
INSERT INTO TshirtSizes(id, name) VALUES(1, 'Small')
INSERT INTO TshirtSizes(id, name) VALUES(2, 'Medium')
INSERT INTO TshirtSizes(id, name) VALUES(3, 'Large')
ALTER TABLE Tshirts ADD FOREIGN KEY(tshirtsizeid) REFERENCES tshirtsize(id)
then in your C#
public enum TShirtSizes
{
Small = 1,
Medium = 2,
Large = 3
}
In this example, the table TshirtSizes is only used for the reader to know what the magic numbers 1, 2, and 3 mean. If you don't care about database read-ability you can omit those tables and just have an indexed column.
Memory is usually cheap. Otherwise you could one-time sort all the values and retrieve based on comparison which would be O(n). You could keep track of the positions of things and retrieve faster that way.
Related
If i do a query like this
SELECT * from Foo where Bar = '42'
and Bar is a int column. Will that string value be optimized to 42 in the db engine? Will it have some kind of impact if i leave it as it is instead of changing it to:
Select * from Foo where Bar = 42
This is done on a SQL Compact database if that makes a difference.
I know its not the correct way to do it but it's a big pain going though all code looking at every query and DB schema to see if the column is a int type or not.
SQL Server automatically convert it to INT that because INT has higher precedence than VARCHAR.
You should also be aware of the impact that implicit conversions can
have on a query’s performance. To demonstrate what I mean, I’ve created and populated the following table in the AdventureWorks2008 database:
USE AdventureWorks2008;
IF OBJECT_ID ('ProductInfo', 'U') IS NOT NULL
DROP TABLE ProductInfo;
CREATE TABLE ProductInfo
(
ProductID NVARCHAR(10) NOT NULL PRIMARY KEY,
ProductName NVARCHAR(50) NOT NULL
);
INSERT INTO ProductInfo
SELECT ProductID, Name
FROM Production.Product;
As you can see, the table includes a primary key configured with the
NVARCHAR data type. Because the ProductID column is the primary key,
it will automatically be configured with a clustered index. Next, I
set the statistics IO to on so I can view information about disk
activity:
SET STATISTICS IO ON;
Then I run the following SELECT statement to retrieve product
information for product 350:
SELECT ProductID, ProductName
FROM ProductInfo
WHERE ProductID = 350;
Because statistics IO is turned on, my results include the following
information:
Table 'ProductInfo'. Scan count 1, logical reads 6, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
Two important items to notice is that the query performed a scan and
that it took six logical reads to retrieve the data. Because my WHERE
clause specified a value in the primary key column as part of the
search condition, I would have expected an index seek to be performed,
rather than I scan. As the figure below confirms, the database engine performed a scan, rather than a seek. Figure below shows the details of that scan (accessed by hovering the mouse over the scan icon).
Notice that in the Predicate section, the CONVERT_IMPLICIT function is
being used to convert the values in the ProductID column in order to
compare them to the value of 350 (represented by #1) I passed into the
WHERE clause. The reason that the data is being implicitly converted
is because I passed the 350 in as an integer value, not a string
value, so SQL Server is converting all the ProductID values to
integers in order to perform the comparisons.
Because there are relatively few rows in the ProductInfo table,
performance is not much of a consideration in this instance. But if
your table contains millions of rows, you’re talking about a serious
hit on performance. The way to get around this, of course, is to pass
in the 350 argument as a string, as I’ve done in the following
example:
SELECT ProductID, ProductName
FROM ProductInfo
WHERE ProductID = '350';
Once again, the statement returns the product information and the statistics IO data, as shown in the following results:
Now the index is being properly used to locate the record. And if you
refer to Figure below, you’ll see that the values in the ProductID
column are no longer being implicitly converted before being compared
to the 350 specified in the search condition.
As this example demonstrates, you need to be aware of how performance
can be affected by implicit conversions, just like you need to be
aware of any types of implicit conversions being conducted by the
database engine. For that reason, you’ll often want to explicitly
convert your data so you can control the impact of that conversion.
You can read more about Data Conversion in SQL Server.
If you look into the MSDN chart which tells about the implicit conversion you will find that string is implicitly converted into int.
both should work in your case but the norme is to use quote anyway.
cuz if this work.
Select * from Foo where Bar = 42
this not
Select * from Foo where Bar = %42%
and this will
SELECT * from Foo where Bar = '%42%'
ps: you should anyway look at entity framework and linq query it make it simple...
If i am not mistaken, the SQL Server will read it as INT if the string will only contains number (numeric) and you're comparing it to the INTEGER column datatype, but if the string is is alphanumeric , then that is the time you will encounter an error or have an unexpected result.
My suggestion is , in WHERE clause, if you are comparing integer, do not put single quote. that is the best practice to avoid error and unexpected result.
You should use always parameters when executing sql by code, to avoid security lacks (EJ: Sql injection).
On a SQL Server database I have a table with about 200 items:
create table dbo.Reports
(
Id int identity not null,
HealthRating int not null, --- { One of: NoProblems; TemporaryChange; ... }
Hobbies int not null, --- { Many of: None, Running, Tennis, Football, Swimming }
HobbiesOthers nvarchar (400) null
-- More 100 columns
);
So I have about 200 columns with types: INT, NVARCHAR, BIT and DATETIME.
Some of the INT columns are as HealthRating to store one value.
Others are like Hobbies to hold many items ... And usually have an extra column to store other options as text (nvarchar) ...
How should I structure this table? I see 3 options:
Have one column for each property so:
HealthRatingNoProblems bit no null,
HealthRatingTemporaryChange bit no null,
Create lookup tables for HealthRatings, Hobbies, ...
Probably I will end with more 60 tables or so ...
Use Enums and Flag enums which are now supported in Entity Framework and store one choice and multiple choice items in Int columns as I posted.
What would you suggest?
By all means -- please! -- normalize that poor table. If you end up with 50 or 60 tables then so be it. That is the design. If a user has a hobby, that information will be in the Hobby table. If he has three hobbies, there will be three entries in the Hobby table. If he doesn't have any hobbies, there will be nothing in the Hobby table. So on with all the other tables.
And for all those times you are only interested in hobbies, you only involve the Hobby table with the Reports table and leave all the other tables alone. You can't do that with one huge, all encompassing row that attempts to hold everything. There, if you only want to look at hobby information, you still have to read in the entire row, bringing in all that data you don't want. Why read in data you are just going to discard?
Lets say I have a table Person and I need that a user can add different attributes to him/herself.
User should be able to add a date, string, number, boolean, multiple values.
Lets say he wants to add:
Date of birth
Name
Heigth
Children names
How would I hold this in database?
I have 2 ideas:
I can hold all the values as string or varchar and always parse the value back to original format when used. Multiple values holding like text1#text2#text3 or similar.
Having a table, where there are columns for each : date, string, number and only the one that is needed will be populated and other will stay nulls.
Any suggestions?
Good database design should always be N:1 (many to one) or 1:1 (one to one), never 1:N (one to many) or N:N (many to many), meaning that if you have multiple related fields of a user, you should make a new table that refers to the user.
Since a user can only have one birth date though, you should keep that as a column to the Users table.
For example, in this case you want children names as the "multiple", assigned to one user.
A simple table for that could look like this:
ID int primary key
UserID int references User(ID)
Name varchar
That way, you can make multiple children names for one user, while still being able to keep constraints in the database (which helps ensure code correctness if you're interfacing with it through an application!)
Some people will suggest having a table for each of the values, just to avoid nulls. For example, to store their birthdate, you make a table similar to the Children names table above, since you won't have to make a column in the Users table that might be null.
Personally I think using nulls are fine, as they allow you to see if there is a relevant result set without joining (or worse, left joining) an entire table of potentially irrelevant information.
Use your second approach. In your table 'Person', have a row for each record that has multiple columns each which holds a single value for you desired fields.
So..
tbPerson
ID | Date Of Birth | Name | Height | Childrens names | etc...
To Create a table...
CREATE TABLE tbPerson([ID] INT IDENTITY(1,1), [Date Of Birth] DATE, [Name] VARCHAR(50), Height INT, [Childrens names] VARCHAR(250))
This is the best and easiest way and enables editing 1 field of a persons records simple. In your first approach you will have endless nightmares storing everything a 1 long string.
I have a database with over 3,000,000 rows, each has an id and xml field with varchar(6000).
If I do SELECT id FROM bigtable it takes +- 2 minutes to complete. Is there any way to get this in 30 seconds?
Build clustered index on id column
See http://msdn.microsoft.com/en-us/library/ms186342.aspx
You could apply indexes to your tables. In your case a clustered index.
Clustered indexes:
http://msdn.microsoft.com/en-gb/library/aa933131(v=sql.80).aspx
I would also suggest filtering your query so it doesn't return all 3 million rows each time, this can be done by using TOP or WHERE.
TOP:
SELECT TOP 1000 ID
FROM bigtable
WHERE:
SELECT ID FROM
bigtable
WHERE id IN (1,2,3,4,5)
First of all, 3 milion records dont make a table 'Huge'.
To optimize your query, you should do the following.
Filter your query, why do you need to get ALL your IDs?
Create clustered index for the ID column to get a smaller lookup table to search first before pointing to the selected row.
Helpful threads, here and here
Okay, why are you retuning all the Ids to the client?
Even if your table has no clustered index (which I doubt), the vast majority of you processing time will be client-side, transferring the Id values over the network and displaying them on the screen.
Querying for all values rather defeats the point of having a query engine.
The only reason I can think of (perhaps I lack imagination) for getting all the Ids is some sort of misguided caching.
If you want to know many you have do
SELECT count(*) FROM [bigtable]
If you want to know if an Id exists do
SELECT count([Id[) FROM [bigtable] WHERE [Id] = 1 /* or some other Id */
This will return 1 row with a 1 or 0 indicating existence of the specified Id.
Both these queries will benefit massively from a clustered index on Id and will return minimal data with maximal information.
Both of these queries will return in less than 30 seconds, and in less than 30 milliseconds if you have a clustered index on Id
Selecting all the Ids will provide no more useful information than these queries and all it will achieve is a workout for you network and client.
You could index your table for better performance.
There are additional options as well which you could use to imrpove performance like partion feature.
I have a SQL Server database designed like this :
TableParameter
Id (int, PRIMARY KEY, IDENTITY)
Name1 (string)
Name2 (string, can be null)
Name3 (string, can be null)
Name4 (string, can be null)
TableValue
Iteration (int)
IdTableParameter (int, FOREIGN KEY)
Type (string)
Value (decimal)
So, as you've just understood, TableValue is linked to TableParameter.
TableParameter is like a multidimensionnal dictionary.
TableParameter is supposed to have a lot of rows (more than 300,000 rows)
From my c# client program, I have to fill this database after each Compute() function :
for (int iteration = 0; iteration < 5000; iteration++)
{
Compute();
FillResultsInDatabase();
}
In FillResultsInDatabase() method, I have to :
Check if the label of my parameter already exists in TableParameter. If it doesn't exist, i have to insert a new one.
I have to insert the value in the TableValue
Step 1 takes a long time ! I load all the table TableParameter in a IEnumerable property and then, for each parameter I make a
.FirstOfDefault( x => x.Name1 == item.Name1 &&
x.Name2 == item.Name2 &&
x.Name3 == item.Name3 &&
x.Name4 == item.Name4 );
in order to detect if it already exists (and after to get the id).
Performance are very bad like this !
I've tried to make selection with WHERE word in order to avoid loading every row of TableParameter but performance are worse !
How can I improve the performance of step 1 ?
For Step 2, performance are still bad with classic INSERT. I am going to try SqlBulkCopy.
How can I improve the performance of step 2 ?
EDITED
I've tried with Store Procedure :
CREATE PROCEDURE GetIdParameter
#Id int OUTPUT,
#Name1 nvarchar(50) = null,
#Name2 nvarchar(50) = null,
#Name3 nvarchar(50) = null
AS
SELECT TOP 1 #Id = Id FROM TableParameter
WHERE
TableParameter.Name1 = #Name1
AND
(#Name2 IS NULL OR TableParameter.Name2= #Name2)
AND
(#Name3 IS NULL OR TableParameter.Name3 = #Name3)
GO
CREATE PROCEDURE CreateValue
#Iteration int,
#Type nvarchar(50),
#Value decimal(32, 18),
#Name1 nvarchar(50) = null,
#Name2 nvarchar(50) = null,
#Name3 nvarchar(50) = null
AS
DECLARE #IdParameter int
EXEC GetIdParameter #IdParameter OUTPUT,
#Name1, #Name2, #Name3
IF #IdParameter IS NULL
BEGIN
INSERT TablePArameter (Name1, Name2, Name3)
VALUES
(#Name1, #Name2, #Name3)
SELECT #IdParameter= SCOPE_IDENTITY()
END
INSERT TableValue (Iteration, IdParamter, Type, Value)
VALUES
(#Iteration, #IdParameter, #Type, #Value)
GO
I still have the same performance... :-( (not acceptable)
If I understand what's happening you're querying the database to see if the data is there in step 1. I'd use a db call to a stored procedure that that inserts the data if it not there. So just compute the results and pass to the sp.
Can you compute the results first, and then insert in batches?
Does the compute function take data from the database? If so can you turn the operation in to a set based operation and perform it on the server itself? Or may part of it?
Remember that sql server is designed for a large dataset operations.
Edit: reflecting comments
Since the code is slow on the data inserts, and you suspect that it's because the insert has to search back before it can be done, I'd suggest that you may need to place SQL Indexes on the columns that you search on in order to improve searching speed.
However I have another idea.
Why don't you just insert the data without the check and then later when you read the data remove the duplicates in that query?
Given the fact that name2 - name3 can be null, would it be possible to restructure the parameter table:
TableParameter
Id (int, PRIMARY KEY, IDENTITY)
Name (string)
Dimension int
Now you can index it and simplify the query. (WHERE name = "TheNameIWant" AND Dimension="2")
(And speaking of indices, you do have index the name columns in the parameter table?)
Where do you do your commits on the insert? if you do one statement commits, group multiple inserts into one.
If you are the only one inserting values, if speed is really of essence, load all values from the database into the memory and check there.
just some ideas
hth
Mario
I must admit that I'm struggling to grasp the business process that you are trying to achieve here.
On initial review, it appears as if you are are performing a data comparison within your application tier. I would advise against this and suggest that you let the Database Engine do what it is designed to do, to manage and implement your data access.
As another poster has mentioned, I concur that you should look to create a Stored Procedure to handle your record insertion logic. The procedure can perform a simple check to see if your records already exist.
You should also consider:
Enforcing the insertion logic/rule by creating a Unique Constraint across the four name columns.
Creating a covering non-clustered index incorporating the four name columns.
With regard to performance of your inserts, perhaps you can provide some metrics to qualify what it is that you are seeing and how you are measuring it?
To give you a yardstick the current ETL insertion record for SQL Server is approx 16 million rows per second. What sort of numbers are you expecting and wanting to see?
the fastest way ( i know so far) is bulk insert. but not just lines of INSERT. try insert + select + union. it works pretty fast.
insert into myTable
select a1, b1, c1, ...
union select a2, b2, c2, ...
union select a3, b3, c3, ...