I've taken over an ASP.NET application that needs to be re-written. The core functionality of this application that I need to replicate modifies a SQL Server database that is accessed via ODBC from third party software.
The third-party application creates files that represent printer labels, generated by a user. These label files directly reference an ODBC source's fields. Each row of the table represents a product that populates the label's fields. (So, within these files are direct references to the column names of the table.)
The ASP.NET application allows the user to create/update the data for these fields that are referenced by the labels, by adding or editing a particular row representing a product.
It also allows the occasional addition of new fields... where it actually creates a new column in the core table that is referenced by the labels.
My concern: I've never programmatically altered an existing table's columns before. The existing application seems to handle this functionality fine, but before I blindly do the same thing in my new application, I'd like to know what sort of pitfalls exist in doing this, if any... and if there are any obvious alternatives.
It can become problem when too many columns are added to tables, and you have to be careful if performance is a consideration (covering indexes are not applicable, so expensive bookmark lookups might be performed).
The other alternative is a Key-Value Pair structure: Key Value Pairs in Database design, but that too has it's pitfalls and you are better off creating new columns, as you are suggesting. (KVPs are good for settings)
One option I think is to use a KVP table for storing dynamic "columns" (as first mentioned by Mitch), join the products table with the KVP table based on the product id then pivot the results in order to have all the dynamic columns in the resultset.
EDIT: something along these lines:
Prepare:
create table Product(ProductID nvarchar(50))
insert Product values('Product1')
insert Product values('Product2')
insert Product values('Product3')
create table ProductKVP(ProductID nvarchar(50), [Key] nvarchar(50), [Value] nvarchar(255))
insert ProductKVP values('Product1', 'Key2', 'Value12')
insert ProductKVP values('Product2', 'Key1', 'Value21')
insert ProductKVP values('Product2', 'Key2', 'Value22')
insert ProductKVP values('Product2', 'Key3', 'Value23')
insert ProductKVP values('Product3', 'Key4', 'Value34')
Retrieve:
declare #forClause nvarchar(max),
#sql nvarchar(max)
select #forClause = isnull(#forClause + ',', '') + '[' + [Key] + ']' from (
select distinct [Key] from ProductKVP /* WHERE CLAUSE */
) t
set #forClause = 'for [Key] in (' + #forClause + ')'
set #sql = '
select * from (
select
ProductID, [Key], [Value]
from (
select k.* from
Product p
inner join ProductKVP k on (p.ProductID = k.ProductID)
/* WHERE CLAUSE */
) sq
) t pivot (
max([Value])' +
#forClause + '
) pvt'
exec(#sql)
Results:
ProductID Key1 Key2 Key3 Key4
----------- --------- --------- --------- -------
Product1 NULL Value12 NULL NULL
Product2 Value21 Value22 Value23 NULL
Product3 NULL NULL NULL Value34
It very much depends on the queries you want to run against those tables. The main disadvantage of KVP is that more complex queries can become very inefficient.
A "hybrid" approach of both might be interesting.
Store the values you want to query in dedicated columns and leave the rest in an XML blob (MS SQL has nice features to even query inside the XML) or alternatively in a KVP bag. Personally I really don't like KVPs in DBs because you cannot build application logic specific indixes anymore.
Just another approach would be not to model the specific columns at all. You create generic "custom attribute" tables like: Attribute1, Attribute2, Attribute3, Attribute4 (for the required data type etc...) You then add meta data to your database that describes what AttrX means for a specific type of printer label.
Again, it really depends on how you want to use that data in the end.
One risk is the table getting too wide. I used to maintain a horrible app that added 3 columns "automagically" when new values were added to some XML (for some reason it thought everything would be a string a date or a number- hence the creation of 3 columns).
There are other techniques like serializing a BLOB or designing the tables differently that may help.
Related
After some advice if there is a simpler and more efficient way of what I'm about to do....
I have a table with product data in sqlserver then a frontend in asp.net c#, this has export to excel, txt file options and publish to API's.
now I need to build in that we hold certain fields like product description in a different format for certain customers,
so product table is like
PT_PRODUCT|PT_DESC |PT_SIZE
ABC123 |Super Cool Ice-Cream |small
but then for 'Customer 1' the product description needs to be 'Ice Cool Lollypop'
I was going to create a class for 'Product' in my application and fill that with the values from the main table,
then query a second table that would look like this,
CUST |PRODUCT | FIELD_ID | FIELD_VAL
CU1 |ABC123 |PT_DESC |Ice Cool Lollypop
and would run something like
select * from table2 where cust='CA1' and product='ABC123'
for(int i=0;i< ds.tables[0].rows.count;i++)
{
switch(ds.Tables[0].Rows[i]["FIELD_ID"])
{
case "PT_DESC":
ClassProd.DESC = ds.Tables[0].Rows[i]["FIELD_VAL");
//and so on updating the class
}
}
the use the updated class to update the customers site via the API or exporting to excel ect,
now for the slight curve-ball, there may be around 20+ fields that need to be overridden by the customers data, also going down this route I will be dictating the fields that can be overridden, so was wondering if there was a way of doing this in the original sql select.
You could create a stored procedure and forget about having to do any of the C# to get the customer's custom products.
This left joins the CustomerProducts table on the Product Id and the Customer Id. If it is NULL, it didn't find a customer product description so it will use the default one from the Products table. If it is NOT NULL, then it found a customer product description in CustomerProducts and uses that instead.
I don't know your schema exactly, but this is the gist:
CREATE PROCEDURE GetCustomerProducts
(
#CustomerId VARCHAR(255),
#ProductId VARCHAR(255)
)
AS
BEGIN
SELECT PRODUCT
,FIELD_ID
,CASE
WHEN cp.FIELD_VAL IS NOT NULL
THEN cp.FIELD_VAL
ELSE p.FIELD_VAL
END AS FIELD_VAL
FROM Products p
LEFT JOIN CustProducts cp ON cp.PT_PRODUCT = p.PT_PRODUCT
AND cp.CUST = #CustomerID
WHERE p.PT_PRODUCT = #ProductId
END
Then:
EXEC GetCustomerProducts #CustomerId = 'CU1', #ProductId = 'ABC123'
I am using local database for first time with my WPF project. I have the database setup, and I am connecting fine ETC. Ther eare some columns which I want to be multiple choice, either between a few values or a whole bunch of values. Problem is obviously human error will make typos now and then when inputting the data.
How would I go about making the data entry give the user a multiple choice? So for example, I have a column called "Category", and at the moment (this will be expanded later) I only want to allow the following options:
Bronze
Misc
I have the columns set to nvarchar(50) at present, but typing the same string manually constantly... not what I would like to be doing TBH... so... Could I set it so that there are a list of predefined values it will accept? :)
thanks :D
You can use CHECK constraint of any complexity on your table column(s). Check MSDN here
So your table definition would be as:
CREATE TABLE T
(
Category nvarchar(50) CHECK (Category in ('Bronze','Misc'))
)
If you expect your list of possible values to change in the future and you do not want to change a table definition, you can create a separate table with the list of values and use the foreign key.
CREATE TABLE Categories
(
Id int PRIMARY KEY,
CategoryName nvarchar(50)
)
INSERT INTO Categories VALUES (1, 'Bronze'), (2, 'Silver'), (3, 'Misc')
CREATE TABLE T
(
CategoryId int REFERENCES Categories
)
I have a windows service which basically watches a folder for any CSV file. Each record in the CSV file is inserted into a SQL table. If the same CSV file is put in that folder, it can lead to duplicate record entries in the table. How can I avoid duplicate insertions into the SQL table?
Try INSERT WHERE NOT EXISTS, where a, b and c are relevant columns, #a, #b and #c are relevant values.
INSERT INTO table
(
a,
b,
c
)
VALUES
(
#a,
#b,
#c
)
WHERE NOT EXISTS
(
SELECT 0 FROM table WHERE a = #a, b = #b, c = #c
)
The accepted answer has a syntax error and is not compatible with relational databases like MySQL.
Specifically, the following is not compatible with most databases:
values(...) where not exists
While the following is generic SQL, and is compatible with all databases:
select ... where not exists
Given that, if you want to insert a single record into a table after checking if it already exists, you can do a simple select with a where not exists clause as part of your insert statement, like this:
INSERT
INTO table_name (
primay_col,
col_1,
col_2
)
SELECT 1234,
'val_1',
'val_2'
WHERE NOT EXISTS (
SELECT 1
FROM table_name
WHERE primary_col=1234
);
Simply pass all values with the select keyword, and put the primary or unique key condition in the where clause.
Problems with the answers using WHERE NOT EXISTS are:
performance -- row-by-row processing requires, potentially, a very large number of table scans against table
NULL handling -- for every column where there might be NULLs you will have to write the matching condition in a more complicated way, like
(a = #a OR (a IS NULL AND #a IS NULL)).
Repeat that for 10 columns and viola - you hate SQL :)
A better answer would take into account the great SET processing capabilities that relational databases provide (in short -- never use row-by-row processing in SQL if you can avoid it. If you can't -- think again and avoid it anyway).
So for the answer:
load (all) data into a temporary table (or a staging table that can be safely truncated before load)
run the insert in a "set"-way:
INSERT INTO table (<columns>)
select <columns> from #temptab
EXCEPT
select <columns> from table
Keep in mind that the EXCEPT is safely dealing with NULLs for every kind of column ;) as well as choosing a high-performance join type for matching (hash, loop, merge join) depending on the available indexes and table statistics.
The database for my application contains tables (not editable by the user) that are necessary for my application to run. For instance, there is a Report table containing a list of my SSRS reports.
Except for the Auto-Increment and GUID fields, the data in my Report Table should match across all databases.
To keep existing client databases in synch with the ones created from scratch, there is a database updater app that runs scripts to update the existing client base.
There are Unit Tests to ensure Reports run correctly on both types of databases. However, other than developer eye, there is no system check to ensure the rows and values in those rows match among the tables. This is prone to human error.
To fix, I plan to add a small report to Unit Test report that will inform development of the following:
Records missing from the "Made From Scratch" database that exist in the "Updated" Database
Records missing from the "Updated" database that exist in the "Made From Scratch" Database
Fields that do not match between the tables
So far, I have a query to report the above information for all tables involved.
A sample query would look something like this:
--Take the fields I want to compare from TableToCompare in MadeFromScratch and put them in #First_Table_Var
--NOTE: MyFirstField should match in both tables in order to compare the values between rows
DECLARE #First_Table_Var table(
MyFirstField Varchar(255),
MySecondField VarChar(255),
MyThirdField Varchar(255),
);
INSERT INTO #First_Table_Var
SELECT
r.MyFirstField,
r.MySecondField,
l.MyThirdField
FROM
MadeFromScratch.dbo.TableToCompare r
INNER JOIN MadeFromScratch.dbo.LookUpTable l ON r.ForeignKeyID = l.PrimaryKeyID
--Take the fields I want to compare from TableToCompare in UpdatdDatabase and put them in #Second_Table_Var
DECLARE #Second_Table_Var table(
MyFirstField Varchar(255),
MySecondField VarChar(255),
MyThirdField Varchar(255),
);
INSERT INTO #Second_Table_Var
SELECT
r.MyFirstField,
r.MySecondField,
l.MyThirdField
FROM
UpdatdDatabase.dbo.TableToCompare r
INNER JOIN UpdatdDatabase.dbo.LookUpTable l ON r.ForeignKeyID = l.PrimaryKeyID
--**********************
-- CREATE OUTPUT
--**********************
--List Rows that exist in #Second_Table but not #First_Table
--(e.g. these rows need to be added to the table in MadeFromScratch)
SELECT
Problem = '1 MISSING ROW IN A MADE-FROM-SCRATCH DATABASE',
hur.MyFirstField,
hur.MySecondField,
hur.MyThirdField
FROM
#Second_Table_Var hur
WHERE
NOT EXISTS
(SELECT
*
FROM
#First_Table_Var hu
WHERE
hu.MyFirstField = hur.MyFirstField
)
UNION
--List Rows that exist in #First_Table but not #Second_Table
--(e.g. these rows need to be added to the table in UpdatdDatabase)
SELECT
Problem = '2 MISSING IN UPDATE DATABASE',
hur.MyFirstField,
hur.MySecondField,
hur.MyThirdField
FROM
#First_Table_Var hur
WHERE
NOT EXISTS
(SELECT
*
FROM
#Second_Table_Var hu
WHERE
hu.MySecondField = hur.MySecondField
)
UNION
--Compare fields among the tables where MyFirstField matches, but
SELECT
Problem = '3 MISMATCHED FIELD',
h.MyFirstField,
MySecondField = CASE WHEN h.MySecondField = hu.MySecondField THEN '' ELSE 'Created Value: ' + h.MySecondField + ' Updated Value: ' + hu.MySecondField END,
MyThirdField = CASE WHEN h.MyThirdField = hu.MyThirdField THEN '' ELSE 'Created Value: ' + CAST(h.MyThirdField AS VARCHAR(4)) + ' Updated Value: ' + CAST(hu.MyThirdField AS VARCHAR(4)) END,
FROM
#First_Table_Var h
INNER JOIN #Second_Table_Var hu on h.MyFirstField = hu.MyFirstField
WHERE
NOT EXISTS
(SELECT
*
FROM
#Second_Table_Var hu
WHERE
hu.MyFirstField = h.MyFirstField and
hu.MySecondField = h.MySecondField and
hu.MyThirdField = h.MyThirdField and
)
ORDER BY Problem
I won't have any problem writing code to parse through the results, but this methodology feels antiquated for the following reasons:
Several queries (which essentially do the same thing) will need to be written
Maintenance for this process can get cumbersome
I would like to be able to write something where the list of tables and fields to compare is maintained by some kind of file (XML?). So, whether fields are added or changes all the user has to do is update this file.
Is there a way to use LINQ and/or Reflection (or any feature in .NET 4.0 for that matter) where I could compare tables between two databases and maintain them like I've listed above?
Ideas are welcome. Ideas with an example would be great! :D
you said "Except for the Auto-Increment and GUID fields, the data in my Report Table should match across all databases."
I assume that these fields are ID fields, ideally, replication of the database should replicate the id fields too ensuring this will allow you to check for new inserts by ID, in case of updates, you can set a timestamp field for comparison.
A database exists with two tables
Data_t : DataID Primary Key that is
Identity 1,1. Also has another field
'LEFT' TINYINT
Data_Link_t : DataID PK and FK where
DataID MUST exist in Data_t. Also has another field 'RIGHT' SMALLINT
Coming from a microsoft access environment into C# and sql server I'm looking for a good method of importing a record into this relationship.
The record contains information that belongs on both sides of this join (Possibly inserting/updating upwards 5000 records at once). Bonus to process the entire batch in some kind of LINQ list type command but even if this is done record by record the key goal is that BOTH sides of this record should be processed in the same step.
There are countless approaches and I'm looking at too many to determine which way I should go so I thought faster to ask the general public. Is LINQ an option for inserting/updating a big list like this with LINQ to SQL? Should I go record by record? What approach should I use to add a record to normalized tables that when joined create the full record?
Sounds like a case where I'd write a small stored proc and call that from C# - e.g. as a function on my Linq-to-SQL data context object.
Something like:
CREATE PROCEDURE dbo.InsertData(#Left TINYINT, #Right SMALLINT)
AS BEGIN
DECLARE #DataID INT
INSERT INTO dbo.Data_t(Left) VALUES(#Left)
SELECT #DataID = SCOPE_IDENTITY();
INSERT INTO dbo.Data_Link_T(DataID, Right) VALUES(#DataID, #Right)
END
If you import that into your data context, you could call this something like:
using(YourDataContext ctx = new YourDataContext)
{
foreach(YourObjectType obj in YourListOfObjects)
{
ctx.InsertData(obj.Left, obj.Right)
}
}
and let the stored proc handle all the rest (all the details, like determining and using the IDENTITY from the first table in the second one) for you.
I have never tried it myself, but you might be able to do exactly what you are asking for by creating an updateable view and then inserting records into the view.
UPDATE
I just tried it, and it doesn't look like it will work.
Msg 4405, Level 16, State 1, Line 1
View or function 'Data_t_and_Data_Link_t' is not updatable because the modification affects multiple base tables.
I guess this is just one more thing for all the Relational Database Theory purists to hate about SQL Server.
ANOTHER UPDATE
Further research has found a way to do it. It can be done with a view and an "instead of" trigger.
create table Data_t
(
DataID int not null identity primary key,
[LEFT] tinyint,
)
GO
create table Data_Link_t
(
DataID int not null primary key foreign key references Data_T (DataID),
[RIGHT] smallint,
)
GO
create view Data_t_and_Data_Link_t
as
select
d.DataID,
d.[LEFT],
dl.[RIGHT]
from
Data_t d
inner join Data_Link_t dl on dl.DataID = d.DataID
GO
create trigger trgInsData_t_and_Data_Link_t on Data_t_and_Data_Link_T
instead of insert
as
insert into Data_t ([LEFT]) select [LEFT] from inserted
insert into Data_Link_t (DataID, [RIGHT]) select ##IDENTITY, [RIGHT] from inserted
go
insert into Data_t_and_Data_Link_t ([LEFT],[RIGHT]) values (1, 2)