When performing an insert lets say from C# into a SQL Server table (using parameterized sql statements), do you need to specify every table field in the insert statement?
I noticed that the fields that I do not specify in the insert default to the defaults set in the table. I don't know if that's good or bad in my insert statement to leave out fields and let the defaults take care of setting the fields I don't care about. It must be ok because it works.
You need to specify all those field for which you want to insert a value. You do not have to specify all fields in the table!
As you noticed, any of the fields that you do not specify and that do have a default constraint on them will be set to that defined default value. It's a "good thing" (tm) - for sure! This allows you to write less T-SQL insert code - all the defined defaults will be set already. I find this to be a great feature of SQL Server (and lots of other relational databases, too) - you can initialize things like "last modified" date fields to "today" upon insert without having to specifically add those fields to your INSERT statement.
Any fields that are neither part of your INSERT statement, nor have a default value defined, will be left NULL.
Any fields that are defined as NOT NULL must be either part of the list of fields in your INSERT statement (so that you give them a specific NON NULL value), or they have to have a default constraint on them.
This is not a behavior specific to C# or .NET.
With most databases, any fields that are omitted from an INSERT statement are assigned the default values, or if nullable are stored as NULL. This is standard, and expected behavior.
As for whether this is good or bad - it depends. Personally, I prefer to specify the values of all fields in any table I am inserting into so that future maintainers realize that I chose to insert the values by intent. However, in some cases, there are calculations or trigger-based columns which the application cannot set - in which case I allow the database to handle this.
If you really want to be thorough and clear - you can perform all of your INSERTs through stored procedures - this way the knowledge is captured in the database layer and can be leveraged by any applications that interact with the database.
Yes it is ok to leave out fields in an insert statement if the defaults are good enough for you.
Well seeing as it works, its ok if its what you want to achieve. If you have nullable fields and fields with default values, and nothing to put in them - well thats just fine.
This is what defaults values are there for. They help you to
save your time
avoid writing unnecessary code
It is fine if the default values are the values that are correct for the record you are inserting. What concerns me is that you talk about them as fields that you don't care about. You should care what goes into them; it can lead to serious data problems to not care. Those fields are there for a reason; you need to understand what the defaults mean and if they are appropriate.
Related
I wanted to see what others have experienced when working with types like List<> or Dictionary<> and having in turn storing and retrieving that data?
Here's an example scenario: users will be creating their own "templates", where these templates is essentially a collection of Dictionary, e.g. for user1, values are (1, Account), (2, Bank), (3, Code), (4, Savings), and for user2, values (unrelated) could be (1, Name), (2, Grade), (3, Class), and so on. These templates/lists could be of varying length but they will always have an index and a value. Also, each list/ template will have one and only one User linked to it.
What types did you choose on the database side?
And pain-points and/or advice I should be aware of?
As far as the types within the collection go, there is a fairly 1-to-1 mapping between .Net types and SQL types: SQL Server Data Type Mappings. You mostly need to worry about string fields:
Will they always be ASCII values (0 - 255)? Then use VARCHAR. If they might contain non-ASCII / UCS-2 characters, then use NVARCHAR.
What is their likely max length?
Of course, sometimes you might want to use a slightly different numeric type in the database. The main reason would be if an int was chosen on the app side because it "easier" (or so I have been told) to deal with than Int16 and byte, but the values will never be above 32,767 or 255, then you should most likely use SMALLINT or TINYINT respectively. The difference between int and byte in terms of memory in the app layer might be minimal, but it does have an impact in terms of physical storage, especially as row counts increase. And if that is not clear, "impact" means slowing down queries and sometimes costing more money when you need to buy more SAN space. But, the reason I said to "most likely use SMALLINT or TINYINT" is because if you have Enterprise Edition and have Row Compression or Page Compression enabled, then the values will be stored in the smallest datatype that they will fit in.
As far as retrieving the data from the database, that is just a simple SELECT.
As far as storing that data (at least in terms of doing it efficiently), well, that is more interesting :). A nice way to transport a list of fields to SQL Server is to use Table-Valued Parameters (TVPs). These were introduced in SQL Server 2008. I have posted a code sample (C# and T-SQL) in this answer on a very similar question here: Pass Dictionary<string,int> to Stored Procedure T-SQL. There is another TVP example on that question (the accepted answer), but instead of using IEnumerable<SqlDataRecord>, it uses a DataTable which is an unnecessary copy of the collection.
EDIT:
With regards to the recent update of the question that specifies the actual data being persisted, that should be stored in a table similar to:
UserID INT NOT NULL,
TemplateIndex INT NOT NULL,
TemplateValue VARCHAR(100) NOT NULL
The PRIMARY KEY should be (UserID, TemplateIndex) as that is a unique combination. There is no need (at least not with the given information) for an IDENTITY field.
The TemplateIndex and TemplateValue fields would get passed in the TVP as shown in my answer to the question that I linked above. The UserID would be sent by itself as a second SqlParameter. In the stored procedure, you would do something similar to:
INSERT INTO SchemaName.TableName (UserID, TemplateIndex, TemplateName)
SELECT #UserID,
tmp.TemplateIndex,
tmp.TemplateName
FROM #ImportTable tmp;
And just to have it stated explicitly, unless there is a very specific reason for doing so (which would need to include never, ever needing to use this data in any queries, such that this data is really just a document and no more usable in queries than a PDF or image), then you shouldn't serialize it to any format. Though if you were inclined to do so, XML is a better choice than JSON, at least for SQL Server, as there is built-in support for interacting with XML data in SQL Server but not so much for JSON.
List or any collection's representation in databases are supposed to be tables. Always think of it as a collection and relate it to what a database offers.
Though you can always serialize a collection, i do not suggest it since updating or inserting records, you'd always update the whole record or data whereas having a table, you'd only have to query for the KEY wherein Dictionary, you already have it.
I have many tables in the database that have at least one column that contains a Url. And these are repeated a lot through-out the database. So I normalize them to a dedicated table and I just use numeric IDs everywhere I need them. I often need to join them so numeric ids are much better than full strings.
In MySql + C++, to insert a lot of Urls in one strike, I used to use multi-row INSERT IGNOREs or mysql_set_local_infile_handler(). Then batch SELECT with IN () to pull the IDs back from the database.
In C# + SQLServer I noticed there's a SqlBulkCopy class that's very useful and fast in mass-insertion. But I also need mass-selection to resolve the Url IDs after I insert them. Is there any such helper class that would work the same as SELECT WHERE IN (many, urls, here)?
Or do you have a better idea for turning Urls into numbers in a consistent manner in C#? I thought about crc32'ing the urls or crc64'ing them but I worry about collisions. I wouldn't care if collisions are few, but if not... it would be an issue.
PS: We're talking about tens of millions of Urls to get an idea of scale.
PS: For basic large insert, SQLBulkCopy is faster than SqlDbType.Structured. Plus it has the SqlRowsCopied event for a status tracking callback.
There is even a better way than SQLBulkCopy.
It's called Structured Parameters and it allows you to pass a table-valued parameter to stored procedure or query through ADO.NET.
There are code examples in the article, so I will only highlight what you need to do to get it up and working:
Create a user defined table type in the database. You can call it UrlTable
Setup a SP or query which does the SELECT by joining with a table variable or type UrlTable
In your backing code (C#), create a DataTable with the same structure as UrlTable, populate it with URLs and pass it to an SqlCommand through as a structured parameter. Note that column order correspondence is critical between the data table and the table type.
What ADO.NET does behind the scenes (if you profile the query you can see this) is that before the query it declares a variable of type UrlTable and populates it (INSERT statements) with what you pass in the structured parameter.
Other than that, query-wise, you can do pretty much everything with table-valued parameters in SQL (join, select, etc).
I think you could use the IGNORE_DUP_KEY option on your index. If you set IGNORE_DUP_KEY = ON on the index of the URL column, the duplicate values are simply ignored and the rest are inserted appropriately.
The Client table:
Id (PK), int, not null (IDENTITY)
NoClient, int, not null
The form (wireframe):
The field NoClient should be a number
The field NoClient should be unique
The field NoClient should be auto-generated if null
The field NoClient is for reference only
The field NoClient is NOT the primary key
The field NoClient is NOT the identity column
How to solve that problem SQL-wise?
EDIT. I'm talking about the NoClient column, not ID.
Strictly interpreting those rules, there is no solution. One of those rules is either not correct, or not precise. You can't solve it with an AFTER trigger, because you can't attempt to insert a blank into a numeric field, nor can you with a BEFORE trigger. You can't use a default either.
Now, if you mean that "when left blank" means "when left null", then you can solve it with a very carefully crafted BEFORE TRIGGER. (Or an AFTER TRIGGER, if you can change the field to a nullable int)
If you mean that "when left blank" means that you don't mention the column in your insert/update, then you might be able to get by with a carefully crafted default, by converting a call to GUID via NewID to a very large number.
As a side note, I would tell the designer to go back and redesign it, because whatever solution you do finally come up with, it is not very scalable, and a PITA to do correctly. You have to basically lock the entire table (from reading and writing), do an entire table/index scan to make sure the value you come up with is UNIQUE. You probably should be using the ID field as the client no, possibly seeding the identity with something not starting with 0.
Execute a query to SET IDENTITY OFF first, insert, your ID, then SET IDENTITY ON again.
How would I get the primary key ID number from a Table without making a second trip to the database in LINQ To SQL?
Right now, I submit the data to a table, and make another trip to figure out what id was assigned to the new field (in an auto increment id field). I want to do this in LINQ To SQL and not in Raw SQL (I no longer use Raw SQL).
Also, second part of my question is: I am always careful to know the ID of a user that's online because I'd rather call their information in various tables using their ID as opposed to using a GUID or a username, which are all long strings. I do this because I think that SQL Server doing a numeric compare is much (?) more efficient than doing a username (string) or even a guid (very long string) compare. My questions is, am I more concerned than I should be? Is the difference worth always keeping the userid (int32) in say, session state?
#RedFilter provided some interesting/promising leads for the first question, because I am at this stage unable to try them, if anyone knows or can confirm these changes that he recommended in the comments section of his answer?
If you have a reference to the object, you can just use that reference and call the primary key after you call db.SubmitChanges(). The LINQ object will automatically update its (Identifier) primary key field to reflect the new one assigned to it via SQL Server.
Example (vb.net):
Dim db As New NorthwindDataContext
Dim prod As New Product
prod.ProductName = "cheese!"
db.Products.InsertOnSubmit(prod)
db.SubmitChanges()
MessageBox.Show(prod.ProductID)
You could probably include the above code in a function and return the ProductID (or equivalent primary key) and use it somewhere else.
EDIT: If you are not doing atomic updates, you could add each new product to a separate Collection and iterate through it after you call SubmitChanges. I wish LINQ provided a 'database sneak peek' like a dataset would.
Unless you are doing something out of the ordinary, you should not need to do anything extra to retrieve the primary key that is generated.
When you call SubmitChanges on your Linq-to-SQL datacontext, it automatically updates the primary key values for your objects.
Regarding your second question - there may be a small performance improvement by doing a scan on a numeric field as opposed to something like varchar() but you will see much better performance either way by ensuring that you have the correct columns in your database indexed. And, with SQL Server if you create a primary key using an identity column, it will by default have a clustered index over it.
Linq to SQL automatically sets the identity value of your class with the ID generated when you insert a new record. Just access the property. I don't know if it uses a separate query for this or not, having never used it, but it is not unusual for ORMs to require another query to get back the last inserted ID.
Two ways you can do this independent of Linq To SQL (that may work with it):
1) If you are using SQL Server 2005 or higher, you can use the OUTPUT clause:
Returns information from, or
expressions based on, each row
affected by an INSERT, UPDATE, or
DELETE statement. These results can be
returned to the processing application
for use in such things as confirmation
messages, archiving, and other such
application requirements.
Alternatively, results can be inserted
into a table or table variable.
2) Alternately, you can construct a batch INSERT statement like this:
insert into MyTable
(field1)
values
('xxx');
select scope_identity();
which works at least as far back as SQL Server 2000.
In T-SQL, you could use the OUTPUT clause, saying:
INSERT table (columns...)
OUTPUT inserted.ID
SELECT columns...
So if you can configure LINQ to use that construct for doing inserts, then you can probably get it back easily. But whether LINQ can get a value back from an insert, I'll let someone else answer that.
Calling a stored procedure from LINQ that returns the ID as an output parameter is probably the easiest approach.
I have several tables within my database that contains nothing but "metadata".
For example we have different grouptypes, contentItemTypes, languages, ect.
the problem is, if you use automatic numbering then it is possible that you create gaps.
The id's are used within our code so, the number is very important.
Now I wonder if it isn't better not to use autonumbering within these tables?
Now we have create the row in the database first, before we can write our code. And in my opinion this should not be the case.
What do you guys think?
I would use an identity column as you suggest to be your primary key(surrogate key) and then assign your you candidate key (identifier from your system) to be a standard column but apply a unique constraint to it. This way you can ensure you do not insert duplicate records.
Make sense?
if these are FK tables used just to expand codes into a description or contain other attributes, then I would NOT use an IDENTITY. Identity are good for ever inserting user data, metadata tables are usually static. When you deploy a update to your code, you don't want to be suprised and have an IDENTITY value different than you expect.
For example, you add a new value to the "Languages" table, you expect the ID will be 6, but for some reason (development is out of sync, another person has not implemented their next language type, etc) the next identity you get is different say 7. You then insert or convert a bunch of rows having using Language ID=6 which all fail becuase it does not exist (it is 7 iin the metadata table). Worse yet, they all actuall insert or update because the value 6 you thought was yours was already in the medadata table and you now have a mix of two items sharing the same 6 value, and your new 7 value is left unused.
I would pick the proper data type based on how many codes you need, how often you will need to look at it (CHARs are nice to look at for a few values, helps with memory).
for example, if you only have a few groups, and you'll often look at the raw data, then a char(1) may be good:
GroupTypes table
-----------------
GroupType char(1) --'M'=manufacturing, 'P'=purchasing, 'S'=sales
GroupTypeDescription varchar(100)
however, if there are many different values, then some form of an int (tinyint, smallint, int, bigint) may do it:
EmailTypes table
----------------
EmailType smallint --2 bytes, up to 32k different positive values
EmailTypeDescription varchar(100)
If the numbers are hardcoded in your code, don't use identity fields. Hardcode them in the database as well as they'll be less prone to changing because someone scripted a database badly.
I would use an identity column as the primary key also just for simplicity sake of inserting the records into the database, but then use a column for type of metadata, I call mine LookUpType(int), as well as columns for LookUpId (int value in code) or value in select lists, LookUpName(string), and if those values require additional settings so to speak use extra columns. I personally use two extras, LookUpKey for hierarchical relations, and LookUpValue for abbreviations or alternate values of LookUpName.
Well, if those numbers are important to you because they'll be in code, I would probably not use an IDENTITY.
Instead, just make sure you use a INT column and make it the primary key - in that case, you will have to provide the ID's yourself, and they'll have to be unique.