When I write C# applications, I use to write sql relations as inner join for example, in the query:
select xxx from TableA as A inner join TableB...
I don't really see why I should define these realtions (hard defined) in Management Studio.
Should I, and why if required?
Regards
Two main reasons and a minor third one: data-integrity and performance - and documentation.
Performance is simple: if you define a relationship between two tables you will normally automatically create an index on those two columns, and your database will use those indexes to speed up look-ups when they can be used.
As for data-integrity, even though you leave the important part of your join out of your example, you assume in a join that a foreign key field can only contain values that exist in the primary key field. Your database can make sure of that, and will make sure of that when you define those relations in SQL server.
If you do not define those relationships, you could easily create a situation where you have, say, Orders, that belong to Customer 12345, who does not exist in your Customer table.
Or you could delete Customer 23456, leaving all their Orders in your system, but without an existing Customer.
A final reason is documentation: your database is not only made to be accessed by your code. If someone else accesses your database and sees only unconnected unrelated tables, how will they know that field cstID in table CstOrderHdr happens to be a reference to field id in table Relations where Relation.RelTyp = 'Customer'? Who is to stop them from filling in 0, null or a random number?
Related
Are they the same thing? and which one is the better way to query data from several tables. I prefer using projection method more because it's quite simple.
Projection
https://benjii.me/2018/01/expression-projection-magic-entity-framework-core/
Join table
https://entityframeworkcore.com/querying-data-joining
Select is probably better if you're talking about Entity framework, if it's an option, and you're just following the foreign keys (i.e. Navigation Properties).
However, if you need to join using an unusual method, particularly on related data, then join is sometimes the better (or sometimes only) option.
For instance, getting a count of logins every day against a count of signups every day, would use a join, because login count and signup count are using two completely unrelated tables and joining on date, not userID.
Getting a list of activities belonging to all active users, you can do with a Select, because the "belonging to" relationship is represented by the FK.
Finally, don't forget Select allows you to take a subset of data from a single table, whereas join requires you to be joining two tables together to then filter to a subset.
I need to do a BULK INSERT of several hundred-thousand records across 3 tables. A simple breakdown of the tables would be:
TableA
--------
TableAID (PK)
TableBID (FK)
TableCID (FK)
Other Columns
TableB
--------
TableBID (PK)
Other Columns
TableC
--------
TableCID (PK)
Other Columns
The problem with a bulk insert, of course, is that it only works with one table so FK's become a problem.
I've been looking around for ways to work around this, and from what I've gleaned from various sources, using a SEQUENCE column might be the best bet. I just want to make sure I have correctly cobbled together the logic from the various threads and posts I've read on this. Let me know if I have the right idea.
First, would modify the tables to look like this:
TableA
--------
TableAID (PK)
TableBSequence
TableCSequence
Other Columns
TableB
--------
TableBID (PK)
TableBSequence
Other Columns
TableC
--------
TableCID (PK)
TableCSequence
Other Columns
Then, from within the application code, I would make five calls to the database with the following logic:
Request X Sequence numbers from TableC, where X is the known number of records to be inserted into TableC. (1st DB call.)
Request Y Sequence numbers from TableB, where Y is the known number of records to be inserted into TableB (2nd DB call.)
Modify the existing objects for A, B and C (which are models generated to mirror the tables) with the now known Sequence numbers.
Bulk insert to TableA. (3rd DB call)
Bulk insert to TableB. (4th DB call)
Bulk insert to TableC. (5th DB call)
And then, of course, we would always join on the Sequence.
I have three questions:
Do I have the basic logic correct?
In Tables B and C, would I remove the clustered index from the PK and put in on the Sequence instead?
Once the Sequence numbers are requested from Tables B and C, are they then somehow locked between the request and the bulk insert? I just need to make sure that between the request and the insert, some other process doesn't request and use the same numbers.
Thanks!
EDIT:
After typing this up and posting it, I've been reading deeper into the SEQUENCE document. I think I misunderstood it at first. SEQUENCE is not a column type. For the actual column in the table, I would just use an INT (or maybe a BIGINT) depending on the number of records I expect to have). The actual SEQUENCE object is an entirely separate entity whose job is to generate numeric values on request and keep track of which ones have already been generated. So, if I understand correctly, I would generate two SEQUENCE objects, one to be used in conjunction with Table B and one with Table C.
So that answers my third question.
Do I have the basic logic correct?
Yes. The other common approach here is to bulk load your data into a staging table, and do something similar on the server-side.
From the client you can request ranges of sequence values using the sp_sequence_get_range stored procedure.
In Tables B and C, would I remove the clustered index from the PK
No, as you later noted the sequence just supplies the PK values for you.
Sorry, read your question wrong at first. I see now that you are trying to generate your own PK's rather then allow MS SQL to generate them for you. Scratch my above comment.
As David Browne mentioned, you might want to use a staging table to avoid the strain you'll put on your app's heap. Use tempdb and do the modifications directly on the table using a single transaction for each table. Then, copy the staging tables over to their target or use a MERGE if appending. If you are enforcing FK's, you can temporarily remove those constraints if you choose to insert in reverse order (C=>B=>A). You also may want to consider temporarily removing indexes if experiencing performance issues during the insert. Last, consider using SSIS instead of a custom app.
I have two tables, one containing patient information, the other, the notes for each patient.
(One patient, many notes for a patient).
Given this, in the Designer (which you access by right-clicking on the chosen DataSet), how do I create a one-to-many relationship? I have never performed this before.
Secondly, for the patient notes table, how would I add a note to a patient record using SQL syntax? Note, this is not updating an existing one, but adding a completely new one to the patientNotes table using the unique patient ID number as the reference (so only that specific patient has that note added to them, not them and everyone else).
Very technically speaking, you don't need to do anything to create a one-to-many relationship. You just have to have the two tables set up as you have them and use them as you intend on using them. I work in data warehousing and unfortunately a great many of our relationships like this are not formalized with any sort of key or constraint.
The correct way to do it is to implement a foreign key constraint on the patient ID column on the patientNotes table. A FK will only allow you to insert data into patientNotes IF the patient ID exists in the patient table. If you would try to insert a note into your table that has a patient ID that doesn't exist in the patient table, the insert would fail and the SQL engine would give you an error. Note that the column on the patients table that you are creating the FK to must be a primary key.
Inserting data will really go as any other insert would:
INSERT INTO dbo.patientNotes (patientId, NoteText)
VALUES(4265, 'During his 8/14/2014 visit, Mr. Cottinsworth complained of chest pains. Evidently he has been wearing a lady''s corset to hide his large gut. Advised the very portly Mr. Cottinsworth to discontinue corset use'
You could toss that in a SP, put it in your code and use parameters for the patientId and NoteText, however you wanted to do it.
As far as doing this all in Visual Studio graphically, I can't be of much help there. I typically use the TSQL editor and type out what I want to do to the DB. I'm sure there are tutorials abound on how to set up FKs on Visual Studio.
Further reading:
http://msdn.microsoft.com/en-us/library/ms189049.aspx
http://www.scarydba.com/2010/11/22/do-foreign-key-constraints-help-performance/
what are the advantages of defining a foreign key
I'm trying out using Dapper for my data access (in ASP.NET MVC3 FWIW). I have a a T-SQL view (in SQL Server) which is something like this:
SELECT s.*, c.CompanyId AS BreakPoint c.Name AS CompanyName
FROM tblStaff AS s
INNER JOIN tblCompanies AS c ON c.CompanyId = s.CompanyId
So pretty simple. Essentially a list of staff each of which have a single company.
The problem I'm having is that I'm trying to map the output of this query onto my POCOs, but because each field in the View has to be unique (i.e. CompanyName instead of Name which already exists in tblStaff) the mapping to POCOs isn't working.
Here's the code:
var sql = #"select * from qryStaff";
var people = _db.Query<Person, Company, Person>(sql, (person, company) => {person.Company = company; return person;}, splitOn: "BreakPoint");
Any advice how I might solve this puzzle? I'm open to changing the way I do views as right now I'm stumped about how to progress.
You should explicitly list all the fields returned from you view (no asterisks!) and where the field names are not unique, make use of aliases to deduplicate. As an exmaple:
SELECT
s.CompanyName as CompanyName1,
s.BreakPoint as BreakPoint1,
...
c.CompanyId AS BreakPoint,
c.Name AS CompanyName
FROM tblStaff AS s
INNER JOIN tblCompanies AS c ON c.CompanyId = s.CompanyId
The fields listed and the aliases you might use depend, of course, entirely on your code. Typically you adjust the aliases in your query to match the property names of the POCO.
Also, as a general rule of thumb, it's good to stay away from wildcards in SQL queries exactly because issues like this are introduced. Here's a decent article on SQL query best practices.
Excerpt:
Using explicit names of columns in your SELECT statements within your
code has a number of advantages. First, SQL Server is only returning
the data your application needs, and not a bunch of additional data
that your application will not use. By returning only the data you
need you are optimizing the amount of work SQL Server needs to do to
gather all the columns of information you require. Also by not using
the asterisk (*) nomenclature you are also minimizing the amount of
network traffic (number of bytes) required to send the data associated
with your SELECT statement to your application.
Additionally by explicitly naming your columns, you are insulating
your application from potential failures related to some database
schema change that might happen to any table you reference in your
SELECT statement. If you were to use the asterick (*) nomenclature and
someone was to add a new column to a table, your application would
start receiving data for this additional column of data, even without
changing your application code. If your application were expecting
only a specific number of columns to be returned, then it would fail
as soon as someone added an additional column to one of your
referenced tables. Therefore, by explicitly naming columns in your
SELECT statement your application will always get the same number of
columns returned, even if someone adds a new column to any one of the
tables referenced in your SELECT statement.
I just finished using Linq to Sql to map our existing Database structure for use in a Thick Client app.
While writing some Linq Methods to replace some Stored Procedures I noticed that sometimes I could do tblOne.tblTwo.MyDesiredField. I learned that there needed to be an association in the dbml for that to work. Well mine was missing some obvious ones so I added a bunch.
That was when I noticed that sometimes I couldn't do the above as some of the associated tables are considered EntitySets<tblThree> instead of the table, tblThree itself?
To me, there seems to be no rhyme or reason as to what I'll get. Am I doing something wrong in the dbml? Something I need to change in the Properties?
Is this cause for concern? I noticed that to use an EntitySet<tblThree> I need to add an extra from..
from person in context.tblPersons
from address in person.tblAddress where address.AddressType == "Home"
select new {person.Name, address.Home};
EntitySet is a result set. If tableA has a 1 to many relationship with tableB then tableA.tableB refers to the collection of results in tableB that reference the result in tableA.
Table is just the table. If you drag and drop using the designer you'll see that it pluralizes the entitySets which makes things more readable.
EDIT: I imagine from the sounds of your setup, you'll likely see an entitySet as follows
from b in TableA select b.TableB
in this case TableA is a Table, and b.TableB is the EntitySet