Stuck on SQL query with multiple joins

Stuck on SQL query with multiple joins - c#

Alright, the system I got is a pretty outdated ERP system based around an Ingres database. The database schema is ... well ... not very nice (not really normalized) but basically it works out. Please understand that I cannot change anything related to the database.
Consider the following SQL statement:
SELECT
-- some selected fields here
FROM
sta_artikelstamm s
left join sta_chargen c on c.artikel_nr = s.artikel_nr and c.lager != 93
left join sta_artikelbeschreib b on s.artikel_nr = b.artikel_nr and b.seite = 25 and b.zeilennr = 1
left join sta_einkaufskonditionen ek on s.artikel_nr = ek.artikel_nr AND s.lieferant_1 = ek.kunden_nr
left join sta_kundenstamm ks on ek.kunden_nr = ks.nummer AND ks.nummer = s.lieferant_1
left join tab_teilegruppe2 tg2 on s.teilegruppe_2 = tg2.teilegruppe
WHERE
(s.status = 0)
AND
(s.teilegruppe_2 IS NOT NULL) AND (s.teilegruppe_2 != '')
So far, this works as expected, I get exactely 40742 results back. The result set looks alright, the number matches about what I would expect and the statement has shown no duplicates. I explicitly use a LEFT JOIN since some fields in related tables may not contain entries but I would like to keep the info from the main article table nonetheless.
Now, table tab_teilegruppe2 consists of 3 fields (bezeichnung = description, teilegruppe = part group == primary key, taricnr - please ignore this field, it may be null or contain some values but I don't need it).
I though of adding the following SQL part to only include rows in the resultset which do NOT appear in a specific part group. I therefore added the following line at the very end of the SQL statement.
AND (s.teilegruppe_2 NOT IN (49,57,60,63,64,65,66,68,71,73,76,77,78,79,106,107))
I'm by no means an SQL expert (you probably have guessed that already), but shouldn't an additional WHERE statement remove rows instead of adding? As soon as I add this simple additional statement in the WHERE clause, I get 85170 result rows.
Now I'm guessing it has to do with the "NOT IN" statement, but I don't understand why I suddenly get more rows than before. Anyone can give me a pointer where to look for my error?

What is the type of the s.teilegruppe_2 column? Is it an integer or some sort of string (VARCHAR)?
The (s.teilegruppe_2 != '') suggests it is a string but your NOT IN is comparing it against a list of integers.
If the column involved is a string then the NOT IN list will match all the values since none of them are going to match an integer value.

Related

Join in LINQ within a where clause

I have a database structure which has a set of users and their UserId
I then have a table called 'Post' which consists of a text field and a CreatedBy field.
I then have a 'Follows' table which consists of 'WhoIsFollowing' and 'WhoTheyFollow' fields.
The idea is that the 'Follows' table maps which users another user 'Follows'.
If I am using the application as a particular user and I want to get all my relevant 'Posts', these would be posts of those users I follow, or my own posts.
I have been trying to get this into one LINQ statement but have been failing to get it perfect. Ultimately I need to query the 'Posts' table for all the 'Posts' that I have posted, joined with all the posts of the people I follow in the 'Follows' table.
I have got it working with this statement
postsWeWant = (from s in db.Posts
join sa in db.Follows on s.CreatedBy equals sa.WhoTheyAreFollowing into joinTable1
from x in joinTable1.DefaultIfEmpty()
where (x.WhoIsFollowing == userId || s.CreatedBy == userId) && !s.Deleted
orderby s.DateCreated descending
select s).Take(25).ToList();
The issue is that it seems to come back with duplicates for all the posts posted by the user themselves. I have added .Distinct() to get around this, but instead of taking 25 posts each time, the duplicates are meaning it comes back with much less when there are a lot of posts by that user in the latest 25.
First off, why is the above coming back with duplicates? (It would help me understand the statement a bit more), and secondly how do I get around it?

Its difficult to say exactly without the data structure, but I would recommend investigating and perhaps expanding your join to eliminate duplicate association.
If that fails then I would use a group by clause to remove duplicates so there is no need for a distinct. The reason you are ending up with less than 25 records is probably because the elimination of duplicates is happening after taking 25. But I think I would need more of your code to tell for sure.

EF5 generating "imaginary" columns in select statements

We are using C#. VS2012, EF 5.0, and Oracle 11g. Approach is code first. I have a table that is defined, and it is plainly visible in looking at the code that it is defined with all the correct columns (and none that are not there.)
Still, when I run certain LINQ queries (joins) and attempt to select the results into a new object, things break. Here is the LINQ:
IQueryable<CheckWage> query =
from clientWage in context.ClientWages
join paycheckWage in context.PaycheckWages
on
new {clientWage.PermanentClientId, clientWage.WageId} equals
new {paycheckWage.PermanentClientId, paycheckWage.WageId}
where
(paycheckWage.PermanentClientId == Session.PermanentClientId) &&
(clientWage.PermanentClientId == Session.PermanentClientId)
select new CheckWage
{
CWage = clientWage,
PWage = paycheckWage
};
Now, here is the SQL it emits (as captured by Devart's DbMonitor tool):
SELECT
"Extent1".ASSOCIATE_NO,
"Extent1".PCLIENT_ID,
"Extent1".CLIENT_NO,
"Extent1".CLIENT_NAME,
"Extent1".ADDRESS1,
"Extent1".ADDRESS2,
"Extent1".CITY,
"Extent1".STATE,
"Extent1".ZIP,
"Extent1".COUNTRY,
"Extent1".CLIENT_TYPE,
"Extent1".DOING_BUSINESS_AS,
"Extent1".CONTACT,
"Extent1".PHONE,
"Extent1".EXTENSION,
"Extent1".FAX,
"Extent1".FAX_EXTENSION,
"Extent1".EMAIL,
"Extent1".NEXTEMP,
"Extent1".PAY_FREQ,
"Extent1".EMPSORT,
"Extent1".DIVUSE,
"Extent1".CLIENT_ACCESS_TYPE,
"Extent1".AUTOPAY_WAGE_ID,
"Extent1".FEIN,
"Extent1".HR_MODULE,
"Extent1".BANK_CODE,
"Extent1".ACH_DAYS,
"Extent1".ACH_COLLECT,
"Extent1".UPDATED,
"Extent1".IAT_FLAG,
"Extent1".ORIG_EMAIL,
"Extent1"."R1",
"Extent1"."R2"
FROM INSTANTPAY.CLIENT "Extent1"
WHERE "Extent1".PCLIENT_ID = :EntityKeyValue1'
There are no such columns as "R1" and "R2." I am guessing is has something to do with the join into a new object type with two properties, but I am pulling my hair out trying to figure out what I've done or haven't done that is resulting in this errant SQL. Naturally, the error from the Oracle server is "ORA-00904: "Extent1"."R2": invalid identifier." Strange that is doesn't choke on R1, but perhaps it only lists the last error or something...
Thanks in advance,
Peter
5/23/2014: I left out an important detail. The SQL is emitted when I attempt to drill into one of the CheckWage objects (using Lazy loading), as both of the contained objects have a navigation property to the "Client" entity. I can access the client table just fine in other LINQ queries that do not use a join, it is only this one that creates the "R1" and "R2" in the SELECT statement.
Peter

determining differences between generic lists

There are probably 10 duplicates of this question but I would like to know if there is a better way than I am currently doing this. This is a small example that I'm using to show how I'm determining differences:
//let t1 be a representation of the ID's in the database.
List<int> t1 = new List<int>() { 5, 6, 7, 8 };
//let t2 be the list of ID's that are in memory.
//these changes need to be reflected to the database.
List<int> t2 = new List<int>() { 6, 8, 9, 10 };
var hash = new HashSet<int>(t1);
var hash2 = new HashSet<int>(t2);
//determines which ID's need to be removed from the database
hash.ExceptWith(t2);
//determines which ID's need to be added to the database.
hash2.ExceptWith(t1);
//remove contents of hash from database
//add contents of hash2 to database
I want to know if I can determine what to add and remove in ONE operation instead of the two that I currently have to do. Is there any way to increase the performance of this operation? Keep in mind in the actual database situation there are hundreds of thousands of ID's.
EDIT or second question, is there a LINQ query that I can do directly on the database so I can just supply the new list of ID's and have it automatically remove/add itself? (using mysql)
CLARIFICATION I know I need two SQL queries (or a stored procedure). The question is if I can determine the differences in the list in one action, and if it can be done faster than this.
EDIT2
This operation from SPFiredrake appears to be faster than my hashset version - however I have no idea how to determine which to add and which to remove from the database. Is there a way to include that information in the operation?
t1.Union(t2).Except(t1.Intersect(t2))
EDIT3
Nevermind, I forgot that this statement in-fact has the problem of delayed execution, although in-case anyone is wondering, I solved my prior problem with it by using a custom comparer and an added variable determining which list it was from.

Ultimately, you're going to use a full outer join (which in LINQ world, is two GroupJoins). However, we ONLY care about values that don't have a matching record in either table. Null right value (left outer join) indicates a removal, null left value (right outer join) indicates an addition. So to get it to work this way, we just perform two left outer joins (switching the input for the second case to emulate the right outer join), concat them together (can use union, but unnecessary since we'll be getting rid of any duplicates anyway).
List<int> t1 = new List<int>() { 5, 6, 7, 8 };
List<int> t2 = new List<int>() { 6, 8, 9, 10 };
var operations =
t1.GroupJoin(
t2,
t1i => t1i,
t2i => t2i,
(t1i, t2join) => new { Id = t1i, Action = !t2join.Any() ? "Remove" : null })
.Concat(
t2.GroupJoin(
t1,
t2i => t2i,
t1i => t1i,
(t2i, t1join) => new { Id = t2i, Action = !t1join.Any() ? "Insert" : null })
.Where(tr => tr.Action != null)
This will give you the select statement. Then, you can feed this data into a stored procedure that removes values that already exist in the table and add the rest (or two lists to run removals and additions against). Either way, still not the cleanest way to do it, but at least this gets you thinking.
Edit: My original solution was to separate out the two lists based on what action was needed, which is why it's so ghastly. The same can be done using a one-liner (not caring about which action to take, however), although I think you'll still suffer from the same issues (using LINQ [enumeration] as opposed to Hashsets [hash collection]).
// XOR of sets = (A | B) - (A & B), - being set difference (Except)
t1.Union(t2).Except(t1.Intersect(t2))
I'm sure it'll still be slower than using the Hashsets, but give it a shot anyway.
Edit: Yes, it is faster, because it doesn't actually do anything with the collection until you enumerate over it (either in a foreach or by getting it into a concrete data type [IE: List<>, Array, etc]). It's still going to take extra time to sort out which ones to add/remove and that's ultimately the problem. I was able to get comparable speed by breaking down the two queries, but getting it into the in-memory world (via ToList()) made it slower than the hashset version:
t1.Except(t2); // .ToList() slows these down
t2.Except(t1);
Honestly, I would handle it on the SQL side. In the stored proc, store all the values in a table variable with another column indicating addition or removal (based on whether the value already exists in the table). Then you can just do a bulk deletion/insertion by joining back to this table variable.
Edit: Thought I'd expand on what I meant by sending the full list to the database and have it handled in the sproc:
var toModify = t1.Union(t2).Except(t1.Intersect(t2));
mods = string.Join(",", toModify.ToArray());
// Pass mods (comma separated list) to your sproc.
Then, in the stored procedure, you would do this:
-- #delimitedIDs some unbounded text type, in case you have a LOT of records
-- I use XQuery to build the table (found it's faster than some other methods)
DECLARE #idTable TABLE (ID int, AddRecord bit)
DECLARE #xmlString XML
SET #xmlString = CAST('<NODES><NODE>' + REPLACE(#delimitedIDs, ',', '</NODE><NODE>') + '</NODE></NODES>' as XML)
INSERT INTO #idTable (ID)
SELECT node.value('.','int')
FROM #xmlString.nodes('//NODE') as xs(node)
UPDATE id
SET AddRecord = CASE WHEN someTable.ID IS NULL THEN 1 ELSE 0 END
FROM #idTable id LEFT OUTER JOIN [SomeTable] someTable on someTable.ID = id.ID
DELETE a
FROM [SomeTable] a JOIN #idTable b ON b.ID = a.ID AND b.AddRecord = 0
INSERT INTO [SomeTable] (ID)
SELECT id FROM #idTable WHERE AddRecord = 1
Admittedly, this just inserts some ID, it doesn't actually add any other information. However, you can still pass in XML data to the sproc and use XQuery in a similar fashion to get the information you'd need to add.

even if you replace it with a Linq version you still need two operations.
let's assume you are doing this using pure SQL.
you would probably need two queries:
one for removing the records
another one for adding them
Using LINQ code it would be much more complicated and less readable than your solution

SQL Port to LINQ with Left Outer Join with aggregation and bitwise filtering

I have the following query:
;WITH valRules AS
( SELECT vr.valRuleID, Count(*) AS totalRows, Sum(vt.test) AS validRows
FROM (SELECT NULL AS x) AS x
JOIN #itemMap AS IM
ON IM.lngitemID = 1
JOIN tblValidationRule AS vr
ON IM.RuleID = vr.valRuleID
JOIN tblValidationRuleDetl AS vrd
ON vr.valRuleID = vrd.valRuleID
LEFT JOIN #ValTest AS vt
ON vrd.type = vt.type
AND vrd.typeSequence = vt.typeSequence
AND vrd.valRule & vt.Response > 0
OR (vrd.valrule = 0 AND vt.response = 0 )
GROUP BY vr.valRuleID
)
SELECT Count(*)
FROM valrules
WHERE totalrows = validRows
Note the CTE, and the Bitwise Operator in the Left Join Condition. How this is currently used is in a stored procedure that takes values from a C# application in the form of an XML variable. The XML Variable is placed into table #valTest. All columns are of datatype INT. If vt.Response is valid for vaRule, the result of & will be greater than zero. (i.e. 31 & 8 = 8 but 12 & 2 = 0). vt.Test column contains the number 1 for each row, so that it may be summed up (nulls are automatically excluded) to get a count of the validations that pass by rule. Each rule has a number of attributes that must pass validation for success. If the number of attributes is equal to those that passed, we have success.
In an effort to reduce calls to the database, the goal is to cache ALL the rules in the ASP.NET cache and handle validation localy. The developers are asking for a de-normalized version of the validation data with the claim that the SQL Set based operation is not a simple task in C# with Linq. From what I have looked into, I would agree. At this point my investigation shows the bitwise comparison in the join condition is particularly problematic.
The main question is how is can this be converted to something that uses Linq on the C# side? Or, are there more efficient ways to deal with this on the client side and Linq is not one of them (i.e. just give them flat data)?
thanks

LINQ-to-SQL isn't going to do anything quite as bespoke as that query. Which isn't a criticism of either LINQ-to-SQL or the query: simply, there are limits.
There are two ways I would approach that:
1: as a parameterized TSQL query via ExecuteQuery<T> - i.e.
var result = db.ExecuteQuery<YourType>(#"your query here with {0}, {1} etc",
arg0, arg1, ...);
2: write that TSQL a udf mapped into the data-context:
var result = db.YourUdf(arg0, ...);
Both are valid and will work with LINQ-to-SQL; personally I prefer the first approach, but the UDF approach allows greater re-use within the DB layer, at the expense of having more complex deployment (i.e. app tier and db tier all at the same time).

Linq not properly generating view by not allowing nulls

I have a view:
SELECT dbo.Theme.ThemeID,
dbo.ThemeObject.XLocation,
dbo.ThemeObject.YLocation,
dbo.ThemeObject.ThemeElementTypeID AS ThemeObject_ThemeElementTypeID,
dbo.ThemeImage.ThemeImageData
FROM dbo.Theme INNER JOIN
dbo.ThemeObject ON dbo.Theme.ThemeID = dbo.ThemeObject.ThemeID LEFT OUTER JOIN
dbo.ThemeImage ON dbo.ThemeObject.ThemeObjectID = dbo.ThemeImage.ThemeObjectID
WHERE (dbo.ThemeObject.IsDeleted = 0 OR dbo.ThemeObject.IsDeleted IS NULL) AND
(dbo.ThemeImage.IsDeleted = 0 OR dbo.ThemeImage.IsDeleted IS NULL) AND
(dbo.Theme.IsDeleted = 0)
My problem here is that not all ThemeObjects will have an image (thus the outer join). Linq doesn't recognize this and doesn't tell the generator to allow nulls in the column thus causing a crash if we don't manually set that column to allow nulls every time.
This is the only view that seems to do this and I can't figure out why.
Other than manually configuring the column in the designer after every recreation of the DAL (I sometimes delete all tables and views and re-drop them because of subtle changes that mysteriously don't find their way back in here when they occur... but getting that person to keep up with it is beyond my control) -- is there something I can adjust to either get it to generate it correctly or tell it to alter the output? Or some kind of override I'm not aware of that I can use?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.