SQL user defined aggregate order of values preserved?

SQL user defined aggregate order of values preserved? - c#

Im using the code from this MSDN page to create a user defined aggregate to concatenate strings with group by's in SQL server. One of my requirements is that the order of the concatenated values are the same as in the query. For example:
Value Group
1 1
2 1
3 2
4 2
Using query
SELECT
dbo.Concat(tbl.Value) As Concat,
tbl.Group
FROM
(SELECT TOP 1000
tblTest.*
FROM
tblTest
ORDER BY
tblTest.Value) As tbl
GROUP BY
tbl.Group
Would result in:
Concat Group
"1,2" 1
"3,4" 2
The result seems to always come out correct and as expected, but than I came across this page that states that the order is not guaranteed and that attribute SqlUserDefinedAggregateAttribute.IsInvariantToOrder is only reserved for future use.
So my question is: Is it correct to assume that the concatenated values in the string can end up in any order? If that is the case then why does the example code on the MSDN page use the IsInvariantToOrder attribute?

I suspect a big problem here is your statement "the same as in the query" - however, your query never defines (and cannot define) an order by the things being aggregated (you can of course order the groups, by having a ORDER BY after the GROUP BY). Beyond that, I can only say that it is based purely on a set (rather than an ordered sequence), and that technically the order is indeed undefined.

While the accepted answer is correct, I wanted to share a workaround that others may find useful. Warning: it involves not using a user-defined aggregate at all :)
The link below describes an elegant way to build a concatenated, delimited list using only a SELECT statement and a varchar variable. The upside (for this thread) is that you can specify the order in which the rows are processed. The downside is that you can't easily concatenate across many different subsets of rows without painful iteration.
Not perfect, but for my use case was a good workaround.
http://blog.sqlauthority.com/2008/06/04/sql-server-create-a-comma-delimited-list-using-select-clause-from-table-column/

Related

I want to show numbers in "From - To" Format

I have multiple random numbers in table column ID like
8v12027
8v12025
8v12024
8v12029.
8v12023
8v12030
8v12020
O/p - 8v12020, From 8v12023 To 8v12025, 8v12027, From 8v12029 To 8v12030,

I assume you'are waiting for an sql solution so :
You have to use Lead or Lag KeyWord and concat it.
SELECT CONCAT('From ',Id,'To :', LEAD(p.Id) OVER (ORDER BY p.Id),'s') FROM YourTable p
There is a really good explanation about thoses keyword in the sqlauthority web site.
https://blog.sqlauthority.com/2013/09/22/sql-server-how-to-access-the-previous-row-and-next-row-value-in-select-statement/
But If you were waiting for a pure C# solution, you can retreive the data set in an Array, after order it by Id and concat and with a for loop concat current value with previous (or next) one.
Or with a Linq use Aggregate
yourArray.Aggregate((a,b)=> String.Concat("From ",a," To ",b,";")).Split(';')

GROUP BY / Case Insensitivity extension for Asp.Net Web API ODATA?

Couldn't find a group by references in ODATA V4 documentation. When we pass the group by query in URL it just returns the key, not the actual grouped results.
Any references for using Group by in Asp.Net ODATA, on extensibility
points of ODATA Web API. We're in need to take full command over how
ODATA query is parsed & transformed into LILNQ to entities query.
I am talking on the line of intercepting ODATA queries and performing manual customization e.g. LINQ to Entities
I am trying to achieve a similar extension for Case Sensitivity.
OData Case In-Sensitive filtering in Web API?

Try to approach it from SQL perspective:
by grouping you are able to get the keys, exactly, and some aggregate values
by using original filter you used for group and extending it with group keys (original filter and group keys matches) you actually load the data for the given group.
This is how our grouping grid works in angular telerik kendo (they have a nice toOdataString impl. which I also extended: https://github.com/telerik/kendo-angular/issues/2102).
This approach ensures a fixed given amount of groups in the grid (total group or 1st level group).
PRO: you see all the groups (or at least N of them)
CONS: if you unfold a group you might end up with too many items; needs lot of extra code and additional calls with special odata queries;
See: http://www.reflection.sk/#portfolios, check screenshot: Universal Plans Services (UPS) Software Bundle (.NET & Angular with KendoUI)
If you take it from UI perspective:
Then the grouping is nothing more than a list of data with priority sort over the grouped field. This is the default kendo grouping grid aproach. So they just sort the data, get a page-size of it then the grouping UX items are added (virtual items in the grid).
This approach ensures that you got a fixed items in the grid, but when you collapse all items, you might have just 1 or even pageSize count of groups (depending if/how many items are in each of the groups). See it here: https://www.telerik.com/kendo-angular-ui/components/grid/grouping/ - actually you'd need paging to be turned off to see the difference.
With items up to a fixed count this approach is the fastest. Just one call per page, but the count of groups is not know in advance (if you collapse them, it might be just 1 or even N where N is pagesize).

Regarding case sensitivity:
when filtering wrapping column name and entered value into tolower() helps:
but it's up to the DB settings how case sensitivity is handled by def.
Also a note: with grouping I was not able to do something like $groupby(tolower(columnname)) with odata, so...

Remove certain parts of a statement

I've got an application which generates a number of SQL statements, whose selection fields all contain AS clauses, like so:
SELECT TOP 200 [customer_name] AS [Customer Name], [customer_age] AS [Customer Age], AVG([customer_age]) AS 'Average Customer Age' FROM [tb_customers] GROUP BY [customer_age]
My statements will always be in that format. My task is to parse them so that "TOP 200" is removed, as well as all the AS clauses except for aggregates. In other words, I would want to parse the statements and in that case it would end up like so:
SELECT [customer_name], [customer_age], AVG([customer_age]) AS 'Average Customer Age' FROM [tb_customers] GROUP BY [customer_age]
How would I go about doing this? Is it even possible, as it seems like a very complex parsing task since the amount of fields is never going to be the same. If it helps, I've got a variable which stores the amount of fields in it (not including aggregates).

You may use a regular expression, like replace all occurrences of
AS \[.*?\]
with empty text
or all occurrences of
AS \[.*?\],
with a comma ",".
The question mark "?" is important here as it turns off greedy matching.

SQL Server Architecture for specific problem - full-text search - with full join

I am building an application that searches candidate's resumes. I need to use full-text search on the application as there are a lot of records and the resume field is fairly large. The issue is that for advanced searches, I have another table RelocationItems, that lists zips, states, etc. for the candidates relocation preferences and is related through a candidateID in the RelocationItems table. The problem is that sometimes a candidate will have no RelocationItems, sometimes they will have one, and sometimes they will have more than one. So, simple enough, I created a View that uses full outer join and then can select using DISTINCT on candidateID to find the candidates I need that will relocate to a certain area based on the search criteria.
The big problem with this view though as since it uses and Full Join, I can't use the full-text search now! (obviously so because my full-text index field is now not a unique not-null field)
And my stored procedure has the CONTAINS word in it so it won't even compile.
Should I :
- Create a new table based on the view? (and then create another index identity field)
- Do something to store the relocation items in the candidate table (maybe an XML field)? (I don't think you can store a table-value parameter in 2008 can you?)
- Do some sort of Union of Tables (Queries)? (Run the search against the Candidates Table and then against the RelocationTable and then merge or union)?
Thanks for any suggestions on the best way to work around this problem!!!

I created a View that uses full outer join and then can select using DISTINCT on candidateID to
find the candidates I need that will relocate to a certain area based on the search criteria.
Already a potential problem - a subselect with exists would be better.
A properly set up query would have no problem - do not use a join, go for a subselect and exists.

Selecting first 100 records using Linq

How can I return first 100 records using Linq?
I have a table with 40million records.
This code works, but it's slow, because will return all values before filter:
var values = (from e in dataContext.table_sample
where e.x == 1
select e)
.Take(100);
Is there a way to return filtered? Like T-SQL TOP clause?

No, that doesn't return all the values before filtering. The Take(100) will end up being part of the SQL sent up - quite possibly using TOP.
Of course, it makes more sense to do that when you've specified an orderby clause.
LINQ doesn't execute the query when it reaches the end of your query expression. It only sends up any SQL when either you call an aggregation operator (e.g. Count or Any) or you start iterating through the results. Even calling Take doesn't actually execute the query - you might want to put more filtering on it afterwards, for instance, which could end up being part of the query.
When you start iterating over the results (typically with foreach) - that's when the SQL will actually be sent to the database.
(I think your where clause is a bit broken, by the way. If you've got problems with your real code it would help to see code as close to reality as possible.)

I don't think you are right about it returning all records before taking the top 100. I think Linq decides what the SQL string is going to be at the time the query is executed (aka Lazy Loading), and your database server will optimize it out.

Have you compared standard SQL query with your linq query? Which one is faster and how significant is the difference?
I do agree with above comments that your linq query is generally correct, but...
in your 'where' clause should probably be x==1 not x=1 (comparison instead of assignment)
'select e' will return all columns where you probably need only some of them - be more precise with select clause (type only required columns); 'select *' is a vaste of resources
make sure your database is well indexed and try to make use of indexed data
Anyway, 40milions records database is quite huge - do you need all that data all the time? Maybe some kind of partitioning can reduce it to the most commonly used records.

I agree with Jon Skeet, but just wanted to add:
The generated SQL will use TOP to implement Take().
If you're able to run SQL-Profiler and step through your code in debug mode, you will be able to see exactly what SQL is generated and when it gets executed. If you find the time to do this, you will learn a lot about what happens underneath.
There is also a DataContext.Log property that you can assign a TextWriter to view the SQL generated, for example:
dbContext.Log = Console.Out;
Another option is to experiment with LINQPad. LINQPad allows you to connect to your datasource and easily try different LINQ expressions. In the results panel, you can switch to see the SQL generated the LINQ expression.

I'm going to go out on a limb and guess that you don't have an index on the column used in your where clause. If that's the case then it's undoubtedly doing a table scan when the query is materialized and that's why it's taking so long.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.