How to write dynamic Linq to count matching numbers

How to write dynamic Linq to count matching numbers - c#

I have two tables in database. Ticket and TicketNumbers. I would like to write Linq to count the number of tickets that have numbers matching those passed into this function. Since we don't know how many numbers must be matched, the Linq has to be dynamic ... to my understanding.
public int CountPartialMatchingTicket(IList<int> numbers)
{
// where's the code? =_=;
}
Say for example there are 3 Tickets in the database now and I want to count up all those that have the numbers 3 and 4.
(1) 1 2 3 4
(2) 1 3 4 6 7
(3) 1 2 3
In this case the function should return 2, since ticket (1) and (2) have the matching numbers.
In another case if asked to match 1, 2, 3, then again we should be returned 2.
Here's what those two tables in the database look like:
Ticket:
TicketId, Name, AddDateTime
TicketNumbers:
Id, TicketId, Number
I've never used Dynamic Linq before, so just in case this is what I have at the top of my cs file.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using LottoGen.Models;
using System.Linq.Dynamic;
First things first though, I don't even know how I should write the Linq line for a fixed amount of numbers. I suppose the SQL would be like this:
SELECT TicketId, COUNT(0) AS Expr1
FROM TicketNumber
WHERE (Number = 3) OR (Number = 4)
GROUP BY TicketId
However this isn't what I want either. The above query would get me Tickets that have either a 3 or a 4 - but I just want the tickets that have BOTH numbers. And I guess it has to be nested somehow to return a single count. If I had to use my imagination for completing the function then, it would be something like this:
public int CountPartialMatchingTicket(IList<int> numbers)
{
string query = "";
foreach(int number in numbers) {
query += "Number = " + number.ToString() + " AND ";
}
// I know.. there is a trailing AND.. lazy
int count = DbContext.TicketNumbers.Where(query).Count();
return count;
}
Oh wait a minute. There's no Dynamic Linq there... The above is looking like something I would do in PHP and that query statement obviously does not do anything useful. What am I doing? :(
At the end of the day, I want to output a little table to the webpage looking like this.
Ticket Matching Tickets
-----------------------------------
3 4 2
Trinity, help!

public int CountPartialMatchingTicket(IList<int> numbers)
{
var arr = numbers.ToArray();
int count = DbContext.Tickets
.Count(tk=>arr.All(n=> tk.TicketNumbers.Any(tn=>tn.Number== n));
return count;
}
UPDATE: "If you don't mind, how would I limit this query to just the tickets made on a particular AddDateTime (the whole day)?"
The part inside the Count() method call is the WHERE condition, so just extend that:
DateTime targetDate = ......;
DateTime tooLate = targetDate.AddDay(1);
int count = DbContext.Tickets
.Count(tk=>
targetDate < tk.AddDateTime && tk.AddDateTime < tooLate
&& arr.All(n=> tk.TicketNumbers.Any(tn=>tn.Number== n));

This is similar to James Curran's answer, but a little simpler, and it should produce a simpler WHERE IN-based query:
// count all tickets...
return DbContext.Tickets.Count(
// where any of their ticket numbers
tk => tk.TicketNumbers.Any(
// are contained in our list of numbers
tn => numbers.Contains(tn.Number)))

Related

How to genereate random numbers from a range when button is used but the numbers cannot repeate [duplicate]

This question already has answers here:
"order by newid()" - how does it work?
(5 answers)
Closed 2 years ago.
I am creating a quiz and im using a random number from a range of 1 - 20 numbers ( Primary Keys)
Random r = new Random();
int rInt = r.Next(1, 9);
The numbers(primary keys) and then used for a query to select 5 random number but the problem is that I am getting repeated questions because the numbers repeat
string SQL = "SELECT QuestionText,CorrectAnswer,WrongAnswer1,WrongAnswer2,WrongAnswer3 FROM Question Where QuestionID = " + rInt;
I have tried some methods to fix it but its not working and running out of ideas , anyone have any suggestions?

Just ask the database for it:
string SQL = #"
SELECT TOP 5 QuestionText,CorrectAnswer,WrongAnswer1,WrongAnswer2,WrongAnswer3
FROM Question
ORDER BY NewID()";
If/when you outgrow this, there exists a more optimized solution as well:
string SQL = #"
WITH cte AS
(
SELECT TOP 5 QuestionId FROM Questions ORDER BY NEWID()
)
SELECT QuestionText,CorrectAnswer,WrongAnswer1,WrongAnswer2,WrongAnswer3
FROM cte c
JOIN Questions q
ON q.QuestionId = c.QuestionId
";
The second query will perform much better (assuming QuestionId is your primary key) because it will only have to read the primary index (which will likely already be in memory), generate the Guids, pick the top 5 using the most efficient method, then look up those 5 records using the primary key.
The first query should work just fine for smaller number of questions, but I believe it may cause a table scan, and some pressure on tempdb, so if your questions are varchar(max) and get very long, or you have tens of thousands of questions with a very small tempdb with some versions of Sql Server, it may not perform great.

Something like this might do the trick for you:
[ThreadStatic]
private static Random __random = null;
public int[] Get5RandomQuestions()
{
__random = __random ?? new Random(Guid.NewGuid().GetHashCode()); // approx one in 850 chance of seed collision
using (var context = new MyDBContext())
{
var questions = context.Questions.Select(x => x.Question_ID).ToArray();
return questions.OrderBy(_ => __random.Next()).Take(5).ToArray();
}
}

Another, server side approach:
private static Random _r = new Random();
...
var seed = _r.NextDouble();
using var context = new SomeContext();
var questions = context.Questions
.OrderBy(p => SqlFunctions.Checksum(p.Id * seed))
.Take(5);
Note : Checksum is not bullet proof, limitations apply. This approach should not be used to generate quiz questions in life or death situations.
As per request:
SqlFunctions.Checksum will essentially generate a hash and order by it
CHECKSUM([Id] * <seed>) AS [C1],
...
ORDER BY [C1] ASC
CHECKSUM (Transact-SQL)
The CHECKSUM function returns the checksum value computed over a table
row, or over an expression list. Use CHECKSUM to build hash indexes.
...
CHECKSUM computes a hash value, called the checksum, over its argument
list. Use this hash value to build hash indexes. A hash index will
result if the CHECKSUM function has column arguments, and an index is
built over the computed CHECKSUM value. This can be used for equality
searches over the columns.
Note, as mentioned before the Checksum is not bullet proof it returns an int (take it for what it is), however, the chances of a collision or duplicate is extremely small for smaller data sets when using it in this way with unique Id, it's also fairly performant.
So running this only a production database with 10 million records many times, there was no collisions.
In regards to speed, it can get the top 5 in 75ms, however it is slower when generated by EF
The cte solution tendered for NewId, is about 125 ms.

The Linq .Distinct() method is too nice to not use here
The easiest way I know of doing this would be like below, using a method to create an infinite stream of random numbers, which can then be nicely wrangled with Linq:
using System.Linq;
IEnumerable<int> GenRandomNumbers()
{
var random = new Random();
while (true)
{
yield return rand.Next(1, 20);
}
}
var numbers = GenRandomNumbers()
.Distinct()
.Take(5)
.ToArray();
Though it looks like the generator method will run for ever because of its closed loop, it will only run until it has generated 5 distinct numbers, because of how it yields.

Try selecting all the questions from the db. Say you have them in a collection 'Question', you could then try
Questions.OrderBy(y => Guid.NewGuid()).ToList()

Get rid of non-digits and then get last n chars of a string in LINQ

I have two tables, both of them with a Phone column.
This column -NVARCHAR(30)- can have its stored data formatted in several different ways, from 0001 222 333 444 to 222 333 444 to (0001)-222333444 to - to even an empty string.
I would like to do a query using LINQ where the first three examples shown above would give a match, so I need to get rid of everything that's not a number and then get the last 9 digits out of the string. However, I haven't been able to do this using just one query, and instead I'm looping through each of my results and apply the phone number filters there. Any way this could be done with just the query?

String is essentially an IEnumerable<char>, so what you can do is:
var digits = s.Where(char.IsDigit);
Now, there's no real elegant way to take the last 9 digits: IEnumerable<> implies that there's no way to find out its length other than to iterate over it. What I can suggest is:
var digits = new string(s.Where(char.IsDigit).Reverse().Take(9).Reverse().ToArray());
Or you can get all fancy and write your own TakeLast() extension method:
public static IList<T> TakeLast<T>(this IEnumerable<T> enumerable, int n)
{
var queue = new Queue<T>(n);
foreach(var item in enumerable)
{
queue.Enqueue(item);
if(queue.Count > n)
queue.Dequeue();
}
return queue.ToList();
}
Which will greatly simplify your code:
var digits = new string(s.Where(char.IsDigit).TakeLast(9).ToArray());

Similar to Jonesopolis' answer, you can do the following to get the last N characters from a string:
var str = "0001 222 333 444";
var n = 9;
var result = string.Concat(str.Where(char.IsDigit).Skip(str.Length - n).Take(n));
this will skip the first few characters and pull back the last n number of characters. You may need to do a test on string length to make sure the string contains sufficient characters - I've not tested this so if the string is too short it might throw an error.

Assuming an EntityFramework mapped class like:
public class Person
{
public int Id { get; set; }
public string Phone { get; set; }
}
and an EF context that contains
public DbSet<Person> Persons
You can filter using something like
var ppl = context.Persons.Where(x => x.Phone.Replace(" ", "").EndsWith("222333444")).ToList();
This will produce SQL that looks like the following, satisfying your request for a filtering solution.
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Phone] AS [Phone]
FROM [dbo].[People] AS [Extent1]
WHERE REPLACE([Extent1].[Phone], N' ', N'') LIKE N'%222333444'
Then any formatting can be applied in a middle tier / model builder / automapper style solution.

How about a combination of Regex & LINQ :
Regex r1 = new Regex("[^0-9.]");
Regex r2 = new Regex("(.{9})$");
var Last9Digits = PhoneNumbers.Select(PhoneNo => r2.Match(r1.Replace(PhoneNo, "")));

/// <summary>
/// Get the last N characters of a string.
/// </summary>
public static string GetLast(this string source, int numberOfChars)
{
if (string.IsNullOrEmpty(source) || numberOfChars >= source.Length)
return source;
return source.Substring(source.Length - numberOfChars);
}

Method to find hits in comma-separated number string on SQL Server

I have a Windows forms (c#) application and a table in SQL Server that has two columns like this:
ticket (int) | numbers (string)
12345 | '01, 02, 04, 05, 09, 10, 23'
This table may have like 100.000 rows or more.
Where I have to do is to found the amount of hits giving an array of numbers like a lottery.
I have 12 hits, 11 hits and 9 hits for example and for each raffled number I have to perform the search of what win the 12 hits, 11 hits or 9 hits.
So, how is the best way to get this approach? I need the best performance.
For now I have this code:
string sentSQL = " SELECT ticket, numbers FROM tableA";
/* CODE TO PERFORM THE CONNECTION */
/*...*/
DbDataReader reader = connection.ExecuteReader();
int hits12, hits11, hits9 = 0;
int count;
while (reader.Read())
{
count = 0;
string numbers = reader["numbers"].ToString();
string ticketNumber = reader["ticket"].ToString();
int maxJ = balls.Count; //balls is the ArrayList with the numbers currently extracted in the raffle
for (int j = 0; j < maxJ; j++)
{
if (numbers.Contains(balls[j].ToString()))
{
count++;
}
}
switch (count)
{
case 12:
hits12++;
break;
case 11:
hits11++;
break;
case 9:
hits9++;
break;
}
}
This is working but maybe there is a better method to make it possible.
I'm using SQL Server 2012, maybe is there a function that help me?
Edit: Can i perform in the sql query a SUM of the CHARINDEX of each number to get the amount of hits inside the sql query?

You currently have a totally tacky solution.
create table ticket (
ticketId int not null -- PK
)
create table TicketNumbers *
ticketId int not null,
numberSelected int not null
)
TicketNumbers has an FK to Ticket, and a PK of TicketNumber + numberSelected.
select t.ticketId, count(*) CorrectNumbers
from ticket t
inner join TicketNumbers tn on tn.ticketId = t.TicketId
where tn.numberSelected in (9, 11, 12, 15) -- list all winning numbers
group by t.ticketId
order by count(*) desc
Cheers -

One simple way to improve this is to update your select statement to get only records with numbers greater than your first ball number and less that your last ball number + 1 ...
Example (probably not correct SQL):
SELECT ticket, numbers FROM tableA where '10' < numbers and '43' > numbers

LINQ, Lambda Expression Skip n records, while averaging the skipped records

I have an expression:
Records.OrderBy(o => o.TIME).Where((o, i) => i % interval == 0).ToList();
This does an alright job at taking a large list of data records and paring it down to a smaller list. (interval is the number of records to skip). The problem is, I want to average out some of the fields, and not just skip them. I have no idea how to do this without making a huge loop. It is worth noting that each data record has about 90 fields. Ideas?
Edit: I want to be able to skip exactly every nth record, and average 2 specific fields (lat and long (stored as decimal)) and most likely leave the other 88 fields untouched.
Edit: I would like to go from
timelat longmany other fields
1 2 3 field1
2 3 4 field1
3 4 5 field1
4 5 6 field1
5 6 7 field1
6 7 8 field1
7 8 9 field1
8 9 10 field1
9 10 11 field1
10 11 12 field1
11 12 13 field1
12 13 14 field1
To:
timelat long other fields
3 3 4 field1
6 6 7 field1
9 9 10 field1
12 12 13 field1

If I understand correctly, you want to group a large number of items into a smaller number of equally-sized "buckets", where for each bucket some fields are aggregated (e.g. averaged) and some are skipped (i.e. taken from the last item in the bucket).
Consider if you could do this:
Records
.ToBuckets(interval)
.Select(bucket => new Record {
Time = bucket.Last().Time,
Count = bucket.Count,
Lat = bucket.Average(x => x.Lat),
Long = bucket.Average(x => x.Long),
Other = bucket.First().Other
}
.ToList()
If this is what you want, all you need to do is create the ToBuckets method, which is a much simpler (and generic!) problem:
public static IEnumerable<IList<T>> ToBuckets<T>(this IEnumerable<T> source, int size)
{
var bucket = new List<T>(size);
foreach (var item in source)
{
bucket.Add(item);
if (bucket.Count == size) {
yield return bucket;
bucket = new List<T>(size); // or you can use the same one if you're careful
}
if (bucket.Count > 0) yield return bucket;
}
(The above is given as an extension method to support the example, but this can also be a regular method of course).

If you want to include a given record in an average, you are going to have to touch that record. Something is going to have to loop through all records, whether you are doing it explicitly or whether Linq is doing that behind the scenes.
A given Linq expression can only return one thing.
The Linq expression you currently have will return the filtered list.
You will need a second Linq expression (or your own loop) to average all of the records, e.g.
var avg = Records.Average(r => r.FieldToAverage);
I'm not sure what you meant by
It is worth noting that each data record has about 90 fields
Do you somehow have to average the fields within a given record? If so, what data type(s) are they? Is there some existing method to enumerate all of those fields? If not, you will need to explicitly access each field, or use reflection to enumerate (relevant) fields.

You should be able to stick it in the Where clause. It'll be a bit ugly but something like this:
[EDIT: From your edit I now understand you wanted something a little different. This code has been edited accordingly].
decimal latSum = 0;
decimal longSum = 0;
int count = 0;
var recordList = Records
.OrderBy(o => o.TIME)
.Where((o, i) => {
if (i % interval == 0)
{
// Modify the record in place (hope that's OK)
o.Lat = (o.Lat + latSum) / (count + 1);
o.Long = (o.Long + longSum) / (count + 1);
latSum = longSum = count = 0;
return true;
}
latSum += o.Lat;
longSum += o.Long;
count++;
return false;
})
.ToList();

SQL huge selection of IDs - How to make it faster?

I have an array with a huge amounts of IDs I would like to select out from the DB.
The usual approach would be to do select blabla from xxx where yyy IN (ids) OPTION (RECOMPILE).
(The option recompile is needed, because SQL server is not intelligent enough to see that putting this query in its query cache is a huge waste of memory)
However, SQL Server is horrible at this type of query when the amount of IDs are high, the parser that it uses to simply too slow.
Let me give an example:
SELECT * FROM table WHERE id IN (288525, 288528, 288529,<about 5000 ids>, 403043, 403044) OPTION (RECOMPILE)
Time to execute: ~1100 msec (This returns appx 200 rows in my example)
Versus:
SELECT * FROM table WHERE id BETWEEN 288525 AND 403044 OPTION (RECOMPILE)
Time to execute: ~80 msec (This returns appx 50000 rows in my example)
So even though I get 250 times more data back, it executes 14 times faster...
So I built this function to take my list of ids and build something that will return a reasonable compromise between the two (something that doesn't return 250 times as much data, yet still gives the benefit of parsing the query faster)
private const int MAX_NUMBER_OF_EXTRA_OBJECTS_TO_FETCH = 5;
public static string MassIdSelectionStringBuilder(
List<int> keys, ref int startindex, string colname)
{
const int maxlength = 63000;
if (keys.Count - startindex == 1)
{
string idstring = String.Format("{0} = {1}", colname, keys[startindex]);
startindex++;
return idstring;
}
StringBuilder sb = new StringBuilder(maxlength + 1000);
List<int> individualkeys = new List<int>(256);
int min = keys[startindex++];
int max = min;
sb.Append("(");
const string betweenAnd = "{0} BETWEEN {1} AND {2}\n";
for (; startindex < keys.Count && sb.Length + individualkeys.Count * 8 < maxlength; startindex++)
{
int key = keys[startindex];
if (key > max+MAX_NUMBER_OF_EXTRA_OBJECTS_TO_FETCH)
{
if (min == max)
individualkeys.Add(min);
else
{
if(sb.Length > 2)
sb.Append(" OR ");
sb.AppendFormat(betweenAnd, colname, min, max);
}
min = max = key;
}
else
{
max = key;
}
}
if (min == max)
individualkeys.Add(min);
else
{
if (sb.Length > 2)
sb.Append(" OR ");
sb.AppendFormat(betweenAnd, colname, min, max);
}
if (individualkeys.Count > 0)
{
if (sb.Length > 2)
sb.Append(" OR ");
string[] individualkeysstr = new string[individualkeys.Count];
for (int i = 0; i < individualkeys.Count; i++)
individualkeysstr[i] = individualkeys[i].ToString();
sb.AppendFormat("{0} IN ({1})", colname, String.Join(",",individualkeysstr));
}
sb.Append(")");
return sb.ToString();
}
It is then used like this:
List<int> keys; //Sort and make unique
...
for (int i = 0; i < keys.Count;)
{
string idstring = MassIdSelectionStringBuilder(keys, ref i, "id");
string sqlstring = string.Format("SELECT * FROM table WHERE {0} OPTION (RECOMPILE)", idstring);
However, my question is...
Does anyone know of a better/faster/smarter way to do this?

In my experience the fastest way was to pack numbers in binary format into an image. I was sending up to 100K IDs, which works just fine:
Mimicking a table variable parameter with an image
Yet is was a while ago. The following articles by Erland Sommarskog are up to date:
Arrays and Lists in SQL Server

If the list of Ids were in another table that was indexed, this would execute a whole lot faster using a simple INNER JOIN
if that isn't possible then try creating a TABLE variable like so
DECLARE #tTable TABLE
(
#Id int
)
store the ids in the table variable first, then INNER JOIN to your table xxx, i have had limited success with this method, but its worth the try

You're using (key > max+MAX_NUMBER_OF_EXTRA_OBJECTS_TO_FETCH) as the check to determine whether to do a range fetch instead of an individual fetch. It appears that's not the best way to do that.
let's consider the 4 ID sequences {2, 7}, {2,8}, {1,2,7}, and {1,2,8}.
They translate into
ID BETWEEN 2 AND 7
ID ID in (2, 8)
ID BETWEEN 1 AND 7
ID BETWEEN 1 AND 2 OR ID in (8)
The decision to fetch and filter the IDs 3-6 now depends only on the difference between 2 and 7/8. However, it does not take into account whether 2 is already part of a range or a individual ID.
I think the proper criterium is how many individual IDs you save. Converting two individuals into a range removes has a net benefit of 2 * Cost(Individual) - Cost(range) whereas extending a range has a net benefit of Cost(individual) - Cost(range extension).

Adding recompile not a good idea. Precompiling means sql does not save your query results but it saves the execution plan. Thereby trying to make the query faster. If you add recompile then it will have the overhead of compiling the query always. Try creating a stored procedure and saving the query and calling it from there. As stored procedures are always precompiled.

Another dirty idea similar to Neils,
Have a indexed view which holds the IDs alone based on your business condition
And you can join the view with your actual table and get the desired result.

The efficient way to do this is to:
Create a temporary table to hold the IDs
Call a SQL stored procedure with a string parameter holding all the comma-separated IDs
The SQL stored procedure uses a loop with CHARINDEX() to find each comma, then SUBSTRING to extract the string between two commas and CONVERT to make it an int, and use INSERT INTO #Temporary VALUES ... to insert it into the temporary table
INNER JOIN the temporary table or use it in an IN (SELECT ID from #Temporary) subquery
Every one of these steps is extremely fast because a single string is passed, no compilation is done during the loop, and no substrings are created except the actual id values.
No recompilation is done at all when this is executed as long as the large string is passed as a parameter.
Note that in the loop you must tracking the prior and current comma in two separate values

Off the cuff here - does incorporating a derived table help performance at all? I am not set up to test this fully, just wonder if this would optimize to use between and then filter the unneeded rows out:
Select * from
( SELECT *
FROM dbo.table
WHERE ID between <lowerbound> and <upperbound>) as range
where ID in (
1206,
1207,
1208,
1209,
1210,
1211,
1212,
1213,
1214,
1215,
1216,
1217,
1218,
1219,
1220,
1221,
1222,
1223,
1224,
1225,
1226,
1227,
1228,
<...>,
1230,
1231
)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to write dynamic Linq to count matching numbers - c#

Related

How to genereate random numbers from a range when button is used but the numbers cannot repeate [duplicate]

Get rid of non-digits and then get last n chars of a string in LINQ

Method to find hits in comma-separated number string on SQL Server

LINQ, Lambda Expression Skip n records, while averaging the skipped records

SQL huge selection of IDs - How to make it faster?

Categories

Resources