I have a Windows forms (c#) application and a table in SQL Server that has two columns like this:
ticket (int) | numbers (string)
12345 | '01, 02, 04, 05, 09, 10, 23'
This table may have like 100.000 rows or more.
Where I have to do is to found the amount of hits giving an array of numbers like a lottery.
I have 12 hits, 11 hits and 9 hits for example and for each raffled number I have to perform the search of what win the 12 hits, 11 hits or 9 hits.
So, how is the best way to get this approach? I need the best performance.
For now I have this code:
string sentSQL = " SELECT ticket, numbers FROM tableA";
/* CODE TO PERFORM THE CONNECTION */
/*...*/
DbDataReader reader = connection.ExecuteReader();
int hits12, hits11, hits9 = 0;
int count;
while (reader.Read())
{
count = 0;
string numbers = reader["numbers"].ToString();
string ticketNumber = reader["ticket"].ToString();
int maxJ = balls.Count; //balls is the ArrayList with the numbers currently extracted in the raffle
for (int j = 0; j < maxJ; j++)
{
if (numbers.Contains(balls[j].ToString()))
{
count++;
}
}
switch (count)
{
case 12:
hits12++;
break;
case 11:
hits11++;
break;
case 9:
hits9++;
break;
}
}
This is working but maybe there is a better method to make it possible.
I'm using SQL Server 2012, maybe is there a function that help me?
Edit: Can i perform in the sql query a SUM of the CHARINDEX of each number to get the amount of hits inside the sql query?
You currently have a totally tacky solution.
create table ticket (
ticketId int not null -- PK
)
create table TicketNumbers *
ticketId int not null,
numberSelected int not null
)
TicketNumbers has an FK to Ticket, and a PK of TicketNumber + numberSelected.
select t.ticketId, count(*) CorrectNumbers
from ticket t
inner join TicketNumbers tn on tn.ticketId = t.TicketId
where tn.numberSelected in (9, 11, 12, 15) -- list all winning numbers
group by t.ticketId
order by count(*) desc
Cheers -
One simple way to improve this is to update your select statement to get only records with numbers greater than your first ball number and less that your last ball number + 1 ...
Example (probably not correct SQL):
SELECT ticket, numbers FROM tableA where '10' < numbers and '43' > numbers
Related
I have a task to solve. I am trying to display the operation time of two machines (number1 & number 2) in a diagram. Therefore i store information in a table. The columns are id, date, number1, number2.
Lets assume i have this specific dataset:
id date number1 number2
1| 24.09.14 | 100 | 120
2| 01.10.14 | 150 | 160
For displaying the information I need to retrieve the following data.
((number1(2)- number1(1)) + number2(2) - number1(1))/2)/(number of days (date2 - date1))
This should result in the following specific numbers.
((150-100 + 160-120)/2)/7= 6,42
Or in plain words. The result should be the average daily operation time from all of my machines. Substracting saturdays and sundays from the number of dates would be nice but not necessary.
I hope that you understand my question. In essence I am facing the problem that i dont know how to work with different rows from a simple sql query.
The programming language is c# in a razor based web project.
First I doubt that you have only 2 records in database. Here some code that makes calculation for every 2 rows in DataSet.
for(int i=0; i < dst.Tables[0].Rows.Count - 1; i+=2)
{
if(dst.Tables[0].Rows.Count % 2 != 0)
Console.WriteLine("Wrong records count")
int number1Row1 =Convert.ToInt32(dst.Tables[0].Rows[i]["Number1"]);
int number1Row2 =Convert.ToInt32(dst.Tables[0].Rows[i]["Number2"]);
int number2Row1 =Convert.ToInt32(dst.Tables[0].Rows[i+1]["Number1"]);
int number2Row2 =Convert.ToInt32(dst.Tables[0].Rows[i+1]["Number2"]);
DateTime dateRow1 =Convert.ToDateTime(dst.Tables[0].Rows[i]["Date"]);
DateTime dateRow2 =Convert.ToDateTime(dst.Tables[0].Rows[i+1]["Date"]);
double calc = ((number1Row2- number1Row1 + number2Row2 - number2Row1)/2)*(dateRow1 - dateRow2).TotalDays
Console.WriteLine(calc);
}
It is wroted to be maximum clear to understand.
Your formule have probably a mistake in front of your numerical sample :
((number1(2)- number1(1)) + number2(2) - number2(1))/2)/(number of days (date2 - date1))
If the values of the id column are chronological and have no holes (1.2, 3, 4, ... OK but 1,3,4, 6 KO ...) you can try the following script :
SELECT t2.number1 , t1.number1, t2.number2 , t1.number1 , DATEDIFF(DAY, t2.date, t1.date)
, (((t2.number1 - t1.number1) + t2.number2 - t1.number2) /2 ) / DATEDIFF(DAY, t2.date, t1.date) as result
FROM #tmp t1
INNER JOIN #tmp t2 ON t1.id + 1 = t2.id
--- I create a #tmp table for test
CREATE table #tmp
(
id int,
Date DateTime,
number1 float,
number2 float
)
--- insert samples data
INSERT INTO #tmp (id, Date, number1, number2) VALUES (1, '2014-09-24T00:00:00', 100, 120), (2, '2014-10-01T00:00:00', 150, 160)
it work great on my SQL Server
Yes you can do it with sql query. Try the below query.
SELECT
N1.Date as PeriodStartDate,
N2.Date as PeriodEndDate,
CAST(CAST((((N2.number1- N1.number1) + (n2.number2 - N1.number2))/2) AS DECIMAL(18,2))/(datediff(d,n1.date,n2.date)) AS DECIMAL(18,2) ) AS AverageDailyOperation
FROM
[dbo].[NumberTable] N1
INNER JOIN
[dbo].[NumberTable] N2
ON N2.Date>N1.Date
I have assumed the table name as NumberTable, I have added PeriodStartDate and PeriodEndDate to make it meaningful. You can remove it as per your need.
I have two tables in database. Ticket and TicketNumbers. I would like to write Linq to count the number of tickets that have numbers matching those passed into this function. Since we don't know how many numbers must be matched, the Linq has to be dynamic ... to my understanding.
public int CountPartialMatchingTicket(IList<int> numbers)
{
// where's the code? =_=;
}
Say for example there are 3 Tickets in the database now and I want to count up all those that have the numbers 3 and 4.
(1) 1 2 3 4
(2) 1 3 4 6 7
(3) 1 2 3
In this case the function should return 2, since ticket (1) and (2) have the matching numbers.
In another case if asked to match 1, 2, 3, then again we should be returned 2.
Here's what those two tables in the database look like:
Ticket:
TicketId, Name, AddDateTime
TicketNumbers:
Id, TicketId, Number
I've never used Dynamic Linq before, so just in case this is what I have at the top of my cs file.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using LottoGen.Models;
using System.Linq.Dynamic;
First things first though, I don't even know how I should write the Linq line for a fixed amount of numbers. I suppose the SQL would be like this:
SELECT TicketId, COUNT(0) AS Expr1
FROM TicketNumber
WHERE (Number = 3) OR (Number = 4)
GROUP BY TicketId
However this isn't what I want either. The above query would get me Tickets that have either a 3 or a 4 - but I just want the tickets that have BOTH numbers. And I guess it has to be nested somehow to return a single count. If I had to use my imagination for completing the function then, it would be something like this:
public int CountPartialMatchingTicket(IList<int> numbers)
{
string query = "";
foreach(int number in numbers) {
query += "Number = " + number.ToString() + " AND ";
}
// I know.. there is a trailing AND.. lazy
int count = DbContext.TicketNumbers.Where(query).Count();
return count;
}
Oh wait a minute. There's no Dynamic Linq there... The above is looking like something I would do in PHP and that query statement obviously does not do anything useful. What am I doing? :(
At the end of the day, I want to output a little table to the webpage looking like this.
Ticket Matching Tickets
-----------------------------------
3 4 2
Trinity, help!
public int CountPartialMatchingTicket(IList<int> numbers)
{
var arr = numbers.ToArray();
int count = DbContext.Tickets
.Count(tk=>arr.All(n=> tk.TicketNumbers.Any(tn=>tn.Number== n));
return count;
}
UPDATE: "If you don't mind, how would I limit this query to just the tickets made on a particular AddDateTime (the whole day)?"
The part inside the Count() method call is the WHERE condition, so just extend that:
DateTime targetDate = ......;
DateTime tooLate = targetDate.AddDay(1);
int count = DbContext.Tickets
.Count(tk=>
targetDate < tk.AddDateTime && tk.AddDateTime < tooLate
&& arr.All(n=> tk.TicketNumbers.Any(tn=>tn.Number== n));
This is similar to James Curran's answer, but a little simpler, and it should produce a simpler WHERE IN-based query:
// count all tickets...
return DbContext.Tickets.Count(
// where any of their ticket numbers
tk => tk.TicketNumbers.Any(
// are contained in our list of numbers
tn => numbers.Contains(tn.Number)))
I have a temp table as shown in following screen shot. I populate this table using an SP as an intermediate step to generate a report.
This table contains Employee ID, PID (working location) and days from 1st to 31st. If an employee has worked a Day Shift its denoted by D and night shifts are denoted by N. If an employee has worked both shifts its denoted by D/N. Now I have to get the totals to last columns as follows.
sub_totals - total of "D" and "N" separately. ex. "15/08"
shift_totals - total of shifts together. ex. "23" (All "D"s and "N"s)
day_totals - number of days worked (count of D, N and D/N) ex. "20"
NOT: When calculating day-total, "D/N" should be treated as a one day worked.
Finally I want to show this on a report. (development language is C# and I'm using ADO.NET)
Could someone please show me how to do this in SQL if possible?
You would have to do something like this:
SELECT
EmployeeID,
PID,
(
CASE WHEN [1] = 'D' OR [1] = 'N' THEN 1 ELSE IF [1] = 'D/N' THEN 2 ELSE 0 END
+ CASE WHEN [2] = ...
+ ...
+ CASE WHEN [31] = 'D' OR [31] = 'N' .....
) AS shift_totals,
(
CASE WHEN [1] IS NOT NULL THEN 1 ELSE 0 END
+ CASE WHEN [2] IS ...
+ ...
) AS day_totals
I have a situation where by I need to create tens of thousands of unique numbers. However these numbers must be 9 digits and cannot contain any 0's. My current approach is to generate 9 digits (1-9) and concatenate them together, and if the number is not already in the list adding it into it. E.g.
public void generateIdentifiers(int quantity)
{
uniqueIdentifiers = new List<string>(quantity);
while (this.uniqueIdentifiers.Count < quantity)
{
string id = string.Empty;
id += random.Next(1,10);
id += random.Next(1,10);
id += random.Next(1,10);
id += " ";
id += random.Next(1,10);
id += random.Next(1,10);
id += random.Next(1,10);
id += " ";
id += random.Next(1,10);
id += random.Next(1,10);
id += random.Next(1,10);
if (!this.uniqueIdentifiers.Contains(id))
{
this.uniqueIdentifiers.Add(id);
}
}
}
However at about 400,000 the process really slows down as more and more of the generated numbers are duplicates. I am looking for a more efficient way to perform this process, any help would be really appreciated.
Edit: - I'm generating these - http://www.nhs.uk/NHSEngland/thenhs/records/Pages/thenhsnumber.aspx
As others have mentioned, use a HashSet<T> instead of a List<T>.
Furthermore, using StringBuilder instead of simple string operations will gain you another 25%. If you can use numbers instead of strings, you win, because it only takes a third or fourth of the time.
var quantity = 400000;
var uniqueIdentifiers = new HashSet<int>();
while (uniqueIdentifiers.Count < quantity)
{
int i=0;
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
uniqueIdentifiers.Add(i);
}
It takes about 270 ms on my machine for 400,000 numbers and about 700 for 1,000,000. And this even without any parallelism.
Because of the use of a HashSet<T> instead of a List<T>, this algorithm runs in O(n), i.e. the duration will grow linear. 10,000,000 values therefore take about 7 seconds.
This suggestion may or may not be popular.... it depends on people's perspective. Because you haven't been too specific about what you need them for, how often, or the exact number, I will suggest a brute force approach.
I would generate a hundred thousand numbers - shouldn't take very long at all, maybe a few seconds? Then use Parallel LINQ to do a Distinct() on them to eliminate duplicates. Then use another PLINQ query to run a regex against the remainder to eliminate any with zeroes in them. Then take the top x thousand. (PLINQ is brilliant for ripping through large tasks like this). If needed, rinse and repeat until you have enough for your needs.
On a decent machine it will just about take you longer to write this simple function than it will take to run it. I would also query why you have 400K entries to test when you state you actually need "tens of thousands"?
The trick here is that you only need ten thousand unique numbers. Theoretically you could have almost 9,0E+08 possibilities, but why care if you need so many less?
Once you realize that you can cut down on the combinations that much then creating enough unique numbers is easy:
long[] numbers = { 1, 3, 5, 7 }; //note that we just take a few numbers, enough to create the number of combinations we might need
var list = (from i0 in numbers
from i1 in numbers
from i2 in numbers
from i3 in numbers
from i4 in numbers
from i5 in numbers
from i6 in numbers
from i7 in numbers
from i8 in numbers
from i9 in numbers
select i0 + i1 * 10 + i2 * 100 + i3 * 1000 + i4 * 10000 + i5 * 100000 + i6 * 1000000 + i7 * 10000000 + i8 * 100000000 + i9 * 1000000000).ToList();
This snippet creates a list of more than a 1,000,000 valid unique numbers pretty much instantly.
Try avoiding checks making sure that you always pick up a unique number:
static char[] base9 = "123456789".ToCharArray();
static string ConvertToBase9(int value) {
int num = 9;
char[] result = new char[9];
for (int i = 8; i >= 0; --i) {
result[i] = base9[value % num];
value = value / num;
}
return new string(result);
}
public static void generateIdentifiers(int quantity) {
var uniqueIdentifiers = new List<string>(quantity);
// we have 387420489 (9^9) possible numbers of 9 digits in base 9.
// if we choose a number that is prime to that we can easily get always
// unique numbers
Random random = new Random();
int inc = 386000000;
int seed = random.Next(0, 387420489);
while (uniqueIdentifiers.Count < quantity) {
uniqueIdentifiers.Add(ConvertToBase9(seed));
seed += inc;
seed %= 387420489;
}
}
I'll try to explain the idea behind with small numbers...
Suppose you have at most 7 possible combinations. We choose a number that is prime to 7, e.g. 3, and a random starting number, e.g. 4.
At each round, we add 3 to our current number, and then we take the result modulo 7, so we get this sequence:
4 -> 4 + 3 % 7 = 0
0 -> 0 + 3 % 7 = 3
3 -> 3 + 3 % 7 = 6
6 -> 6 + 6 % 7 = 5
In this way, we generate all the values from 0 to 6 in a non-consecutive way. In my example, we are doing the same, but we have 9^9 possible combinations, and as a number prime to that I choose 386000000 (you just have to avoid multiples of 3).
Then, I pick up the number in the sequence and I convert it to base 9.
I hope this is clear :)
I tested it on my machine, and generating 400k unique values took ~ 1 second.
Meybe this will bee faster:
//we can generate first number wich in 9 base system will be between 88888888 - 888888888
//we can't start from zero becouse it will couse the great amount of 1 digit at begining
int randNumber = random.Next((int)Math.Pow(9, 8) - 1, (int)Math.Pow(9, 9));
//no we change our number to 9 base, but we add 1 to each digit in our number
StringBuilder builder = new StringBuilder();
for (int i=(int)Math.Pow(9,8); i>0;i= i/9)
{
builder.Append(randNumber / i +1);
randNumber = randNumber % i;
}
id = builder.ToString();
Looking at the solutions already posted, mine seems fairly basic. But, it works, and generates 1million values in approximate 1s (10 million in 11s).
public static void generateIdentifiers(int quantity)
{
HashSet<int> uniqueIdentifiers = new HashSet<int>();
while (uniqueIdentifiers.Count < quantity)
{
int value = random.Next(111111111, 999999999);
if (!value.ToString().Contains('0') && !uniqueIdentifiers.Contains(value))
uniqueIdentifiers.Add(value);
}
}
use string array or stringbuilder, wjile working with string additions.
more over, your code is not efficient because after generating many id's your list may hold new generated id, so that the while loop will run more than you need.
use for loops and generate your id's from this loop without randomizing. if random id's are required, use again for loops and generate more than you need and give an generation interval, and selected from this list randomly how much you need.
use the code below to have a static list and fill it at starting your program. i will add later a second code to generate random id list. [i'm a little busy]
public static Random RANDOM = new Random();
public static List<int> randomNumbers = new List<int>();
public static List<string> randomStrings = new List<string>();
private void fillRandomNumbers()
{
int i = 100;
while (i < 1000)
{
if (i.ToString().Contains('0') == false)
{
randomNumbers.Add(i);
}
}
}
I think first thing would be to use StringBuilder, instead of concatenation - you'll be pleasantly surprised.
Antoher thing - use a more efficient data structure, for example HashSet<> or HashTable.
If you could drop the quite odd requirement not to have zero's - then you could of course use just one random operation, and then format your resulting number the way you want.
I think #slugster is broadly right - although you could run two parallel processes, one to generate numbers, the other to verify them and add them to the list of accepted numbers when verified. Once you have enough, signal the original process to stop.
Combine this with other suggestions - using more efficient and appropriate data structures - and you should have something that works acceptably.
However the question of why you need such numbers is also significant - this requirement seems like one that should be analysed.
Something like this?
public List<string> generateIdentifiers2(int quantity)
{
var uniqueIdentifiers = new List<string>(quantity);
while (uniqueIdentifiers.Count < quantity)
{
var sb = new StringBuilder();
sb.Append(random.Next(11, 100));
sb.Append(" ");
sb.Append(random.Next(11, 100));
sb.Append(" ");
sb.Append(random.Next(11, 100));
var id = sb.ToString();
id = new string(id.ToList().ConvertAll(x => x == '0' ? char.Parse(random.Next(1, 10).ToString()) : x).ToArray());
if (!uniqueIdentifiers.Contains(id))
{
uniqueIdentifiers.Add(id);
}
}
return uniqueIdentifiers;
}
I have an array with a huge amounts of IDs I would like to select out from the DB.
The usual approach would be to do select blabla from xxx where yyy IN (ids) OPTION (RECOMPILE).
(The option recompile is needed, because SQL server is not intelligent enough to see that putting this query in its query cache is a huge waste of memory)
However, SQL Server is horrible at this type of query when the amount of IDs are high, the parser that it uses to simply too slow.
Let me give an example:
SELECT * FROM table WHERE id IN (288525, 288528, 288529,<about 5000 ids>, 403043, 403044) OPTION (RECOMPILE)
Time to execute: ~1100 msec (This returns appx 200 rows in my example)
Versus:
SELECT * FROM table WHERE id BETWEEN 288525 AND 403044 OPTION (RECOMPILE)
Time to execute: ~80 msec (This returns appx 50000 rows in my example)
So even though I get 250 times more data back, it executes 14 times faster...
So I built this function to take my list of ids and build something that will return a reasonable compromise between the two (something that doesn't return 250 times as much data, yet still gives the benefit of parsing the query faster)
private const int MAX_NUMBER_OF_EXTRA_OBJECTS_TO_FETCH = 5;
public static string MassIdSelectionStringBuilder(
List<int> keys, ref int startindex, string colname)
{
const int maxlength = 63000;
if (keys.Count - startindex == 1)
{
string idstring = String.Format("{0} = {1}", colname, keys[startindex]);
startindex++;
return idstring;
}
StringBuilder sb = new StringBuilder(maxlength + 1000);
List<int> individualkeys = new List<int>(256);
int min = keys[startindex++];
int max = min;
sb.Append("(");
const string betweenAnd = "{0} BETWEEN {1} AND {2}\n";
for (; startindex < keys.Count && sb.Length + individualkeys.Count * 8 < maxlength; startindex++)
{
int key = keys[startindex];
if (key > max+MAX_NUMBER_OF_EXTRA_OBJECTS_TO_FETCH)
{
if (min == max)
individualkeys.Add(min);
else
{
if(sb.Length > 2)
sb.Append(" OR ");
sb.AppendFormat(betweenAnd, colname, min, max);
}
min = max = key;
}
else
{
max = key;
}
}
if (min == max)
individualkeys.Add(min);
else
{
if (sb.Length > 2)
sb.Append(" OR ");
sb.AppendFormat(betweenAnd, colname, min, max);
}
if (individualkeys.Count > 0)
{
if (sb.Length > 2)
sb.Append(" OR ");
string[] individualkeysstr = new string[individualkeys.Count];
for (int i = 0; i < individualkeys.Count; i++)
individualkeysstr[i] = individualkeys[i].ToString();
sb.AppendFormat("{0} IN ({1})", colname, String.Join(",",individualkeysstr));
}
sb.Append(")");
return sb.ToString();
}
It is then used like this:
List<int> keys; //Sort and make unique
...
for (int i = 0; i < keys.Count;)
{
string idstring = MassIdSelectionStringBuilder(keys, ref i, "id");
string sqlstring = string.Format("SELECT * FROM table WHERE {0} OPTION (RECOMPILE)", idstring);
However, my question is...
Does anyone know of a better/faster/smarter way to do this?
In my experience the fastest way was to pack numbers in binary format into an image. I was sending up to 100K IDs, which works just fine:
Mimicking a table variable parameter with an image
Yet is was a while ago. The following articles by Erland Sommarskog are up to date:
Arrays and Lists in SQL Server
If the list of Ids were in another table that was indexed, this would execute a whole lot faster using a simple INNER JOIN
if that isn't possible then try creating a TABLE variable like so
DECLARE #tTable TABLE
(
#Id int
)
store the ids in the table variable first, then INNER JOIN to your table xxx, i have had limited success with this method, but its worth the try
You're using (key > max+MAX_NUMBER_OF_EXTRA_OBJECTS_TO_FETCH) as the check to determine whether to do a range fetch instead of an individual fetch. It appears that's not the best way to do that.
let's consider the 4 ID sequences {2, 7}, {2,8}, {1,2,7}, and {1,2,8}.
They translate into
ID BETWEEN 2 AND 7
ID ID in (2, 8)
ID BETWEEN 1 AND 7
ID BETWEEN 1 AND 2 OR ID in (8)
The decision to fetch and filter the IDs 3-6 now depends only on the difference between 2 and 7/8. However, it does not take into account whether 2 is already part of a range or a individual ID.
I think the proper criterium is how many individual IDs you save. Converting two individuals into a range removes has a net benefit of 2 * Cost(Individual) - Cost(range) whereas extending a range has a net benefit of Cost(individual) - Cost(range extension).
Adding recompile not a good idea. Precompiling means sql does not save your query results but it saves the execution plan. Thereby trying to make the query faster. If you add recompile then it will have the overhead of compiling the query always. Try creating a stored procedure and saving the query and calling it from there. As stored procedures are always precompiled.
Another dirty idea similar to Neils,
Have a indexed view which holds the IDs alone based on your business condition
And you can join the view with your actual table and get the desired result.
The efficient way to do this is to:
Create a temporary table to hold the IDs
Call a SQL stored procedure with a string parameter holding all the comma-separated IDs
The SQL stored procedure uses a loop with CHARINDEX() to find each comma, then SUBSTRING to extract the string between two commas and CONVERT to make it an int, and use INSERT INTO #Temporary VALUES ... to insert it into the temporary table
INNER JOIN the temporary table or use it in an IN (SELECT ID from #Temporary) subquery
Every one of these steps is extremely fast because a single string is passed, no compilation is done during the loop, and no substrings are created except the actual id values.
No recompilation is done at all when this is executed as long as the large string is passed as a parameter.
Note that in the loop you must tracking the prior and current comma in two separate values
Off the cuff here - does incorporating a derived table help performance at all? I am not set up to test this fully, just wonder if this would optimize to use between and then filter the unneeded rows out:
Select * from
( SELECT *
FROM dbo.table
WHERE ID between <lowerbound> and <upperbound>) as range
where ID in (
1206,
1207,
1208,
1209,
1210,
1211,
1212,
1213,
1214,
1215,
1216,
1217,
1218,
1219,
1220,
1221,
1222,
1223,
1224,
1225,
1226,
1227,
1228,
<...>,
1230,
1231
)