Comparing integer numbers predicate to LINQ

Comparing integer numbers predicate to LINQ - c#

I have this simple query where I need to identify all tickets within start and end number of a specific TicketBook object on api side in EF Core.
var ticketBook = await Context.TicketBooks.FirstOrDefaultAsync(x=>x.Id == query.TicketBookId);
if (ticketBook != null)
{
dbTickets = dbTickets.Where(x => ConvertTicketNumberToInt(x, ticketBook));
}
private bool ConvertTicketNumberToInt(Ticket t, TicketBook tb)
{
try
{
var numberOnly = new string(t.Number.Where(t => char.IsDigit(t)).ToArray());
var tNumber = Convert.ToInt64(numberOnly);
return tNumber >= tb.StartIntNumber && tNumber <= tb.EndIntNumber;
}
catch(OverflowException)
{
return false;
}
}
the problem is the "Number" property in Ticket class is nvarchar (string) but I need to convert it into integer for this particular query only and for that I have written a small method which does it for me. But as you can see its very time consuming and not efficient at all so my api call just times out.
I am trying to figure out how to do this in LINQ without writing extra methods like this. The trick is that "number" property can sometimes can have a few alphabets in it which throws exception while converting it to integer so I need to remove those non digit characters before the comparison that's why I had to write this dedicated method for it.

As already mentioned, you are facing some performance issues storing nvarchar instead of long.
Anyway, what you're doing in your code is not that bad - you have fairly simple method for the job which keeps your LINQ code clean and tidy. But since you want to have a single LINQ query, try the following (it can be done shorter but I've chosen this way for readability):
var ticketBook = await Context.TicketBooks.FirstOrDefaultAsync(x=>x.Id == query.TicketBookId);
if (ticketBook != null)
{
dbTickets = dbTickets
.Select(t => new { Ticket = t, Number = new string(t.Number.Where(n => char.IsDigit(n)).ToArray()) })
.Select(t =>
{
long ticketNumber = long.MinValue;
long.TryParse(t.Number), out ticketNumber);
return new { Ticket = t, Number = ticketNumber };
})
.Where(t => t.Ticket >= ticketBook.StartIntNumber && t.Ticket <= ticketBook.EndIntNumber)
.Select(t => t.Ticket);
}
What it does:
in first pass all your varchars are stripped of the letters and converted to strings containing only the digits, then an anonymous type with the complete Ticket class is returned along with this string
the strings are parsed to long - I've abused long.MinValue to indicate a failed conversion (since you're using char.IsDigit(c) I see you're not expect any negative values in your results. You might as well use ulong for twice the positive range and abuse 0 value) and again, an anonymous type is returned
those anonymous structures are filtered with the condition you provided
finally, only the original Ticket structure is returned
If you're concerned about the number of passes over the initial results - I've run several performance tests to find out whether having a number of Selects with short operations inside is slower than having one pass with an elaborate operation and I haven't observed any significant difference.

Your best bet is to do most of the conversion in the database.
If you have access to the context, you can do this:
dbTickets = Context.Tickets
.FromSqlRaw("SELECT * FROM Tickets WHERE CAST(CASE WHEN PATINDEX('%[^0-9]%',Number) = 0 THEN Number ELSE LEFT(Number,PATINDEX('%[^0-9]%',Number)-1) END as int) BETWEEN {0} AND {1}", ticketBook.StartIntNumber, ticketBook.EndIntNumber)
.ToList();
This will strip off any trailing letters from the Number column and convert it to an int, then use that to make sure it is between your StartIntNumber and EndIntNumber.
That said, I would highly suggest you add an additional column into your tickets table that uses a derivative of the above to calculate an integer and then make the column a persistent calculated column. Then you can index on that column. Very little (if ANY) should need to be changed in your code if you do this, and the performance benefit will be huge.
This is based on your comment that said sometimes Number has additional letters at the end, like 123A. The above would need to be modified if Number can have letters at the start or in the middle like A123 or 1A23. Currently, it would treat A123 as 0, and 1A23 as 1.

Related

FsCheck: How to generate test data that depends on other test data?

FsCheck has some neat default Arbitrary types to generate test data. However what if one of my test dates depends on another?
For instance, consider the property of string.Substring() that a resulting substring can never be longer than the input string:
[Fact]
public void SubstringIsNeverLongerThanInputString()
{
Prop.ForAll(
Arb.Default.NonEmptyString(),
Arb.Default.PositiveInt(),
(input, length) => input.Get.Substring(0, length.Get).Length <= input.Get.Length
).QuickCheckThrowOnFailure();
}
Although the implementation of Substring certainly is correct, this property fails, because eventually a PositiveInt will be generated that is longer than the genereated NonEmptyString resulting in an exception.
Shrunk: NonEmptyString "a" PositiveInt 2 with exception: System.ArgumentOutOfRangeException: Index and length must refer to a location within the string.
I could guard the comparison with an if (input.Length < length) return true; but that way I end up with lots of test runs were the property isn't even checked.
How do I tell FsCheck to only generate PositiveInts that don't exceed the input string? I presume I have to use the Gen<T> class, but it's interface is just hella confusing to me... I tried the following but still got PositiveInts exceeding the string:
var inputs = Arb.Default.NonEmptyString();
// I have no idea what I'm doing here...
var lengths = inputs.Generator.Select(s => s.Get.Length).ToArbitrary();
Prop.ForAll(
inputs,
lengths,
(input, length) => input.Get.Substring(0, length).Length <= input.Get.Length
).QuickCheckThrowOnFailure();

You can create generators which depend on values generated from another using SelectMany. This also allows you to use the LINQ query syntax e.g.
var gen = from s in Arb.Generate<NonEmptyString>()
from i in Gen.Choose(0, s.Get.Length - 1)
select Tuple.Create(s, i);
var p = Prop.ForAll(Arb.From(gen), t =>
{
var s = t.Item1.Get;
var len = t.Item2;
return s.Substring(0, len).Length <= s.Length;
});
Check.Quick(p);

C# converting a list of generated numbers into an int array

Again, I am a discrete mathematician, not a coder, but an trying to use C# for a paper I am working on and need some help.
I have a code to generate a set of random integers, based on user input, and printed them as a string separated by commas, but need to convert them into a vector (or an int array?). I am not exactly sure which is appropriate, but I cannot find much online about how to use either in C#. I need to be able to apply vector functions to them, so each entry still needs to be identifiable and an integer, but the vector needs to be able to vary in size depending on the user input.

If you already have a comma-delimited string, you can use the String.Split() method to split it based on the commas into an array and then you can convert each of these values into it's appropriate integer using the Int32.Parse() or Convert.ToInt32() methods respectively :
// The Split() method will yield an array, then the Select() statement
// will map each string value to it's appropriate integer and finally
// the last ToArray() call will make this into an actual array of integers
var output = input.Split(',').Select(n => Int32.Parse(n)).ToArray();
You can see an example of this in action here. If you needed to explicitly ignore possible empty entries and whitespace, you could use the following adjusted example :
var output = input.Split(new char[]{','}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => Int32.Parse(s.Trim()))
.ToArray();
An even safer approach still would be to only use values that could be properly parsed as integers via the Int32.TryParse() method as seen below :
// Split your string, removing any empty entries
var output = strings.Split(new char[]{','}, StringSplitOptions.RemoveEmptyEntries)
.Select(n => {
// A variable to store your value
int v;
// Attempt to parse it, store an indicator if the parse was
// successful (and store the value in your v parameter)
var success = Int32.TryParse(n, out v);
// Return an object containing your value and if it was successful
return new { Number = v, Successful = success };
})
// Now only select those that were successful
.Where(attempt => attempt.Successful)
// Grab only the numbers for the successful attempts
.Select(attempt => attempt.Number)
// Place this into an array
.ToArray();

Handling null in Linq

I am new to linq, and this keeps popping on a null volume field. The file is unpredictable, and it will happen so I would like to put a 0 in where there is an exception. any quick and easy way to do it?
var qry =
from line in File.ReadAllLines("C:\\temp\\T.txt")
let myRecX = line.Split(',')
select new myRec()
{
price = Convert.ToDecimal( myRecX[0].Replace("price = ", "")) ,
volume = Convert.ToInt32(myRecX[1].Replace("volume =", "")),
dTime = Convert.ToDateTime( myRecX[2].Replace("timestamp =", ""))
};

If you would like to use a default when the incoming data is null, empty, or consists entirely of whitespace characters, you can do it like this:
volume = string.IsNullOrWhitesplace(myRecX[1])
? defaultVolume // <<== You can use any constant here
: Convert.ToInt32(myRecX[1].Replace("volume =", ""))
However, this is a "quick and dirty" way of achieving what you need, because the position of each named parameter remains hardcoded. A more robust way would be writing a mini-parser that pays attention to the names of attributes specified in the file, rather than replacing them with an empty string.

You could use something like this, which offers an expressive way to write what you want:
static TOutput Convert<TInput, TOutput>(
TInput value,
params Func<TInput, TOutput>[] options)
{
foreach (var option in options) {
try { return option(value); }
catch { }
}
throw new InvalidOperationException("No option succeeded.");
}
Used like:
select new myRec()
{
price = Convert(myRecX[0].Replace("price = ", ""),
input => Convert.ToDecimal(input),
or => 0M),
...
};
The function indirection and implicit array construction may incur a slight performance penalty, but it gives you a nice syntax with which to specify a number of possible conversions, where the first successful one is taken.

I think here there's an issue beyond the use of Linq.
In general is bad practice manipulating file data before sanitizing it.
Ever if the following question is on the filename (rather than it's content) is a good starting point to understand the concept of sanitizing input:
C# Sanitize File Name
After all yourself tells that your code lacks control of the file content, so before call:
let myRecX = line.Split(',')
I suggest define a private method like:
string SanitizeInputLine(string input) {
// here do whatever is needed to bring back input to
// a valid format in a way that subsequent calls will not
// fail
return input;
}
Applying it is straightforward;
let myRecX = SanitizeInputLine(line).Split(',')
As general rule never trust input.
Let me quote Chapter 10 named _All Input Is Evil!__ of Writing Secure Code by Howard/LeBlanc:
...you should never trust data until data is validated. Failure to do
so will render your application vulnerable. Or, put another way: all
input is evil until proven otherwise.

Parallel For Loop

I am trying to utilize the parallel for loop in .NET Framework 4.0. However I noticed that, I am missing some elements in the result set.
I have snippet of code as below. lhs.ListData is a list of nullable double and rhs.ListData is a list of nullable double.
int recordCount = lhs.ListData.Count > rhs.ListData.Count ? rhs.ListData.Count : lhs.ListData.Count;
List<double?> listResult = new List<double?>(recordCount);
var rangePartitioner = Partitioner.Create(0, recordCount);
Parallel.ForEach(rangePartitioner, range =>
{
for (int index = range.Item1; index < range.Item2; index++)
{
double? result = lhs.ListData[index] * rhs.ListData[index];
listResult.Add(result);
}
});
lhs.ListData has the length of 7964 and rhs.ListData has the length of 7962. When I perform the "*" operation, listResult has only 7867 as output. There are null elements in the both input list.
I am not sure what is happening during the execution. Is there any reason why I am seeing less elements in the result set? Please advice...

The correct way to do this is to use LINQ's IEnumerable.AsParallel() extention. It does all of the partitioning for you, and everything in PLINQ is inherently thread-safe. There is another LINQ extension called Zip that zips together two collections into one, based on a function that you give it. However, this isn't exactly what you need as it only goes to the length of the shorter of the two lists, not the longer. It would probably be easies to do this, but first expand the shorter of the two lists to the length of the longer one by padding it with null at the end of the list.
IEnumerable<double?> lhs, rhs; // Assume these are filled with your numbers.
double?[] result = System.Linq.Enumerable.Zip(lhs, rhs, (a, b) => a * b).AsParallel().ToArray();
Here's the MSDN page on Zip:
http://msdn.microsoft.com/en-us/library/dd267698%28VS.100%29.aspx

That's probably because the operations on a List<T> (e.g. Add) are not thread safe - your results may vary. As a workaround you could use a lock, but that would very much reduce performance.
It looks like you just want each item in the result list to be the product of the items at the corresponding index in the two input lists, how about this instead using PLINQ:
var listResult = lhs.AsParallel()
.Zip(rhs.AsParallel(), (a,b) => a*b)
.ToList();
Not sure why you chose parallelism here, I would benchmark if this is even necessary - is this truly the bottleneck in your application?

You are using List<double?> to store results but Add method is not thread safe.
You can use explicit index to store the result (instead of calling Add):
listResult[index] = result;

c# - BinarySearch StringList with wildcard

I have a sorted StringList and wanted to replace
foreach (string line3 in CardBase.cardList)
if (line3.ToLower().IndexOf((cardName + Config.EditionShortToLong(edition)).ToLower()) >= 0)
{
return true;
}
with a binarySearch, since the cardList ist rather large(~18k) and this search takes up around 80% of the time.
So I found the List.BinarySearch-Methode, but my problem is that the lines in the cardList look like this:
Brindle_Boar_(Magic_2012).c1p247924.prod
But I have no way to generate the c1p... , which is a problem cause the List.BinarySearch only finds exact matches.
How do I modify List.BinarySearch so that it finds a match if only a part of the string matches?
e. g.
searching for Brindle_Boar_(Magic_2012) should return the position of Brindle_Boar_(Magic_2012).c1p247924.prod

List.BinarySearch will return the ones complement of the index of the next item larger than the request if an exact match is not found.
So, you can do it like this (assuming you'll never get an exact match):
var key = (cardName + Config.EditionShortToLong(edition)).ToLower();
var list = CardBase.cardList;
var index = ~list.BinarySearch(key);
return index != list.Count && list[index].StartsWith(key);

BinarySearch() has an overload that takes an IComparer<T> has second parameter, implement a custom comparer and return 0 when you have a match within the string - you can use the same IndexOf() method there.
Edit:
Does a binary search make sense in your scenario? How do you determine that a certain item is "less" or "greater" than another item? Right now you only provide what would constitute a match. Only if you can answer this question, binary search applies in the first place.

You can take a look at the C5 Generic Collection Library (you can install it via NuGet also).
Use the SortedArray(T) type for your collection. It provides a handful of methods that could prove useful. You can even query for ranges of items very efficiently.
var data = new SortedArray<string>();
// query for first string greater than "Brindle_Boar_(Magic_2012)" an check if it starts
// with "Brindle_Boar_(Magic_2012)"
var a = data.RangeFrom("Brindle_Boar_(Magic_2012)").FirstOrDefault();
return a.StartsWith("Brindle_Boar_(Magic_2012)");
// query for first 5 items that start with "Brindle_Boar"
var b = data.RangeFrom("string").Take(5).Where(s => s.StartsWith("Brindle_Boar"));
// query for all items that start with "Brindle_Boar" (provided only ascii chars)
var c = data.RangeFromTo("Brindle_Boar", "Brindle_Boar~").ToList()
// query for all items that start with "Brindle_Boar", iterates until first non-match
var d = data.RangeFrom("Brindle_Boar").TakeWhile(s => s.StartsWith("Brindle_Boar"));
The RageFrom... methods perform a binary search, find the first element greater than or equal to your argument, that returns an iterator from that position

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Comparing integer numbers predicate to LINQ - c#

Related

FsCheck: How to generate test data that depends on other test data?

C# converting a list of generated numbers into an int array

Handling null in Linq

Parallel For Loop

c# - BinarySearch StringList with wildcard

Categories

Resources