FsCheck has some neat default Arbitrary types to generate test data. However what if one of my test dates depends on another?
For instance, consider the property of string.Substring() that a resulting substring can never be longer than the input string:
[Fact]
public void SubstringIsNeverLongerThanInputString()
{
Prop.ForAll(
Arb.Default.NonEmptyString(),
Arb.Default.PositiveInt(),
(input, length) => input.Get.Substring(0, length.Get).Length <= input.Get.Length
).QuickCheckThrowOnFailure();
}
Although the implementation of Substring certainly is correct, this property fails, because eventually a PositiveInt will be generated that is longer than the genereated NonEmptyString resulting in an exception.
Shrunk: NonEmptyString "a" PositiveInt 2 with exception: System.ArgumentOutOfRangeException: Index and length must refer to a location within the string.
I could guard the comparison with an if (input.Length < length) return true; but that way I end up with lots of test runs were the property isn't even checked.
How do I tell FsCheck to only generate PositiveInts that don't exceed the input string? I presume I have to use the Gen<T> class, but it's interface is just hella confusing to me... I tried the following but still got PositiveInts exceeding the string:
var inputs = Arb.Default.NonEmptyString();
// I have no idea what I'm doing here...
var lengths = inputs.Generator.Select(s => s.Get.Length).ToArbitrary();
Prop.ForAll(
inputs,
lengths,
(input, length) => input.Get.Substring(0, length).Length <= input.Get.Length
).QuickCheckThrowOnFailure();
You can create generators which depend on values generated from another using SelectMany. This also allows you to use the LINQ query syntax e.g.
var gen = from s in Arb.Generate<NonEmptyString>()
from i in Gen.Choose(0, s.Get.Length - 1)
select Tuple.Create(s, i);
var p = Prop.ForAll(Arb.From(gen), t =>
{
var s = t.Item1.Get;
var len = t.Item2;
return s.Substring(0, len).Length <= s.Length;
});
Check.Quick(p);
Related
I have this simple query where I need to identify all tickets within start and end number of a specific TicketBook object on api side in EF Core.
var ticketBook = await Context.TicketBooks.FirstOrDefaultAsync(x=>x.Id == query.TicketBookId);
if (ticketBook != null)
{
dbTickets = dbTickets.Where(x => ConvertTicketNumberToInt(x, ticketBook));
}
private bool ConvertTicketNumberToInt(Ticket t, TicketBook tb)
{
try
{
var numberOnly = new string(t.Number.Where(t => char.IsDigit(t)).ToArray());
var tNumber = Convert.ToInt64(numberOnly);
return tNumber >= tb.StartIntNumber && tNumber <= tb.EndIntNumber;
}
catch(OverflowException)
{
return false;
}
}
the problem is the "Number" property in Ticket class is nvarchar (string) but I need to convert it into integer for this particular query only and for that I have written a small method which does it for me. But as you can see its very time consuming and not efficient at all so my api call just times out.
I am trying to figure out how to do this in LINQ without writing extra methods like this. The trick is that "number" property can sometimes can have a few alphabets in it which throws exception while converting it to integer so I need to remove those non digit characters before the comparison that's why I had to write this dedicated method for it.
As already mentioned, you are facing some performance issues storing nvarchar instead of long.
Anyway, what you're doing in your code is not that bad - you have fairly simple method for the job which keeps your LINQ code clean and tidy. But since you want to have a single LINQ query, try the following (it can be done shorter but I've chosen this way for readability):
var ticketBook = await Context.TicketBooks.FirstOrDefaultAsync(x=>x.Id == query.TicketBookId);
if (ticketBook != null)
{
dbTickets = dbTickets
.Select(t => new { Ticket = t, Number = new string(t.Number.Where(n => char.IsDigit(n)).ToArray()) })
.Select(t =>
{
long ticketNumber = long.MinValue;
long.TryParse(t.Number), out ticketNumber);
return new { Ticket = t, Number = ticketNumber };
})
.Where(t => t.Ticket >= ticketBook.StartIntNumber && t.Ticket <= ticketBook.EndIntNumber)
.Select(t => t.Ticket);
}
What it does:
in first pass all your varchars are stripped of the letters and converted to strings containing only the digits, then an anonymous type with the complete Ticket class is returned along with this string
the strings are parsed to long - I've abused long.MinValue to indicate a failed conversion (since you're using char.IsDigit(c) I see you're not expect any negative values in your results. You might as well use ulong for twice the positive range and abuse 0 value) and again, an anonymous type is returned
those anonymous structures are filtered with the condition you provided
finally, only the original Ticket structure is returned
If you're concerned about the number of passes over the initial results - I've run several performance tests to find out whether having a number of Selects with short operations inside is slower than having one pass with an elaborate operation and I haven't observed any significant difference.
Your best bet is to do most of the conversion in the database.
If you have access to the context, you can do this:
dbTickets = Context.Tickets
.FromSqlRaw("SELECT * FROM Tickets WHERE CAST(CASE WHEN PATINDEX('%[^0-9]%',Number) = 0 THEN Number ELSE LEFT(Number,PATINDEX('%[^0-9]%',Number)-1) END as int) BETWEEN {0} AND {1}", ticketBook.StartIntNumber, ticketBook.EndIntNumber)
.ToList();
This will strip off any trailing letters from the Number column and convert it to an int, then use that to make sure it is between your StartIntNumber and EndIntNumber.
That said, I would highly suggest you add an additional column into your tickets table that uses a derivative of the above to calculate an integer and then make the column a persistent calculated column. Then you can index on that column. Very little (if ANY) should need to be changed in your code if you do this, and the performance benefit will be huge.
This is based on your comment that said sometimes Number has additional letters at the end, like 123A. The above would need to be modified if Number can have letters at the start or in the middle like A123 or 1A23. Currently, it would treat A123 as 0, and 1A23 as 1.
I'm still getting acquainted with delegates and lambdas, I'm not using LINQ, and I also only discovered the ConvertAll function today, so I'm experimenting and asking this to bolster my understanding.
The task I had was to establish whether a string of numbers were even or odd. So first, convert the string list to an int list, and from there into a bool list. As bloated as the code would be, I wondered if I could get it all on one line and reduce the need for an extra for loop.
string numbers = "2 4 7 8 10";
List<bool> evenBools = new List<bool>(Array.ConvertAll(numbers.Split(' '), (x = Convert.Int32) => x % 2 == 0))
The expected result is [true, true, false, true, true]. Obviously the code doesn't work.
I understand that the second argument of Array.ConvertAll) requires the conversation to take place. From string to int, that's simply Convert.ToInt32. Is it possible to do that on the fly though (i.e. on the left side of the lambda expression), so that I can get on with the bool conversion and return on the right?
Second parameter of ConvertAll method is Converter<TInput,TOutput> delegate. You can use anonymous method here, pass string input to it (left side of =>), parse to int inside and return bool value, indicating whether it's even or odd
var boolArray = Array.ConvertAll(numbers.Split(' '), input => Convert.ToInt32(input) % 2 == 0);
List<bool> evenBools = new List<bool>(boolArray);
Finally create a List<bool> from array.
Left side of lambda expressions contains input parameters only (input string in your case). Right side is used for expressions and statements, here you have a logic for parsing and return a bool value.
With help of System.Linq it can be written on the similar way
var boolArray = numbers.Split(' ').Select(input => Convert.ToInt32(input) % 2 == 0).ToArray();
You also might use int.Parse instead of Convert.ToInt32
var boolList = numbers.Split(' ').Select(Int32.Parse).ToList().ConvertAll(i => i % 2 == 0);
I have a case where I have the name of an object, and a bunch of file names. I need to match the correct file name with the object. The file name can contain numbers and words, separated by either hyphen(-) or underscore(_). I have no control of either file name or object name. For example:
10-11-12_001_002_003_13001_13002_this_is_an_example.svg
The object name in this case is just a string, representing an number
10001
I need to return true or false if the file name is a match for the object name. The different segments of the file name can match on their own, or any combination of two segments. In the example above, it should be true for the following cases (not every true case, just examples):
10001
10002
10003
11001
11002
11003
12001
12002
12003
13001
13002
And, we should return false for this case (among others):
13003
What I've come up with so far is this:
public bool IsMatch(string filename, string objectname)
{
var namesegments = GetNameSegments(filename);
var match = namesegments.Contains(objectname);
return match;
}
public static List<string> GetNameSegments(string filename)
{
var segments = filename.Split('_', '-').ToList();
var newSegments = new List<string>();
foreach (var segment in segments)
{
foreach (var segment2 in segments)
{
if (segment == segment2)
continue;
var newToken = segment + segment2;
newSegments.Add(newToken);
}
}
return segments.Concat(newSegments).ToList();
}
One or two segments combined can make a match, and that is enought. Three or more segments combined should not be considered.
This does work so far, but is there a better way to do it, perhaps without nesting foreach loops?
First: don't change debugged, working, sufficiently efficient code for no reason. Your solution looks good.
However, we can make some improvements to your solution.
public static List<string> GetNameSegments(string filename)
Making the output a list puts restrictions on the implementation that are not required by the caller. It should be IEnumerable<String>. Particularly since the caller in this case only cares about the first match.
var segments = filename.Split('_', '-').ToList();
Why ToList? A list is array-backed. You've already got an array in hand. Just use the array.
Since there is no longer a need to build up a list, we can transform your two-loop solution into an iterator block:
public static IEnumerable<string> GetNameSegments(string filename)
{
var segments = filename.Split('_', '-');
foreach (var segment in segments)
yield return segment;
foreach (var s1 in segments)
foreach (var s2 in segments)
if (s1 != s2)
yield return s1 + s2;
}
Much nicer. Alternatively we could notice that this has the structure of a query and simply return the query:
public static IEnumerable<string> GetNameSegments(string filename)
{
var q1= filename.Split('_', '-');
var q2 = from s1 in q1
from s2 in q1
where s1 != s2
select s1 + s2;
return q1.Concat(q2);
}
Again, much nicer in this form.
Now let's talk about efficiency. As is often the case, we can achieve greater efficiency at a cost of increased complication. This code looks like it should be plenty fast enough. Your example has nine segments. Let's suppose that nine or ten is typical. Our solutions thus far consider the ten or so singletons first, and then the hundred or so combinations. That's nothing; this code is probably fine. But what if we had thousands of segments and were considering millions of possibilities?
In that case we should restructure the algorithm. One possibility would be this general solution:
public bool IsMatch(HashSet<string> segments, string name)
{
if (segments.Contains(name))
return true;
var q = from s1 in segments
where name.StartsWith(s1)
let s2 = name.Substring(s1.Length)
where s1 != s2
where segments.Contains(s2)
select 1; // Dummy. All we care about is if there is one.
return q.Any();
}
Your original solution is quadratic in the number of segments. This one is linear; we rely on the constant order contains operation. (This assumes of course that string operations are constant time because strings are short. If that's not true then we have a whole other kettle of fish to fry.)
How else could we extract wins in the asymptotic case?
If we happened to have the property that the collection was not a hash set but rather a sorted list then we could do even better; we could binary search the list to find the start and end of the range of possible prefix matches, and then pour the list into a hashset to do the suffix matches. That's still linear, but could have a smaller constant factor.
If we happened to know that the target string was small compared to the number of segments, we could attack the problem from the other end. Generate all possible combinations of partitions of the target string and check if both halves are in the segment set. The problem with this solution is that it is quadratic in memory usage in the size of the string. So what we'd want to do there is construct a special hash on character sequences and use that to populate the hash table, rather than the standard string hash. I'm sure you can see how the solution would go from there; I shan't spell out the details.
Efficiency is very much dependent on the business problem that you're attempting to solve. Without knowing the full context/usage it's difficult to define the most efficient solution. What works for one situation won't always work for others.
I would always advocate to write working code and then solve any performance issues later down the line (or throw more tin at the problem as it's usually cheaper!) If you're having specific performance issues then please do tell us more...
I'm going to go out on a limb here and say (hope) that you're only going to be matching the filename against the object name once per execution. If that's the case I reckon this approach will be just about the fastest. In a circumstance where you're matching a single filename against multiple object names then the obvious choice is to build up an index of sorts and match against that as you were already doing, although I'd consider different types of collection depending on your expected execution/usage.
public static bool IsMatch(string filename, string objectName)
{
var segments = filename.Split('-', '_');
for (int i = 0; i < segments.Length; i++)
{
if (string.Equals(segments[i], objectName)) return true;
for (int ii = 0; ii < segments.Length; ii++)
{
if (ii == i) continue;
if (string.Equals($"{segments[i]}{segments[ii]}", objectName)) return true;
}
}
return false;
}
If you are willing to use the MoreLINQ NuGet package then this may be worth considering:
public static HashSet<string> GetNameSegments(string filename)
{
var segments = filename.Split(new char[] {'_', '-'}, StringSplitOptions.RemoveEmptyEntries).ToList();
var matches = segments
.Cartesian(segments, (x, y) => x == y ? null : x + y)
.Where(z => z != null)
.Concat(segments);
return new HashSet<string>(matches);
}
StringSplitOptions.RemoveEmptyEntries handles adjacent separators (e.g. --). Cartesian is roughly equivalent to your existing nested for loops. The Where is to remove null entries (i.e. if x == y). Concat is the same as your existing Concat. The use of HashSet allows for your Contains calls (in IsMatch) to be faster.
I have a sorted StringList and wanted to replace
foreach (string line3 in CardBase.cardList)
if (line3.ToLower().IndexOf((cardName + Config.EditionShortToLong(edition)).ToLower()) >= 0)
{
return true;
}
with a binarySearch, since the cardList ist rather large(~18k) and this search takes up around 80% of the time.
So I found the List.BinarySearch-Methode, but my problem is that the lines in the cardList look like this:
Brindle_Boar_(Magic_2012).c1p247924.prod
But I have no way to generate the c1p... , which is a problem cause the List.BinarySearch only finds exact matches.
How do I modify List.BinarySearch so that it finds a match if only a part of the string matches?
e. g.
searching for Brindle_Boar_(Magic_2012) should return the position of Brindle_Boar_(Magic_2012).c1p247924.prod
List.BinarySearch will return the ones complement of the index of the next item larger than the request if an exact match is not found.
So, you can do it like this (assuming you'll never get an exact match):
var key = (cardName + Config.EditionShortToLong(edition)).ToLower();
var list = CardBase.cardList;
var index = ~list.BinarySearch(key);
return index != list.Count && list[index].StartsWith(key);
BinarySearch() has an overload that takes an IComparer<T> has second parameter, implement a custom comparer and return 0 when you have a match within the string - you can use the same IndexOf() method there.
Edit:
Does a binary search make sense in your scenario? How do you determine that a certain item is "less" or "greater" than another item? Right now you only provide what would constitute a match. Only if you can answer this question, binary search applies in the first place.
You can take a look at the C5 Generic Collection Library (you can install it via NuGet also).
Use the SortedArray(T) type for your collection. It provides a handful of methods that could prove useful. You can even query for ranges of items very efficiently.
var data = new SortedArray<string>();
// query for first string greater than "Brindle_Boar_(Magic_2012)" an check if it starts
// with "Brindle_Boar_(Magic_2012)"
var a = data.RangeFrom("Brindle_Boar_(Magic_2012)").FirstOrDefault();
return a.StartsWith("Brindle_Boar_(Magic_2012)");
// query for first 5 items that start with "Brindle_Boar"
var b = data.RangeFrom("string").Take(5).Where(s => s.StartsWith("Brindle_Boar"));
// query for all items that start with "Brindle_Boar" (provided only ascii chars)
var c = data.RangeFromTo("Brindle_Boar", "Brindle_Boar~").ToList()
// query for all items that start with "Brindle_Boar", iterates until first non-match
var d = data.RangeFrom("Brindle_Boar").TakeWhile(s => s.StartsWith("Brindle_Boar"));
The RageFrom... methods perform a binary search, find the first element greater than or equal to your argument, that returns an iterator from that position
I have a simple method that returns an array of the letters of the alphabet.
public char[] GetLettersOfTheAlphabet() {
char[] result = new char[26];
int index = 0;
for (int i = 65; i < 91; i++) {
result[index] = Convert.ToChar(i);
index += 1;
}
return result;
}
I tried to unit test the code with
[TestMethod()]
public void GetLettersOfTheAlphabetTest_Pass() {
HomePage.Rules.BrokerAccess.TridionCategories target = new HomePage.Rules.BrokerAccess.TridionCategories();
char[] expected = new char[] { char.Parse("A"),
char.Parse("B"),
char.Parse("C"),
char.Parse("D"),
char.Parse("E"),
char.Parse("F"),
char.Parse("G"),
char.Parse("H"),
char.Parse("I"),
char.Parse("J"),
char.Parse("k"),
char.Parse("L"),
char.Parse("M"),
char.Parse("N"),
char.Parse("O"),
char.Parse("P"),
char.Parse("Q"),
char.Parse("R"),
char.Parse("S"),
char.Parse("T"),
char.Parse("U"),
char.Parse("V"),
char.Parse("W"),
char.Parse("X"),
char.Parse("Y"),
char.Parse("Z")};
char[] actual;
actual = target.GetLettersOfTheAlphabet();
Assert.AreEqual(expected, actual);
}
It seems that it comes down the compare function that the array uses. How do you handle this case?
If you're using .NET 3.5, you can also use SequenceEqual():
[TestMethod()]
public void GetLettersOfTheAlphabetTest_Pass() {
var target = new HomePage.Rules.BrokerAccess.TridionCategories();
var expected = new[] { 'A','B','C', ... };
var actual = target.GetLettersOfTheAlphabet();
Assert.IsTrue(expected.SequenceEqual(actual));
}
CollectionAssert.AreEquivalent(expected, actual) works.
I am not exactly sure how it works but it does work for my needs. I think it checks each item in each item in the expected collection exsits in the actual column, it also checks that they are the same count, but order does not matter.
Just iterate through the array and compare each item. Index of 4 in each array should be identical.
Another way is to check the result array if it contains say 'A', 'G', 'K' or random letters.
Something I have learned from Unit Tests sometimes you have to either copy code or do more work because that is the nature of testing. Basically what works for a unit test might not be acceptable for a production site.
You could also do a test on the size and a few spot checks of values in the array....
Assert.AreEqual(expected.Length, actual.Length)
Assert.AreEqual(expected[0], actual[0])
....
Your K is lowercase in the expected array, but I believe that code generates uppercase characters. If the comparison is case sensitive, it would cause the failure when the two arrays don't match.
First of all:
Assert.AreEqual(expected.Length,actual.Length);
Then:
for(int i = 0; i < actual.Length; i++)
{
//Since you are testing whether the alphabet is correct, then is ok that the characters
//have different case.
Assert.AreEqual(true,actual[i].toString().Equals(expected[i].toString(),StringComparison.OrdinalIgnoreCase));
}