c# - BinarySearch StringList with wildcard

c# - BinarySearch StringList with wildcard - c#

I have a sorted StringList and wanted to replace
foreach (string line3 in CardBase.cardList)
if (line3.ToLower().IndexOf((cardName + Config.EditionShortToLong(edition)).ToLower()) >= 0)
{
return true;
}
with a binarySearch, since the cardList ist rather large(~18k) and this search takes up around 80% of the time.
So I found the List.BinarySearch-Methode, but my problem is that the lines in the cardList look like this:
Brindle_Boar_(Magic_2012).c1p247924.prod
But I have no way to generate the c1p... , which is a problem cause the List.BinarySearch only finds exact matches.
How do I modify List.BinarySearch so that it finds a match if only a part of the string matches?
e. g.
searching for Brindle_Boar_(Magic_2012) should return the position of Brindle_Boar_(Magic_2012).c1p247924.prod

List.BinarySearch will return the ones complement of the index of the next item larger than the request if an exact match is not found.
So, you can do it like this (assuming you'll never get an exact match):
var key = (cardName + Config.EditionShortToLong(edition)).ToLower();
var list = CardBase.cardList;
var index = ~list.BinarySearch(key);
return index != list.Count && list[index].StartsWith(key);

BinarySearch() has an overload that takes an IComparer<T> has second parameter, implement a custom comparer and return 0 when you have a match within the string - you can use the same IndexOf() method there.
Edit:
Does a binary search make sense in your scenario? How do you determine that a certain item is "less" or "greater" than another item? Right now you only provide what would constitute a match. Only if you can answer this question, binary search applies in the first place.

You can take a look at the C5 Generic Collection Library (you can install it via NuGet also).
Use the SortedArray(T) type for your collection. It provides a handful of methods that could prove useful. You can even query for ranges of items very efficiently.
var data = new SortedArray<string>();
// query for first string greater than "Brindle_Boar_(Magic_2012)" an check if it starts
// with "Brindle_Boar_(Magic_2012)"
var a = data.RangeFrom("Brindle_Boar_(Magic_2012)").FirstOrDefault();
return a.StartsWith("Brindle_Boar_(Magic_2012)");
// query for first 5 items that start with "Brindle_Boar"
var b = data.RangeFrom("string").Take(5).Where(s => s.StartsWith("Brindle_Boar"));
// query for all items that start with "Brindle_Boar" (provided only ascii chars)
var c = data.RangeFromTo("Brindle_Boar", "Brindle_Boar~").ToList()
// query for all items that start with "Brindle_Boar", iterates until first non-match
var d = data.RangeFrom("Brindle_Boar").TakeWhile(s => s.StartsWith("Brindle_Boar"));
The RageFrom... methods perform a binary search, find the first element greater than or equal to your argument, that returns an iterator from that position

Related

Replace a string placeholder with consecutive elements from a list in c#

i know similar questions have been asked, but I couldn't find anything specifically fitting my need and I am supremely ignorant about Regex.
I have sentences of varying length like this one:
Provides a +$modifier% bonus to Maximum Quality and a +$modifier% chance for Special Traits when developing a Recipe.
so that $modifier is my placeholder for all of them. I have a list of floats that I will then replace accordingly to the order.
In this case I have a List values {5,0.5}. The replaced string should end up as
Provides a +5% bonus to Maximum Quality and a +0.5% chance for Special Traits when developing a Recipe.
I would like to avoid string.Replace as texts might get longer and i wouldn't like to loop multiple time over it. Could anyone suggest a good approach to do it?
Cheers and thanks
H

The method Regex.Replace has an overload where you can specify a callback method to provide the value to use for each replacement.
private string ReplaceWithList(string source, string placeHolder, IEnumerable<object> list)
{
// Escape placeholder so that it is a valid regular expression
placeHolder = Regex.Escape(placeHolder);
// Get enumerator for list
var enumerator = list.GetEnumerator();
// Use Regex engine to replace all occurences of placeholder
// with next entry from enumerator
string result = Regex.Replace(source, placeHolder, (m) =>
{
enumerator.MoveNext();
return enumerator.Current?.ToString();
});
return result;
}
Use like that:
string s = "Provides a +$modifier% bonus to Maximum Quality and a +$modifier% chance for Special Traits when developing a Recipe.";
List<object> list = new List<object> { 5, 0.5 };
s = ReplaceWithList(s, "$modifier", list);
Note that you need to add sensible error handling.

As strings are immutable in the CLR there's probably no sensible way around splitting your string and putting it back together in some way.
One would be to split at your desired marker string, insert your replacement values and afterwards concatenate your parts again:
var s = "Provides a +$modifier % bonus to Maximum Quality and a +$modifier % chance for Special Traits when developing a Recipe.";
var v = new List<float> { 5.0f, 0.5f };
var result = string.Concat(s.Split("$modifier").Select((s, i) => $"{s}{(i < v.Count ? v[i] : string.Empty)}"));

How to combine items in List<string> to make new items efficiently

I have a case where I have the name of an object, and a bunch of file names. I need to match the correct file name with the object. The file name can contain numbers and words, separated by either hyphen(-) or underscore(_). I have no control of either file name or object name. For example:
10-11-12_001_002_003_13001_13002_this_is_an_example.svg
The object name in this case is just a string, representing an number
10001
I need to return true or false if the file name is a match for the object name. The different segments of the file name can match on their own, or any combination of two segments. In the example above, it should be true for the following cases (not every true case, just examples):
10001
10002
10003
11001
11002
11003
12001
12002
12003
13001
13002
And, we should return false for this case (among others):
13003
What I've come up with so far is this:
public bool IsMatch(string filename, string objectname)
{
var namesegments = GetNameSegments(filename);
var match = namesegments.Contains(objectname);
return match;
}
public static List<string> GetNameSegments(string filename)
{
var segments = filename.Split('_', '-').ToList();
var newSegments = new List<string>();
foreach (var segment in segments)
{
foreach (var segment2 in segments)
{
if (segment == segment2)
continue;
var newToken = segment + segment2;
newSegments.Add(newToken);
}
}
return segments.Concat(newSegments).ToList();
}
One or two segments combined can make a match, and that is enought. Three or more segments combined should not be considered.
This does work so far, but is there a better way to do it, perhaps without nesting foreach loops?

First: don't change debugged, working, sufficiently efficient code for no reason. Your solution looks good.
However, we can make some improvements to your solution.
public static List<string> GetNameSegments(string filename)
Making the output a list puts restrictions on the implementation that are not required by the caller. It should be IEnumerable<String>. Particularly since the caller in this case only cares about the first match.
var segments = filename.Split('_', '-').ToList();
Why ToList? A list is array-backed. You've already got an array in hand. Just use the array.
Since there is no longer a need to build up a list, we can transform your two-loop solution into an iterator block:
public static IEnumerable<string> GetNameSegments(string filename)
{
var segments = filename.Split('_', '-');
foreach (var segment in segments)
yield return segment;
foreach (var s1 in segments)
foreach (var s2 in segments)
if (s1 != s2)
yield return s1 + s2;
}
Much nicer. Alternatively we could notice that this has the structure of a query and simply return the query:
public static IEnumerable<string> GetNameSegments(string filename)
{
var q1= filename.Split('_', '-');
var q2 = from s1 in q1
from s2 in q1
where s1 != s2
select s1 + s2;
return q1.Concat(q2);
}
Again, much nicer in this form.
Now let's talk about efficiency. As is often the case, we can achieve greater efficiency at a cost of increased complication. This code looks like it should be plenty fast enough. Your example has nine segments. Let's suppose that nine or ten is typical. Our solutions thus far consider the ten or so singletons first, and then the hundred or so combinations. That's nothing; this code is probably fine. But what if we had thousands of segments and were considering millions of possibilities?
In that case we should restructure the algorithm. One possibility would be this general solution:
public bool IsMatch(HashSet<string> segments, string name)
{
if (segments.Contains(name))
return true;
var q = from s1 in segments
where name.StartsWith(s1)
let s2 = name.Substring(s1.Length)
where s1 != s2
where segments.Contains(s2)
select 1; // Dummy. All we care about is if there is one.
return q.Any();
}
Your original solution is quadratic in the number of segments. This one is linear; we rely on the constant order contains operation. (This assumes of course that string operations are constant time because strings are short. If that's not true then we have a whole other kettle of fish to fry.)
How else could we extract wins in the asymptotic case?
If we happened to have the property that the collection was not a hash set but rather a sorted list then we could do even better; we could binary search the list to find the start and end of the range of possible prefix matches, and then pour the list into a hashset to do the suffix matches. That's still linear, but could have a smaller constant factor.
If we happened to know that the target string was small compared to the number of segments, we could attack the problem from the other end. Generate all possible combinations of partitions of the target string and check if both halves are in the segment set. The problem with this solution is that it is quadratic in memory usage in the size of the string. So what we'd want to do there is construct a special hash on character sequences and use that to populate the hash table, rather than the standard string hash. I'm sure you can see how the solution would go from there; I shan't spell out the details.

Efficiency is very much dependent on the business problem that you're attempting to solve. Without knowing the full context/usage it's difficult to define the most efficient solution. What works for one situation won't always work for others.
I would always advocate to write working code and then solve any performance issues later down the line (or throw more tin at the problem as it's usually cheaper!) If you're having specific performance issues then please do tell us more...
I'm going to go out on a limb here and say (hope) that you're only going to be matching the filename against the object name once per execution. If that's the case I reckon this approach will be just about the fastest. In a circumstance where you're matching a single filename against multiple object names then the obvious choice is to build up an index of sorts and match against that as you were already doing, although I'd consider different types of collection depending on your expected execution/usage.
public static bool IsMatch(string filename, string objectName)
{
var segments = filename.Split('-', '_');
for (int i = 0; i < segments.Length; i++)
{
if (string.Equals(segments[i], objectName)) return true;
for (int ii = 0; ii < segments.Length; ii++)
{
if (ii == i) continue;
if (string.Equals($"{segments[i]}{segments[ii]}", objectName)) return true;
}
}
return false;
}

If you are willing to use the MoreLINQ NuGet package then this may be worth considering:
public static HashSet<string> GetNameSegments(string filename)
{
var segments = filename.Split(new char[] {'_', '-'}, StringSplitOptions.RemoveEmptyEntries).ToList();
var matches = segments
.Cartesian(segments, (x, y) => x == y ? null : x + y)
.Where(z => z != null)
.Concat(segments);
return new HashSet<string>(matches);
}
StringSplitOptions.RemoveEmptyEntries handles adjacent separators (e.g. --). Cartesian is roughly equivalent to your existing nested for loops. The Where is to remove null entries (i.e. if x == y). Concat is the same as your existing Concat. The use of HashSet allows for your Contains calls (in IsMatch) to be faster.

How to get the length of an array with out empty value?

I'm now doing a project about solving a Magic cube problem. I want to create an array to remember the steps like this:
char[] Steps = new char[200];
Each time I do the 'F','B','R','L','U','D' turn method, it will add a 'F','B','R','L','U','D' character in the array.
But when I want to get the length of the steps, it always shows 200.
for example:
char[] steps = new char[5];
and now I've already added 3 steps:
steps[] = {'f','b','f','',''};
How can I get the length '3'?
Or is there any alternative method I can use that I don't need to set the length at the beginning?

you can just use List<char> but if performance is really critical in your sceanario you can just initialize the initial capacity
something like the following
List<char> list = new List<char>(200);
list.Add('c');
list.Add('b');
here count will return just what you have really added
var c = list.Count;
note in list you can apply Linq Count() or just use the Count property which does not need to compute like Linq and return the result immediately

You will get compilation error on this line
steps[] = {'f','b','f','',''};
As you cannot use empty char and you need to write steps instead of steps[].
I will suggest you to use string array instead and using LINQ get count of not empty elements in this way:
string [] steps = {"f","b","f","",""};
Console.WriteLine(steps.Where(x=>!string.IsNullOrEmpty(x)).Count());

To count non-empty items using System.Linq:
steps.Count(x => x != '\0');
Your code doesn't compile since '' isn't allowed as a char, but I'm assuming that you mean empty elements in a char array which are actually represented by '\0' or the Unicode Null. So the above condition simply counts the non null items in your array.

you could use a list of character that would make things a lot simpler like this :
List<char> steps = new List<char>();
and just add a line to the list for each steps :
char move = 'F';
steps.add(move);
finally then you can count the number of move in the list easily
int numberofmove = steps.count();

Get Second To Last Element For SortedDictionary

I have a sorted dictionary that looks like such:
SortedDictionary<DateTime, string> mySortedDictionary = GetDataSource();
To get the last element, I noticed that I am able to do this:
DateTime last = Convert.ToDateTime(mySortedDictionary.Keys.Last());
Is there any way to get the second-to-last item? The way that I am currently thinking of involves getting the last item and then calculating what the second to last item would be. My DateTime keys all have a set pattern, however, it is not guaranteed that I know them exactly.

dictionary.Keys.Reverse().Skip(1).FirstOrDefault()
This will take O(n) time, but I as far as I can tell there seems to be no fast solution.

Using linq you can skip all items until the second to last and take the first one (but first check if the dictionary has at least 2 elements):
var secondToLast = mySortedDictionary.Skip(mySortedDictionary.Count - 2).First();

You can use this method to get the second to last item. Note that it needs to iterate the entire sequence of keys to get it, so it will not be efficient. Also note that I've mostly ignored the cases of a 0 or 1 item sequence; you can check for it and throw, or do something else, if you don't want to be given the default value.
public static T SecondToLast<T>(this IEnumerable<T> source)
{
T previous = default(T);
T current = default(T);
foreach (var item in source)
{
previous = current;
current = item;
}
return previous;
}
To use it:
DateTime secondToLast = mySortedDictionary.Keys.SecondToLast();

Can you store the keys reversed? In that case you can just use mySortedDictionary.Skip(1).FirstOrDefault().
You can reverse the key sort order by specifying a (simple) custom IComparer in the constructor.

C# Determing whether any element in a string array contains a given string anywhere

I have a string array:
string[] Animals = {"Cat", "Dog", "Fish"};
I then want to determine which element contains the sequence "is" and return that entire element; in this case "fish"
If I want to find "gh", it does not exist in the list, so it should return the first element, in this case "Cat"
I've tried this linq code, but i don't think I'm doing the lambda part right.
int index = Animals.Where(x => x.IndexOf("is") >= 0).First().IndexOf("is")
string result = index > 0 ? Animals[index] : Animals[0];
This code throws this error:
Exception Details: System.ArgumentNullException: Value cannot be null.
Parameter name: value
I think I'm close, I just can't seem to get it.
This method obviously isn't fool proof, it should return the first instance of "is" which could be problematic. My potential list is fairly small and the index word is always unique.

Try this:
string result = Animals.FirstOrDefault(x => x.Contains("is")) ?? Animals.First();
(This will fail if the array contains no elements; what do you want to do in this case? You could try FirstOrDefault for the fallback expression as well - this will return null if the sequence is empty.)
Given your requirements, the code you posted has 2 issues:
It uses Enumerable.First, which will throw an exception on an empty sequence i.e. if no item exists that matches the original predicate.
The index you are using in the the second statement is the index of the "is" substring in the result of the first query, not the index of the result in the original array. Consequently, it does not make sense to use that number to index the original array.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# - BinarySearch StringList with wildcard - c#

Related

Replace a string placeholder with consecutive elements from a list in c#

How to combine items in List<string> to make new items efficiently

How to get the length of an array with out empty value?

Get Second To Last Element For SortedDictionary

C# Determing whether any element in a string array contains a given string anywhere

Categories

Resources