I'm looking for a simple way to discern if a string contains any part of another string (be that regex, built in function I don't know about, etc...). For Example:
string a = "unicorn";
string b = "cornholio";
string c = "ornament";
string d = "elephant";
if (a <comparison> b)
{
// match found ("corn" from 'unicorn' matched "corn" from 'cornholio')
}
if (a <comparison> c)
{
// match found ("orn" from 'unicorn' matched "orn" from 'ornament')
}
if (a <comparison> d)
{
// this will not match
}
something like if (a.ContainsAnyPartOf(b)) would be too much to hope for.
Also, I only have access to .NET 2.0.
Thanks in advance!
This method should work. You'll want to specify a minimum length for the "part" that might match. I'd assume you'd want to look for something of at least 2, but with this you can set it as high or low as you want. Note: error checking not included.
public static bool ContainsPartOf(string s1, string s2, int minsize)
{
for (int i = 0; i <= s2.Length - minsize; i++)
{
if (s1.Contains(s2.Substring(i, minsize)))
return true;
}
return false;
}
I think you're looking for this implementation of longest common substring?
Your best bet, according to my understanding of the question, is to compute the Levenshtein (or related values) distance and compare that against a threshold.
Your requirements are a little vague.
You need to define a minimum length for the match...but implementing an algorithm shouldn't be too difficult when you figure that part out.
I'd suggest breaking down the string into character arrays and then using tail recursion to find matches for the parts.
Related
I am aware this question as been asked. And I am not really looking for a function to do so. I was hoping to get some tips on making a little method I made better. Basically, take a long string, and search for a smaller string inside of it. I am aware that there is literally always a million ways to do things better, and that is what brought me here.
Please take a look at the code snippet, and let me know what you think. No, its not very complex, yes it does work for my needs, but I am more interested in learning where the pain points would be using this for something I would assume it would work for, but would not for such and such reason. I hope that makes sense. But to give this question a way to be answered for SO, is this a strong way to perform this task (I somewhat know the answer :) )
Super interested in constructive criticism, not just in "that's bad". I implore you do elaborate on such a thought so I can get the most out of the responses.
public static Boolean FindTextInString(string strTextToSearch, string strTextToLookFor)
{
//put the string to search into lower case
string strTextToSearchLower = strTextToSearch.ToLower();
//put the text to look for to lower case
string strTextToLookForLower = strTextToLookFor.ToLower();
//get the length of both of the strings
int intTextToLookForLength = strTextToLookForLower.Length;
int intTextToSearch = strTextToSearchLower.Length;
//loop through the division amount so we can check each part of the search text
for(int i = 0; i < intTextToSearch; i++)
{
//substring at multiple positions and see if it can be found
if (strTextToSearchLower.Substring(i,intTextToLookForLength) == strTextToLookForLower)
{
//return true if we found a matching string within the search in text
return true;
}
}
//otherwise we will return false
return false;
}
If you only care about finding a substring inside a string, just use String.Contains()
Example:
string string_to_search = "the cat jumped onto the table";
string string_to_find = "jumped onto";
return string_to_search.ToLower().Contains(string_to_find.ToLower());
You can reuse VB's Like operator this way:
1) Make a reference to Microsoft.VisualBasic.dll library.
2) Use the following code.
using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;
if (LikeOperator.LikeString(Source: "11", Pattern: "11*", CompareOption: CompareMethod.Text)
{
// Your code here...
}
To implement your function in a case-insensitive way, it may be more appropriate to use IndexOf instead of the combination of two ToLower() calls with Contains. This is both because ToLower() will generate a new string, and because of the Turkish İ Problem.
Something like the following should do the trick, where it returns False if either term is null, otherwise uses a case-insensitive IndexOf call to determine if the search term exists in the source string:
public static bool SourceContainsSearch(string source, string search)
{
return search != null &&
source?.IndexOf(search, StringComparison.OrdinalIgnoreCase) > -1;
}
I have to write an equivalent of this in C++ in C#,
string val_in;
float val;
char unit[100];
val_in = NoSpace(val_in);
int nscan = sscanf(val_in.c_str(), "%f%s", &val, &unit);
if (nscan < 2) {
return val_in; //do nothing if scan fail
}
where the NoSpace() method trims and removes all spaces in val_in.
I have looked around here on SO and most of the similar questions involves strings that contain delimiters such as spaces or commas, but don't apply to this case. So I turned to RegEx.
So far, I have this,
string val_in;
float val;
char[] unit = new char[100];
string[] val_arr;
val_in = NoSpace(val_in);
val_arr = Regex.Split(val_in, #"([-]?\d*\.?\d+)([a-zA-Z]+)");
val = Single.Parse(val_arr[1]);
if (val_arr.Length < 2) {
return val_in; //do nothing if scan fail
}
It works so far, but I was wondering if there is another way to do this? I a bit wary of RegEx, because according the accepted answer on this question, having ([-]?\d*\.?\d+) instead of ([-]?(\d*\.)?\d+) is potentially dangerous because of evil RegEx. But if I include those extra parenthesis, then I have an extra group. This causes Split() to split something like 123.456miles into an array with the elements,
{emptystr, 123.456, 123., miles}
This way, I can't be sure that the unit, miles in this case, will be in val_arr[2], which is a problem.
I tested this on this .NET RegEx tester. I also tried to break my RegEx pattern, ([-]?\d*\.?\d+), but it seems to be fine and "evil RegEx safe". So I'm not sure if I should stick to what I've done so far, or find a more elegant solution, if one exist.
Not very elegant, but can't you just look for the first letter in the string to know where your unit starts?
static void SplitValAndUnit(string unsplitData)
{
for (int x = 0; x < unsplitData.Length; x++)
{
if (Char.IsLetter(unsplitData[x]))
{
string value = unsplitData.Substring(0, x);
// TryParse value to whatever data type
string unit = unsplitData.Substring(x, unsplitData.Length - x);
}
}
}
I have a function that is walking through the string looking for pattern and changing parts of it. I could optimize it by inserting
if (!text.Contains(pattern)) return;
But, I am actually walking through the whole string and comparing parts of it with the pattern, so the question is, how String.Contains() actually works? I know there was such a question - How does String.Contains work? but answer is rather unclear. So, if String.Contains() walks through the whole array of chars as well and compare them to pattern I am looking for as well, it wouldn't really make my function faster, but slower.
So, is it a good idea to attempt such an optimizations? And - is it possible for String.Contains() to be even faster than function that just walk through the whole array and compare every single character with some constant one?
Here is the code:
public static char colorchar = (char)3;
public static Client.RichTBox.ContentText color(string text, Client.RichTBox SBAB)
{
if (text.Contains(colorchar.ToString()))
{
int color = 0;
bool closed = false;
int position = 0;
while (text.Length > position)
{
if (text[position] == colorchar)
{
if (closed)
{
text = text.Substring(position, text.Length - position);
Client.RichTBox.ContentText Link = new Client.RichTBox.ContentText(ProtocolIrc.decode_text(text), SBAB, Configuration.CurrentSkin.mrcl[color]);
return Link;
}
if (!closed)
{
if (!int.TryParse(text[position + 1].ToString() + text[position + 2].ToString(), out color))
{
if (!int.TryParse(text[position + 1].ToString(), out color))
{
color = 0;
}
}
if (color > 9)
{
text = text.Remove(position, 3);
}
else
{
text = text.Remove(position, 2);
}
closed = true;
if (color < 16)
{
text = text.Substring(position);
break;
}
}
}
position++;
}
}
return null;
}
Short answer is that your optimization is no optimization at all.
Basically, String.Contains(...) just returns String.IndexOf(..) >= 0
You could improve your alogrithm to:
int position = text.IndexOf(colorchar.ToString()...);
if (-1 < position)
{ /* Do it */ }
Yes.
And doesn't have a bug (ahhm...).
There are better ways of looking for multiple substrings in very long texts, but for most common usages String.Contains (or IndexOf) is the best.
Also IIRC the source of String.Contains is available in the .Net shared sources
Oh, and if you want a performance comparison you can just measure for your exact use-case
Check this similar post How does string.contains work
I think that you will not be able to simply do anything faster than String.Contains, unless you want to use standard CRT function wcsstr, available in msvcrt.dll, which is not so easy
Unless you have profiled your application and determined that the line with String.Contains is a bottle-neck, you should not do any such premature optimizations. It is way more important to keep your code's intention clear.
Ans while there are many ways to implement the methods in the .NET base classes, you should assume the default implementations are optimal enough for most people's use cases. For example, any (future) implementation of .NET might use the x86-specific instructions for string comparisons. That would then always be faster than what you can do in C#.
If you really want to be sure whether your custom string comparison code is faster than String.Contains, you need to measure them both using many iterations, each with a different string. For example using the Stopwatch class to measure the time.
If you now the details which you can use for optimizations (not just simple contains check) sure you can make your method faster than string.Contains, otherwise - not.
How to force double x = 3 / 2; to return 1.5 in x without the D suffix or casting? Is there any kind of operator overload that can be done? Or some compiler option?
Amazingly, it's not so simple to add the casting or suffix for the following reason:
Business users need to write and debug their own formulas. Presently C# is getting used like a DSL (domain specific language) in that these users aren't computer science engineers. So all they know is how to edit and create a few types of classes to hold their "business rules" which are generally just math formulas.
But they always assume that double x = 3 / 2; will return x = 1.5
however in C# that returns 1.
A. they always forget this, waste time debugging, call me for support and we fix it.
B. they think it's very ugly and hurts the readability of their business rules.
As you know, DSL's need to be more like natural language.
Yes. We are planning to move to Boo and build a DSL based on it but that's down the road.
Is there a simple solution to make double x = 3 / 2; return 1.5 by something external to the class so it's invisible to the users?
Thanks!
Wayne
No, there's no solution that can make 3 / 2 return 1.5.
The only workaround taking into consideration your constraints is to discourage the users to use literals in the formula. Encourage them to use constants. Or, if they really need to use literals, Encourage them to use literals with a decimal point.
never say never...
The (double)3/2 solution looks nice...
but it failed for 4+5/6
try this:
donated to the public domain to be used freely by SymbolicComputation.com.
It's alpha but you can try it out, I've only run it on a few tests, my site and software should be up soon.
It uses Microsoft's Roslyn, it'll put a 'd' after every number if all goes well. Roslyn is alpha too, but it will parse a fair bit of C#.
public static String AddDSuffixesToEquation(String inEquation)
{
SyntaxNode syntaxNode = EquationToSyntaxNode(inEquation);
List<SyntaxNode> branches = syntaxNode.DescendentNodesAndSelf().ToList();
List<Int32> numericBranchIndexes = new List<int>();
List<SyntaxNode> replacements = new List<SyntaxNode>();
SyntaxNode replacement;
String lStr;
Int32 L;
for (L = 0; L < branches.Count; L++)
{
if (branches[L].Kind == SyntaxKind.NumericLiteralExpression)
{
numericBranchIndexes.Add(L);
lStr = branches[L].ToString() + "d";
replacement = EquationToSyntaxNode(lStr);
replacements.Add(replacement);
}
}
replacement = EquationToSyntaxNode(inEquation);
List<SyntaxNode> replaceMeBranches;
for (L = numericBranchIndexes.Count - 1; L >= 0; L--)
{
replaceMeBranches = replacement.DescendentNodesAndSelf().ToList();
replacement = replacement.ReplaceNode(replaceMeBranches[numericBranchIndexes[L]],replacements[L]);
}
return replacement.ToString();
}
public static SyntaxNode EquationToSyntaxNode(String inEquation)
{
SyntaxTree tree = EquationToSyntaxTree(inEquation);
return EquationSyntaxTreeToEquationSyntaxNode(tree);
}
public static SyntaxTree EquationToSyntaxTree(String inEquation)
{
return SyntaxTree.ParseCompilationUnit("using System; class Calc { public static object Eval() { return " + inEquation + "; } }");
}
public static SyntaxNode EquationSyntaxTreeToEquationSyntaxNode(SyntaxTree syntaxTree)
{
SyntaxNode syntaxNode = syntaxTree.Root.DescendentNodes().First(x => x.Kind == SyntaxKind.ReturnStatement);
return syntaxNode.ChildNodes().First();
}
simple, if I'm not mistaken:
double x = 3D / 2D;
One solution would be writing a method that does this for them and teach them to use it. Your method would always take in doubles and the answer will always have the correct number of decimals.
I'm not pretty sure, but I believe you can get a double using 3.0/2.0
But if you think .0 just as another way of suffixing then it's not the answer too :-)
Maybe you can try RPN Expression Parser Class for now or bcParser? These are very small expression parsing libraries.
I like strong, statically typed languages for my own work, but I don't think they're suited for beginners who have no interest in becoming professionals.
So I'd have to say unfortunately your choice of C# might not of been the best for that audience.
Boo seems to be statically typed to. Have you thought about embedding a Javascript engine, Python, or some other dynamically typed engine? These usually are not that hard to plug into an existing application and you have the benefit of lots of existing documentation.
Perhaps an extenstion method on int32?
Preprocess formulas before passing them to the c# compiler. Do something like:
formula = Regex.Replace(formula, #"(^|[\^\s\+\*\/-])(\d+)(?![DF\.])", "$1$2D")
To convert integer literals to double literals.
Alternately, you could use a simple state machine to track whether or not you're in a string literal or comment rather than blindly replacing, but for simple formulas I think a regex will suffice.
Try doing it like this:
double result = (double) 3 / 2;
result = 1.5
Can anyone think of a nicer way to do the following:
public string ShortDescription
{
get { return this.Description.Length <= 25 ? this.Description : this.Description.Substring(0, 25) + "..."; }
}
I would have liked to just do string.Substring(0, 25) but it throws an exception if the string is less than the length supplied.
I needed this so often, I wrote an extension method for it:
public static class StringExtensions
{
public static string SafeSubstring(this string input, int startIndex, int length, string suffix)
{
// Todo: Check that startIndex + length does not cause an arithmetic overflow - not that this is likely, but still...
if (input.Length >= (startIndex + length))
{
if (suffix == null) suffix = string.Empty;
return input.Substring(startIndex, length) + suffix;
}
else
{
if (input.Length > startIndex)
{
return input.Substring(startIndex);
}
else
{
return string.Empty;
}
}
}
}
if you only need it once, that is overkill, but if you need it more often then it can come in handy.
Edit: Added support for a string suffix. Pass in "..." and you get your ellipses on shorter strings, or pass in string.Empty for no special suffixes.
return this.Description.Substring(0, Math.Min(this.Description.Length, 25));
Doesn't have the ... part. Your way is probably the best, actually.
public static Take(this string s, int i)
{
if(s.Length <= i)
return s
else
return s.Substring(0, i) + "..."
}
public string ShortDescription
{
get { return this.Description.Take(25); }
}
The way you've done it seems fine to me, with the exception that I would use the magic number 25, I'd have that as a constant.
Do you really want to store this in your bean though? Presumably this is for display somewhere, so your renderer should be the thing doing the truncating instead of the data object
Well I know there's answer accepted already and I may get crucified for throwing out a regular expression here but this is how I usually do it:
//may return more than 25 characters depending on where in the string 25 characters is at
public string ShortDescription(string val)
{
return Regex.Replace(val, #"(.{25})[^\s]*.*","$1...");
}
// stricter version that only returns 25 characters, plus 3 for ...
public string ShortDescriptionStrict(string val)
{
return Regex.Replace(val, #"(.{25}).*","$1...");
}
It has the nice side benefit of not cutting a word in half as it always stops after the first whitespace character past 25 characters. (Of course if you need it to truncate text going into a database, that might be a problem.
Downside, well I'm sure it's not the fastest solution possible.
EDIT: replaced … with "..." since not sure if this solution is for the web!
without .... this should be the shortest :
public string ShortDescription
{
get { return Microsoft.VisualBasic.Left(this.Description;}
}
I think the approach is sound, though I'd recommend a few adjustments
Move the magic number to a const or configuration value
Use a regular if conditional rather than the ternary operator
Use a string.Format("{0}...") rather than + "..."
Have just one return point from the function
So:
public string ShortDescription
{
get
{
const int SHORT_DESCRIPTION_LENGTH = 25;
string _shortDescription = Description;
if (Description.Length > SHORT_DESCRIPTION_LENGTH)
{
_shortDescription = string.Format("{0}...", Description.Substring(0, SHORT_DESCRIPTION_LENGTH));
}
return _shortDescription;
}
}
For a more general approach, you might like to move the logic to an extension method:
public static string ToTruncated(this string s, int truncateAt)
{
string truncated = s;
if (s.Length > truncateAt)
{
truncated = string.Format("{0}...", s.Substring(0, truncateAt));
}
return truncated;
}
Edit
I use the ternary operator extensively, but prefer to avoid it if the code becomes sufficiently verbose that it starts to extend past 120 characters or so. In that case I'd like to wrap it onto multiple lines, so find that a regular if conditional is more readable.
Edit2
For typographical correctness you could also consider using the ellipsis character (…) as opposed to three dots/periods/full stops (...).
One way to do it:
int length = Math.Min(Description.Length, 25);
return Description.Substring(0, length) + "...";
There are two lines instead of one, but shorter ones :).
Edit:
As pointed out in the comments, this gets you the ... all the time, so the answer was wrong. Correcting it means we go back to the original solution.
At this point, I think using string extensions is the only option to shorten the code. And that makes sense only when that code is repeated in at least a few places...
Looks fine to me, being really picky I would replace "..." with the entity reference "…"
I can't think of any but your approach might not be the best. Are you adding presentation logic into your data object? If so then I suggest you put that logic elsewhere, for example a static StringDisplayUtils class with a GetShortStringMethod( int maxCharsToDisplay, string stringToShorten).
However, that approach might not be great either. What about different fonts and character sets? You'd have to start measuring the actual string length in terms of pixels. Check out the AutoEllipsis property on the winform's Label class (you'll prob need to set AutoSize to false if using this). The AutoEllipsis property, when true, will shorten a string and add the '...' chars for you.
I'd stick with what you have tbh, but just as an alternative, if you have LINQ to objects you could
new string(this.Description.ToCharArray().Take(25).ToArray())
//And to maintain the ...
+ (this.Description.Length <= 25 ? String.Empty : "...")
As others have said, you'd likely want to store 25 in a constant
You should see if you can reference the Microsoft.VisualBasic DLL into your app so you can make use of the "Left" function.