Is there a way to use less than on Strings? - c#

Using a string.CompareTo(string) i can get around this slightly but is not easy to read and i have read on that locallity settings might influence the result.
Is there a way to just simply use < or > on 2 Strings in a more straightforward way?

You can overload operators but you seldom should. To me "stringA" > "stringB" wouldn't mean a damn thing, it's not helping readability IMO. That's why operator overloading guidelines advise not to overload operators if the meaning is not obvious.
EDIT: Operator Overloading Usage Guidelines
Also, in case of String I'm afraid you can't do it seeing as you can put operator-overloading methods only in the class in which the methods are defined.
If the syntax of CompareTo bothers you, maybe wrapping it in extension method will solve your problem?
Like that:
public static bool IsLessThan(this string str, string str2) {
return str.Compare(str2) < 0;
}
I still find it confusing for reader though.
The bottom line is, you can't overload operators for String. Usually you can do something like declaring a partial and stuffing your overloads there, but String is a sealed class, so not this time. I think that the extension method with reasonable name is your best bet. You can put CompareTo or some custom logic inside it.

CompareTo is the proper way in my opinion, you can use the overloads to specify culture specific parameters...

You mention in a comment that you're comparing two strings with values of the form "A100" and "B001". This works in your legacy VB 6 code with the < and > operators because of the way that VB 6 implements string comparison.
The algorithm is quite simple. It walks through the string, one character at a time, and compares the ASCII values of each character. As soon as a character from one string is found to have a lower ASCII code than the corresponding character in the other string, the comparison stops and the first string is declared to be "less than" the second. (VB 6 can be forced to perform a case-insensitive comparison based on the system's current locale by placing the Option Compare Text statement at the top of
the relevant code module, but this is not the default setting.)
Simple, of course, but not entirely logical. Comparing ASCII values skips over all sorts of interesting things you might find in strings nowadays; namely non-ASCII characters. Since you appear to be dealing with strings whose contents have pre-defined limits, this may not be a problem in your particular case. But more generally, writing code like strA < strB is going to look like complete nonsense to anyone else who has to maintain your code (it seems like you're already having this experience), and I encourage you to do the "right thing" even when you're dealing with a fixed set of possible inputs.
There is nothing "straightforward" about using < or > on string values. If you need to implement this functionality, you're going to have to do it yourself. Following the algorithm that I described VB 6 as using above, you could write your own comparison function and call that in your code, instead. Walk through each character in the string, determine if it is a character or a number, and convert it to the appropriate data type. From there, you can compare the two parsed values, and either move on to the next index in the string or return an "equality" value.

There is another problem with that, I think:
Assert.IsFalse(10 < 2);
Assert.IsTrue("10" < "2");
(The second Assert assumes you did an overload for the < operator on the string class.)
But the operator suggests otherwise!!
I agree with Dyppl: you shouldn't do it!

Related

IndexOf for char array ignoring casing

I'm developing a pdf file viewer. A pdf file stores it characters in bytes and a pdf file can have several megabytes. Using strings for this scenario is a bad idea, because the storage space of a string cannot be reused for another string. Therefor I store these pdf bytes in a char array. When reading the next big pdf file, I can reuse the char array.
Now I need to support a search functionality, so that the user can find a certain text in this huge file. When I am searching, I usually don't want to have to enter proper upper and lower case letters, I might even not remember the correct casing, meaning the search should succeed regardless of casing. When using
string.IndexOf(String, StringComparison)
one can chose InvariantCultureIgnoreCase to get both upper and lower case matches.
However, converting the megabyte char array into an equally big string is a bad idea.
Unfortunately, IndexOf for an Array is not helpful:
public static int IndexOf<T> (T[] array, T value);
This allows to search for only 1 char in a char array and does also not support IgnoreCase, which obviously wouldn't make sense for other arrays, like an integer array.
So the question is:
Which method can be used from DotNet to search a string in a character array.
Please read this before marking this question as dupplicate
I am aware that there are already similar questions regarding searching. But the ones I have seen all convert the character array in one way or another into a string, which I definitely not want.
Also note that many of those solutions don't support ignoring the casing. The solution should also handle exotic Unicodes correctly.
And last but not least, best would be an existing method from DotNet.
I came to the conclusion that I need to implement my own IndexOf method for character arrays. However, programming that proved rather challenging, so I checked in the DotNet source code how string.IndexOf is doing it.
It's a bit confusing because one method is calling another which calls another, each doing not much. Finally, one arrives at:
public unsafe int IndexOf(ReadOnlySpan<char> source, ReadOnlySpan<char> value,
CompareOptions options = CompareOptions.None)
Lo and behold, that was exactly the functionality I was looking for, because it is very easy to convert a char[] into a ReadOnlySpan<char>. This method belongs to the CompareInfo class. To call it, one has to write something like this:
var index = CultureInfo.InvariantCulture.CompareInfo.IndexOf(bigCharArray,
searchString, CompareOptions.IgnoreCase);

solving a math expression

I want to evaluate a math expression which the user enters in a textbox. I have done this so far
string equation, finalString;
equation = textBox1.Text;
StringBuilder stringEvaluate = new StringBuilder(equation);
stringEvaluate.Replace("sin", "math.sin");
stringEvaluate.Replace("cos", "math.cos");
stringEvaluate.Replace("tan", "math.tan");
stringEvaluate.Replace("log", "math.log10");
stringEvaluate.Replace("e^", "math.exp");
finalString = stringEvaluate.ToString();
StringBuilder replaceI = new StringBuilder(finalString);
replaceI.Replace("x", "i");
double a;
for (int i = 0; i<5 ; i++)
{
a = double.Parse(finalStringI);
if(a<0)
break;
}
when I run this program it gives an error "Input string was not in a correct format." and highlights a=double.Parse(finalStringI);
I used a pre defined expression a=i*math.log10(i)-1.2 and it works, but when I enter the same thing in the textbox it doesn't.
I did some search and it came up with something to do with compiling the code at runtime.
any ideas how to do this?
i'm an absolute beginner.
thanks :)
The issue is within your stringEvaluate StringBuilder. When you're replacing "sin" with "math.sin", the content within stringEvaluate is still a string. You've got the right idea, but the error you're getting is because of that fact.
Math.sin is a method inside the Math class, thus it cannot be operated on as you are in your a = double.Parse(finalStringI); call.
It would be a pretty big undertaking to accomplish your goal, but I would go about it this way:
Create a class (perhaps call it Expression).
Members of the Expression class could include Lists of operators and operands, and perhaps a double called solution.
Pass this class the string at instantiation, and tear it apart using the StringBuilder class. For example, if you encounter a "sin", add Math.sin to the operator collection (of which I'd use type object).
Each operator and operand within said string should be placed within the two collections.
Create a method that evaluates the elements within the operator and operand collection accordingly. This could get sticky for complex calculations with more than 2 operators, as you would have to implement a PEMDAS-esque algorithm to re-order the collections to obey the order of operations (and thus achieve correct solutions).
Hope this helps :)
The .Parse methods (Int.Parse, double.Parse, etc) will only take a string such as "25" or "3.141" and convert it to the matching value type (int 25, or double 3.141). They will not evaluate math expressions!
You'll pretty much have to write your own text-parser and parse-tree evaluator, or explore run-time code-generation, or MSIL code-emission.
Neither topic can really be covered in the Q&A format of StackOverflow answers.
Take a look at this blog post:
http://www.c-sharpcorner.com/UploadFile/mgold/CodeDomCalculator08082005003253AM/CodeDomCalculator.aspx
It sounds like it does pretty much what you're trying to do. Evaluating math expressions is not as simple as just parsing a double (which is really only going to work for strings like "1.234", not "1 + 2.34"), but apparently it is possible.
You can use the eval function that the framework includes for JScript.NET code.
More details: http://odetocode.com/code/80.aspx
Or, if you're not scared to use classes marked "deprecated", it's really easy:
static string EvalExpression(string s)
{
return Microsoft.JScript.Eval.JScriptEvaluate(s, null, Microsoft.JScript.Vsa.VsaEngine.CreateEngine()).ToString();
}
For example, input "Math.cos(Math.PI / 3)" and the result is "0.5" (which is the correct cosine of 60 degrees)

strtoul Equivalent in C#

I'm trying to read-in a bunch of unsigned integers from a configuration file into a class. These numbers may be specified in either base-10 (eg: 1234) or in base-16 (eg: 0xAB31). Therefore looking for the strtoul equivalent in C# 2.0.
More specifically, I'm interested in a C# function which mimics the behaviour of the this function when the argument indicating the base or radix is passed in as zero. (Under C++, strtoul will attempt to 'guess' the base or radix based on the first couple of characters in the string and then proceed to convert the number suitably)
Currently I'm manually checking the first two characters (using string.Substring() method) of the string and then calling Convert.ToUInt32(hex, 10) or Convert.ToUInt32(hex, 16) as needed.
I'm sure that there has to be a better way to deal with this problem and hence this post. More elegant ideas/solutions or work-arounds would be great help.
Well, you don't need to use Substring unless it's in hex, but it sounds like you're basically doing it the right way:
return text.StartsWith("0x") ? Convert.ToUInt32(text.Substring(2), 16)
: Convert.ToUInt32(text, 10);
Obviously this will create an extra object for the Substring call, and you could write your own hex parsing code to cope with this - but unless you've actually run into performance problems with this approach, I'd keep it simple.

C# better way to do this?

Hi I have this code below and am looking for a prettier/faster way to do this.
Thanks!
string value = "HelloGoodByeSeeYouLater";
string[] y = new string[]{"Hello", "You"};
foreach(string x in y)
{
value = value.Replace(x, "");
}
You could do:
y.ToList().ForEach(x => value = value.Replace(x, ""));
Although I think your variant is more readable.
Forgive me, but someone's gotta say it,
value = Regex.Replace( value, string.Join("|", y.Select(Regex.Escape)), "" );
Possibly faster, since it creates fewer strings.
EDIT: Credit to Gabe and lasseespeholt for Escape and Select.
While not any prettier, there are other ways to express the same thing.
In LINQ:
value = y.Aggregate(value, (acc, x) => acc.Replace(x, ""));
With String methods:
value = String.Join("", value.Split(y, StringSplitOptions.None));
I don't think anything is going to be faster in managed code than a simple Replace in a foreach though.
It depends on the size of the string you are searching. The foreach example is perfectly fine for small operations but creates a new instance of the string each time it operates because the string is immutable. It also requires searching the whole string over and over again in a linear fashion.
The basic solutions have all been proposed. The Linq examples provided are good if you are comfortable with that syntax; I also liked the suggestion of an extension method, although that is probably the slowest of the proposed solutions. I would avoid a Regex unless you have an extremely specific need.
So let's explore more elaborate solutions and assume you needed to handle a string that was thousands of characters in length and had many possible words to be replaced. If this doesn't apply to the OP's need, maybe it will help someone else.
Method #1 is geared towards large strings with few possible matches.
Method #2 is geared towards short strings with numerous matches.
Method #1
I have handled large-scale parsing in c# using char arrays and pointer math with intelligent seek operations that are optimized for the length and potential frequency of the term being searched for. It follows the methodology of:
Extremely cheap Peeks one character at a time
Only investigate potential matches
Modify output when match is found
For example, you might read through the whole source array and only add words to the output when they are NOT found. This would remove the need to keep redimensioning strings.
A simple example of this technique is looking for a closing HTML tag in a DOM parser. For example, I may read an opening STYLE tag and want to skip through (or buffer) thousands of characters until I find a closing STYLE tag.
This approach provides incredibly high performance, but it's also incredibly complicated if you don't need it (plus you need to be well-versed in memory manipulation/management or you will create all sorts of bugs and instability).
I should note that the .Net string libraries are already incredibly efficient but you can optimize this approach for your own specific needs and achieve better performance (and I have validated this firsthand).
Method #2
Another alternative involves storing search terms in a Dictionary containing Lists of strings. Basically, you decide how long your search prefix needs to be, and read characters from the source string into a buffer until you meet that length. Then, you search your dictionary for all terms that match that string. If a match is found, you explore further by iterating through that List, if not, you know that you can discard the buffer and continue.
Because the Dictionary matches strings based on hash, the search is non-linear and ideal for handling a large number of possible matches.
I'm using this methodology to allow instantaneous (<1ms) searching of every airfield in the US by name, state, city, FAA code, etc. There are 13K airfields in the US, and I've created a map of about 300K permutations (again, a Dictionary with prefixes of varying lengths, each corresponding to a list of matches).
For example, Phoenix, Arizona's main airfield is called Sky Harbor with the short ID of KPHX. I store:
KP
KPH
KPHX
Ph
Pho
Phoe
Ar
Ari
Ariz
Sk
Sky
Ha
Har
Harb
There is a cost in terms of memory usage, but string interning probably reduces this somewhat and the resulting speed justifies the memory usage on data sets of this size. Searching happens as the user types and is so fast that I have actually introduced an artificial delay to smooth out the experience.
Send me a message if you have the need to dig into these methodologies.
Extension method for elegance
(arguably "prettier" at the call level)
I'll implement an extension method that allows you to call your implementation directly on the original string as seen here.
value = value.Remove(y);
// or
value = value.Remove("Hello", "You");
// effectively
string value = "HelloGoodByeSeeYouLater".Remove("Hello", "You");
The extension method is callable on any string value in fact, and therefore easily reusable.
Implementation of Extension method:
I'm going to wrap your own implementation (shown in your question) in an extension method for pretty or elegant points and also employ the params keyword to provide some flexbility passing the arguments. You can substitute somebody else's faster implementation body into this method.
static class EXTENSIONS {
static public string Remove(this string thisString, params string[] arrItems) {
// Whatever implementation you like:
if (thisString == null)
return null;
var temp = thisString;
foreach(string x in arrItems)
temp = temp.Replace(x, "");
return temp;
}
}
That's the brightest idea I can come up with right now that nobody else has touched on.

.NET C# switch statement string compare versus enum compare

I'm interested in both style and performance considerations. My choice is to do either of the following ( sorry for the poor formatting but the interface for this site is not WYSIWYG ):
One:
string value = "ALPHA";
switch ( value.ToUpper() )
{
case "ALPHA":
// do somthing
break;
case "BETA":
// do something else
break;
default:
break;
}
Two:
public enum GreekLetters
{
UNKNOWN= 0,
ALPHA= 1,
BETA = 2,
etc...
}
string value = "Alpha";
GreekLetters letter = (GreekLetters)Enum.Parse( typeof( GreekLetters ), value.ToUpper() );
switch( letter )
{
case GreekLetters.ALPHA:
// do something
break;
case GreekLetters.BETA:
// do something else
break;
default:
break;
}
Personally, I prefer option TWO below, but I don't have any real reason other than basic style reasons. However, I'm not even sure there really is a style reason. Thanks for your input.
The second option is marginally faster, as the first option may require a full string comparison. The difference will be too small to measure in most circumstances, though.
The real advantage of the second option is that you've made it explicit that the valid values for value fall into a narrow range. In fact, it will throw an exception at Enum.Parse if the string value isn't in the expected range, which is often exactly what you want.
Option #1 is faster because if you look at the code for Enum.Parse, you'll see that it goes through each item one by one, looking for a match. In addition, there is less code to maintain and keep consistent.
One word of caution is that you shouldn't use ToUpper, but rather ToUpperInvariant() because of Turkey Test issues.
If you insist on Option #2, at least use the overload that allows you to specify to ignore case. This will be faster than converting to uppercase yourself. In addition, be advised that the Framework Design Guidelines encourage that all enum values be PascalCase instead of SCREAMING_CAPS.
I can't comment on the performance part of the question but as for style I prefer option #2. Whenever I have a known set of values and the set is reasonably small (less than a couple of dozen or so) I prefer to use an enum. I find an enum is a lot easier to work with than a collection of string values and anyone looking at the code can quickly see what the set of allowed values is.
This actually depends on the number of items in the enum, and you would have to test it for each specific scenario - not that it is likely to make a big difference. But it is a great question.
With very few values, the Enum.Parse is going to take more time than anything else in either example, so the second should be slower.
With enough values, the switch statement will be implemented as a hashtable, which should work the same speed with strings and enums, so again, Enum.Parse will probably make the second solution slower, but not by relatively as much.
Somewhere in the middle, I would expect the cost of comparing strings being higher than comparing enums would make the first solution faster.
I wouldn't even be surprised if it were different on different compiler versions or different options.
I would definitely say #1. Enum.Parse() causes reflection which is relatively expensive. Plus, Enum.Parse() will throw an Exception if its not defined and since there's no TryParse() you'd need to wrap it in Try/Catch block
Not sure if there is a performance difference when switching on a string value versus an enum.
One thing to consider is would you need the values used for the case statements elsewhere in your code. If so, then using an enum would make more sense as you have a singular definition of the values. Const strings could also be used.

Categories