Separating number and unit in a string in C#

Separating number and unit in a string in C# - c#

I have to write an equivalent of this in C++ in C#,
string val_in;
float val;
char unit[100];
val_in = NoSpace(val_in);
int nscan = sscanf(val_in.c_str(), "%f%s", &val, &unit);
if (nscan < 2) {
return val_in; //do nothing if scan fail
}
where the NoSpace() method trims and removes all spaces in val_in.
I have looked around here on SO and most of the similar questions involves strings that contain delimiters such as spaces or commas, but don't apply to this case. So I turned to RegEx.
So far, I have this,
string val_in;
float val;
char[] unit = new char[100];
string[] val_arr;
val_in = NoSpace(val_in);
val_arr = Regex.Split(val_in, #"([-]?\d*\.?\d+)([a-zA-Z]+)");
val = Single.Parse(val_arr[1]);
if (val_arr.Length < 2) {
return val_in; //do nothing if scan fail
}
It works so far, but I was wondering if there is another way to do this? I a bit wary of RegEx, because according the accepted answer on this question, having ([-]?\d*\.?\d+) instead of ([-]?(\d*\.)?\d+) is potentially dangerous because of evil RegEx. But if I include those extra parenthesis, then I have an extra group. This causes Split() to split something like 123.456miles into an array with the elements,
{emptystr, 123.456, 123., miles}
This way, I can't be sure that the unit, miles in this case, will be in val_arr[2], which is a problem.
I tested this on this .NET RegEx tester. I also tried to break my RegEx pattern, ([-]?\d*\.?\d+), but it seems to be fine and "evil RegEx safe". So I'm not sure if I should stick to what I've done so far, or find a more elegant solution, if one exist.

Not very elegant, but can't you just look for the first letter in the string to know where your unit starts?
static void SplitValAndUnit(string unsplitData)
{
for (int x = 0; x < unsplitData.Length; x++)
{
if (Char.IsLetter(unsplitData[x]))
{
string value = unsplitData.Substring(0, x);
// TryParse value to whatever data type
string unit = unsplitData.Substring(x, unsplitData.Length - x);
}
}
}

Related

Method adds not a necessary line to a list when it should not do so

I'm a beginner in c# and I am working with text exercises. I made a method to filter vehicle's plate numbers. It should consist of 3 letters and 3 integers ( AAA:152 ). My method sends the wrong plate numbers to a file, but also it adds that bad number to a good ones list.
private static string[] InvalidPlates(string[] csvLines, int fieldToCorrect)
{
var toReturn = new List<string>();
var toSend = new List<string>();
int wrongCount = 0;
for (int i = 0; i < csvLines.Length; i++)
{
string[] stringFields = csvLines[i].Split(csvSeparator[0]);
string[] values = stringFields[fieldToCorrect].Split(':');
if(Regex.IsMatch(values[0], #"^[a-zA-Z]+$") && Regex.IsMatch(values[1], "^[0-9]+$"))
{
toReturn.Add(string.Join(csvSeparator, stringFields));
}
else
{
toSend.Add(string.Join(csvSeparator, stringFields));
wrongCount++;
}
}
WriteLinesToFile(OutputFile, toSend.ToArray(), wrongCount);
return toReturn.ToArray();
}
Can somebody help me to fix that?

You need to constrain the possible length using quantifiers:
^[a-zA-Z]{3}\:\d{3}$
which literally means the following, in the strict order:
the strings begins from exactly 3 lowercase or uppercase English alphabet letters, continues with semicolon (:), and ends with exactly three digits
Remember that \ should be escaped in C#.
Also, there is no need to join stringFields back into a string, when you can use non-splitted csvLines[i]:
if (Regex.IsMatch(stringFields, #"^[a-zA-Z]{3}\\:\\d{3}$"))
toReturn.Add(csvLines[i]);
}
else
{
toSend.Add(csvLines[i]);
wrongCount++;
}
Another important thing is that your code is incorrect in terms of OOP. It is pretty inobvious that your method called InvalidPlates will save something to a file. It may confuse you after some time or other developers. There should be no "hidden" functionality, and all methods should actually do only the one thing.
Here is how I would do this using LINQ:
private static bool IsACorrectPlate(string p) => Regex.IsMatch(p, #"^[a-zA-Z]{3}\:\d{3}$");
private static void SortPlatesOut(string[] csvLines, int column, out string[] correct, out string[] incorrect)
{
var isCorrect = csvLines
.GroupBy(l => IsACorrectPlate(l.Split(';')[column]))
.ToDictionary(g => g.Key, g => g.ToArray());
correct = isCorrect[true];
incorrect = isCorrect[false];
}
// Usage:
string[] incorrect, correct;
SortPlatesOut(csvLines, 1, out correct, out incorrect);
File.WriteAllLines("", incorrect);
// do whatever you need with correct
Now, SortPlatesOut method has an expectable behavior without side effects. The code has also become two times shorter. At the same time, it looks more readable for me. If it looks non-readable for you, you can unpack LINQ and split some things other things up.

Fastest way to trim a string and convert it to lower case

I've written a class for processing strings and I have the following problem: the string passed in can come with spaces at the beginning and at the end of the string.
I need to trim the spaces from the strings and convert them to lower case letters. My code so far:
var searchStr = wordToSearchReplacemntsFor.ToLower();
searchStr = searchStr.Trim();
I couldn't find any function to help me in StringBuilder. The problem is that this class is supposed to process a lot of strings as quickly as possible. So I don't want to be creating 2 new strings for each string the class processes.
If this isn't possible, I'll go deeper into the processing algorithm.

Try method chaining.
Ex:
var s = " YoUr StRiNg".Trim().ToLower();

Cyberdrew has the right idea. With string being immutable, you'll be allocating memory during both of those calls regardless. One thing I'd like to suggest, if you're going to call string.Trim().ToLower() in many locations in your code, is to simplify your calls with extension methods. For example:
public static class MyExtensions
{
public static string TrimAndLower(this String str)
{
return str.Trim().ToLower();
}
}

Here's my attempt. But before I would check this in, I would ask two very important questions.
Are sequential "String.Trim" and "String.ToLower" calls really impacting the performance of my app? Would anyone notice if this algorithm was twice as slow or twice as fast? The only way to know is to measure the performance of my code and compare against pre-set performance goals. Otherwise, micro-optimizations will generate micro-performance gains.
Just because I wrote an implementation that appears faster, doesn't mean that it really is. The compiler and run-time may have optimizations around common operations that I don't know about. I should compare the running time of my code to what already exists.
static public string TrimAndLower(string str)
{
if (str == null)
{
return null;
}
int i = 0;
int j = str.Length - 1;
StringBuilder sb;
while (i < str.Length)
{
if (Char.IsWhiteSpace(str[i])) // or say "if (str[i] == ' ')" if you only care about spaces
{
i++;
}
else
{
break;
}
}
while (j > i)
{
if (Char.IsWhiteSpace(str[j])) // or say "if (str[j] == ' ')" if you only care about spaces
{
j--;
}
else
{
break;
}
}
if (i > j)
{
return "";
}
sb = new StringBuilder(j - i + 1);
while (i <= j)
{
// I was originally check for IsUpper before calling ToLower, probably not needed
sb.Append(Char.ToLower(str[i]));
i++;
}
return sb.ToString();
}

If the strings use only ASCII characters, you can look at the C# ToLower Optimization. You could also try a lookup table if you know the character set ahead of time

So first of all, trim first and replace second, so you have to iterate over a smaller string with your ToLower()
other than that, i think your best algorithm would look like this:
Iterate over the string once, and check
whether there's any upper case characters
whether there's whitespace in beginning and end (and count how many chars you're talking about)
if none of the above, return the original string
if upper case but no whitespace: do ToLower and return
if whitespace:
allocate a new string with the right size (original length - number of white chars)
fill it in while doing the ToLower

You can try this:
public static void Main (string[] args) {
var str = "fr, En, gB";
Console.WriteLine(str.Replace(" ","").ToLower());
}

String cannot contain any part of another string .NET 2.0

I'm looking for a simple way to discern if a string contains any part of another string (be that regex, built in function I don't know about, etc...). For Example:
string a = "unicorn";
string b = "cornholio";
string c = "ornament";
string d = "elephant";
if (a <comparison> b)
{
// match found ("corn" from 'unicorn' matched "corn" from 'cornholio')
}
if (a <comparison> c)
{
// match found ("orn" from 'unicorn' matched "orn" from 'ornament')
}
if (a <comparison> d)
{
// this will not match
}
something like if (a.ContainsAnyPartOf(b)) would be too much to hope for.
Also, I only have access to .NET 2.0.
Thanks in advance!

This method should work. You'll want to specify a minimum length for the "part" that might match. I'd assume you'd want to look for something of at least 2, but with this you can set it as high or low as you want. Note: error checking not included.
public static bool ContainsPartOf(string s1, string s2, int minsize)
{
for (int i = 0; i <= s2.Length - minsize; i++)
{
if (s1.Contains(s2.Substring(i, minsize)))
return true;
}
return false;
}

I think you're looking for this implementation of longest common substring?

Your best bet, according to my understanding of the question, is to compute the Levenshtein (or related values) distance and compare that against a threshold.

Your requirements are a little vague.
You need to define a minimum length for the match...but implementing an algorithm shouldn't be too difficult when you figure that part out.
I'd suggest breaking down the string into character arrays and then using tail recursion to find matches for the parts.

C# - fastest way to compare two strings using wildcards

Is there a fastest way to compare two strings (using the space for a wildcard) than this function?
public static bool CustomCompare(this string word, string mask)
{
for (int index = 0; index < mask.Length; index++)
{
if (mask[index] != ' ') && (mask[index]!= word[index]))
{
return false;
}
}
return true;
}
Example: "S nt nce" comparing with "Sentence" will return true. (The two being compared would need to be the same length)

If mask.length is less than word.length, this function will stop comparing at the end of mask. A word/mask length compare in the beginning would prevent that, also it would quick-eliminate some obvious mismatches.

The loop is pretty simple and I'm not sure you can do much better. You might be able to micro optimize the order of the expression in the if statement. For example due to short circuiting of the && it might be faster to order the if statement this way
if (mask[index]!= word[index])) && (mask[index] != ' ')
Assuming that matching characters is more common that matching the wildcard. Of course this is just theory I wouldn't believe it made a difference without benchmarking it.
And as others have pointed out the routine fails if the mask and string are not the same length.

That looks like a pretty good implementation - I don't think you will get much faster than that.
Have you profiled this code and found it to be a bottleneck in your application? I think this should be fine for most purposes.

If you used . instead of , you could do a simple regex match.

Variable Length comparison:
I used your code as a starting place for my own application which assumes the mask length is shorter or equal to the comparison text length. allowing for a variable length wildcard spot in the mask. ie: "concat" would match a mask of "c ncat" or "c t" or even "c nc t"
private bool CustomCompare(string word, string mask)
{
int lengthDifference = word.Length - mask.Length;
int wordOffset = 0;
for (int index = 0; index < mask.Length; index++)
{
if ((mask[index] != ' ') && (mask[index]!= word[index+wordOffset]))
{
if (lengthDifference <= 0)
{
return false;
}
else
{
lengthDifference += -1;
wordOffset += 1;
}
}
}
return true;
}

Not sure if this is any faster but it looks neat:
public static bool CustomCompare(this string word, string mask)
{
return !mask.Where((c, index) => c != word[index] && c != ' ').Any();
}

I think you're doing a little injustice by not giving a little bit of context to your code. Sure, if you want to search only one string of characters of the same length as your pattern, then yes this is fine.
However, if you are using this as the heart of a pattern matcher where there are several other patterns you will be looking for, this is a poor method. There are other known methods, the best of which depends on your exact application. The phrase "inexact pattern matching" is the phrase you are concerned with.

Tiny way to get the first 25 characters

Can anyone think of a nicer way to do the following:
public string ShortDescription
{
get { return this.Description.Length <= 25 ? this.Description : this.Description.Substring(0, 25) + "..."; }
}
I would have liked to just do string.Substring(0, 25) but it throws an exception if the string is less than the length supplied.

I needed this so often, I wrote an extension method for it:
public static class StringExtensions
{
public static string SafeSubstring(this string input, int startIndex, int length, string suffix)
{
// Todo: Check that startIndex + length does not cause an arithmetic overflow - not that this is likely, but still...
if (input.Length >= (startIndex + length))
{
if (suffix == null) suffix = string.Empty;
return input.Substring(startIndex, length) + suffix;
}
else
{
if (input.Length > startIndex)
{
return input.Substring(startIndex);
}
else
{
return string.Empty;
}
}
}
}
if you only need it once, that is overkill, but if you need it more often then it can come in handy.
Edit: Added support for a string suffix. Pass in "..." and you get your ellipses on shorter strings, or pass in string.Empty for no special suffixes.

return this.Description.Substring(0, Math.Min(this.Description.Length, 25));
Doesn't have the ... part. Your way is probably the best, actually.

public static Take(this string s, int i)
{
if(s.Length <= i)
return s
else
return s.Substring(0, i) + "..."
}
public string ShortDescription
{
get { return this.Description.Take(25); }
}

The way you've done it seems fine to me, with the exception that I would use the magic number 25, I'd have that as a constant.
Do you really want to store this in your bean though? Presumably this is for display somewhere, so your renderer should be the thing doing the truncating instead of the data object

Well I know there's answer accepted already and I may get crucified for throwing out a regular expression here but this is how I usually do it:
//may return more than 25 characters depending on where in the string 25 characters is at
public string ShortDescription(string val)
{
return Regex.Replace(val, #"(.{25})[^\s]*.*","$1...");
}
// stricter version that only returns 25 characters, plus 3 for ...
public string ShortDescriptionStrict(string val)
{
return Regex.Replace(val, #"(.{25}).*","$1...");
}
It has the nice side benefit of not cutting a word in half as it always stops after the first whitespace character past 25 characters. (Of course if you need it to truncate text going into a database, that might be a problem.
Downside, well I'm sure it's not the fastest solution possible.
EDIT: replaced … with "..." since not sure if this solution is for the web!

without .... this should be the shortest :
public string ShortDescription
{
get { return Microsoft.VisualBasic.Left(this.Description;}
}

I think the approach is sound, though I'd recommend a few adjustments
Move the magic number to a const or configuration value
Use a regular if conditional rather than the ternary operator
Use a string.Format("{0}...") rather than + "..."
Have just one return point from the function
So:
public string ShortDescription
{
get
{
const int SHORT_DESCRIPTION_LENGTH = 25;
string _shortDescription = Description;
if (Description.Length > SHORT_DESCRIPTION_LENGTH)
{
_shortDescription = string.Format("{0}...", Description.Substring(0, SHORT_DESCRIPTION_LENGTH));
}
return _shortDescription;
}
}
For a more general approach, you might like to move the logic to an extension method:
public static string ToTruncated(this string s, int truncateAt)
{
string truncated = s;
if (s.Length > truncateAt)
{
truncated = string.Format("{0}...", s.Substring(0, truncateAt));
}
return truncated;
}
Edit
I use the ternary operator extensively, but prefer to avoid it if the code becomes sufficiently verbose that it starts to extend past 120 characters or so. In that case I'd like to wrap it onto multiple lines, so find that a regular if conditional is more readable.
Edit2
For typographical correctness you could also consider using the ellipsis character (…) as opposed to three dots/periods/full stops (...).

One way to do it:
int length = Math.Min(Description.Length, 25);
return Description.Substring(0, length) + "...";
There are two lines instead of one, but shorter ones :).
Edit:
As pointed out in the comments, this gets you the ... all the time, so the answer was wrong. Correcting it means we go back to the original solution.
At this point, I think using string extensions is the only option to shorten the code. And that makes sense only when that code is repeated in at least a few places...

Looks fine to me, being really picky I would replace "..." with the entity reference "…"

I can't think of any but your approach might not be the best. Are you adding presentation logic into your data object? If so then I suggest you put that logic elsewhere, for example a static StringDisplayUtils class with a GetShortStringMethod( int maxCharsToDisplay, string stringToShorten).
However, that approach might not be great either. What about different fonts and character sets? You'd have to start measuring the actual string length in terms of pixels. Check out the AutoEllipsis property on the winform's Label class (you'll prob need to set AutoSize to false if using this). The AutoEllipsis property, when true, will shorten a string and add the '...' chars for you.

I'd stick with what you have tbh, but just as an alternative, if you have LINQ to objects you could
new string(this.Description.ToCharArray().Take(25).ToArray())
//And to maintain the ...
+ (this.Description.Length <= 25 ? String.Empty : "...")
As others have said, you'd likely want to store 25 in a constant

You should see if you can reference the Microsoft.VisualBasic DLL into your app so you can make use of the "Left" function.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Separating number and unit in a string in C# - c#

Related

Method adds not a necessary line to a list when it should not do so

Fastest way to trim a string and convert it to lower case

String cannot contain any part of another string .NET 2.0

C# - fastest way to compare two strings using wildcards

Tiny way to get the first 25 characters

Categories

Resources