Compare two values using RegEx

Compare two values using RegEx - c#

If I have two values eg/ABC001 and ABC100 or A0B0C1 and A1B0C0, is there a RegEx I can use to make sure the two values have the same pattern?

Well, here's my shot at it. This doesn't use regular expressions, and assumes s1 and s2 only contain numbers or digits:
public static bool SamePattern(string s1, string s2)
{
if (s1.Length == s2.Length)
{
char[] chars1 = s1.ToCharArray();
char[] chars2 = s2.ToCharArray();
for (int i = 0; i < chars1.Length; i++)
{
if (!Char.IsDigit(chars1[i]) && chars1[i] != chars2[i])
{
return false;
}
else if (Char.IsDigit(chars1[i]) != Char.IsDigit(chars2[i]))
{
return false;
}
}
return true;
}
else
{
return false;
}
}
A description of the algorithm is as follows:
If the strings have different lengths, return false.
Otherwise, check the characters in the same position in both strings:
If they are both digits or both numbers, move on to the next iteration.
If they aren't digits but aren't the same, return false.
If one is a digit and one is a number, return false.
If all characters in both strings were checked successfully, return true.

If you don't know the pattern in advance, but are only going to encounter two groups of characters (alpha and digits), then you could do the following:
Write some C# that parsed the first pattern, looking at each char and determine if it's alpha, or digit, then generate a regex accordingly from that pattern.
You may find that there's no point writing code to generate a regex, as it could be just as simple to check the second string against the first.
Alternatively, without regex:
First check the strings are the same length.
Then loop through both strings at the same time, char by char. If char[x] from string 1 is alpha, and char[x] from string two is the same, you're patterns are matching.
Try this, it should cope if a string sneaks in some symbols. Edited to compare character values ... and use Char.IsLetter and Char.IsDigit
private bool matchPattern(string string1, string string2)
{
bool result = (string1.Length == string2.Length);
char[] chars1 = string1.ToCharArray();
char[] chars2 = string2.ToCharArray();
for (int i = 0; i < string1.Length; i++)
{
if (Char.IsLetter(chars1[i]) != Char.IsLetter(chars2[i]))
{
result = false;
}
if (Char.IsLetter(chars1[i]) && (chars1[i] != chars2[i]))
{
//Characters must be identical
result = false;
}
if (Char.IsDigit(chars1[i]) != Char.IsDigit(chars2[i]))
result = false;
}
return result;
}

Consider using Char.GetUnicodeCategory
You can write a helper class for this task:
public class Mask
{
public Mask(string originalString)
{
OriginalString = originalString;
CharCategories = originalString.Select(Char.GetUnicodeCategory).ToList();
}
public string OriginalString { get; private set; }
public IEnumerable<UnicodeCategory> CharCategories { get; private set; }
public bool HasSameCharCategories(Mask other)
{
//null checks
return CharCategories.SequenceEqual(other.CharCategories);
}
}
Use as
Mask mask1 = new Mask("ab12c3");
Mask mask2 = new Mask("ds124d");
MessageBox.Show(mask1.HasSameCharCategories(mask2).ToString());

I don't know C# syntax but here is a pseudo code:
split the strings on ''
sort the 2 arrays
join each arrays with ''
compare the 2 strings

A general-purpose solution with LINQ can be achieved quite easily. The idea is:
Sort the two strings (reordering the characters).
Compare each sorted string as a character sequence using SequenceEquals.
This scheme enables a short, graceful and configurable solution, for example:
// We will be using this in SequenceEquals
class MyComparer : IEqualityComparer<char>
{
public bool Equals(char x, char y)
{
return x.Equals(y);
}
public int GetHashCode(char obj)
{
return obj.GetHashCode();
}
}
// and then:
var s1 = "ABC0102";
var s2 = "AC201B0";
Func<char, double> orderFunction = char.GetNumericValue;
var comparer = new MyComparer();
var result = s1.OrderBy(orderFunction).SequenceEqual(s2.OrderBy(orderFunction), comparer);
Console.WriteLine("result = " + result);
As you can see, it's all in 3 lines of code (not counting the comparer class). It's also very very easily configurable.
The code as it stands checks if s1 is a permutation of s2.
Do you want to check if s1 has the same number and kind of characters with s2, but not necessarily the same characters (e.g. "ABC" to be equal to "ABB")? No problem, change MyComparer.Equals to return char.GetUnicodeCategory(x).Equals(char.GetUnicodeCategory(y));.
By changing the values of orderFunction and comparer you can configure a multitude of other comparison options.
And finally, since I don't find it very elegant to define a MyComparer class just to enable this scenario, you can also use the technique described in this question:
Wrap a delegate in an IEqualityComparer
to define your comparer as an inline lambda. This would result in a configurable solution contained in 2-3 lines of code.

Related

How to check for variable character in string and match it with another string of same length?

I have a rather complex issue that I'am unable to figure out.
I'm getting a set of string every 10 seconds from another process in which the first set has first 5 characters constant, next 3 are variable and can change. And then another set of string in which first 3 are variable and next 3 are constant.
I want to compare these values to a fixed string to check if the first 5 char matches in 1st set of string (ABCDE*** == ABCDEFGH) and ignore the last 3 variable characters while making sure the length is the same. Eg : if (ABCDE*** == ABCDEDEF) then condition is true, but if (ABCDE*** == ABCDDEFG) then the condition is false because the first 5 char is not same, also if (ABCDE*** == ABCDEFV) the condition should be false as one char is missing.
I'm using the * in fixed string to try to make the length same while comparing.

Does this solve your requirements?
private static bool MatchesPattern(string input)
{
const string fixedString = "ABCDExyz";
return fixedString.Length == input.Length && fixedString.Substring(0, 5).Equals(input.Substring(0, 5));
}
In last versions of C# you can also use ranges:
private static bool MatchesPattern(string input)
{
const string fixedString = "ABCDExyz";
return fixedString.Length == input.Length && fixedString[..5].Equals(input[..5]);
}
See this fiddle.
BTW: You could probably achieve the same using regex.

It's always a good idea to make an abstraction. Here I've made a simple function that takes the pattern and the value and makes a check:
bool PatternMatches(string pattern, string value)
{
// The null string doesn't match any pattern
if (value == null)
{
return false;
}
// If the value has a different length than the pattern, it doesn't match.
if (pattern.Length != value.Length)
{
return false;
}
// If both strings are zero-length, it's considered a match
bool result = true;
// Check every character against the pattern
for (int i = 0; i< pattern.Length; i++)
{
// Logical and the result, * matches everything
result&= (pattern[i]== '*') ? true: value[i] == pattern[i];
}
return result;
}
You can then call it like this:
bool b1 = PatternMatches("ABCDE***", "ABCDEFGH");
bool b2 = PatternMatches("ABC***", "ABCDEF");
You could use regular expressions, but this is fairly readable, RegExes aren't always.
Here is a link to a dotnetfiddle: https://dotnetfiddle.net/4x1U1E

If the string you match against is known at compile time, your best bet is probably using regular expressions. In the first case, match against ^ABCDE...$. In the second case, match against ^...DEF$.
Another way, probably better if the match string is unknown, uses Length, StartsWith and EndsWith:
String prefix = "ABCDE";
if (str.Length == 8 && str.StartsWith(prefix)) {
// do something
}
Then similarly for the second case, but using EndsWith instead of StartsWith.

check this
public bool Comparing(string str1, string str2)
=> str2.StartWith(str1.replace("*","")) && str1.length == str2.Length;

Using unicode characters bigger than 2 bytes with .Net

I'm using this code to generate U+10FFFC
var s = Encoding.UTF8.GetString(new byte[] {0xF4,0x8F,0xBF,0xBC});
I know it's for private-use and such, but it does display a single character as I'd expect when displaying it. The problems come when manipulating this unicode character.
If I later do this:
foreach(var ch in s)
{
Console.WriteLine(ch);
}
Instead of it printing just the single character, it prints two characters (i.e. the string is apparently composed of two characters). If I alter my loop to add these characters back to an empty string like so:
string tmp="";
foreach(var ch in s)
{
Console.WriteLine(ch);
tmp += ch;
}
At the end of this, tmp will print just a single character.
What exactly is going on here? I thought that char contains a single unicode character and I never had to worry about how many bytes a character is unless I'm doing conversion to bytes. My real use case is I need to be able to detect when very large unicode characters are used in a string. Currently I have something like this:
foreach(var ch in s)
{
if(ch>=0x100000 && ch<=0x10FFFF)
{
Console.WriteLine("special character!");
}
}
However, because of this splitting of very large characters, this doesn't work. How can I modify this to make it work?

U+10FFFC is one Unicode code point, but string's interface does not expose a sequence of Unicode code points directly. Its interface exposes a sequence of UTF-16 code units. That is a very low-level view of text. It is quite unfortunate that such a low-level view of text was grafted onto the most obvious and intuitive interface available... I'll try not to rant much about how I don't like this design, and just say that not matter how unfortunate, it is just a (sad) fact you have to live with.
First off, I will suggest using char.ConvertFromUtf32 to get your initial string. Much simpler, much more readable:
var s = char.ConvertFromUtf32(0x10FFFC);
So, this string's Length is not 1, because, as I said, the interface deals in UTF-16 code units, not Unicode code points. U+10FFFC uses two UTF-16 code units, so s.Length is 2. All code points above U+FFFF require two UTF-16 code units for their representation.
You should note that ConvertFromUtf32 doesn't return a char: char is a UTF-16 code unit, not a Unicode code point. To be able to return all Unicode code points, that method cannot return a single char. Sometimes it needs to return two, and that's why it makes it a string. Sometimes you will find some APIs dealing in ints instead of char because int can be used to handle all code points too (that's what ConvertFromUtf32 takes as argument, and what ConvertToUtf32 produces as result).
string implements IEnumerable<char>, which means that when you iterate over a string you get one UTF-16 code unit per iteration. That's why iterating your string and printing it out yields some broken output with two "things" in it. Those are the two UTF-16 code units that make up the representation of U+10FFFC. They are called "surrogates". The first one is a high/lead surrogate and the second one is a low/trail surrogate. When you print them individually they do not produce meaningful output because lone surrogates are not even valid in UTF-16, and they are not considered Unicode characters either.
When you append those two surrogates to the string in the loop, you effectively reconstruct the surrogate pair, and printing that pair later as one gets you the right output.
And in the ranting front, note how nothing complains that you used a malformed UTF-16 sequence in that loop. It creates a string with a lone surrogate, and yet everything carries on as if nothing happened: the string type is not even the type of well-formed UTF-16 code unit sequences, but the type of any UTF-16 code unit sequence.
The char structure provides static methods to deal with surrogates: IsHighSurrogate, IsLowSurrogate, IsSurrogatePair, ConvertToUtf32, and ConvertFromUtf32. If you want you can write an iterator that iterates over Unicode characters instead of UTF-16 code units:
static IEnumerable<int> AsCodePoints(this string s)
{
for(int i = 0; i < s.Length; ++i)
{
yield return char.ConvertToUtf32(s, i);
if(char.IsHighSurrogate(s, i))
i++;
}
}
Then you can iterate like:
foreach(int codePoint in s.AsCodePoints())
{
// do stuff. codePoint will be an int will value 0x10FFFC in your example
}
If you prefer to get each code point as a string instead change the return type to IEnumerable<string> and the yield line to:
yield return char.ConvertFromUtf32(char.ConvertToUtf32(s, i));
With that version, the following works as-is:
foreach(string codePoint in s.AsCodePoints())
{
Console.WriteLine(codePoint);
}

As posted already by Martinho, it is much easier to create the string with this private codepoint that way:
var s = char.ConvertFromUtf32(0x10FFFC);
But to loop through the two char elements of that string is senseless:
foreach(var ch in s)
{
Console.WriteLine(ch);
}
What for? You will just get the high and low surrogate that encode the codepoint. Remember a char is a 16 bit type so it can hold just a max value of 0xFFFF. Your codepoint doesn't fit into a 16 bit type, indeed for the highest codepoint you'll need 21 bits (0x10FFFF) so the next wider type would just be a 32 bit type. The two char elements are not characters, but a surrogate pair. The value of 0x10FFFC is encoded into the two surrogates.

While #R. Martinho Fernandes's answer is correct, his AsCodePoints extension method has two issues:
It will throw an ArgumentException on invalid code points (high surrogate without low surrogate or vice versa).
You can't use char static methods that take (char) or (string, int) (such as char.IsNumber()) if you only have int code points.
I've split the code into two methods, one similar to the original but returns the Unicode Replacement Character on invalid code points. The second method returns a struct IEnumerable with more useful fields:
StringCodePointExtensions.cs
public static class StringCodePointExtensions {
const char ReplacementCharacter = '\ufffd';
public static IEnumerable<CodePointIndex> CodePointIndexes(this string s) {
for (int i = 0; i < s.Length; i++) {
if (char.IsHighSurrogate(s, i)) {
if (i + 1 < s.Length && char.IsLowSurrogate(s, i + 1)) {
yield return CodePointIndex.Create(i, true, true);
i++;
continue;
} else {
// High surrogate without low surrogate
yield return CodePointIndex.Create(i, false, false);
continue;
}
} else if (char.IsLowSurrogate(s, i)) {
// Low surrogate without high surrogate
yield return CodePointIndex.Create(i, false, false);
continue;
}
yield return CodePointIndex.Create(i, true, false);
}
}
public static IEnumerable<int> CodePointInts(this string s) {
return s
.CodePointIndexes()
.Select(
cpi => {
if (cpi.Valid) {
return char.ConvertToUtf32(s, cpi.Index);
} else {
return (int)ReplacementCharacter;
}
});
}
}
CodePointIndex.cs:
public struct CodePointIndex {
public int Index;
public bool Valid;
public bool IsSurrogatePair;
public static CodePointIndex Create(int index, bool valid, bool isSurrogatePair) {
return new CodePointIndex {
Index = index,
Valid = valid,
IsSurrogatePair = isSurrogatePair,
};
}
}
To the extent possible under law, the person who associated CC0 with this work has waived all copyright and related or neighboring rights to this work.

Yet another alternative to enumerate the UTF32 characters in a C# string is to use the System.Globalization.StringInfo.GetTextElementEnumerator method, as in the code below.
public static class StringExtensions
{
public static System.Collections.Generic.IEnumerable<UTF32Char> GetUTF32Chars(this string s)
{
var tee = System.Globalization.StringInfo.GetTextElementEnumerator(s);
while (tee.MoveNext())
{
yield return new UTF32Char(s, tee.ElementIndex);
}
}
}
public struct UTF32Char
{
private string s;
private int index;
public UTF32Char(string s, int index)
{
this.s = s;
this.index = index;
}
public override string ToString()
{
return char.ConvertFromUtf32(this.UTF32Code);
}
public int UTF32Code { get { return char.ConvertToUtf32(s, index); } }
public double NumericValue { get { return char.GetNumericValue(s, index); } }
public UnicodeCategory UnicodeCategory { get { return char.GetUnicodeCategory(s, index); } }
public bool IsControl { get { return char.IsControl(s, index); } }
public bool IsDigit { get { return char.IsDigit(s, index); } }
public bool IsLetter { get { return char.IsLetter(s, index); } }
public bool IsLetterOrDigit { get { return char.IsLetterOrDigit(s, index); } }
public bool IsLower { get { return char.IsLower(s, index); } }
public bool IsNumber { get { return char.IsNumber(s, index); } }
public bool IsPunctuation { get { return char.IsPunctuation(s, index); } }
public bool IsSeparator { get { return char.IsSeparator(s, index); } }
public bool IsSurrogatePair { get { return char.IsSurrogatePair(s, index); } }
public bool IsSymbol { get { return char.IsSymbol(s, index); } }
public bool IsUpper { get { return char.IsUpper(s, index); } }
public bool IsWhiteSpace { get { return char.IsWhiteSpace(s, index); } }
}

How to check if a string has at least 1 alphabetic character? [duplicate]

This question already has answers here:
How can I generate random alphanumeric strings?
(36 answers)
Closed 2 years ago.
My ASP.NET application requires me to generate a huge number of random strings such that each contain at least 1 alphabetic and numeric character and should be alphanumeric on the whole.
For this my logic is to generate the code again if the random string is numeric:
public static string GenerateCode(int length)
{
if (length < 2 || length > 32)
{
throw new RSGException("Length cannot be less than 2 or greater than 32.");
}
string newcode = Guid.NewGuid().ToString("n").Substring(0, length).ToUpper();
return newcode;
}
public static string GenerateNonNumericCode(int length)
{
string newcode = string.Empty;
try
{
newcode = GenerateCode(length);
}
catch (Exception)
{
throw;
}
while (IsNumeric(newcode))
{
return GenerateNonNumericCode(length);
}
return newcode;
}
public static bool IsNumeric(string str)
{
bool isNumeric = false;
try
{
long number = Convert.ToInt64(str);
isNumeric = true;
}
catch (Exception)
{
isNumeric = false;
}
return isNumeric;
}
While debugging, it is working properly but when I ask it to create 10,000 random strings, its not able to handle it properly. When I export that data to Excel, I find at least 20 strings on an average that are numeric.
Is it a problem with my code or C#? - Mine.
If anyone's looking for code,
public static string GenerateCode(int length)
{
if (length < 2)
{
throw new A1Exception("Length cannot be less than 2.");
}
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var random = new Random();
var result = new string(
Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)])
.ToArray());
return result;
}
public static string GenerateAlphaNumericCode(int length)
{
string newcode = string.Empty;
try
{
newcode = GenerateCode(length);
while (!IsAlphaNumeric(newcode))
{
newcode = GenerateCode(length);
}
}
catch (Exception)
{
throw;
}
return newcode;
}
public static bool IsAlphaNumeric(string str)
{
bool isAlphaNumeric = false;
Regex reg = new Regex("[0-9A-Z]+");
isAlphaNumeric = reg.IsMatch(str);
return isAlphaNumeric;
}
Thanks to all for your ideas.

If you want to stick with the Guid as the generator, you could always validate using a Regex
This will only return true if at least one alpha is present
Regex reg = new Regex("[a-zA-Z]+");
Then just use the IsMatch method to see if your string is valid
That way you don't need the (IMHO rather ugly) try..catch around the Convert.
Update : I see your subsequent comment about actually making your code slower. Are you instantiating the Regex object only once, or every time that the test is being done? If the latter then this will be rather inefficient, and you should consider using a "lazy-loaded" property on your class, e.g.
private Regex reg;
private Regex AlphaRegex
{
get
{
if (reg == null) reg = new Regex("[a-zA-Z]+");
return reg;
}
}
Then just use AlphaRegex.IsMatch() in your method. I would expect this to make a difference.

use name space then using System.Linq; use normal string
check whether the string consist at lest one character or number.
using System.Linq;
string StrCheck = "abcd123";
check the string has characters ---> StrCheck.Any(char.IsLetter)
check the string has numbers ---> StrCheck.Any(char.IsDigit)
if (StrCheck.Any(char.IsLetter) && StrCheck.Any(char.IsDigit))
{
//statement goes here.....
}
sorry for the late reply ...

I didn't quite understand what you want in the string except letters (abc etc) - lets say numbers.
You can generate a random character as following:
Random r = new Random();
r.Next('a', 'z'); //For lowercase
r.Next('A', 'Z'); //For capitals
//or you can convert lowercase to capital:
char c = 'k' + ('A' - 'a');
If you want to create a string:
var s = new StringBuilder();
for(int i = 0; i < length; ++i)
s.Append((char)r.Next('a', 'z' + 1)); //Changed to char
return s.ToString();
Note: I don't know ASP.NET so much, so I just act like it's C#.

To answer your question strictly, using your existing code: there is a problem with your recursion logic, which can be avoided by not using recursion (there is absolutely no reason to use recursion in GenerateNonNumericCode). Do the following instead:
public static string GenerateNonNumericCode(int length)
{
string newcode = GenerateCode(length);
while (IsNumeric(newcode))
{
newcode = GenerateCode(length);
}
return newcode;
}
Other General Notes
Your code is very inefficient--throwing exceptions is expensive, so using try/catch in a loop is therefore slow and pointless. As others have suggested, regex makes more sense (System.Text.RegularExpressions namespace).
Is it a problem with my code or C#?
When in doubt, the problem is almost never C#.

So, I would change the code to this:
static Random r = new Random();
public static string GenerateNonNumericCodeFaster(int length) {
var firstLength = r.Next(0, length - 1);
var secondLength = length - 1 - firstLength;
return GenerateCode(firstLength)
+ (char) r.Next((int)'A', (int)'G')
+ GenerateCode(secondLength);
}
You can keep your GenerateCode function as is. Everything else you toss out. The idea here of course is, rather than testing if the string contains an alphabetic character, you just explicitly PUT one in. In my tests, using this code could generate 10,000 8 character strings in 0.0172963 seconds compared to your code which takes around 52 seconds. So, yeah, this is about 3000 times faster :)

How can I use C# to sort values numerically?

I have a string that contains numbers separated by periods. When I sort it appears like this since it is a string: (ascii char order)
3.9.5.2.1.1
3.9.5.2.1.10
3.9.5.2.1.11
3.9.5.2.1.12
3.9.5.2.1.2
3.9.5.2.1.3
3.9.5.2.1.4
etc.
I want it to sort like this: (in numeric order)
3.9.5.2.1.1
3.9.5.2.1.2
3.9.5.2.1.3
...
3.9.5.2.1.9
3.9.5.2.1.10
3.9.5.2.1.11
3.9.5.2.1.12
I know that I can:
Use the Split function to get the individual numbers
Put the values into an object
Sort the object
I prefer to avoid all of that work if it is duplicating existing functionality. Is a method in the .net framework that does this already?

Here's my working solution that also takes care of strings that are not in the right format (e.g. contain text).
The idea is to get the first number within both strings and compare these numbers. If they match, continue with the next number. If they don't, we have a winner. If one if these numbers isn't a number at all, do a string comparison of the part, which wasn't already compared.
It would be easy to make the comparer fully compatible to natural sort order by changing the way to determine the next number.
Look at that.. just found this question.
The Comparer:
class StringNumberComparer : IComparer<string>
{
public int Compare(string x, string y)
{
int compareResult;
int xIndex = 0, yIndex = 0;
int xIndexLast = 0, yIndexLast = 0;
int xNumber, yNumber;
int xLength = x.Length;
int yLength = y.Length;
do
{
bool xHasNextNumber = TryGetNextNumber(x, ref xIndex, out xNumber);
bool yHasNextNumber = TryGetNextNumber(y, ref yIndex, out yNumber);
if (!(xHasNextNumber && yHasNextNumber))
{
// At least one the strings has either no more number or contains non-numeric chars
// In this case do a string comparison of that last part
return x.Substring(xIndexLast).CompareTo(y.Substring(yIndexLast));
}
xIndexLast = xIndex;
yIndexLast = yIndex;
compareResult = xNumber.CompareTo(yNumber);
}
while (compareResult == 0
&& xIndex < xLength
&& yIndex < yLength);
return compareResult;
}
private bool TryGetNextNumber(string text, ref int startIndex, out int number)
{
number = 0;
int pos = text.IndexOf('.', startIndex);
if (pos < 0) pos = text.Length;
if (!int.TryParse(text.Substring(startIndex, pos - startIndex), out number))
return false;
startIndex = pos + 1;
return true;
}
}
Usage:
public static void Main()
{
var comparer = new StringNumberComparer();
List<string> testStrings = new List<string>{
"3.9.5.2.1.1",
"3.9.5.2.1.10",
"3.9.5.2.1.11",
"3.9.test2",
"3.9.test",
"3.9.5.2.1.12",
"3.9.5.2.1.2",
"blabla",
"....",
"3.9.5.2.1.3",
"3.9.5.2.1.4"};
testStrings.Sort(comparer);
DumpArray(testStrings);
Console.Read();
}
private static void DumpArray(List<string> values)
{
foreach (string value in values)
{
Console.WriteLine(value);
}
}
Output:
....
3.9.5.2.1.1
3.9.5.2.1.2
3.9.5.2.1.3
3.9.5.2.1.4
3.9.5.2.1.10
3.9.5.2.1.11
3.9.5.2.1.12
3.9.test
3.9.test2
blabla

No, I don't believe there's anything in the framework which does this automatically. You could write your own IComparer<string> implementation which doesn't do any splitting, but instead iterates over both strings, only comparing as much as is required (i.e. parsing just the first number of each, then continuing if necessary etc) but it would be quite fiddly I suspect. It would also need to make assumptions about how "1.2.3.4.5" compared with "1.3" for example (i.e. where the values contain different numbers of numbers).

Since the comparison you want to do on the strings is different from how strings are normally compared in .Net, you will have to use a custom string string comparer
class MyStringComparer : IComparer<string>
{
public int Compare(string x, string y)
{
// your comparison logic
// split the string using '.' separator
// parse each string item in split array into an int
// compare parsed integers from left to right
}
}
Then you can use the comparer in methods like OrderBy and Sort
var sorted = lst.OrderBy(s => s, new MyStringComparer());
lst.Sort(new MyStringComparer());
This will give you the desired result. If not then just tweak the comparer.

What you are looking for is the natural sort order and Jeff Atwood bloged about it and has links to implementations in different languages. The .NET Framework does not contain an implementation.

Is it possible for you to pad your fields to the same length on the front with 0? If so, then you can just use straight lexicographic sorting on the strings. Otherwise, there is no such method built in to the framework that does this automatically. You'll have to implement your own IComparer<string> if padding is not an option.

Not really, though you may be able to use Regexes or Linq to avoid too much wheel-reinventing. Keep in mind it will cost you much the same computationally to use something built-in as to roll your own.
Try this:
List<string> myList = GetNumberStrings();
myList.Select(s=>s.Split('.')).ToArray().
.Sort((a,b)=>RecursiveCompare(a,b))
.Select(a=>a.Aggregate(new StringBuilder(),
(s,sb)=>sb.Append(s).Append(".")).Remove(sb.Length-1, 1).ToString())
.ToList();
...
public int RecursiveCompare(string[] a, string[] b)
{
return RecursiveCompare(a,b,0)
}
public int RecursiveCompare(string[] a, string[] b, int index)
{
return index == a.Length || index == b.Length
? 0
: a[index] < b[index]
? -1
: a[index] > b[index]
? 1
: RecursiveCompare(a,b, index++);
}
Not the most compact, but it should work and you could use a y-combinator to make the comparison a lambda.

Split each string by '.', iterate through the components and compare them numerically.
This code also assumes that the number of components is signficant (a string '1.1.1' will be greater than '2.1'. This can be adjusted by altering the first if statement in the Compare method below.
int Compare(string a, string b)
{
string[] aParts = a.Split('.');
string[] bParts = b.Split('.');
/// if A has more components than B, it must be larger.
if (aParts.Length != bParts.Length)
return (aParts.Length > bParts.Length) ? 1 : -1;
int result = 0;
/// iterate through each numerical component
for (int i = 0; i < aParts.Length; i++)
if ( (result = int.Parse(aParts[i]).CompareTo(int.Parse(bParts[i]))) !=0 )
return result;
/// all components are equal.
return 0;
}
public string[] sort()
{
/// initialize test data
string l = "3.9.5.2.1.1\n"
+ "3.9.5.2.1.10\n"
+ "3.9.5.2.1.11\n"
+ "3.9.5.2.1.12\n"
+ "3.9.5.2.1.2\n"
+ "3.9.5.2.1.3\n"
+ "3.9.5.2.1.4\n";
/// split the large string into lines
string[] arr = l.Split(new char[] { '\n' },StringSplitOptions.RemoveEmptyEntries);
/// create a list from the array
List<string> strings = new List<string>(arr);
/// sort using our custom sort routine
strings.Sort(Compare);
/// concatenate the list back to an array.
return strings.ToArray();
}

You can use the awesome AlphanumComparator Alphanum natural sort algorithm by David Koelle.
Code:
OrderBy(o => o.MyString, new AlphanumComparator())
If you're gonna use the C# version change it to:
AlphanumComparator : IComparer<string>
and
public int Compare(string x, string y)

In addition to implementing your own IComparer as Jon mentions, if you call ToList() on your array, you can call the .Sort() method and pass in a function parameter that compares two values, as shown here: http://msdn.microsoft.com/en-us/library/w56d4y5z.aspx

Identify if a string is a number

If I have these strings:
"abc" = false
"123" = true
"ab2" = false
Is there a command, like IsNumeric() or something else, that can identify if a string is a valid number?

int n;
bool isNumeric = int.TryParse("123", out n);
Update As of C# 7:
var isNumeric = int.TryParse("123", out int n);
or if you don't need the number you can discard the out parameter
var isNumeric = int.TryParse("123", out _);
The var s can be replaced by their respective types!

This will return true if input is all numbers. Don't know if it's any better than TryParse, but it will work.
Regex.IsMatch(input, #"^\d+$")
If you just want to know if it has one or more numbers mixed in with characters, leave off the ^ + and $.
Regex.IsMatch(input, #"\d")
Edit:
Actually I think it is better than TryParse because a very long string could potentially overflow TryParse.

You can also use:
using System.Linq;
stringTest.All(char.IsDigit);
It will return true for all Numeric Digits (not float) and false if input string is any sort of alphanumeric.
Test case
Return value
Test result
"1234"
true
✅Pass
"1"
true
✅Pass
"0"
true
✅Pass
""
true
⚠️Fail (known edge case)
"12.34"
false
✅Pass
"+1234"
false
✅Pass
"-13"
false
✅Pass
"3E14"
false
✅Pass
"0x10"
false
✅Pass
Please note: stringTest should not be an empty string as this would pass the test of being numeric.

I've used this function several times:
public static bool IsNumeric(object Expression)
{
double retNum;
bool isNum = Double.TryParse(Convert.ToString(Expression), System.Globalization.NumberStyles.Any, System.Globalization.NumberFormatInfo.InvariantInfo, out retNum);
return isNum;
}
But you can also use;
bool b1 = Microsoft.VisualBasic.Information.IsNumeric("1"); //true
bool b2 = Microsoft.VisualBasic.Information.IsNumeric("1aa"); // false
From Benchmarking IsNumeric Options
(source: aspalliance.com)
(source: aspalliance.com)

This is probably the best option in C#.
If you want to know if the string contains a whole number (integer):
string someString;
// ...
int myInt;
bool isNumerical = int.TryParse(someString, out myInt);
The TryParse method will try to convert the string to a number (integer) and if it succeeds it will return true and place the corresponding number in myInt. If it can't, it returns false.
Solutions using the int.Parse(someString) alternative shown in other responses works, but it is much slower because throwing exceptions is very expensive. TryParse(...) was added to the C# language in version 2, and until then you didn't have a choice. Now you do: you should therefore avoid the Parse() alternative.
If you want to accept decimal numbers, the decimal class also has a .TryParse(...) method. Replace int with decimal in the above discussion, and the same principles apply.

You can always use the built in TryParse methods for many datatypes to see if the string in question will pass.
Example.
decimal myDec;
var Result = decimal.TryParse("123", out myDec);
Result would then = True
decimal myDec;
var Result = decimal.TryParse("abc", out myDec);
Result would then = False

In case you don't want to use int.Parse or double.Parse, you can roll your own with something like this:
public static class Extensions
{
public static bool IsNumeric(this string s)
{
foreach (char c in s)
{
if (!char.IsDigit(c) && c != '.')
{
return false;
}
}
return true;
}
}

If you want to catch a broader spectrum of numbers, à la PHP's is_numeric, you can use the following:
// From PHP documentation for is_numeric
// (http://php.net/manual/en/function.is-numeric.php)
// Finds whether the given variable is numeric.
// Numeric strings consist of optional sign, any number of digits, optional decimal part and optional
// exponential part. Thus +0123.45e6 is a valid numeric value.
// Hexadecimal (e.g. 0xf4c3b00c), Binary (e.g. 0b10100111001), Octal (e.g. 0777) notation is allowed too but
// only without sign, decimal and exponential part.
static readonly Regex _isNumericRegex =
new Regex( "^(" +
/*Hex*/ #"0x[0-9a-f]+" + "|" +
/*Bin*/ #"0b[01]+" + "|" +
/*Oct*/ #"0[0-7]*" + "|" +
/*Dec*/ #"((?!0)|[-+]|(?=0+\.))(\d*\.)?\d+(e\d+)?" +
")$" );
static bool IsNumeric( string value )
{
return _isNumericRegex.IsMatch( value );
}
Unit Test:
static void IsNumericTest()
{
string[] l_unitTests = new string[] {
"123", /* TRUE */
"abc", /* FALSE */
"12.3", /* TRUE */
"+12.3", /* TRUE */
"-12.3", /* TRUE */
"1.23e2", /* TRUE */
"-1e23", /* TRUE */
"1.2ef", /* FALSE */
"0x0", /* TRUE */
"0xfff", /* TRUE */
"0xf1f", /* TRUE */
"0xf1g", /* FALSE */
"0123", /* TRUE */
"0999", /* FALSE (not octal) */
"+0999", /* TRUE (forced decimal) */
"0b0101", /* TRUE */
"0b0102" /* FALSE */
};
foreach ( string l_unitTest in l_unitTests )
Console.WriteLine( l_unitTest + " => " + IsNumeric( l_unitTest ).ToString() );
Console.ReadKey( true );
}
Keep in mind that just because a value is numeric doesn't mean it can be converted to a numeric type. For example, "999999999999999999999999999999.9999999999" is a perfeclty valid numeric value, but it won't fit into a .NET numeric type (not one defined in the standard library, that is).

I know this is an old thread, but none of the answers really did it for me - either inefficient, or not encapsulated for easy reuse. I also wanted to ensure it returned false if the string was empty or null. TryParse returns true in this case (an empty string does not cause an error when parsing as a number). So, here's my string extension method:
public static class Extensions
{
/// <summary>
/// Returns true if string is numeric and not empty or null or whitespace.
/// Determines if string is numeric by parsing as Double
/// </summary>
/// <param name="str"></param>
/// <param name="style">Optional style - defaults to NumberStyles.Number (leading and trailing whitespace, leading and trailing sign, decimal point and thousands separator) </param>
/// <param name="culture">Optional CultureInfo - defaults to InvariantCulture</param>
/// <returns></returns>
public static bool IsNumeric(this string str, NumberStyles style = NumberStyles.Number,
CultureInfo culture = null)
{
double num;
if (culture == null) culture = CultureInfo.InvariantCulture;
return Double.TryParse(str, style, culture, out num) && !String.IsNullOrWhiteSpace(str);
}
}
Simple to use:
var mystring = "1234.56789";
var test = mystring.IsNumeric();
Or, if you want to test other types of number, you can specify the 'style'.
So, to convert a number with an Exponent, you could use:
var mystring = "5.2453232E6";
var test = mystring.IsNumeric(style: NumberStyles.AllowExponent);
Or to test a potential Hex string, you could use:
var mystring = "0xF67AB2";
var test = mystring.IsNumeric(style: NumberStyles.HexNumber)
The optional 'culture' parameter can be used in much the same way.
It is limited by not being able to convert strings that are too big to be contained in a double, but that is a limited requirement and I think if you are working with numbers larger than this, then you'll probably need additional specialised number handling functions anyway.

UPDATE of Kunal Noel Answer
stringTest.All(char.IsDigit);
// This returns true if all characters of the string are digits.
But, for this case we have that empty strings will pass that test, so, you can:
if (!string.IsNullOrEmpty(stringTest) && stringTest.All(char.IsDigit)){
// Do your logic here
}

You can use TryParse to determine if the string can be parsed into an integer.
int i;
bool bNum = int.TryParse(str, out i);
The boolean will tell you if it worked or not.

If you want to know if a string is a number, you could always try parsing it:
var numberString = "123";
int number;
int.TryParse(numberString , out number);
Note that TryParse returns a bool, which you can use to check if your parsing succeeded.

I guess this answer will just be lost in between all the other ones, but anyway, here goes.
I ended up on this question via Google because I wanted to check if a string was numeric so that I could just use double.Parse("123") instead of the TryParse() method.
Why? Because it's annoying to have to declare an out variable and check the result of TryParse() before you know if the parse failed or not. I want to use the ternary operator to check if the string is numerical and then just parse it in the first ternary expression or provide a default value in the second ternary expression.
Like this:
var doubleValue = IsNumeric(numberAsString) ? double.Parse(numberAsString) : 0;
It's just a lot cleaner than:
var doubleValue = 0;
if (double.TryParse(numberAsString, out doubleValue)) {
//whatever you want to do with doubleValue
}
I made a couple extension methods for these cases:
Extension method one
public static bool IsParseableAs<TInput>(this string value) {
var type = typeof(TInput);
var tryParseMethod = type.GetMethod("TryParse", BindingFlags.Static | BindingFlags.Public, Type.DefaultBinder,
new[] { typeof(string), type.MakeByRefType() }, null);
if (tryParseMethod == null) return false;
var arguments = new[] { value, Activator.CreateInstance(type) };
return (bool) tryParseMethod.Invoke(null, arguments);
}
Example:
"123".IsParseableAs<double>() ? double.Parse(sNumber) : 0;
Because IsParseableAs() tries to parse the string as the appropriate type instead of just checking if the string is "numeric" it should be pretty safe. And you can even use it for non numeric types that have a TryParse() method, like DateTime.
The method uses reflection and you end up calling the TryParse() method twice which, of course, isn't as efficient, but not everything has to be fully optimized, sometimes convenience is just more important.
This method can also be used to easily parse a list of numeric strings into a list of double or some other type with a default value without having to catch any exceptions:
var sNumbers = new[] {"10", "20", "30"};
var dValues = sNumbers.Select(s => s.IsParseableAs<double>() ? double.Parse(s) : 0);
Extension method two
public static TOutput ParseAs<TOutput>(this string value, TOutput defaultValue) {
var type = typeof(TOutput);
var tryParseMethod = type.GetMethod("TryParse", BindingFlags.Static | BindingFlags.Public, Type.DefaultBinder,
new[] { typeof(string), type.MakeByRefType() }, null);
if (tryParseMethod == null) return defaultValue;
var arguments = new object[] { value, null };
return ((bool) tryParseMethod.Invoke(null, arguments)) ? (TOutput) arguments[1] : defaultValue;
}
This extension method lets you parse a string as any type that has a TryParse() method and it also lets you specify a default value to return if the conversion fails.
This is better than using the ternary operator with the extension method above as it only does the conversion once. It still uses reflection though...
Examples:
"123".ParseAs<int>(10);
"abc".ParseAs<int>(25);
"123,78".ParseAs<double>(10);
"abc".ParseAs<double>(107.4);
"2014-10-28".ParseAs<DateTime>(DateTime.MinValue);
"monday".ParseAs<DateTime>(DateTime.MinValue);
Outputs:
123
25
123,78
107,4
28.10.2014 00:00:00
01.01.0001 00:00:00

If you want to check if a string is a number (I'm assuming it's a string since if it's a number, duh, you know it's one).
Without regex and
using Microsoft's code as much as possible
you could also do:
public static bool IsNumber(this string aNumber)
{
BigInteger temp_big_int;
var is_number = BigInteger.TryParse(aNumber, out temp_big_int);
return is_number;
}
This will take care of the usual nasties:
Minus (-) or Plus (+) in the beginning
contains decimal character BigIntegers won't parse numbers with decimal points. (So: BigInteger.Parse("3.3") will throw an exception, and TryParse for the same will return false)
no funny non-digits
covers cases where the number is bigger than the usual use of Double.TryParse
You'll have to add a reference to System.Numerics and have
using System.Numerics; on top of your class (well, the second is a bonus I guess :)

Double.TryParse
bool Double.TryParse(string s, out double result)

The best flexible solution with .net built-in function called- char.IsDigit. It works with unlimited long numbers. It will only return true if each character is a numeric number. I used it lot of times with no issues and much easily cleaner solution I ever found. I made a example method.Its ready to use. In addition I added validation for null and empty input. So the method is now totally bulletproof
public static bool IsNumeric(string strNumber)
{
if (string.IsNullOrEmpty(strNumber))
{
return false;
}
else
{
int numberOfChar = strNumber.Count();
if (numberOfChar > 0)
{
bool r = strNumber.All(char.IsDigit);
return r;
}
else
{
return false;
}
}
}

Try the regex define below
new Regex(#"^\d{4}").IsMatch("6") // false
new Regex(#"^\d{4}").IsMatch("68ab") // false
new Regex(#"^\d{4}").IsMatch("1111abcdefg")
new Regex(#"^\d+").IsMatch("6") // true (any length but at least one digit)

With c# 7 it you can inline the out variable:
if(int.TryParse(str, out int v))
{
}

Use these extension methods to clearly distinguish between a check if the string is numerical and if the string only contains 0-9 digits
public static class ExtensionMethods
{
/// <summary>
/// Returns true if string could represent a valid number, including decimals and local culture symbols
/// </summary>
public static bool IsNumeric(this string s)
{
decimal d;
return decimal.TryParse(s, System.Globalization.NumberStyles.Any, System.Globalization.CultureInfo.CurrentCulture, out d);
}
/// <summary>
/// Returns true only if string is wholy comprised of numerical digits
/// </summary>
public static bool IsNumbersOnly(this string s)
{
if (s == null || s == string.Empty)
return false;
foreach (char c in s)
{
if (c < '0' || c > '9') // Avoid using .IsDigit or .IsNumeric as they will return true for other characters
return false;
}
return true;
}
}

public static bool IsNumeric(this string input)
{
int n;
if (!string.IsNullOrEmpty(input)) //.Replace('.',null).Replace(',',null)
{
foreach (var i in input)
{
if (!int.TryParse(i.ToString(), out n))
{
return false;
}
}
return true;
}
return false;
}

Regex rx = new Regex(#"^([1-9]\d*(\.)\d*|0?(\.)\d*[1-9]\d*|[1-9]\d*)$");
string text = "12.0";
var result = rx.IsMatch(text);
Console.WriteLine(result);
To check string is uint, ulong or contains only digits one .(dot) and digits
Sample inputs
123 => True
123.1 => True
0.123 => True
.123 => True
0.2 => True
3452.434.43=> False
2342f43.34 => False
svasad.324 => False
3215.afa => False

Hope this helps
string myString = "abc";
double num;
bool isNumber = double.TryParse(myString , out num);
if isNumber
{
//string is number
}
else
{
//string is not a number
}

Pull in a reference to Visual Basic in your project and use its Information.IsNumeric method such as shown below and be able to capture floats as well as integers unlike the answer above which only catches ints.
// Using Microsoft.VisualBasic;
var txt = "ABCDEFG";
if (Information.IsNumeric(txt))
Console.WriteLine ("Numeric");
IsNumeric("12.3"); // true
IsNumeric("1"); // true
IsNumeric("abc"); // false

All the Answers are Useful. But while searching for a solution where the Numeric value is 12 digits or more (in my case), then while debugging, I found the following solution useful :
double tempInt = 0;
bool result = double.TryParse("Your_12_Digit_Or_more_StringValue", out tempInt);
Th result variable will give you true or false.

Here is the C# method.
Int.TryParse Method (String, Int32)

bool is_number(string str, char delimiter = '.')
{
if(str.Length==0) //Empty
{
return false;
}
bool is_delimetered = false;
foreach (char c in str)
{
if ((c < '0' || c > '9') && (c != delimiter)) //ASCII table check. Not a digit && not delimeter
{
return false;
}
if (c == delimiter)
{
if (is_delimetered) //more than 1 delimiter
{
return false;
}
else //first time delimiter
{
is_delimetered = true;
}
}
}
return true;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Compare two values using RegEx - c#

If I have two values eg/ABC001 and ABC100 or A0B0C1 and A1B0C0, is there a RegEx I can use to make sure the two values have the same pattern?

I don't know C# syntax but here is a pseudo code: split the strings on '' sort the 2 arrays join each arrays with '' compare the 2 strings

Related

How to check for variable character in string and match it with another string of same length?

Using unicode characters bigger than 2 bytes with .Net

How to check if a string has at least 1 alphabetic character? [duplicate]

How can I use C# to sort values numerically?

Identify if a string is a number

Categories

Resources