Regex string for validating Decimal excluding zero - c#

I have a function which creates Regex string for validating decimal.
public static string DecimalWithPlaces(int wholePart, int fractionalPart)
{
return #"^-?[0-9]{0," + wholePart + #"}\.[0-9]{1," + fractionalPart + #"}$";
}
Could anyone let me know how can I excluding zero from this?
For example: "0", "0.0","0.00" etc should not be matched.
Thanks

Amy's right - examples would be good.
However, I'm going in blind.
Did you want to exclude all zeros, or just the value of zero?
To exclude zeros, try this and see what happens.
public static string DecimalWithPlaces(int wholePart, int fractionalPart)
{
return #"^-?[1-9]{0," + wholePart + #"}\.[1-9]{1," + fractionalPart + #"}$";
}
To exclude the number zero... That's actually not a good check for regex. That's a value check.
In my books you should be using TWO validation steps on the value. One to check that it meets your precision/scale requirements as per your regex above, the second to then check it for nonzero value.
I do believe that using regex for the zero value thing is possible. But I strongly advise against it. That said, if you're committed to ignoring that reccomendation then you'll probably want to look into Negative Lookahead regex structures.

Related

How do you find a delimited/isolated substring with string.contains?

I am trying to parse out and identify some values from strings that I have in a list.
I am using string.Contains to identify the value im looking for, but I am getting hits even if the value is surrounded by other text. How can I make sure I only get a hit if the value is isolated?
Example parse:
Looking for value = "302"
string sale =
"199708. (30), italiano, delim fabricata modella, serialNumber302. tnr F18529302E.";
var result = sale.ToLower().Contains(”302”));
In this example I will get a hit for "serialNumber302" and "F18529302E", which in the context is incorrect since I only want a hit if it finds “302” isolated, like “dontfind302 shouldfind 302”.
Any ideas on how to do this?
If you try Regex, you can define a word boundary using \b:
string sale =
"199708. (30), italiano, delim fabricata modella, serialNumber302. tnr F18529302E.";
bool result = Regex.IsMatch(sale, #"\b302\b"); // false
sale = "A string with 302 isolated";
result = Regex.IsMatch(sale, #"\b302\b"); // true
So 302 will only be found if it is at the start of the string, at the end of the string, or if it is surrounded by non-word characters i.e. not a-z A-Z 0-9 or _
EDIT: From the comments I realiſed that it waſn't clear whether or not "serialNum302" ſhould get a hit. I aſſumed ſo in this anſwer.
I ſee a few eaſy ways you could do this:
1) If the input is always a number as in the example, one option would be to only ſearch for ſubſtrings not ſurrounded by more numbers, by examining all the reſults of an initial ſearch and comparing their neighboring characters againſt the ſtring "0123456789". I really don't think this is the beſt option though, becauſe ſooner or later it's goïng to break when it miſinterprets one of the other bits of data.
2) If the ſtring sale always has the ſeriäl number in the format "serialNumber[Num]", inſtead of juſt looking for Num, look for "serialNumber" + Num, as this is leſs likely to be meſſed up with the other data.
3) From your ſtring, it looks like you have a ſtandardized format that's beïng introduced to the ſyſtem. In this caſe, parſe it in a ſtandardized way, e.g. by ſplitting it into ſubſtrings at the commas, then parſing each ſubſtring differently as it requires.

Regex/Method to remove namespace from a Type.FullName - C#

I am working on writing a method to remove the namespace from a System.Type.FullName (not XML).
I started off googling and didn't get too far so switched to trying to write a Regex I could use with a Regex.Replace(). But I am far from a master of the Regex arts, so I present myself humbly before the regex gods.
Given the following inputs:
name.space.class
name.space.class<other.name.space.class1>
name.space.class<other.name.space.class1, shortSpace.class2>
I need to remove the namespaces so I get:
class
class<class1>
class<class1, class2>
Alternatively, if anyone knows of an existing library that has this functionality, all the better!
Note: I know System.Type has a Namespace property that I could use to remove the namespace (ie System.Type.FullName - System.Type.Namespace), but my method takes a type name as a string and needs to work with type names that the run-time does not know about (can't resolve).
How about this...
[.\w]+\.(\w+)
...and substiuting with $1. See it in action on regex101.
From looking at some C# examples it seems you would do
string output = Regex.Replace(input, #"[.\w]+\.(\w+)", "$1");
Try this:
public static string RemoveNamespaces(string typename)
{
return string.Join("",
Regex.Split(typename,
#"([^\w\.])").Select(p =>
p.Substring(p.LastIndexOf('.') + 1)));
}
I wouldn't even consider using regexes for this. Imperative code is pretty trivial here, although it requires a bit of string-fu:
public string RemoveNamespace(string typename)
{
if(typename.Contains("<")
{
var genericArguments =
typename.
// in reality, we need a substring before
// first occurence of "<" and last occurence of ">"
SubstringBetween("<", ">").
Split(',').
Select(string.Trim).
Select(RemoveNamespace);
return
RemoveNamespace(typename.SubstringBefore("<")) +
"<" +
string.Join(", ", genericArguments) +
">";
}
else
{
return typename.Trim().SubstringAfterLastOccurenceOf(".");
}
}
Sounds like a good situation to use positive lookahead:
(\w+[.+])+(?=\w+)
This pattern will match any number of words separated by periods or plusses, except the last one in a sequence (the short name of the type). Replacing the matches by the empty string will remove all namespace prefixes.
Why not split by dot(.) and take only the last string

Most efficient way to parse a delimited string in C#

This has been asked a few different ways but I am debating on "my way" vs "your way" with another developer. Language is C#.
I want to parse a pipe delimited string where the first 2 characters of each chunk is my tag.
The rules. Not my rules but rules I have been given and must follow.
I can't change the format of the string.
This function will be called possibly many times so efficiency is key.
I need to keep is simple.
The input string and tag I am looking for may/will change during runtime.
Example input string: AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4
Example tag I may need value for: AB
I split string into an array based on delimiter and loop through the array each time the function is called. I then looked at the first 2 characters and return the value minus the first 2 characters.
The "other guys" way is to take the string and use a combination of IndexOf and SubString to find the starting point and ending point of the field I am looking for. Then using SubString again to pullout the value minus the first 2 characters. So he would say IndexOf("|AB") the find then next pipe in the string. This would be the start and end. Then SubString that out.
Now I should think that IndexOf and SubString would parse the string each time at a char by char level so this would be less efficient than using large chunks and reading the string minus the first 2 characters. Or is there another way the is better then what both of us has proposed?
The other guy's approach is going to be more efficient in time given that input string needs to be reevaluated each time. If the input string is long, it is also won't require the extra memory that splitting the string would.
If I'm trying to code a really tight loop I prefer to directly use array/string operators rather than LINQ to avoid that additional overhead:
string inputString = "AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4";
static string FindString(string tag)
{
int startIndex;
if (inputString.StartsWith(tag))
{
startIndex = tag.Length;
}
else
{
startIndex = inputString.IndexOf(string.Format("|{0}", tag));
if (startIndex == -1)
return string.Empty;
startIndex += tag.Length + 1;
}
int endIndex = inputString.IndexOf('|', startIndex);
if (endIndex == -1)
endIndex = inputString.Length;
return inputString.Substring(startIndex, endIndex - startIndex);
}
I've done a lot of parsing in C# and I would probably take the approach suggested by the "other guys" just because it would be a bit lighter on resources used and likely to be a little faster as well.
That said, as long as the data isn't too big, there's nothing wrong with the first approach and it will be much easier to program.
Something like this may work ok
string myString = "AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4";
string selector = "AB";
var results = myString.Split('|').Where(x => x.StartsWith(selector)).Select(x => x.Replace(selector, ""));
Returns: list of the matches, in this case just one "VALUE2"
If you are just looking for the first or only match this will work.
string result = myString.Split('|').Where(x => x.StartsWith(selector)).Select(x => x.Replace(selector, "")).FirstOrDefault();
SubString does not parse the string.
IndexOf does parse the string.
My preference would be the Split method, primarily code coding efficiency:
string[] inputArr = input.Split("|".ToCharArray()).Select(s => s.Substring(3)).ToArray();
is pretty concise. How many LoC does the substring/indexof method take?

Remove last characters from a string in C#. An elegant way?

I have a numeric string like this 2223,00. I would like to transform it to 2223. This is: without the information after the ",". Assume that there will be only two decimals after the ",".
I did:
str = str.Remove(str.Length - 3, 3);
Is there a more elegant solution? Maybe using another function? -I don´t like putting explicit numbers-
You can actually just use the Remove overload that takes one parameter:
str = str.Remove(str.Length - 3);
However, if you're trying to avoid hard coding the length, you can use:
str = str.Remove(str.IndexOf(','));
Perhaps this:
str = str.Split(",").First();
This will return to you a string excluding everything after the comma
str = str.Substring(0, str.IndexOf(','));
Of course, this assumes your string actually has a comma with decimals. The above code will fail if it doesn't. You'd want to do more checks:
commaPos = str.IndexOf(',');
if(commaPos != -1)
str = str.Substring(0, commaPos)
I'm assuming you're working with a string to begin with. Ideally, if you're working with a number to begin with, like a float or double, you could just cast it to an int, then do myInt.ToString() like:
myInt = (int)double.Parse(myString)
This parses the double using the current culture (here in the US, we use . for decimal points). However, this again assumes that your input string is can be parsed.
String.Format("{0:0}", 123.4567); // "123"
If your initial value is a decimal into a string, you will need to convert
String.Format("{0:0}", double.Parse("3.5", CultureInfo.InvariantCulture)) //3.5
In this example, I choose Invariant culture but you could use the one you want.
I prefer using the Formatting function because you never know if the decimal may contain 2 or 3 leading number in the future.
Edit: You can also use Truncate to remove all after the , or .
Console.WriteLine(Decimal.Truncate(Convert.ToDecimal("3,5")));
Use:
public static class StringExtensions
{
/// <summary>
/// Cut End. "12".SubstringFromEnd(1) -> "1"
/// </summary>
public static string SubstringFromEnd(this string value, int startindex)
{
if (string.IsNullOrEmpty(value)) return value;
return value.Substring(0, value.Length - startindex);
}
}
I prefer an extension method here for two reasons:
I can chain it with Substring.
Example: f1.Substring(directorypathLength).SubstringFromEnd(1)
Speed.
You could use LastIndexOf and Substring combined to get all characters to the left of the last index of the comma within the sting.
string var = var.Substring(0, var.LastIndexOf(','));
You can use TrimEnd. It's efficient as well and looks clean.
"Name,".TrimEnd(',');
Try the following. It worked for me:
str = str.Split(',').Last();
Since C# 8.0 it has been possible to do this with a range operator.
string textValue = "2223,00";
textValue = textValue[0..^3];
Console.WriteLine(textValue);
This would output the string 2223.
The 0 says that it should start from the zeroth position in the string
The .. says that it should take the range between the operands on either side
The ^ says that it should take the operand relative to the end of the sequence
The 3 says that it should end from the third position in the string
Use lastIndexOf. Like:
string var = var.lastIndexOf(',');

CSV Parsing with double quotes

I am trying to use C# to parse CSV. I used regular expressions to find "," and read string if my header counts were equal to my match count.
Now this will not work if I have a value like:
"a",""b","x","y"","c"
then my output is:
'a'
'"b'
'x'
'y"'
'c'
but what I want is:
'a'
'"b","x","y"'
'c'
Is there any regex or any other logic I can use for this ?
CSV, when dealing with things like multi-line, quoted, different delimiters* etc - can get trickier than you might think... perhaps consider a pre-rolled answer? I use this, and it works very well.
*=remember that some locales use [tab] as the C in CSV...
CSV is a great example for code reuse - No matter which one of the csv parsers you choose, don't choose your own. Stop Rolling your own CSV parser
I would use FileHelpers if I were you. Regular Expressions are fine but hard to read, especially if you go back, after a while, for a quick fix.
Just for sake of exercising my mind, quick & dirty working C# procedure:
public static List<string> SplitCSV(string line)
{
if (string.IsNullOrEmpty(line))
throw new ArgumentException();
List<string> result = new List<string>();
bool inQuote = false;
StringBuilder val = new StringBuilder();
// parse line
foreach (var t in line.Split(','))
{
int count = t.Count(c => c == '"');
if (count > 2 && !inQuote)
{
inQuote = true;
val.Append(t);
val.Append(',');
continue;
}
if (count > 2 && inQuote)
{
inQuote = false;
val.Append(t);
result.Add(val.ToString());
continue;
}
if (count == 2 && !inQuote)
{
result.Add(t);
continue;
}
if (count == 2 && inQuote)
{
val.Append(t);
val.Append(',');
continue;
}
}
// remove quotation
for (int i = 0; i < result.Count; i++)
{
string t = result[i];
result[i] = t.Substring(1, t.Length - 2);
}
return result;
}
There's an oft quoted saying:
Some people, when confronted with a
problem, think "I know, I'll use
regular expressions." Now they have
two problems. (Jamie Zawinski)
Given that there's no official standard for CSV files (instead there are a large number of slightly incompatible styles), you need to make sure that what you implement suits the files you will be receiving. No point in implementing anything fancier than what you need - and I'm pretty sure you don't need Regular Expressions.
Here's my stab at a simple method to extract the terms - basically, it loops through the line looking for commas, keeping track of whether the current index is within a string or not:
public IEnumerable<string> SplitCSV(string line)
{
int index = 0;
int start = 0;
bool inString = false;
foreach (char c in line)
{
switch (c)
{
case '"':
inString = !inString;
break;
case ',':
if (!inString)
{
yield return line.Substring(start, index - start);
start = index + 1;
}
break;
}
index++;
}
if (start < index)
yield return line.Substring(start, index - start);
}
Standard caveat - untested code, there may be off-by-one errors.
Limitations
The quotes around a value aren't removed automatically.
To do this, add a check just before the yield return statement near the end.
Single quotes aren't supported in the same way as double quotes
You could add a separate boolean inSingleQuotedString, renaming the existing boolean to inDoubleQuotedString and treating both the same way. (You can't make the existing boolean do double work because you need the string to end with the same quote that started it.)
Whitespace isn't automatically removed
Some tools introduce whitespace around the commas in CSV files to "pretty" the file; it then becomes difficult to tell intentional whitespace from formatting whitespace.
In order to have a parseable CSV file, any double quotes inside a value need to be properly escaped somehow. The two standard ways to do this are by representing a double quote either as two double quotes back to back, or a backslash double quote. That is one of the following two forms:
""
\"
In the second form your initial string would look like this:
"a","\"b\",\"x\",\"y\"","c"
If your input string is not formatted against some rigorous format like this then you have very little chance of successfully parsing it in an automated environment.
If all your values are guaranteed to be in quotes, look for values, not for commas:
("".*?""|"[^"]*")
This takes advantage of the fact that "the earliest longest match wins" - it looks for double quoted values first, and with a lower priority for normal quoted values.
If you don't want the enclosing quote to be part of the match, use:
"(".*?"|[^"]*)"
and go for the value in match group 1.
As I said: Prerequisite for this to work is well-formed input with guaranteed quotes or double quotes around each value. Empty values must be quoted as well! A nice side-effect is that it does not care for the separator char. Commas, TABs, semi-colons, spaces, you name it. All will work.
FileHelpers supports multiline fields.
You could parse files like these:
a,"line 1
line 2
line 3"
b,"line 1
line 2
line 3"
Here is the datatype declaration:
[DelimitedRecord(",")]
public class MyRecord
{
public string field1;
[FieldQuoted('"', QuoteMode.OptionalForRead, MultilineMode.AllowForRead)]
public string field2;
}
Here is the usage:
static void Main()
{
FileHelperEngine engine = new FileHelperEngine(typeof(MyRecord));
MyRecord[] res = engine.ReadFile("file.csv");
}
Try CsvHelper (a library I maintain) or FastCsvReader. Both work well. CsvHelper does writing also. Like everyone else has been saying, don't roll your own. :P
FileHelpers for .Net is your friend.
See the link "Regex fun with CSV" at:
http://snippets.dzone.com/posts/show/4430
The Lumenworks CSV parser (open source, free but needs a codeproject login) is by far the best one I've used. It'll save you having to write the regex and is intuitive to use.
Well, I'm no regex wiz, but I'm certain they have an answer for this.
Procedurally it's going through letter by letter. Set a variable, say dontMatch, to FALSE.
Each time you run into a quote toggle dontMatch.
each time you run into a comma, check dontMatch. If it's TRUE, ignore the comma. If it's FALSE, split at the comma.
This works for the example you give, but the logic you use for quotation marks is fundamentally faulty - you must escape them or use another delimiter (single quotes, for instance) to set major quotations apart from minor quotations.
For instance,
"a", ""b", ""c", "d"", "e""
will yield bad results.
This can be fixed with another patch. Rather than simply keeping a true false you have to match quotes.
To match quotes you have to know what was last seen, which gets into pretty deep parsing territory. You'll probably, at that point, want to make sure your language is designed well, and if it is you can use a compiler tool to create a parser for you.
-Adam
I have just try your regular expression in my code..its work fine for formated text with quote ...
but wondering if we can parse below value by Regex..
"First_Bat7679",""NAME","ENAME","FILE"","","","From: "DDD,_Ala%as"#sib.com"
I am looking for result as:
'First_Bat7679'
'"NAME","ENAME","FILE"'
''
''
'From: "DDD,_Ala%as"#sib.com'
Thanx

Categories