Converting a String of input into Inches - c#

What I would like to do is to take input string, in architectural format and convert it to a double in inches.
For example:
Input: (String) Output: (Double)
1'-2" 14
1'-2 1/2" 14.5
1'2 3/16" 14.1875
1' 12
12 12
12" 12
1'0.5 12.5
1'0.5" 12.5
1'-0.5 12.5
1'-0.5" 12.5
I know I would need to iterate through every character in the string and test a bunch of cases but I did not know if there was some built in function within c# or within some other resource that could do this for me and not make me re-invent the wheel.

Regex for the win!
Okay, if you're new to Regex, it's basically a way of parsing strings. So, realistically, what does your input consist of?
At a high level, you've got one of these three possibilities:
Composite: Number, followed by ', followed by either a - or space,
followed by a number, and optionally ending with a "
Feet Only: Number, followed by a '
Inches Only: Number, optionally followed by a "
And those 'Number's?
Possibilities:
1+ digits (aka, "23")
1+ digits, a '.', and 1+ digits (aka, "32.43")
1+ digits, a space, 1+ digits, a slash, and 1+ digits (aka, "32
13/16")
1+ digits, a slash, and 1+ digits (aka, "13/16")
Okay, so first up, we need a regex for one of your "numbers":
\d+|\d+.\d+|\d+ \d+\/\d+|\d+\/\d+
(Looks complicated, but see these two pages for reference: http://www.rexegg.com/regex-quickstart.html and https://regex101.com/)
Now, just so our regex'es don't get too complicated, you could do something like this:
string regexSnippetForNumber = #"\d+|\d+.\d+|\d+ \d+\/\d+|\d+\/\d+";
string regexForComposite =
"^(" + regexSnippetForNumber + ")'[ -]" +
"(" + regexSnippetForNumber + ")\"?$"
... and then, if the input matches regexForComposite, you use the two capturing groups to get the two numbers. (Which you'd have to parse to get the numerical value.)
Hopefully that makes sense and can get you close enough to the finish line. If you've never used Regexes before, I highly suggest you read up on them. They're incredibly handy when you need to do string parsing that can otherwise be really annoying (like this exact problem!)

Related

How to validate Regex

Im having a hard time with grouping parts of a Regex. I want to validate a few things in a string that follows this format: I-XXXXXX.XX.XX.XX
Validate that the first set of 6 X's (I-xxxxxx.XX.XX.XX) does not contain characters and its length is no more than 6.
Validate that the third set of X's (I-XXXXXX.XX.xx.XX) does not contain characters and is only 1 or 2.
Now, I have already validation on the last set of XX's to make sure the numbers are 1-8 using
string pattern1 = #"^.+\.(0?[1-8])$";
Match match = Regex.Match(TxtWBS.Text, pattern1);
if (match.Success)
;
else
{ errMessage += "WBS invalid"; errMessage +=
Environment.NewLine; }
I just cant figure out how to target specific parts of the string. Any help would be greatly appreciated and thank you in advance!
You're having some trouble adding new validation to this string because it's very generic. Let's take a look at what you're doing:
^.+\.(0?[1-8])$
This finds the following:
^ the start of the string
.+ everything it can, other than a newline, basically jumping the engine's cursor to the end of your line
\. the last period in the string, because of the greedy quantifier in the .+ that comes before it
0? a zero, if it can
[1-8] a number between 1 and 8
()$ stores the two previous things in a group, and if the end of the string doesn't come after this, it may even backtrace and try the same thing from the second to last period instead, which we know isn't a great strategy.
This ends up matching a lot of weird stuff, like for example the string The number 0.1
Let's try patterning something more specific, if we can:
^I-(\d{6})\.(\d{2})\.(\d{1,2})\.([1-8]{2})$
This will match:
^I- an I and a hyphen at the start of the string
(\d{6}) six digits, which it stores in a capture group
\. a period. By now, if there was any other number of digits than six, the match fails instead of trying to backtrace all over the place.
(\d{2})\. Same thing, but two digits instead of six.
(\d{1,2})\. Same thing, the comma here meaning it can match between one and two digits.
([1-8]{2}) Two digits that are each between 1 and 8.
$ The end of the string.
I hope I understood what exactly you're trying to match here. Let me know if this isn't what you had in mind.
This regex:
^.-[0-9]{6}(\.[1-8]{1,2}){3}$
will validate the following:
The first character can be any character, but is of length 1
It is followed by a dash
The dash is followed by exactly 6 numbers 0 - 9. (If this could be less than 6 characters - for example, between 3 and 6 characters - just replace {6} with {3,6}).
This is followed by 3 groups of characters. Each of this groups are proceeded by a period, are of length 1 or 2, and can be any number 1 - 8.
An example of a valid string is:
I-587954.12.34.56
This is also valid:
I-587954.1.3.5
But this isn't:
I-587954.12.80.356
because the second-to-last group contains a 0, and because the last group is of length 3.
Pleas let me know if I have misunderstood any of the rules.
^I-([0-9]{1,6})\.(.{1,2})\.(0[1-2])\.(.{1,2})$
groups delimited by . (\.) :
([0-9]{1,6}) - 1-6 digits
(.{1,2}) - 1-2 any single character
(0[1-2]) - 01 or 02
(.{1,2}) - 1-2 any single character
you can write and easy test regex on your input data, just google "regex online"

Regular expression which ignores the few character until it finds a pattern mentioned

I have to find a decimal in the pdf, which comes under the column "charge".
So, i have come across the regular expression to find the decimal which works fine. But in one of the pdf, i have in the below format.
Pdf Text - Charge (country) Eighteen Thousand one hundred Eighty One and 75/100 18,181.75
Expected - 18,181.75
Regular expression which used to find decimal after the text "Charge": (Charge ([0-9]*)(\,?[ ]?[0-9])+(.[0-9]+))
So, i want to ignore whatever comes in mid of "charge" and the decimal. and display the decimal number. Any help?
case 2: "18,181.75" sometimes may come before "Charge" as well. Like "18,181.75 Charge some text here..."
You may make use of .NET regex unlimited-width lookbehinds:
Regex.Match(s, #"(?<=\bCharge\b.*)\d[\d,]*\.\d+|\d[\d,]*\.\d+(?=.*?\bCharge\b)")
See the regex demo
Details
(?<=\bCharge\b.*)\d[\d,]*\.\d+ - a location preceded with a Charge as a whole word with chars other than newline after it, and then matches a digit followed with 0+ commas or digits, then a dot and 1+ digits
| - or
\d[\d,]*\.\d+(?=.*?\bCharge\b) - a digit followed with 0+ commas or digits, then a dot and 1+ digits, and that should be followed by any 0+ chars other than newline as few as possible and then Charge as a whole word
Below regular expression should help you.
Charge.*[0-9]+([,]?[0-9]+)*\.([0-9]){0,2}$
Hope this works.
What about this :
(?<=[Cc]harge.)([0-9],[0-9].[0-9])|[0-9],[0-9].[0-9](?=\s[Cc]harge)

Regular Expression to match a group of alphanumerics followed by a group of spaces, making a fixed total of characters

I'm trying to write a regular expression using C#/.Net that matches 1-4 alphanumerics followed by spaces, followed by 10 digits. The catch is the number of spaces plus the number of alphanumerics must equal 4, and the spaces must follow the alphanumerics, not be interspersed.
I'm at a total loss as to how to do this. I can do ^[A-Za-z\d\s]{1,4}[\d]{10}$, but that lets the spaces fall anywhere in the first four characters. Or I could do ^[A-Za-z\d]{1,4}[\s]{0,3}[\d]{10}$ to keep the spaces together, but that would allow more than a total of four characters before the 10 digit number.
Valid:
A12B1234567890
AB1 1234567890
AB 1234567890
Invalid:
AB1 1234567890 (more than 4 characters before the numbers)
A1B1234567890 (less than 4 characters before the numbers)
A1 B1234567890 (space amidst the first 4 characters instead of at the end)
You can force the check with a look-behind (?<=^[\p{L}\d\s]{4}) that will ensure there are four allowed characters before the 10-digits number:
^[\p{L}\d]{1,4}\s{0,3}(?<=^[\p{L}\d\s]{4})\d{10}$
^^^^^^^^^^^^^^^^^^^^
See demo
If you do not plan to support all Unicode letters, just replace \p{L} with [a-z] and use RegexOptions.IgnoreCase.
Here's the regex you need:
^(?=[A-Za-z0-9 ]{4}\d{10}$)[A-Za-z0-9]{1,4} *\d{10}$
It uses a lookahead (?= ) to test if it's followed by 4 chars, either alnum or space, and then it goes back to where it was (the beggining of string, not consuming any chars).
Once that condition is met, the rest is a expression quite similar to what you were trying ([A-Za-z0-9]{1,4} *\d{10}).
Online tester
I know this is dumb, but must work exactly as required.
^[A-Za-z\d]([A-Za-z\d]{3}|[A-Za-z\d]{2}\s|[A-Za-z\d]\s{2}|\s{3})[\d]{10}$
Not sure what you are looking for, but perhaps:
^(?=.{14}$)[A-Za-z0-9]{1,4} *\d{10}
demo
Try this:
Doesn't allow char/space/char combination and starts with a char:
/\b(?!\w\s{1,2}\w+)\w(\w|\s){3}\d{10}/gm
https://regex101.com/r/fF2tR8/2

Regular expression for numbers with only one space in undefined positions

i am trying to write a regular expression to validate the numbers with only one space in undefined places?
Maximum of 12 characters with one space or Maximum of 11 characters without spaces.
Ex: '25897 569874','5674','65783987665','435 6523'
i have tried with ^[0-9]{0,12}$.this is not perfect cause I don't know how to place the spaces and its counts.
You can use this regex:
^(?:\d{1,11}|(?=\d+ \d+$)[\d ]{3,12})$
\d{1,11} will match from 1 to 11 digits without space.
(?=\d+ \d+$)[\d ]{3,12} will match up to 11 digits with one space somewhere in the middle. The space cannot be leading or trailing, so ' 23' will be rejected.
(?=\d+ \d+$) is a look-ahead that matched one or more digit, then a space, then one or more digit, then anchor the end of string. It guarantees only one space will appear and the space will not be leading or trailing. The look-ahead also implicitly confirms that there are at least 3 characters in the string.
[\d ]{3,12} will guarantee the string only contains digits or space, and up to 12 of them. The lower bound of number of repetition can be set to 3 or lower, since it has been implied by the look-ahead.
The 2 constraints together guarantees that text contains from 1 to 11 digits and an optional space at arbitrary position in between the digits.
To allow leading space, but reject single space, empty string and trailing spaces:
^(?:\d{1,11}|(?=\d* \d+$)[\d ]{2,12})$
Again, the look-ahead implies at least 2 characters, so the number of repetitions can be set to 2 or lower.
^[0-9 ]{0,12}$ will match upto 12 character string with or without space
If you need multiple criteria,
try OR operator (pipe): |
^[0-9 ]{0,12}$|another condition

Discount mask with regex

Is it possible to create a 'dynamic' discount mask that takes % or numbers as discount values? What is the simple way to do this?
the samples of valide input: -25% or 0.25 or -5$ not 0 and two digit after dot
Try
#"(\+|-)?(\d+(\.\d*)?|\.\d+)%?"
It will find:
123.23
12.4%
.34
.34%
45.
45.%
8
7%
34
34%
+2.55%
-1.75%
UPDATE
and with ...
#"(\+|-)?(\d+(,\d{3})*(?!\d)(\.\d*)?|\.\d+)%?"
... you can include thousands separators as well.
I must confess that my second regex expression looks like a cat had walked accross my keyboard. Here the explanation
(\+|-)? optionally ? a plus or a minus sign.
\d+(,\d{3})*(?!\d)(\.\d*)? one or more digits \d+ followed by any number of thousands separators plus three digits (,\d{3})*, not followed by any digit (?!\d) in order to disallow four digits in sequence, optionally followed by a decimal point and any number of digits (\.\d*)?.
|\.\d+ or alternatively a decimal point followed by at least one digit.
%? finally an optional percent sign.
If I understand your question right, you want something like this:
#"^[+-]?(?:\d*\.)?\d+[%$]?$"
That's partly based on your example of -5$. Usually, though, the $ would go in front, so you'd want something like:
#"^(?:\$(?!.*%))?[+-]?(?:\d*\.)?\d+%?$"
That would allow $-5.00, 10, or +20%, but block $5%.
Edit:
Running with Olivier's idea of allowing commas:
#"^(\$(?!.*%))?[+-]?(\d{1,3}((,\d{3})*|\d*))?(\.\d+)?\b%?$"
Expanded to make it easier to understand:
#"^ #Require matching from the beginning of the line
(\$(?!.*%))? #Optionally allow a $ here, but only if there's no % later on.
[+-]? #Optionally allow + or - at the beginning
(
\d{1,3} #Covers the first three numerals
((,\d{3})*|\d*) #Allow numbers in 1,234,567 format, or simply a long string of numerals with no commas
)? #Allow for a decimal with no leading digits
(\.\d+)? #Optionally allow a period, but only with numerals behind it
\b #Word break (a sneaky way to require at least one numeral before this position, thus preventing an empty string)
%? #Optionally allow %
$" #End of line

Categories