Example :
I want to get the "2" character behind "- 60000 rupiah".
So i tried to code it with substring :
string s = "Ayam Bakar - 30000 x 2 - 60000 rupiah";
string qtyMenu = s.Substring(s.IndexOf("x") + 1, s.IndexOf("-") - 1);
But the substring end index didn't work properly. Maybe because the sentences have multiple "-" character. Is it possible to get different index of same character in that senteces ?
This is a good situation for Regex
string s = "Ayam Bakar - 30000 x 2 - 60000 rupiah";
// "x" followed by (maybe) whitespaces followed by at least one digit followed by (maybe) whitespaces followed by "-".
// Capture the digits with the (...)
var match = Regex.Match(s, #"x\s*(\d+)\s*\-");
if (match.Success)
{
// Groups[1] is the captured group
string foo = match.Groups[1].Value;
}
You can easily achieve this by following technique:
string s = "Ayam Bakar - 30000 x 2 - 60000 rupiah";
string qtyMenu = s.Substring(s.IndexOf("x") + 1, (s.LastIndexOf("-")) - (s.IndexOf("x") + 1));
For the second parameter, the length of the the string to extract is determined by the last index of - minus the index of x
From the message, I can derive the template format:
"{Product Name} - {Price x Qty} - {Subtotal}"
So you can implement this solution:
// Split message by '-'
var messages = s.Split('-');
// Result: messages[0] = "Ayam Bakar"
// Result: messages[1] = " 30000 x 2 "
// Result: messages[2] = " 60000 rupiah"
// Obtain {Price x Qty} in messages[1] and get the value after 'x'
var qtyMenu = messages[1].Substring(messages[1].IndexOf("x") + 1).Trim();
Related
I have the following string
"98225-2077 Bellingham WA"
I need to use Regex to separate Zip Code, City and State.
the groups should return
(98225-2077)(Bellingham) and (WA).
The State is optional and will always be at the end and will consist of two Uppercase charachters.
I am able to filter out the following using regex
Zip Code : (^([\S]+-)?\d+(-\d+)?) - Group[1]
City: ((^([\S]+-)?\d+(-\d+)?)\s)?(\S.*) = Group[5].
Can there be a single regex to filter out all the three using the same regex and return blank in case the state is not there?
I would opt for just splitting the string on space and then using the various parts as you need. Because your city name may consist of multiple words, I iterate from the second to next-to-last element to build the city name. This solution assumes that the zip code and state two abbreviation will always be single words.
string address = "98225-2077 Bellingham WA";
string[] tokens = address.Split(' ');
string city = "";
for (int i=1; i < tokens.Length-1; i++)
{
if (i > 1)
{
city += " ";
}
city += tokens[i];
}
Console.WriteLine("zip code: {0}", tokens[0]);
Console.WriteLine("city: {0}", city);
Console.WriteLine("state: {0}", tokens[tokens.Length-1]);
Easy!
^([\d-]+)\s+(.+?)\s*([A-Z]{2})?$
https://regex101.com/r/tL4tN5/1
Explanation:
^([\d-]+): ^ is for the very beginning of the string. \d for digits
\s+(.+?)\s*: Get anything in the middle between ZIP code and state
([A-Z]{2})?$: {2} means 2 character in the specified range [A-Z]. ? means it exists 1 or 0 times.
This will also work
^(\d[\d-]+)\s+(.*?)(?:\s+([A-Z]{2}))?$
Regex Demo
Ideone Demo
I really think you can do it without a regex. Here are two solutions:
Non-regex solution:
/// <summary>
/// Split address into ZIP, Description/Street/anything, [A-Z]{2} state
/// </summary>
/// <returns>null if no space is found</returns>
public static List<string> SplitZipAnyStateAddress(this string s)
{
if (!s.Contains(' ')) return null;
var zip = s.Substring(0, s.IndexOf(' '));
var state = s.Substring(s.LastIndexOf(' ') + 1);
var middle = s.Substring(zip.Length + 1, s.Length - state.Length - zip.Length - 2);
return state.Length == 2 && state.All(p => Char.IsUpper(p)) ?
new List<string>() { zip, middle, state } :
new List<string>() { zip, string.Format("{0} {1}", middle, state) };
}
Results:
StringRegUtils.SplitZipAnyStateAddress("98225-2077 Bellingham WA");
// => [0] 98225-2077 [1] Bellingham [2] WA
StringRegUtils.SplitZipAnyStateAddress("98225-2077 Bellin gham");
// => [0] 98225-2077 [1] Bellin gham
StringRegUtils.SplitZipAnyStateAddress("98225-2077 New Delhi CA");
// => [0] 98225-2077 [1] New Delhi [2] CA
REGEX
If not, you can use my intial regex suggestion (I think a ? got lost):
^(?<zip>\d+-\d+)\s+(?<city>.*?)(?:\s+(?<state>[A-Z]{2}))?$
See the regex demo
Details:
^ - start of string
(?<zip>\d+-\d+) - 1+ digits followed with - followed with 1+ digits
\s+ - 1+ whitespaces
(?<city>.*?) - 0+ characters other than a newline as few as possible up to the
(?:\s+(?<state>[A-Z]{2}))? - optional (1 or 0) occurrences of
\s+ - 1+ whitespaces
(?<state>[A-Z]{2}) - exactly 2 uppercase ASCII letters
$ - end of string
I have my regex codes to parse this out on my email body.
Building: {building number} // new line
Level: {level of building} // new line
Phase: {phase or room number} // new line
Request: {your request}
Example:
Building: 1
Level: 2
Phase: 20
Request: Get 4 chairs
Here's my regex:
string re1 = "(Building)"; // Word 1
string re2 = "(:)"; // Any Single Character 1
string re3 = "(\\s+)"; // White Space 1
string re4 = "(\\d)"; // Any Single Digit 1
string re5 = "(\\n)"; // White Space 2
string re6 = "(Level)"; // Word 2
string re7 = "(:)"; // Any Single Character 2
string re8 = "(\\s+)"; // White Space 3
string re9 = "(\\d)"; // Any Single Digit 2
string re10 = "(\\n)"; // White Space 4
string re11 = "(Phase)"; // Word 3
string re12 = "(:)"; // Any Single Character 3
string re13 = "(\\s+)"; // White Space 5
string re14 = "(\\d+)"; // Integer Number 1
string re15 = "(\\n)"; // White Space 6
string re16 = "(Request)"; // Word 4
string re17 = "(:)"; // Any Single Character 4
string re18 = "(\\s+)"; // White Space 7
string re19 = "(\\s+)"; // Match Any
Regex r = new Regex(re1 + re2 + re3 + re4 + re5 + re6 + re7 + re8 + re9 + re10 + re11 + re12 + re13 + re14 + re15 + re16 + re17 + re18 + re19, RegexOptions.Multiline);
Match m = r.Match(body);
if (m.Success) {
blah blah blah
} else {
blah blah
}
The problem is even if the format (email body) is correct, it's still not matching my regex and it's not storing on my database.
Is my regex correct?
First, there are some useless complications that prevents from matching. This answer sums up the suggestions made in the comments to try to improve your regexp.
Then, your regexp is making groups of everything because of the parenthesis. While this is not especially problematic, this is totally useless. If you want though, you could match the values passed in the mail, but this is totally optional. This would be the result regex:
Building:\s(\d)\s*Level:\s(\d)\s*Phase:\s(\d+)\s*Request:\s(.*)
You can try it here, at Regex101 and see the grouping results of the regular expression.
If you want to retrieve the values, you can use a Matcher.
The result java code, with escaped characters, would be the following:
String regex = "Building:\\s(\\d)\\s*Level:\\s(\\d)\\s*Phase:\\s(\\d+)\\s*Request:\\s(.*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(body);
if (matcher.matches()) {
// There could be exceptions here at runtime if values in the mail
// are not numbers, handle it any way you want
Integer building = Integer.valueOf(matcher.group(1));
Integer level = Integer.valueOf(matcher.group(2));
Integer phase = Integer.valueOf(matcher.group(3));
String request = matcher.group(4);
}
I would STRONGLY recommend to be very careful with the last input to avoid any kind of SQL injection.
I'm trying to split out a string (at the index) whenever I find the first non alpha or whitespace.
My Regex is really rusty and trying to find some direction on getting this to work.
Example: "Payments Received by 08/14/2015 $0.00" is the string. and I'm able to find the first digit
string alphabet = String.Empty;
string digit = String.Empty;
int digitStartIndex;
Match regexMatch = Regex.Match("Payments Received by 08/14/2015 $0.00", "\\d");
digitStartIndex = regexMatch.Index;
alphabet = line.Substring(0, digitStartIndex);
digit = line.Substring(digitStartIndex);
The problem lies when a string like "Amount This Period + $57.00"
I end up with "Amount This Period + $"
How from using Regex in C#, if I want to also include specific non-alphanumeric characters to check for such as $ + -?
Edit: I'm looking for the output (variables alphabet and digit) in the example above I'm struggling with to be.
"Amount This Period"
"+ $57.00"
To split a string the way you mention, use a regular expression to find the initial alpha/space chars and then the rest.
var s = "Payments Received by 08/14/2015 $0.00";
var re = new Regex("^([a-z ]+)(.+)", RegexOptions.IgnoreCase);
var m = re.Match(s);
if (m.Success)
{
Console.WriteLine(m.Groups[1]);
Console.WriteLine(m.Groups[2]);
}
The ^ is important to find characters at the start.
Ah, then you want this I think:
void Main()
{
var regex = new Regex(#"(.*?)([\$\+\-].*)");
var a = "Payments Received by 08/14/2015 $0.00";
var b = "Amount This Period + $57.00";
Console.WriteLine(regex.Match(a).Groups[1].Value);
Console.WriteLine(regex.Match(a).Groups[2].Value);
Console.WriteLine(regex.Match(b).Groups[1].Value);
Console.WriteLine(regex.Match(b).Groups[2].Value);
}
Outputs:
Payments Received by 08/14/2015
$0.00
Amount This Period
+ $57.00
I have a string that I need to split in an array of string. All the values are delimited by a pipe | and are separated by a comma.
|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||
The array should have the following 8 values after the split
111
2,2
room 1
13'2'' x 13'8''
""
""
""
""
by "" I simply mean an empty string. Please note that the value can also have a comma e.g 2,2. I think probably the best way to do this is through Regex.Split but I am not sure how to write the correct regular expression. Any suggestions or any better way of achieving this will be really appreciated.
You can use Match() to get the values instead of split() as long as the values between the pipe characters don't contain the pipe character itself:
(?<=\|)[^|]*(?=\|)
This will match zero or more non-pipe characters [^|]* which are preceded (?<=\|) and followed by a pipe (?=\|).
In C#:
var input = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var results = Regex.Matches(input, #"(?<=\|)[^|]*(?=\|)");
foreach (Match match in results)
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
EDIT: Since commas always separate the values that are between pipe characters | then we can be sure that the commas used as separators will always appear at odd intervals, so we can only walk the even indexes of the array to get the true values like this:
var input = "|room 1|,|,|,||,||,||,||,||,||";
var results = Regex.Matches(input, #"(?<=\|)[^|]*(?=\|)");
for (int i = 0; i < results.Count; i+=2)
Console.WriteLine("Found '{0}'", results[i].Value);
This can be also used in the first example above.
Assuming all fields are enclosed by a pipe and delimited by a comma you can use |,| as the delimiter, removing the leading and trailing |
Dim data = "|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||"
Dim delim = New String() {"|,|"}
Dim results = data.Substring(1, data.Length - 2).Split(delim, StringSplitOptions.None)
For Each s In results
Console.WriteLine(s)
Next
Output:
111
2,2
room 1
13'2'' x 13'8''
""
""
""
""
No need to use a regex, remove the pipes and split the string on the comma:
var input = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var parts = input.Split(',').Select(x => x.Replace("|", string.Empty));
or
var parts = input.Replace("|", string.Empty).Split(',');
EDIT: OK, in that case, use a while loop to parse the string:
var values = new List<string>();
var str = #"|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";;
while (str.Length > 0)
{
var open = str.IndexOf('|');
var close = str.IndexOf('|', open + 1);
var value = str.Substring(open + 1, open + close - 1);
values.Add(value);
str = open + close < str.Length - 1
? str.Substring(open + close + 2)
: string.Empty;
}
You could try this:
string a = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
string[] result = a.Split('|').Where(s => !s.Contains(",")).Select(s => s.Replace("|",String.Empty)).ToArray();
mmm maybe this work for you:
var data = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var resultArray = data.Replace("|", "").Split(',');
Regards.,
k
EDIT: You can use wildcard
string data = "|111|,|2,2|,|,3|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var resultArray = data.Replace("|,|", "¬").Replace("|", "").Split('¬');
Regards.,
k
Check, if this fits your needs...
var str = "|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
//Iterate through all your matches (we're looking for anything between | and |, non-greedy)
foreach (Match m in Regex.Matches(str, #"\|(.*?)\|"))
{
//Groups[0] is entire match, with || symbols, but [1] - something between ()
Console.WriteLine(m.Groups[1].Value);
}
Though, to find anything between | and |, you might and probably should use [^\|] instead of . character.
At least, for specified use case it gives the result you're expecting.
i am very newbie to c#..
i want program if input like this
input : There are 4 numbers in this string 40, 30, and 10
output :
there = string
are = string
4 = number
numbers = string
in = string
this = string
40 = number
, = symbol
30 = number
, = symbol
and = string
10 = number
i am try this
{
class Program
{
static void Main(string[] args)
{
string input = "There are 4 numbers in this string 40, 30, and 10.";
// Split on one or more non-digit characters.
string[] numbers = Regex.Split(input, #"(\D+)(\s+)");
foreach (string value in numbers)
{
Console.WriteLine(value);
}
}
}
}
but the output is different from what i want.. please help me.. i am stuck :((
The regex parser has an if conditional and the ability to group items into named capture groups; to which I will demonstrate.
Here is an example where the patttern looks for symbols first (only a comma add more symbols to the set [,]) then numbers and drops the rest into words.
string text = #"There are 4 numbers in this string 40, 30, and 10";
string pattern = #"
(?([,]) # If a comma (or other then add it) is found its a symbol
(?<Symbol>[,]) # Then match the symbol
| # else its not a symbol
(?(\d+) # If a number
(?<Number>\d+) # Then match the numbers
| # else its not a number
(?<Word>[^\s]+) # So it must be a word.
)
)
";
// Ignore pattern white space allows us to comment the pattern only, does not affect
// the processing of the text!
Regex.Matches(text, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt =>
{
if (mt.Groups["Symbol"].Success)
return "Symbol found: " + mt.Groups["Symbol"].Value;
if (mt.Groups["Number"].Success)
return "Number found: " + mt.Groups["Number"].Value;
return "Word found: " + mt.Groups["Word"].Value;
}
)
.ToList() // To show the result only remove
.ForEach(rs => Console.WriteLine (rs));
/* Result
Word found: There
Word found: are
Number found: 4
Word found: numbers
Word found: in
Word found: this
Word found: string
Number found: 40
Symbol found: ,
Number found: 30
Symbol found: ,
Word found: and
Number found: 10
*/
Once the regex has tokenized the resulting matches, then we us linq to extract out those tokens by identifying which named capture group has a success. In this example we get the successful capture group and project it into a string to print out for viewing.
I discuss the regex if conditional on my blog Regular Expressions and the If Conditional for more information.
You could split using this pattern: #"(,)\s?|\s"
This splits on a comma, but preserves it since it is within a group. The \s? serves to match an optional space but excludes it from the result. Without it, the split would include the space that occurred after a comma. Next, there's an alternation to split on whitespace in general.
To categorize the values, we can take the first character of the string and check for the type using the static Char methods.
string input = "There are 4 numbers in this string 40, 30, and 10";
var query = Regex.Split(input, #"(,)\s?|\s")
.Select(s => new
{
Value = s,
Type = Char.IsLetter(s[0]) ?
"String" : Char.IsDigit(s[0]) ?
"Number" : "Symbol"
});
foreach (var item in query)
{
Console.WriteLine("{0} : {1}", item.Value, item.Type);
}
To use the Regex.Matches method instead, this pattern can be used: #"\w+|,"
var query = Regex.Matches(input, #"\w+|,").Cast<Match>()
.Select(m => new
{
Value = m.Value,
Type = Char.IsLetter(m.Value[0]) ?
"String" : Char.IsDigit(m.Value[0]) ?
"Number" : "Symbol"
});
Well to match all numbers you could do:
[\d]+
For the strings:
[a-zA-Z]+
And for some of the symbols for example
[,.?\[\]\\\/;:!\*]+
You can very easily do this like so:
string[] tokens = Regex.Split(input, " ");
foreach(string token in tokens)
{
if(token.Length > 1)
{
if(Int32.TryParse(token))
{
Console.WriteLine(token + " = number");
}
else
{
Console.WriteLine(token + " = string");
}
}
else
{
if(!Char.isLetter(token ) && !Char.isDigit(token))
{
Console.WriteLine(token + " = symbol");
}
}
}
I do not have an IDE handy to test that this compiles. Essentially waht you are doing is splitting the input on space and then performing some comparisons to determine if it is a symbol, string, or number.
If you want to get the numbers
var reg = new Regex(#"\d+");
var matches = reg.Matches(input );
var numbers = matches
.Cast<Match>()
.Select(m=>Int32.Parse(m.Groups[0].Value));
To get your output:
var regSymbols = new Regex(#"(?<number>\d+)|(?<string>\w+)|(?<symbol>(,))");
var sMatches = regSymbols.Matches(input );
var symbols = sMatches
.Cast<Match>()
.Select(m=> new
{
Number = m.Groups["number"].Value,
String = m.Groups["string"].Value,
Symbol = m.Groups["symbol"].Value
})
.Select(
m => new
{
Match = !String.IsNullOrEmpty(m.Number) ?
m.Number : !String.IsNullOrEmpty(m.String)
? m.String : m.Symbol,
MatchType = !String.IsNullOrEmpty(m.Number) ?
"Number" : !String.IsNullOrEmpty(m.String)
? "String" : "Symbol"
}
);
edit
If there are more symbols than a comma you can group them in a class, like #Bogdan Emil Mariesan did and the regex will be:
#"(?<number>\d+)|(?<string>\w+)|(?<symbol>[,.\?!])"
edit2
To get the strings with =
var outputLines = symbols.Select(m=>
String.Format("{0} = {1}", m.Match, m.MatchType));