Regex for Multiple Key:Value Search Terms - c#

I need a .NET (C#) regular expression for parsing a string of search terms. The terms are key:value pairs and are delimited by spaces. The thing that's throwing me for a loop is the fact that the key:value pairs may have spaces in the value.
Here's an example string:
f:john l:smith c:san francisco st:ca
I expect to get back the following terms:
f:john
l:smith
c:san francisco
st:ca
Any help? Thanks.

I think that this one will work. It uses a lookahead to make sure that the last word doesn't have a : terminating it.
\b\w+:[\w\s]+\b(?!:)

This is my try:
([\w]+)\:([\w\s]+)\s(?=([\w]+)\:)?
2 caveats:
Each match will have three captures in it. Ignore the last one.
The input text must have a space at the end.

Related

Regex lookbeaind only when contains colon

Today I use c# Regex.IsMatch function to matching key:value format.
I have some code that checking if string format is: key:value (like: H:15).
The Regex pattern that I am using today is: [D,H,M,S]:[1-9]+\d?
I what to add the option for default key, when the input is 15, I would like to consider it like: H:15
So, I need to improve my Regex to support key:value or only value (without colon), H:15 is good and 15 is also good
I tried to use the or regex condition (|) something like : ([D,H,M,S]:[1-9]+\d?)|([1-9]+\d?)
But now it match more thinks like :1 and H:01 that are bad input for me.
I try to use also lookbehind regex without success
Any help would be greatly appreciated,
Nadav.
This should do the trick:
\b(?:[DHMS]:|(?<!:))[1-9][0-9]*\b
Demo
So, either match [DHMS]: or a word boundary not preceded by :.
Also, [1-9]+\d? looks very suspicious to me, so I replaced it with [1-9][0-9]*. Note that in .NET \d is not equivalent to [0-9] because it includes Unicode digits as well.
Looks like Avinash just beat me to it, but I added word boundaries with this expression, which works well in tests.
\b(?<=[DHMS]:)?[1-9]\d*\b
Seems like you wants something like this,
#"^(?:[DHMS]:)?[1-9]\d*$"
[DHMS] matches a single character from the given list. ? after the non-capturing group will turn the key part to an optional one. \d* matches zero or more digit characters.

Regular expression for tokenizing connection string

I have 2 different types of connection strings (because of legacy reasons that I can't fix everywhere for various reasons which are irrelevant here). I need to break them up into key/value pairs. Here are the sample connection strings:
1. Server=SomeServer;Database=SomeDatabase;Something=Hello
2. Server=SomeServer,Database=SomeDatabase;Something=Hello
3. Server=SomeServer,1111;Database=SomeDatabase;Something=Hello
For the first 2 cases, I can use the regex:
(?<Key>[0-9A-z\s]+)=(?<Val>[0-9A-z\s,]+?[0-9A-z\s]+)
For the third one, I can use the regex:
(?<Key>[0-9A-z\s]+)=(?<Val>[0-9A-z\s]+?[0-9A-z\s,]+)
How do I turn this into one regex that would work for all cases?
You can just use the below regex
(?<Key>[^=;]+)=(?<Val>[^;]+)
What the above uses is negated character class. [^;]+ will select everything till the first ; it encounters.
DEMO (I've removed the named groups for testing. It would work well in C#, however)
Here is my suggestion.
(?<key>[^=;,]+)=(?<val>[^;,]+(,\d+)?)
The semicolon is a delimiter as is the comma if it is not immediately followed by numbers.

how to match multiple words in string using regex?

I am trying to match 3 words that can appear anywhere in the string:
Win
Enter
Now
All 3 words must exist in the string for it return as a match. But I am having issues for getting a match when all 3 words do exist.
Below is the regex I am using: http://regexr.com/39b83
^(?=.*?win)(?=.*?(enter))(?=.*?(now)).*
Regex is working when all three words are within the same line... when its spread out across the entire string on different lines, it is failing to match.
Any direction or help is appreciated.
Since you don't want to match words like center (with the word "enter"), I would use:
/(\benter\b)|(\bwin\b)|(\bnow\b)/
Link to Fiddler
I think C# would support (?s) DOTALL modifier. If yes then you could try the below regex,
(?i)(?s)win.*?enter.*?now
How about...
/(win|enter|now)/gi
It sounds like you want to match the lines on which these words appear, across up to three lines. That’s not really easy, but:
/^.*win.*(?:\s+.*)?enter.*(?:\s+.*)?now.*|^.*win.*(?:\s+.*)?now.*(?:\s+.*)?enter.*|^.*enter.*(?:\s+.*)?win.*(?:\s+.*)?now.*|^.*enter.*(?:\s+.*)?now.*(?:\s+.*)?win.*|^.*now.*(?:\s+.*)?win.*(?:\s+.*)?enter.*|^.*now.*(?:\s+.*)?enter.*(?:\s+.*)?win.*/igm
should do it.
It 's because the dot doesn't match the newline character. To change this, you have to ways. The first, use the s modifier (that allows the dot to match newlines):
(?s)^(?=.*\bwin\b)(?=.*\benter\b)(?=.*\bnow\b).*
But this feature isn't always available (for example in Javascript). The second way consists to replace the dot with [\s\S] (a character class that matches all the characters):
^(?=[\s\S]*\bwin\b)(?=[\s\S]*\benter\b)(?=[\s\S]*\bnow\b)[\s\S]+

Regex for key value pairs

I am not great with regular expressions and I have a need to parse out key/value pairs from a string. An example of the string would be:
Event Name CallingNumber:+15555555555 CallID:12345 CallingName:Doe, John CallingTime:12-26-2013 14:27:41.645497
The result I'm looking for would be something like this:
CallingNumber=+15555555555
CallID=12345
CallingName=Doe, John
CallingTime=12-26-2013 14:27:41.645497
The key/value pairs are delimited by a space, but the value is allowed to have a space in it (ex: Doe, John). It would be nice if the values were surrounded by quotes or something, but they are not. Essentially I'm trying to match a word without a space followed by a colon and then any character after the colon until it reaches another word without a space followed by a colon.
Your match is impossible, the fields are delimited with : but you have a date with : in there, as well, Regex can't really distinguish those very easily.
Still, this is what I came up with:
(.+?):(.+?)(?=(?:[^\s]+:)|(?:$))
Again, beacuse of the date, this won't work perfectly.
Here's a fiddle to demonstrate: http://www.rexfiddle.net/Wm3NiK0
Edit: If your "keys" are only letters (not numbers), which avoids the time/date problem, then this will work:
([A-Za-z]+?):(.+?)\s?(?=(?:[A-Za-z]+:)|(?:$))
Here's another fiddle to demonstrate this: http://www.rexfiddle.net/sGQs7YV
You can apply the regex repeatedly, with a (.*) to return the "yet to be parsed" remainder
In pseudocode form, this might be:
match string to "^(([^:]*\s)*[^:]*)\s+(.*)$"
should grab "Event Name" and leave the rest as $3
loop:
keep only $3 as new base string
match new base string to "^(\w+)[:](.+?)\s+(\w+[:].*)$"
key = $1, value = $2, new remainder = $3
repeat until no $1, $2 values are returned
"I'm suing .NET (c#)," good idea! :) Microsoft needs to be put in its place!
Do you have a fixed number of fields, or could they vary in number? Do you expect the same fields each time? In the same order? If a fixed number, you could hard code the number of fields in the regexp, but I still think that trying to do it with just one regexp is asking for a headache. Use some scripting code and break it down piece by piece, first of all splitting it on :\s+. The last word in a group is then stripped off as the name of the next group, and the remainder is the value of the previous group. The first and last groups have to have some special treatment. I think that would be a lot easier and more understandable than trying to do it in one ugly regexp. As a bonus, any number of fields in any order could be handled.

The simple regex expression [0-9]* does not work with {e|24} in C#

I've been working on my own simple Wikipedia parser in C#. So far I'm not getting very far because of this problem.
I extract the string {e|24} but it could contain any number. All I want to do is simply extract the number from this string.
This is the code I am using currently:
Match num = Regex.Match(exp.Value, "[0-9]*");
Console.WriteLine(num.Value);
However num.Value is blank.
Can someone please explain why this is not working and how I can fix it?
You would want to use [0-9]+ to ensure at least one number. [0-9]* allows it to be matched 0 times or more, thus getting blanks
My suggestion, make the regexp: \d+
Works. Simpler. Shorter, uses no groups or ranges.

Categories