Google-like Search query spliting Regex

Google-like Search query spliting Regex - c#

I have search query
string input = "FirstName=\"xy z\" LastName=\"Huber\"";
would like to use Regex to split it
I would like to have a string array with the following tokens:
FirstName=\"xy z\"
LastName=\"Huber\"
As you can see, the tokens preserve the spaces with in double quotes
my regex
("[^"]+"|\w+)\s*
nearly not want I want..need to fix more it gets
FirstName= \"xy z\" LastName = \"Huber\"

Altering I4V's answer to match the OP requirements
It seems the OP wants the strings FirstName=\"xy z\" and LastName=\"Huber\" rather than a key value pair so the solution is to simply use the matches from I4V regex.
string input = "FirstName=\"xy z\" LastName=\"Huber\"";
var matches = Regex.Matches(input, #"(\w+?)=\""(.+?)\""")
.OfType<Match>()
.Select(x => x.Value)
.ToArray();
This will give you a string array of the values.
EDIT For the specific case asked by OP
string input = "FirstName=\"xy z\" LastName=\"Huber\"";
var matches = Regex.Matches(input, #"[^\s\""]+(?:\"".*?\"")?")
.OfType<Match>()
.Select(x => x.Value)
.ToArray();

Related

Split on numeric to letters excluding comma

I have a string containing "0,35mA" I now have the code below, which splits "0,35mA" into
"0"
","
"35"
"mA"
List<string> splittedString = new List<string>();
foreach (string strItem in strList)
{
splittedString.AddRange(Regex.Matches(strItem, #"\D+|\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToList());
}
What I want is the code to be splitted into
"0,35"
"mA"
How do I achieve this?

It looks like you want to tokenize the string into numbers and everything else.
A better regex approach is to split with a number matching pattern while wrapping the whole pattern into a capturing group so as to also get the matching parts into the resulting array.
Since you have , as a decimal separator, you may use
var results = Regex.Split(s, #"([-+]?[0-9]*,?[0-9]+(?:[eE][-+]?[0-9]+)?)")
.Where(x => !string.IsNullOrEmpty(x))
.ToList();
See the regex demo:
The regex is based on the pattern described in Matching Floating Point Numbers with a Regular Expression.
The .Where(x => !string.IsNullOrEmpty(x)) is necessary to get rid of empty items (if any).

I assume that all your strings will have the same format.
So, try using this regex:
string regex = "([\\d|,]{4})|[\\w]{2}";
It should work.

var st = "0,35mA";
var li = Regex.Matches(st, #"([,\d]+)([a-zA-z]+)").Cast<Match>().ToList();
foreach (var t in li)
{
Console.WriteLine($"Group 1 {t.Groups[1]}")
Console.WriteLine($"Group 2 {t.Groups[2]}");
}
Group 1 0,35
Group 2 mA

Way to not include something in regex capture group

Given:
var input = "test <123>";
Regex.Matches(input, "<.*?>");
Result:
<123>
Gives me the result I want but includes the angle brackets. Which is ok because I can easily do a search and replace. I was just wondering if there was a way to include that in the expression?

You need to use a capturing group:
var input = "test <123>";
var results = Regex.Matches(input, "<(.*?)>")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
The m.Groups[1].Value lets you get the capturing group #1 value.
And a better, more efficient regex can be <([^>]*)> (it matches <, then matches and captures into Group 1 any zero or more chars other than > and then just matches >). See the regex demo:

c# regex replace everything including a word base64, with nothing and keeping rest of the string

I am wanting to take a string and find base64, and get rid of that and everything prior to that
example
"asdfjljlkjaldf_base64,234u0909230948098234082304802384023094"
Notice "base64," ... I want ONLY everything after "base64,"
Desired: "234u0909230948098234082304802384023094"
I was looking at this code
"string test = "hello, base64, matching";
string regexStrTest;
regexStrTest = #"test\s\w+";
MatchCollection m1 = Regex.Matches(base64,, regexStrTest);
//gets the second matched value
string value = m1[1].Value;
but that is not quite what I want..

Why regular expressions? IndexOf + Substring seems to be quite enough:
string source = "asdfjljlkjaldf_base64,234u0909230948098234082304802384023094";
string tag = "base64,";
string result = source.Substring(source.IndexOf(tag) + tag.Length);

You tried a regex that matches test, a whitespace, and 1+ word chars. The input string just did not match it.
You may use
var results = Regex.Matches(s, #"base64,(\w+)")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
See the regex demo.
The pattern matches base64, substring and then captures into Group 1 one or more word chars with (\w+) pattern. The captured value is stored inside match.Groups[1].Value, just what you get with .Select(m => m.Groups[1].Value).

Some of the other answers are good. Here is a very simple regex
string yourData = "asdfjljlkjaldf_base64,234u0909230948098234082304802384023094";
var newString = Regex.Replace(yourData, "^.*base64,", "");

Parse a string, keeping all of the matches in between given strings (multi-character delimiters)

This is very similar to the question here: How do I extract text that lies between parentheses (round brackets)? which I see this Regex code:
var matches = Regex.Matches("User name [[sales]] and [[anotherthing]]", #"\[\[([^)]*)\]\]");
But that doesn't seem to work with multi-character delimiters? This might not even be the correct way to go, but I am sure I am not the first to try this and I am drawing a blank here - anyone?

Your #"\[\[([^)]*)\]\]" pattern matches two consecutive [[, followed with zero or more characters other than a ) and then followed with two ]]. That means, if you have a ) inside [[...]], there won't be a match.
To deal with multicharacter-delimited substrings, you can use 2 things: either lazy dot matching, or unrolled patterns.
Note: to get multiple matches, use Regex.Matches as I wrote in my other answer.
1. Lazy dot solution:
var s = "User name [[sales]] and [[anotherthing]]";
var matches = Regex.Matches(s, #"\[{2}(.*?)]{2}", RegexOptions.Singleline)
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
See the regex demo. The RegexOptions.Singleline modifier is necessary for the . to match newline symbols.
2. Unrolled regex solution:
var s = "User name [[sales]] and [[anotherthing]]";
var matches = Regex.Matches(s, #"\[{2}([^]]*(?:](?!])[^]]*)*)]{2}")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
With this one, RegexOptions.Singleline is not necessary, and it is much more efficient and faster.
See regex demo

Use Regex.Matches:
Searches the specified input string for all occurrences of a specified regular expression.
Sample code:
var matches = Regex.Matches("User name (sales) and (anotherthing)", #"\(([^)]*)\)")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();

one long string => array/list<string> from text within quotes

3>>asdf3424"THIS TEXT".,.<<<>>3asfdf"THISTOO"6575tsdfbxbxcv"ANDTHIS",,p-01fa
To an array or list of { "THIS TEXT", "THISTOO, "ANDTHIS" }
Does anyone have an idea on how to efficiently do this?

var result = Regex.Matches(input, #"\"".+?\""")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();

If you read each character at a time and look for a quotation mark, then read the following into a char array until you find a another quotation mark, then continue looking for one, you can have a list of char arrays that are easily transferable to string.
It should just be a simple while(still characters to be read).

If you have some big string maybe like this :
string str = "hello,hi,bye";
you may split it by comma something like this:
string[] breakups = str.Split(new[] {',' });

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Google-like Search query spliting Regex - c#

Related

Split on numeric to letters excluding comma

Way to not include something in regex capture group

c# regex replace everything including a word base64, with nothing and keeping rest of the string

Parse a string, keeping all of the matches in between given strings (multi-character delimiters)

one long string => array/list<string> from text within quotes

Categories

Resources