How to Split OData multi-level expand query string in C#? - c#

I have a URL: Expand=User($select=Firstname,Lastname),Organisation,Contract($Expand=MyOrganisation($select=Name,Status),Organisation),List
I need to split this string in the below format:
User($select=Firstname,Lastname)
Organisation
Contract($Expand=MyOrganisation($select=Name,Status),Organisation)
List
How to achieve this functionality in C#?

You not only need to split the string but also keep track of the parentheses while splitting. This is not possible with just plain old regex. See this post.
However, the splitting can be achieved with some advanced RegEx; .NET fortunately supports balancing groups using which you can keep track of the parentheses. This answer was quite helpful in coming up with a solution. For readability, I have split the regex into multiple lines and used RegexOptions.IgnorePatternWhitespace:
string url = "User($select=Firstname,Lastname),Organisation,Contract($Expand=MyOrganisation($select=Name,Status),Organisation),List";
Regex rgx = new Regex(
#"(.+?)
(
(
\(
(?:
[^()]
|
(?'open'\()
|
(?'close-open'\))
)+
(?(open)(?!))
\)
)
,
|
,
|
\b$
)",
RegexOptions.IgnorePatternWhitespace);
foreach(var match in rgx.Matches(url))
{
Console.WriteLine($"{match.Groups[1]} {match.Groups[3]}");
}
The field will be available as match.Groups[1] and the parameters, if any will be available as match.Groups[3](this will be an empty string if there are no parameters). You can access match.Groups[0] to get the entire group.
Regex Breakdown
Regex
Description
(.+?)
Non-greedily match one or more characters
\( and \)
Match an actual ( and )
[^()]
Match any character that is not a ( or )
(?'open'\()
Create a named group with the name "open" and match a ( character
(?'close-open\))
Create a group "close" and assign the interval between "open" and "close" to "close" and delete group "open"
(?(open)(?!))
Assert if the "open" group is not deleted
(?:[^()]|(?'open'\()|(?'close-open'\)))+
Create a non-capturing group and match one or more characters that match one of the expressions between |

More likely you have to use an ODataLib with in-built URI Parser
Uri requestUri = new Uri("Products?$select=ID&$expand=ProductDetail" +
"&$filter=Categories/any(d:d/ID%20gt%201)&$orderby=ID%20desc" +
"&$top=1&$count=true&$search=tom",
UriKind.Relative);
ODataUriParser parser = new ODataUriParser(model, serviceRoot, requestUri);
SelectExpandClause expand = parser.ParseSelectAndExpand(); // parse $select, $expand
FilterClause filter = parser.ParseFilter(); // parse $filter
OrderByClause orderby = parser.ParseOrderBy(); // parse $orderby
SearchClause search = parser.ParseSearch(); // parse $search
long? top = parser.ParseTop(); // parse $top
long? skip = parser.ParseSkip(); // parse $skip
bool? count = parser.ParseCount(); // parse $count
Adding the RegExp option (the fixed version of what Amal has provided below)
string url = "User($select=Firstname,Lastname),Organisation,Contract($Expand=MyOrganisation($select=Name,Status),Organisation),List";
Regex rgx = new Regex(#"(.+?)(?:(\(.*?\)),|,)");
foreach (var match in rgx.Matches($"{url},"))
{
Console.WriteLine(match.ToString()[..^1]);
}

Related

Negative lookahead in Regex to exclude two words

I have the following regex:
(?!SELECT|FROM|WHERE|AND|OR|AS|[0-9])(?<= |^|\()([a-zA-Z0-9_]+)
that I'm matching against a string like this:
SELECT Static AS My_alias FROM Table WHERE Id = 400 AND Name = 'Something';
This already does 90% of what I want. What I also like to do is to exclude AS My_alias, where the alias can be any word.
I tried to add this to my regex, but this didn't work:
(?!SELECT|FROM|WHERE|AND|OR|AS [a-zA-Z0-9_]+|[0-9])(?<= |^|\()([a-zA-Z0-9_]+)
^^^^^^^^^^^^^^^^
this is the new part
How can I exclude this part of the string using my regex?
Demo of the regex can be found here
This excludes the AS and gets the tokens you seek. It also handles multiple select values, along zero to many Where clauses.
The thought is to use named explicit captures, and let the regex engine know to disregard any non-named capture groups. (A match but don't capture feature)
We will also put all the "tokens" wanted into one token captures (?<Token> ... ) for all of our token needs.
var data = "SELECT Static AS My_alias FROM Table WHERE Id = 400 AND Name = 'Something';";
var pattern = #"
^
SELECT\s+
(
(?<Token>[^\s]+)
(\sAS\s[^\s]+)?
[\s,]+
)+ # One to many statements
FROM\s+
(?<Token>[^\s]+) # Table name
(
\s+WHERE\s+
(
(?<Token>[^\s]+)
(.+?AND\s+)?
)+ # One to many conditions
)? # Optional Where
";
var tokens =
Regex.Matches(data, pattern,
RegexOptions.IgnorePatternWhitespace // Lets us space out/comment pattern
| RegexOptions.ExplicitCapture) // Only consume named groups.
.OfType<Match>()
.SelectMany(mt => mt.Groups["Token"].Captures // Get the captures inserted into `Token`
.OfType<Capture>()
.Select(cp => cp.Value.ToString()))
;
tokens is an array of these strings: { "Static", "Table", "Id", "Name" }
This should get you going on most of the cases of what will find. Use similar logic if you need to process selects with joins; regardless this is a good base to work from going forward.

C# RegEx - get only first match in string

I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.
Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.
depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"
I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"
Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>
Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/

C# regex pattern to extract urls from given string - not full html urls but bare links as well

I need a regex which will do the following
Extract all strings which starts with http://
Extract all strings which starts with www.
So i need to extract these 2.
For example there is this given string text below
house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue
So from the given above string i will get
www.monstermmorpg.com
http://www.monstermmorpg.com
http://www.monstermmorpg.commerged
Looking for regex or another way. Thank you.
C# 4.0
You can write some pretty simple regular expressions to handle this, or go via more traditional string splitting + LINQ methodology.
Regex
var linkParser = new Regex(#"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
MessageBox.Show(m.Value);
Explanation
Pattern:
\b -matches a word boundary (spaces, periods..etc)
(?: -define the beginning of a group, the ?: specifies not to capture the data within this group.
https?:// - Match http or https (the '?' after the "s" makes it optional)
| -OR
www\. -literal string, match www. (the \. means a literal ".")
) -end group
\S+ -match a series of non-whitespace characters.
\b -match the closing word boundary.
Basically the pattern looks for strings that start with http:// OR https:// OR www. (?:https?://|www\.) and then matches all the characters up to the next whitespace.
Traditional String Options
var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
MessageBox.Show(s);
Using Nikita's reply, I get the url in string very easy :
using System.Text.RegularExpressions;
string myString = "test =) https://google.com/";
Match url = Regex.Match(myString, #"http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?");
string finalUrl = url.ToString();
Does not work with html containing URL
For e.g.
<table><tr><td class="sub-img car-sm" rowspan ="1"><img src="https://{s3bucket}/abc/xyzxyzxyz/subject/jkljlk757cc617-a560-48f5-bea1-f7c066a24350_202008210836495252.jpg?X-Amz-Expires=1800&X-Amz-Algorithm=abcabcabc&X-Amz-Credential=AKIAVCAFR2PUOE4WV6ZX/20210107/ap-south-1/s3/aws4_request&X-Amz-Date=20210107T134049Z&X-Amz-SignedHeaders=host&X-Amz-Signature=3cc6301wrwersdf25fb13sdfcfe8c26d88ca1949e77d9e1d9af4bba126aa5fa91a308f7883e"></td><td class="icon"></td></tr></table>
For that need to use below Regular Expression
Regex regx = new Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);

RegEx replace query to pick out wiki syntax

I've got a string of HTML that I need to grab the "[Title|http://www.test.com]" pattern out of e.g.
"dafasdfasdf, adfasd. [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad"
I need to replace "[Title|http://www.test.com]" this with "http://www.test.com/'>Title".
What is the best away to approach this?
I was getting close with:
string test = "dafasdfasdf adfasd [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad ";
string p18 = #"(\[.*?|.*?\])";
MatchCollection mc18 = Regex.Matches(test, p18, RegexOptions.Singleline | RegexOptions.IgnoreCase);
foreach (Match m in mc18)
{
string value = m.Groups[1].Value;
string fulltag = value.Substring(value.IndexOf("["), value.Length - value.IndexOf("["));
Console.WriteLine("text=" + fulltag);
}
There must be a cleaner way of getting the two values out e.g. the "Title" bit and the url itself.
Any suggestions?
Replace the pattern:
\[([^|]+)\|[^]]*]
with:
$1
A short explanation:
\[ # match the character '['
( # start capture group 1
[^|]+ # match any character except '|' and repeat it one or more times
) # end capture group 1
\| # match the character '|'
[^]]* # match any character except ']' and repeat it zero or more times
] # match the character ']'
A C# demo would look like:
string test = "dafasdfasdf adfasd [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad ";
string adjusted = Regex.Replace(test, #"\[([^|]+)\|[^]]*]", "$1");

How write a regex with group matching?

Here is the data source, lines stored in a txt file:
servers[i]=["name1", type1, location3];
servers[i]=["name2", type2, location3];
servers[i]=["name3", type1, location7];
Here is my code:
string servers = File.ReadAllText("servers.txt");
string pattern = "^servers[i]=[\"(?<name>.*)\", (.*), (?<location>.*)];$";
Regex reg = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline);
Match m;
for (m = reg.Match(servers); m.Success; m = m.NextMatch()) {
string name = m.Groups["name"].Value;
string location = m.Groups["location"].Value;
}
No lines are matching. What am I doing wrong?
If you don't care about anything except the servername and location, you don't need to specify the rest of the input in your regex. That lets you avoid having to escape the brackets, as Graeme correctly points out. Try something like:
string pattern = "\"(?<name>.+)\".+\s(?<location>[^ ]+)];$"
That's
\" = quote mark,
(?<name> = start capture group 'name',
.+ = match one or more chars (could use \w+ here for 1+ word chars)
) = end the capture group
\" = ending quote mark
.+\s = one or more chars, ending with a space
(?<location> = start capture group 'location',
[^ ]+ = one or more non-space chars
) = end the capture group
];$ = immediately followed by ]; and end of string
I tested this using your sample data in Rad Software's free Regex Designer, which uses the .NET regex engine.
I don't know if C# regex's are the same as perl, but if so, you probably want to escape the [ and ] characters. Also, there are extra characters in there. Try this:
string pattern = "^servers\[i\]=\[\"(?<name>.*)\", (.*), (?<location>.*)\];$";
Edited to add: After wondering why my answer was downvoted and then looking at Val's answer, I realized that the "extra characters" were there for a reason. They are what perl calls "named capture buffers", which I have never used but the original code fragment does. I have updated my answer to include them.
try this
string pattern = "servers[i]=[\"(?<name>.*)\", (.*), (?<location>.*)];$";

Categories