Parsing syntactic string in C#

Parsing syntactic string in C# - c#

I am trying to parse a header and create method stubs from the interface/method declarations.
I want to take c++ com method declarations like this:
STDMETHOD(GetCubeMapSurface)(THIS_ D3DCUBEMAP_FACES FaceType,UINT Level,IDirect3DSurface9** ppCubeMapSurface) PURE;
Then modify it to generate a c++ method stub from it like this:
HRESULT __stdcall WrapIDirect3DCubeTexture9::GetCubeMapSurface(D3DCUBEMAP_FACES FaceType, UINT Level, IDirect3DSurface9 * * ppCubeMapSurface)
{
}
I am a little unsure if I should be using regex for this or using .net string functions, and I am confused on how exactly to implement it either way.
I have quite a few methods to do, so creating a tool seems like the right thing to do.
Can anyone help guide me in the right direction?
EDIT: I should have added that I was looking for some help on how I should be implementing it. I wasn't sure if I should be tokenizing all words/special chars and empty spaces and just go from there, using a regex like this and then just parsing and processing with it broken up.
"(\d[x0-9a-fA-F.UL]*|\w+|\s+|"[^"]*"|.)"
Although now it seems like overkill and that I was over analyzing this whole thing. I ended up quickly creating an implementation with .net string functions, and then seen that Caesay helped me out in the regex direction. So I came up with two implementations.
I have decided I will go with the regex implementation. Since I will be doing some other advanced processing and parsing, and regex would make that easier. The implementations are below.
String based implementation:
if (line.StartsWith(" STDMETHOD"))
{
string newstr = line.Replace(" STDMETHOD(", "HRESULT __stdcall WrapIDirect3DCubeTexture9::");
newstr = StringExtensions.RemoveFirst(newstr, ")");
newstr = newstr.Replace("THIS_ ", "");
newstr = newstr.Replace(" PURE;", Environment.NewLine + "{ " + Environment.NewLine + Environment.NewLine + "}");
textBox2.AppendText(newstr + Environment.NewLine);
}
String extension class taken from(C# - Simplest way to remove first occurrence of a substring from another string):
public static class StringExtensions
{
public static string RemoveFirst(this string source, string remove)
{
int index = source.IndexOf(remove);
return (index < 0)
? source
: source.Remove(index, remove.Length);
}
}
Now for the Regex implementation:
if (line.StartsWith(" STDMETHOD"))
{
Regex regex = new Regex(#"\(.*?\)");
MatchCollection matches = regex.Matches(line);
string newstr = String.Format(#"HRESULT __stdcall WrapIDirect3DCubeTexture9::{0}({1})", matches[0].Value.Trim('(', ')'), matches[1].Value.Trim('(', ')'));
newstr = newstr.Replace("THIS_ ", "");
textBox2.AppendText(newstr + Environment.NewLine + "{" + Environment.NewLine + Environment.NewLine + "}" + Environment.NewLine);
}

I will write you some code to help get you started.
If you start with a minimal output string containing the variables, it will be easier to see what needs to be done, so:
String.Format(#"HRESULT __stdcall WrapIDirect3DCubeTexture9::{0}({1})
{{
}}", "methodName", "arguments");
Here we can see there are two items we need to extract from the original string, the method name - and the arguments. I would suggest using a regex to match what is in the parenthesis in the original string. This will give you two matches, the method name - and the arguments. You will need to do post-processing on the arguments string but this will give an idea.

Related

how to convert char #"\" to Escape String \ by C#

I have grabbed some data from a website.A string which is named as urlresult in the data is "http:\/\/www.cnopyright.com.cn\/index.php?com=com_noticeQuery&method=wareList&optionid=1221&obligee=\u5317\u4eac\u6c83\u534e\u521b\u65b0\u79d1\u6280\u6709\u9650\u516c\u53f8&softwareType=1".
what I want to do is to get rid of the first three char #'\' in the string urlresult above . I have tried the function below:
public string ConvertDataToUrl(string urlresult )
{
var url= urlresult.Split('?')[0].Replace(#"\", "") + "?" + urlresult .Split('?')[1];
return url
}
It returns "http://www.cnopyright.com.cn/index.php?com=com_noticeQuery&method=wareList&optionid=1221&obligee=\\u5317\\u4eac\\u6c83\\u534e\\u521b\\u65b0\\u79d1\\u6280\\u6709\\u9650\\u516c\\u53f8&softwareType=1" which is incorrect.
The correct result is "http://www.cnopyright.com.cn/index.php?com=com_noticeQuery&method=wareList&optionid=1221&obligee=北京沃华创新科技有限公司&softwareType=1"
I have tried many ways,but it hasn't worked.I have no idea how to get the correct result.

I think you may be misled by the debugger because there's no reason that extra "\" characters should get inserted by the code you provided. Often times the debugger will show extra "\" in a quoted string so that you can tell which "\" characters are really there versus which are there to represent other special characters. I would suggest writing the string out with Debug.WriteLine or putting it in a log file. I don't think the information you provided in the question is correct.
As proof of this, I compiled and ran this code:
static void Main(string[] args)
{
var url = #"http:\/\/www.cnopyright.com.cn\/index.php?com=com_noticeQuery&method=wareList&optionid=1221&obligee=\u5317\u4eac\u6c83\u534e\u521b\u65b0\u79d1\u6280\u6709\u9650\u516c\u53f8&softwareType=1";
Console.WriteLine("{0}{1}{2}", url, Environment.NewLine,
url.Split('?')[0].Replace(#"\", "") + "?" + url.Split('?')[1]);
}
The output is:
http:\/\/www.cnopyright.com.cn\/index.php?com=com_noticeQuery&method=wareList&optionid=1221&obligee=\u5317\u4eac\u6c83\u534e\u521b\u65b0\u79d1\u6280\u6709\u9650\u516c\u53f8&softwareType=1
http://www.cnopyright.com.cn/index.php?com=com_noticeQuery&method=wareList&optionid=1221&obligee=\u5317\u4eac\u6c83\u534e\u521b\u65b0\u79d1\u6280\u6709\u9650\u516c\u53f8&softwareType=1

You can use the System.Text.RegularExpressions.Regex.Unescape method:
var input = #"\u5317\u4eac\u6c83\u534e\u521b\u65b0\u79d1\u6280\u6709\u9650\u516c\u53f8";
string escapedText = System.Text.RegularExpressions.Regex.Unescape(input);

How to Regex match a pattern with parentheses in C#

Background: I'm doing some complicated code generation that requires me to extract the methods within a C# interface file. I cannot simply use reflection because this code will feed a T4 template which will not have the compiled code to reflect upon. Thus I am attempting parsing. I can easily make my own parser, but it would be nice if there was a regular expression solution.
Question: Is-there/What regex pattern would match the method declarations (including the return types and parameters) of the string below using C#'s Regular Expressions library?
string testing = #"
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
public interface Service
{
int Test1(int a);
int Test2(int a, int b);
int Test3(
int a,
int b);
int Test4(out int a);
}
}
";
The regex pattern I desire should make four matches:
"int Test1(int a);"
"int Test2(int a, int b);"
"int Test3( int a, int b);" [note: #3 would be multi-line]
"int Test4(out int a);"
Solution Attempt: Here is possibly the closest I have come to a regex solution thus far:
string WhiteSpacePattern = #"\s+";
string PossibleWhiteSpacePattern = #"\s*";
string CsharpWordPattern = #"[a-zA-Z_]+";
string ParenthesesPattern = #"[(][\s\S]*?[)]";
string DoubleCsharpWordPattern = CsharpWordPattern + WhiteSpacePattern + CsharpWordPattern;
string MethodDeclarationPattern =
DoubleCsharpWordPattern +
PossibleWhiteSpacePattern +
ParenthesesPattern;
Pattern usage example:
MatchCollection tests = Regex.Matches(testing, MethodDeclarationPattern);
The individual patterns work perfectly (CsharpWordPattern, ParenthesesPattern, WhiteSpacePattern, and PossibleWhiteSpacePattern). However, when I put them altogether into a single pattern (MethodDeclarationPattern), the full pattern is failing.
How does MethodDeclarationPattern or my usage example need to be altered so that it will start matching the method declarations in the interface code?

To match literal parens, escape them with backslashes:
string ParenthesesPattern = #"\([\s\S]*?\)";
That regex snippet matches a matched pair of parentheses, with optional whitespace between them. You're putting it at the end of your overall regex.
Your complete concatenated regex looks like this:
[a-zA-Z_]+\s+[a-zA-Z_]+\s*[(][\s\S]*?[)]
Identifier, space, identifier, open paren, space, close paren.
For that to match, the method declaration will have to look like this:
"int foo ()"
I believe you'll have better success with something like this:
string openParenPattern = #"\([\s\S]*?";
string closeParenPattern = #"[\s\S]*?\)";
What you really need, conceptually, is this (leaving out space -- no need to clutter it up with that):
identifier
identifier
open paren
((ref|out)? identifier identifier comma)*
((ref|out)? identifier identifier)?
close paren
You know all the syntax for that, I think. You'll have nested groups. Looking at it, I'm really starting to warm up to your idea of putting sub-regexes in string variables and then concatenating them.
The following code matches all four method declarations in your test string:
// This has one bug: It matches "int foo(int a,)"
// Somebody good with regexes could fix that.
var methodPattern =
// return type
identPattern + spacePattern
// method name
+ identPattern + spacePattern
// open paren
+ openParenPattern + spacePattern
// Zero or more parameters followed by commas
+ "(" + paramPattern + spacePattern + "," + spacePattern + ")*" + spacePattern
// Final (or only) parameter not followed by a comma
+ "(" + paramPattern + spacePattern + ")?" + spacePattern
// Close paren
+ closeParenPattern;

Extract sub-string between two certain words right to left side

Example String
This is an important example about regex for my work.
I can extract important example about regex with this (?<=an).*?(?=for) snippet. Reference
But i would like to extract to string right to left side. According to this question's example; first position must be (for) second position must be (an).
I mean extracting process works back ways.
I tried what i want do as below codes in else İf case, but it doesn't work.
public string FnExtractString(string _QsString, string _QsStart, string _QsEnd, string _QsWay = "LR")
{
if (_QsWay == "LR")
return Regex.Match(_QsString, #"(?<=" + _QsStart + ").*?(?=" + _QsEnd + ")").Value;
else if (_QsWay == "RL")
return Regex.Match(_QsString, #"(?=" + _QsStart + ").*?(<=" + _QsEnd + ")").Value;
else
return _QsString;
}
Thanks in advance.
EDIT
My real example as below
#Var|First String|ID_303#Var|Second String|ID_304#Var|Third String|DI_t55
When i pass two string to my method (for example "|ID_304" and "#Var|") I would like to extract "Second String" but this example is little peace of my real string and my string is changeable.

No need for forward or backward lookahead! You could just:
(.*)\san\s.*\sfor\s
The \s demands whitespace, so you don't match an import*an*t.

One potential problem in your current solution is that the string passed in contains special characters, which needs to be escaped with Regex.Escape before concatenation:
return Regex.Match(_QsString, #"(?<=" + Regex.Escape(_QsStart) + ").*?(?=" + Regex.Escape(_QsEnd) + ")").Value;
For your other requirement of matching RL, I don't understand your requirement.

Add to string if string non empty

Sometime I want to join two strings with a space in between. But if second string is null, I don't want the space.
Consider following code:
void AssertFoo(bool cond, string message = null) {
...
Assert.Fail("Something is foo.{0}", message != null ? " " + message : "");
...
}
Is there a more elegant way to do that?

Here is one option that I like. It's better if you already have an IEnumerable<string> with your data, but it's easy enough even if you don't. It also clearly scales well to n strings being joined, not just 1 or two.
string[] myStrings = new string[]{"Hello", "World", null};
string result = string.Join(" ", myStrings.Where(str => !string.IsNullOrEmpty(str)));
Here is another option. It's a bit shorter for this one case, but it's uglier, harder to read, and not as extensible, so I would probably avoid it personally:
//note space added before {0}
Assert.Fail("Something is foo. {0}", message ?? "\b");
In this case we add the space to the format string itself, but if message is null we instead use the backspace character to remove the space that we know is before it in the message.

For newer versions of C# you can use the following extension method:
public static string Prepend(this string value, string prepend) => prepend + value;
It can be used like this:
Assert.Fail("Something is foo.{0}", message?.Prepend(" "));
Added in 2020:
Today I use this:
public static string Surround(this object value, string prepend, string append = null) => prepend + value + append;

Try this:
string joinedString = string.IsNullOrEmpty(message2) ? message1 : message1 + " " + message2;

Assert.Fail("Something is foo.{0}", (" " + message).TrimEnd());
Sure, this will result in a few string object creations, but it's unlikely such micro-optimization issues would matter in the vast majority of programs. It might be considered an advantage of this method that it handles not just null message, but a message of all whitespace as well.

Assert.Fail("Something is foo.{0}", message?.PadLeft(message.Lenght + 1, ' '));

Since C#6 you can use string interpolation like this:
$"Something is foo. {mssg}".TrimEnd();
See it in .NET Fiddle

The most elegant way is to use the inbuilt keyword of String class.
String.IsNullOrEmpty
This way you wont have a problem.

Regular expression to split string into equal length chunks

I have a string which would be delivered to my application in the format below:
ece4241692a1c7434da51fc1399ea2fa155d4fc983084ea59d1455afc79fafed
What I need to do is format it for my database so it reads as follows:
<ece42416 92a1c743 4da51fc1 399ea2fa 155d4fc9 83084ea5 9d1455af c79fafed>
I assume the easiest way to do this would be using regular expressions, but I have never used them before, and this is the first time I have ever needed to, and to be honest, I simply don't have the time to read up on them at the moment, so if anyone could help me with this I would be eternally grateful.

What about:
string input ="ece4241692a1c7434da51fc1399ea2fa155d4fc983084ea59d1455afc79fafed";
string target = "<" + Regex.Replace(input, "(.{8})", "$1 ").Trim() + ">";
Or
string another = "<" + String.Join(" ", Regex.Split(input, "(.{8})")) + ">";

You might just be better served having a small static string parsing method to handle it. A regular expression might get it done, but unless you're doing a bunch in a batch you won't save enough in system resources for it to be worth the maintenance of a RegEx (if you're not already familiar with them I mean). Something like:
private string parseIt(string str)
{
if(str.Length % 8 != 0) throw new Exception("Bad string length");
StringBuilder retVal = new StringBuilder(str)
for (int i = str.Length - 1; i >=0; i=i-8)
{
retVal.Insert(i, " ");
}
return "<" + retVal.ToString() + ">";
}

Try
Regex.Replace(YOURTEXT, "(.{8})", "$1 ");

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing syntactic string in C# - c#

Related

how to convert char #"\" to Escape String \ by C#

How to Regex match a pattern with parentheses in C#

Extract sub-string between two certain words right to left side

Add to string if string non empty

Regular expression to split string into equal length chunks

Categories

Resources