Extract comments from .cs file - c#

Is it possible to use a Regular Expression to extract only comments from a C# file?
If so how would you do that?

Refer this: -> Finding Comments in Source Code Using Regular Expressions
after reading the article, your final RegEx would be.
(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*)
FOR C#
^/[/|*](.+)$ (for single line comment )
(^\/\/.*?$|\/\*.*?\*\/) (for multilne comments)

This Regex finds all the comments in a C# file.
(((?<=//).+(?=\n))|((?<=/*)[^(\*/)]+(?=\*/)))+

Related

C# Regex filter problems

At this moment in time, i posted something earlier asking about the same type of question regarding Regex. It has given me headaches, i have looked up loads of documentation of how to use regex but i still could not put my finger on it. I wouldn't want to waste another 6 hours looking to filter simple (i think) expressions.
So basically what i want to do is filter all filetypes with the endings of HTML extensions (the '*' stars are from a Winforms Tabcontrol signifying that the file has been modified. I also need them in IgnoreCase:
.html, .htm, .shtml, .shtm, .xhtml
.html*, .htm*, .shtml*, .shtm*, .xhtml*
Also filtering some CSS files:
.css
.css*
And some SQL Files:
.sql, .ddl, .dml
.sql*, .ddl*, .dml*
My previous question got an answer to filtering Python files:
.py, .py, .pyi, .pyx, .pyw
Expression would be: \.py[3ixw]?\*?$
But when i tried to learn from the expression above i would always end up with opening a .xhtml only, the rest are not valid.
For the HTML expression, i currently have this: \.html|.html|.shtml|.shtm|.xhtml\*?$ with RegexOptions.IgnoreCase. But the output will only allow .xhtml case sensitive or insensitive. .html files, .htm and the rest did not match. I would really appreciate an explanation to each of the expressions you provide (so i don't have to ask the same question ever again).
Thank you.
For such cases you may start with a simple regex that can be simplified step by step down to a good regex expression:
In C# this would basically, with IgnoreCase, be
Regex myRegex = new Regex("PATTERN", RegexOptions.IgnoreCase);
Now the pattern: The most easy one is simply concatenating all valid results with OR + escaping (if possible):
\.html|\.htm|\.shtml|\.shtm|\.xhtml|\.html*|\.htm*|\.shtml*|\.shtm*|\.xhtml*
With .html* you mean .html + anything, which is written as .*(Any character, 0-infinite times) in regex.
\.html|\.htm|\.shtml|\.shtm|\.xhtml|\.html.*|\.htm.*|\.shtml.*|\.shtm.*|\.xhtml.*
Then, you may take all repeating patterns and group them together. All file endings start with a dot and may have an optional end and ending.* always contains ending:
\.(html|htm|shtml|shtm|xhtml).*
Then, I see htm pretty often, so I try to extract that. Taking all possible characters before and after htm together (? means 0 or 1 appearance):
\.(s|x)?(htm)l?.*
And, I always check if it's still working in regexstorm for .Net
That way, you may also get regular expressions for the other 2 ones and concat them all together in the end.

Using regular expression on string for use in C#

I'm trying to extract a url from a string.
{ns:"images",k:"5127",mid:"A04F21EB77CF61E10E43BA33CF1986CA44357448"
,md5:"e2987d19c953bd836ec8fd2e0aa8492",surl:"http://someURLIdontwant/"
,imgurl:"http://THISISTHEURLINEED.jpg",tid:"OIP.Me2987d199c953bd836ec8fd2e0aa8492H0"
,ow:"300", docid:"608010036892077154",oh:"225",tft:"49"}
So it is located after "imgurl:". I am no expert on Regex and all I could produce is:
imgurl:'(.*)',tid
whitch worked on some online regex tester. But not the way I'm using it in C# apperantly.
webClient.DownloadFile(System.Text.RegularExpressions.Regex.Match
(stringWithText, "imgurl:'(.*)',tid").Groups[1].Value,"path\file.jpg");
Can it be done? Thanks
As #WiktorStribiżew already pointed out: The expression is almost correct. Use this instead:
Regex.Match(stringWithText, "imgurl:\"(.*)\",tid").Groups[1].Value
Example on dotNetFiddle
And as I mentioned earlier in a comment: You should parse the Json data instead.

Using regex to split a formatted string to URL like StackOverFlow

I'm trying to write a parser that will create links found in posted text that are formatted like so:
[Site Description](http://www.stackoverflow.com)
to be rendered as a standard HTML link like this:
Site Description
So far what I have is the expression listed below and will work on the example above, but if will not work if the URL has anything after the ".com". Obviously there is no single regex expression that will find every URL but would like to be able to match as many as I can.
(\[)([A-Za-z0-9 -_]*)(\])(\()((http|https|ftp)\://[A-Za-z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?)(\))
Any help would be greatly appreciated. Thanks.
Darn. It seems #Jerry and #MikeH beat me to it. My answer is best, however, as the link tags are all uppercase ;)
Find what: \[([^]]+)\]\(([^)]+)\)
Replace with: $1
http://regex101.com/r/cY7lF0
Well, you could try negated classes so you don't have to worry about the parsing of the url itself?
\[([^]]+)\]\(([^)]+)\)
And replace with:
$1
regex101 demo
Or maybe use only the beginning parts to identify a url?
\[([^]]+)\]\(((?:https?|ftp)://[^)]+)\)
The replace is the same.

Using Regex to remove css comments

How can I remove comments from CSS using Regex.Replace()?
Note - I'm not able to use the regex mentioned here in C# - Regular expression to remove CSS comments.
That would be normally enough (assuming cssLines is a string containing all lines of your CSS file):
Regex.Replace(cssLines, #"/\*.+?\*/", string.Empty, RegexOptions.Singleline)
Please note that the Singleline option will allow to match multi-line comments.
Use the regex from the linked question like so:
var rx = new Regex(#"(?<!"")\/\*.+?\*\/(?!"")");
I wonder if the following version of Maxim's solution would be faster.
"/\*[^*]*.*?\*/"
As the discussion shows this will also eliminate comments within string literals.
Very late reply but thought it will be useful for some
"(?:/*(.|[\r\n])?/)|(?:(?([^)])//.)"
This will help removing css comments both singleline and multiline.

Parser using RegEx and XML, in C#

I am making an application where I need to verify the syntax of each line which contains a command involving a keyword as the first word.
Also, if the syntax is correct I need to check the type of the variables used in the keywords.
Like if there's a print command:
print "string" < variable < "some another string" //some comments
print\s".*"((\s?<\s?".*")*\s?<\s?(?'string1'\w+))?(\s*//.*)?
So i made the following Regex:
\s*[<>]\s*((?'variant'\w+)(\[\d+\])*)
This is to access all words in variant group to extract the variables used and verify their type.
Like this my tool has many keywords and currently I am crudely writing regex for each keyword. And if there's a change tomorrow I would be replacing the respective change everytime everywhere in every keyword.
I am storing a Regex for each keyword in an XML file. However I was interested in making it extensible, where say the specification changes tomorrow so I need to change it only once and it would reflect in all the places something like I transform the print regex to:
print %string% (%<% %string%|%variable%)* %comments%
Now like this, I write a specification for each keyword and write the definition of string, variable, comments in another file which stores their regex. Then I write a parser which parses this string and create a regex string for me.
Is this possible?
Is there any better way of doing this or is there any way I can do this in XML?
Last time I asked a question like this, someone pointed me to http://www.antlr.org/. Enjoy. :-)
I got an idea and made my own replacer. I used %myname% kind of tags to define my regular expression, and i wrote the definition of %myname% tags seperately using regex. Then i scanned the string recursively and converted the occurance of %myname% tags to the specification they had. It did my work.Thanks any ways

Categories