How can I strip in-line comments from a text reader - c#

Hi I'm trying to remove comments from within a text file by iterating through a streamreader and checking if each line starts with /*
private void StripComments()
{
_list = new List<string>();
using (_reader = new StreamReader(_path))
{
while ((_line = _reader.ReadLine()) != null)
{
var temp =_line.Trim();
if (!temp.StartsWith(#"/*"))
{
_list.Add(temp);
}
}
}
}
I need to remove comments with the following format /* I AM A COMMENT */ I thought that the file only had whole line comments but upon closer inspection there are comments located at the ends of some lines. The .endswith(#"*/") can't be used as this would remove the code preceding it.
Thanks.

If you are comfortable with regex
string pattern="(?s)/[*].*?[*]/";
var output=Regex.Replace(File.ReadAllText(path),pattern,"");
. would match any character other then newline.
(?s) toggles the single line mode in which . would also match newlines..
.* would match 0 to many characters where * is a quantifier
.*? would match lazily i.e it would match as less as possible
NOTE
That won't work if a string within "" contain /*..You should use a parser instead!

Regex is a good fit for this.
string START = Regex.Escape("/*");
string END = Regex.Escape("*/");
string input = #"aaa/* bcd
de */ f";
var str = Regex.Replace(input, START + ".+?" + END, "",RegexOptions.Singleline);

List<string> _list = new List<string>();
Regex r = new Regex("/[*]");
string temp = #"sadf/*slkdj*/";
if (temp.StartsWith(#"/*")) { }
else if (temp.EndsWith(#"*/") && temp.Contains(#"/*"))
{
string pre = temp.Substring(0, r.Match(temp).Index);
_list.Add(pre);
}
else
{
_list.Add(temp);
}

Related

C# Write Word at the end, if string pattern contains

I am trying to write code,
any line which contains word 'ocean', I will write 'water' at the end
how would I conduct this with RegeEx?
Sample:
test1
abcdocean123
test2
test3
Result (keeps all other spacing in file):
test1
abcdocean123 water
test2
test3
Code Attempt:
public string FileRead(string path)
{
content = File.ReadAllText(path);
return content;
}
public string FileChange()
{
var lines = content.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(line => Regex.Replace(line, #"\bocean\b\n", "water \n"));
content = String.Join("\n", lines);
return content;
}
You need to check if a line contains ocean, and, if yes, append the water to that line only:
var content = "test1\n\nabcdocean123 \n\n\ntest2\ntest3";
var lines = content.Split(new[] { "\n" }, StringSplitOptions.None)
.Select(line => line.Contains("ocean") ? $"{line}water" : line);
return string.Join("\n", lines);
See the C# demo
If you still need to use a regex replace line.Contains("ocean") with Regex.IsMatch(line, #"\bocean\b"), or whatever regex you need there. Just note that \b is a word boundary and \bocean\b will match only when not enclosed with word chars (digits, letters or underscores).
Note you should rely on splitting with a newline without removing any empty lines, and when joining the lines back you won't lose any empty ones.
If you really want to continue your journey with regex, you may use
var content = "test1\n\nabcdocean123 \n\n\ntest2\ntest3";
content = Regex.Replace(content, #"ocean.*", "$&water");
// If your line endings are CRLF, use
// content = Regex.Replace(content, #"ocean[^\r\n]*", "$&water");
Console.WriteLine(content);
See this C# demo
Here, ocean.* matches ocean substring and .* matches the rest of the line and $& replaces with the match found and then water is added. [^\r\n] is preferable if your line endings may include CR and as . matches CR, it is safer to use [^\r\n], any char but CR and LF.
Check this
Regex.Replace(line, #"(ocean)(\w+)", "$1water $2\n");
Working Fiddle
There is no need to use Regex at all in your case, if I got your question right.
You can just check wheter a string contains the ocean phrase and append the water word then.
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
private static readonly string Token = "ocean";
private static readonly string AppendToken = "water";
public static void Main()
{
var mylist = new List<string>(new string[] { "firststring", "asdsadsaoceansadsadas", "onemoreocean", "notOcccean" });
var newList = mylist.Select(str => {
if(str.Contains(Program.Token)) {
return str + " " +Program.AppendToken;
}
return str;
});
foreach (object o in newList)
{
Console.WriteLine(o);
}
}
}
You can run this code on DotnetFiddle

C# "between strings" run several times

Here is my code to find a string between { }:
var text = "Hello this is a {Testvar}...";
int tagFrom = text.IndexOf("{") + "{".Length;
int tagTo = text.LastIndexOf("}");
String tagResult = text.Substring(tagFrom, tagTo - tagFrom);
tagResult Output: Testvar
This only works for one time use.
How can I apply this for several Tags? (eg in a While loop)
For example:
var text = "Hello this is a {Testvar}... and we have more {Tagvar} in this string {Endvar}.";
tagResult[] Output (eg Array): Testvar, Tagvar, Endvar
IndexOf() has another overload that takes the start index of which starts to search the given string. if you omit it, it will always look from the beginning and will always find the first one.
var text = "Hello this is a {Testvar}...";
int start = 0, end = -1;
List<string> results = new List<string>();
while(true)
{
start = text.IndexOf("{", start) + 1;
if(start != 0)
end = text.IndexOf("}", start);
else
break;
if(end==-1) break;
results.Add(text.Substring(start, end - start));
start = end + 1;
}
I strongly recommend using regular expressions for the task.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var regex = new Regex(#"(\{(?<var>\w*)\})+", RegexOptions.IgnoreCase);
var text = "Hello this is a {Testvar}... and we have more {Tagvar} in this string {Endvar}.";
var matches = regex.Matches(text);
foreach (Match match in matches)
{
var variable = match.Groups["var"];
Console.WriteLine($"Found {variable.Value} from position {variable.Index} to {variable.Index + variable.Length}");
}
}
}
}
Output:
Found Testvar from position 17 to 24
Found Tagvar from position 47 to 53
Found Endvar from position 71 to 77
For more information about regular expression visit the MSDN reference page:
https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
and this tool may be great to start testing your own expressions:
http://regexstorm.net/tester
Hope this help!
I would use Regex pattern {(\\w+)} to get the value.
Regex reg = new Regex("{(\\w+)}");
var text = "Hello this is a {Testvar}... and we have more {Tagvar} in this string {Endvar}.";
string[] tagResult = reg.Matches(text)
.Cast<Match>()
.Select(match => match.Groups[1].Value).ToArray();
foreach (var item in tagResult)
{
Console.WriteLine(item);
}
c# online
Result
Testvar
Tagvar
Endvar
Many ways to skin this cat, here are a few:
Split it on { then loop through, splitting each result on } and taking element 0 each time
Split on { or } then loop through taking only odd numbered elements
Adjust your existing logic so you use IndexOf twice (instead of lastindexof). When you’re looking for a } pass the index of the { as the start index of the search
This is so easy by using Regular Expressions just by using a simple pattern like {([\d\w]+)}.
See the example below:-
using System.Text.RegularExpressions;
...
MatchCollection matches = Regex.Matches("Hello this is a {Testvar}... and we have more {Tagvar} in this string {Endvar}.", #"{([\d\w]+)}");
foreach(Match match in matches){
Console.WriteLine("match : {0}, index : {1}", match.Groups[1], match.index);
}
It can find any series of letters or number in these brackets one by one.

Matching any word enclosed in parentheses in a sentence

I am trying to find a regex to match any word enclosed in parentheses in a sentence.
Suppose, I have a sentence.
"Welcome, (Hello, All of you) to the Stack Over flow."
Say if my matching word is Hello,, All, of or you. It should return true.
Word could contain anything number , symbol but separated from other by white-space
I tried with this \(([^)]*)\). but this returns all words enclosed by parentheses
static void Main(string[] args)
{
string ss = "Welcome, (Hello, All of you) to the Stack Over flow.";
Regex _regex = new Regex(#"\(([^)]*)\)");
Match match = _regex.Match(ss.ToLower());
if (match.Success)
{
ss = match.Groups[0].Value;
}
}
Help and Guidance is very much appreciated.
Thanks.
Thanks People for you time and answers. I have finally solved by changing my code as reply by Tim.
For People with similar problem. I am writing my final code here
static void Main(string[] args)
{
string ss = "Welcome, (Hello, All of you) to the Stack Over flow.";
Regex _regex = new Regex(#"[^\s()]+(?=[^()]*\))");
Match match = _regex.Match(ss.ToLower());
while (match.Success)
{
ss = match.Groups[0].Value;
Console.WriteLine(ss);
match = match.NextMatch();
}
}
OK, so it seems that a "word" is anything that's not whitespace and doesn't contain parentheses, and that you want to match a word if the next parenthesis character that follows is a closing parenthesis.
So you can use
[^\s()]+(?=[^()]*\))
Explanation:
[^\s()]+ matches a "word" (should be easy to understand), and
(?=[^()]*\)) makes sure that a closing parenthesis follows:
(?= # Look ahead to make sure the following regex matches here:
[^()]* # Any number of characters except parentheses
\) # followed by a closing parenthesis.
) # (End of lookahead assertion)
I've developed a c# function for you, if you are interested.
public static class WordsHelper
{
public static List<string> GetWordsInsideParenthesis(string s)
{
List<int> StartIndices = new List<int>();
var rtn = new List<string>();
var numOfOpen = s.Where(m => m == '(').ToList().Count;
var numOfClose = s.Where(m => m == ')').ToList().Count;
if (numOfClose == numOfOpen)
{
for (int i = 0; i < numOfOpen; i++)
{
int ss = 0, sss = 0;
if (StartIndices.Count == 0)
{
ss = s.IndexOf('(') + 1; StartIndices.Add(ss);
sss = s.IndexOf(')');
}
else
{
ss = s.IndexOf('(', StartIndices.Last()) + 1;
sss = s.IndexOf(')', ss);
}
var words = s.Substring(ss, sss - ss).Split(' ');
foreach (string ssss in words)
{
rtn.Add(ssss);
}
}
}
return rtn;
}
}
Just call it this way:
var text = "Welcome, (Hello, All of you) to the (Stack Over flow).";
var words = WordsHelper.GetWordsInsideParenthesis(s);
Now you'll have a list of words in words variable.
Generally, you should opt for c# coding, rather than regex because c# is far more efficient and readable and better than regex in performance wise.
But, if you want to stick on to Regex, then its ok, do the following:
If you want to use regex, keep the regex from Tim Pietzcker [^\s()]+(?=[^()]*\)) but use it this way:
var text="Welcome, (Hello, All of you) to the (Stack Over flow).";
var values= Regex.Matches(text,#"[^\s()]+(?=[^()]*\))");
now values contains MatchCollection
You can access the value using index and Value property
Something like this:
string word=values[0].Value;
(?<=[(])[^)]+(?=[)])
Matches all words in parentheses
(?<=[(]) Checks for (
[^)]+ Matches everything up to but not including a )
(?=[)]) Checks for )

C# Regex match all occurrences

I'm trying to make a Regular Expression in C# that will match strings like"", but my Regex stops at the first match, and I'd like to match the whole string.
I've been trying with a lot of ways to do this, currently, my code looks like this:
string sPattern = #"/&#\d{2};/";
Regex rExp = new Regex(sPattern);
MatchCollection mcMatches = rExp.Matches(txtInput.Text);
foreach (Match m in mcMatches) {
if (!m.Success) {
//Give Warning
}
}
And also tried lblDebug.Text = Regex.IsMatch(txtInput.Text, "(&#[0-9]{2};)+").ToString(); but it also only finds the first match.
Any tips?
Edit:
The end result I'm seeking is that strings like &# are labeled as incorrect, as it is now, since only the first match is made, my code marks this as a correct string.
Second Edit:
I changed my code to this
string sPattern = #"&#\d{2};";
Regex rExp = new Regex(sPattern);
MatchCollection mcMatches = rExp.Matches(txtInput.Text);
int iMatchCount = 0;
foreach (Match m in mcMatches) {
if (m.Success) {
iMatchCount++;
}
}
int iTotalStrings = txtInput.Text.Length / 5;
int iVerify = txtInput.Text.Length % 5;
if (iTotalStrings == iMatchCount && iVerify == 0) {
lblDebug.Text = "True";
} else {
lblDebug.Text = "False";
}
And this works the way I expected, but I still think this can be achieved in a better way.
Third Edit:
As #devundef suggest, the expression "^(&#\d{2};)+$" does the work I was hopping, so with this, my final code looks like this:
string sPattern = #"^(&#\d{2};)+$";
Regex rExp = new Regex(sPattern);
lblDebug.Text = rExp.IsMatch(txtInput.Text).ToString();
I always neglect the start and end of string characters (^ / $).
Remove the / at the start and end of the expression.
string sPattern = #"&#\d{2};";
EDIT
I tested the pattern and it works as expected. Not sure what you want.
Two options:
&#\d{2}; => will give N matches in the string. On the string  it will match 2 groups,  and 
(&#\d{2};)+ => will macth the whole string as one single group. On the string  it will match 1 group, 
Edit 2:
What you want is not get the groups but know if the string is in the right format. This is the pattern:
Regex rExp = new Regex(#"^(&#\d{2};)+$");
var isValid = rExp.IsMatch("") // isValid = true
var isValid = rExp.IsMatch("xyz") // isValid = false
Here you go: (&#\d{2};)+ This should work for one occurence or more
(&#\d{2};)*
Recommend: http://www.weitz.de/regex-coach/

Using regex or string manipulation when creating permalinks

I have following method(and looks expensive too) for creating permalinks but it's lacking few stuff that are quite important for nice permalink:
public string createPermalink(string text)
{
text = text.ToLower().TrimStart().TrimEnd();
foreach (char c in text.ToCharArray())
{
if (!char.IsLetterOrDigit(c) && !char.IsWhiteSpace(c))
{
text = text.Replace(c.ToString(), "");
}
if (char.IsWhiteSpace(c))
{
text = text.Replace(c, '-');
}
}
if (text.Length > 200)
{
text = text.Remove(200);
}
return text;
}
Few stuff that it is lacking:
if someone enters text like this:
"My choiches are:foo,bar" would get returned as "my-choices-arefoobar"
and it should be like: "my-choiches-are-foo-bar"
and If someone enters multiple white spaces it would get returned as "---" which is not nice to have in url.
Is there some better way to do this in regex(I really only used it few times)?
UPDATE:
Requirement was:
Any non digit or letter chars at beginning or end are not allowed
Any non digit or letter chars should be replaced by "-"
When replaced with "-" chars should not reapeat like "---"
And finally stripping string at index 200 to ensure it's not too long
Change to
public string createPermalink(string text)
{
text = text.ToLower();
StringBuilder sb = new StringBuilder(text.Length);
// We want to skip the first hyphenable characters and go to the "meat" of the string
bool lastHyphen = true;
// You can enumerate directly a string
foreach (char c in text)
{
if (char.IsLetterOrDigit(c))
{
sb.Append(c);
lastHyphen = false;
}
else if (!lastHyphen)
{
// We use lastHyphen to not put two hyphens consecutively
sb.Append('-');
lastHyphen = true;
}
if (sb.Length == 200)
{
break;
}
}
// Remove the last hyphen
if (sb.Length > 0 && sb[sb.Length - 1] == '-')
{
sb.Length--;
}
return sb.ToString();
}
If you really want to use regexes, you can do something like this (based on the code of Justin)
Regex rgx = new Regex(#"^\W+|\W+$");
Regex rgx2 = new Regex(#"\W+");
return rgx2.Replace(rgx.Replace(text.ToLower(), string.Empty), "-");
The first regex searches for non-word characters (1 or more) at the beginning (^) or at the end of the string ($) and removes them. The second one replaces one or more non-word characters with -.
This should solve the problem that you have explained. Please let me know if it needs any further explanation.
Just as an FYI, the regex makes use of lookarounds to get it done in one run
//This will find any non-character word, lumping them in one group if more than 1
//It will ignore non-character words at the beginning or end of the string
Regex rgx = new Regex(#"(?!\W+$)\W+(?<!^\W+)");
//This will then replace those matches with a -
string result = rgx.Replace(input, "-");
To keep the string from going beyond 200 characters, you will have to use substring. If you do this before the regex, then you will be ok, but if you do it after, then you run the risk of having a trailing dash again, FYI.
example:
myString.Substring(0,200)
I use an iterative approach for this - because in some cases you might want certain characters to be turned into words instead of having them turned into '-' characters - e.g. '&' -> 'and'.
But when you're done you'll also end up with a string that potentially contains multiple '-' - so you have a final regex that collapses all multiple '-' characters into one.
So I would suggest using an ordered list of regexes, and then run them all in order. This code is written to go in a static class that is then exposed as a single extension method for System.String - and is probably best merged into the System namespace.
I've hacked it from code I use, which had extensibility points (e.g. you could pass in a MatchEvaluator on construction of the replacement object for more intelligent replacements; and you could pass in your own IEnumerable of replacements, as the class was public), and therefore it might seem unnecessarily complicated - judging by the other answers I'm guessing everybody will think so (but I have specific requirements for the SEO of the strings that are created).
The list of replacements I use might not be exactly correct for your uses - if not, you can just add more.
private class SEOSymbolReplacement
{
private Regex _rx;
private string _replacementString;
public SEOSymbolReplacement(Regex r, string replacement)
{
//null-checks required.
_rx = r;
_replacementString = replacement;
}
public string Execute(string input)
{
/null-check required
return _rx.Replace(input, _replacementString);
}
}
private static readonly SEOSymbolReplacement[] Replacements = {
new SEOSymbolReplacement(new Regex(#"#", RegexOptions.Compiled), "Sharp"),
new SEOSymbolReplacement(new Regex(#"\+", RegexOptions.Compiled), "Plus"),
new SEOSymbolReplacement(new Regex(#"&", RegexOptions.Compiled), " And "),
new SEOSymbolReplacement(new Regex(#"[|:'\\/,_]", RegexOptions.Compiled), "-"),
new SEOSymbolReplacement(new Regex(#"\s+", RegexOptions.Compiled), "-"),
new SEOSymbolReplacement(new Regex(#"[^\p{L}\d-]",
RegexOptions.IgnoreCase | RegexOptions.Compiled), ""),
new SEOSymbolReplacement(new Regex(#"-{2,}", RegexOptions.Compiled), "-")};
/// <summary>
/// Transforms the string into an SEO-friendly string.
/// </summary>
/// <param name="str"></param>
public static string ToSEOPathString(this string str)
{
if (str == null)
return null;
string toReturn = str;
foreach (var replacement in DefaultReplacements)
{
toReturn = replacement.Execute(toReturn);
}
return toReturn;
}

Categories