How can I get only the first line of multiline text using regular expressions?
string test = #"just take this first line
even there is
some more
lines here";
Match m = Regex.Match(test, "^", RegexOptions.Multiline);
if (m.Success)
Console.Write(m.Groups[0].Value);
If you just need the first line, you can do it without using a regex like this
var firstline = test.Substring(0, test.IndexOf(Environment.NewLine));
As much as I like regexs, you don't really need them for everything, so unless this is part of some larger regex exercise, I would go for the simpler solution in this case.
string test = #"just take this first line
even there is
some more
lines here";
Match m = Regex.Match(test, "^(.*)", RegexOptions.Multiline);
if (m.Success)
Console.Write(m.Groups[0].Value);
. is often touted to match any character, while this isn't totally true. . matches any character only if you use the RegexOptions.Singleline option. Without this option, it matches any character except for '\n' (end of line).
That said, a better option is likely to be:
string test = #"just take this first line
even there is
some more
lines here";
string firstLine = test.Split(new string[] {Environment.NewLine}, StringSplitOptions.None)[0];
And better yet, is Brian Rasmussen's version:
string firstline = test.Substring(0, test.IndexOf(Environment.NewLine));
Try this one:
Match m = Regex.Match(test, #".*\n", RegexOptions.Multiline);
This kind of line replaces rest of text after linefeed with empty string.
test = Regex.Replace(test, "(\n.*)$", "", RegexOptions.Singleline);
This will work also properly if string does not have linefeed - then no replacement will be done.
Related
I need to find words in a string with starting and ending white space. I am finding issues while searching white spaces. However, I could achieve the below. Starts and ends with ##. Any help with whitespaces will be great.
string input = "##12## ##13##";
foreach (Match match in Regex.Matches(input, #"##\b\S+?\b##"))
{
messagebox.show(match.Groups[1].Value);
}
From MSDN doc:
// Define a regular expression for repeated words.
Regex rx = new Regex(#"\b(?<word>\w+)\s+(\k<word>)\b",
RegexOptions.Compiled | RegexOptions.IgnoreCase);
\s+(?=</)
is that expression you're after. It means one or more white-space characters followed by
In my opinion it is betetr to use string.Split() instead of Regex:
var wordsArray = s.Split(new []{' '},StringSplitOptions.RemoveEmptyEntries);
it is better to avoid regex if you can achieve the same result easyer with standard string methods.
i cant exactly get what is in your mind but i hope this code can help you:
string[] ha = input.Split(new[] { '#' }, StringSplitOptions.RemoveEmptyEntries);
I want to remove a dot (.) if appears at the start of the line, so for example:
hi,
<new line> .
<new line> How are you.
How can I remove this line?
Remove a dot at the start of a line:
resultString = Regex.Replace(subjectString, #"^\.", "", RegexOptions.Multiline);
Remove an entire line if it starts with a dot:
resultString = Regex.Replace(subjectString, #"^\..*\r\n", "", RegexOptions.Multiline);
Remove an entire line if it contains only a dot:
resultString = Regex.Replace(subjectString, #"^\.\r\n", "", RegexOptions.Multiline);
Remove an entire line if it starts with a dot and possibly contains trailing whitespace:
resultString = Regex.Replace(subjectString, #"^\.[^\r\n\S]*\r\n", "", RegexOptions.Multiline);
string result = "hi,\n.\nHow are you.".Replace("\n.\n", "\n");
You could use built-in string-comparisons - there are plenty avialable.
You could use RegEx.
But it all comes to one point --> what have you tried? and please read the docs.
Short possible answer:
// allLines is your List/Array/Enumerable of all lines that need checking
foreach(string line in allLines){
if(!line.Trim().StartsWith("."){
// Do whatever you like with the found string.
line = line.Remove(".",1);
}
}
What does "detect" mean in your case? If it's just to return true/false if a dot occurs, you can just search for "\n." - a line break followed by a dot. You don't even need regex for it:
bool weHaveDot = myString[0] == "." || (myString.IndexOf("\n.") > -1);
What about:
MyLine.Substring(MyLine.IndexOf("<new line> ") + 11).StartsWith(".")
I'm afraid the answer depends on what the question means.
My guess is that you have several lines, and you want to remove those lines that consist of only a dot. Right?
Then the solution would be:
Split the string into a string array, each of which contains one line
Remove all array elements that consist of just a dot
Join the string array back together.
And you don't need a regex for that. (Unless you want to get some experience in regexes; in that case, sorry.)
I need some help on Regex. I need to find a word that is surrounded by whatever element, for example - *. But I need to match it only if it has spaces or nothing on the ether sides. For example if it is at start of the text I can't really have space there, same for end.
Here is what I came up to
string myString = "You will find *me*, and *me* also!";
string findString = #"(\*(.*?)\*)";
string foundText;
MatchCollection matchCollection = Regex.Matches(myString, findString);
foreach (Match match in matchCollection)
{
foundText = match.Value.Replace("*", "");
myString = myString.Replace(match.Value, "->" + foundText + "<-");
match.NextMatch();
}
Console.WriteLine(myString);
You will find ->me<-, and ->me<- also!
Works correct, the problem is when I add * in the middle of text, I don't want it to match then.
Example: You will find *m*e*, and *me* also!
Output: You will find ->m<-e->, and <-me* also!
How can I fix that?
Try the following pattern:
string findString = #"(?<=\s|^)\*(.*?)\*(?=\s|$)";
(?<=\s|^)X will match any X only if preceded by a space-char (\s), or the start-of-input, and
X(?=\s|$) matches any X if followed by a space-char (\s), or the end-of-input.
Note that it will not match *me* in foo *me*, bar since the second * has a , after it! If you want to match that too, you need to include the comma like this:
string findString = #"(?<=[\s,]|^)\*(.*?)\*(?=[\s,]|$)";
You'll need to expand the set [\s,] as you see fit, of course. You might want to add !, ? and . at the very least: [\s,!?.] (and no, . and ? do not need to be escaped inside a character-set!).
EDIT
A small demo:
string Txt = "foo *m*e*, bar";
string Pattern = #"(?<=[\s,]|^)\*(.*?)\*(?=[\s,]|$)";
Console.WriteLine(Regex.Replace(Txt, Pattern, ">$1<"));
which would print:
>m*e<
You can add "beginning of line or space" and "space or end of line" around your match:
(^|\s)\*(.*?)\*(\s|$)
You'll now need to refer to the middle capture group for the match string.
I am having a regular expression
Regex r = new Regex(#"(\s*)([A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]\d(?!.*[DFIOQU])(?:[A-Z](\s?)\d[A-Z]\d))(\s*)",RegexOptions.IgnoreCase);
and having a string
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
I have to fetch C1C 1C1.This running fine.
But if a modify test string as
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
then it is unable to find the pattern i.e C1C 1C1.
any idea why this expression is failing?
You have a negative look ahead:
(?!.*[DFIOQU])
That matches the "O" in "ON" and since it is a negative look ahead, the whole pattern fails. And, as an aside, I think you want to replace this:
[A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]
With this:
[A-CEGHJ-NPR-TVYX]
A pipe (|) is a literal character inside a character class, not an alternation, and you can use ranges to help hilight the characters that you're leaving out.
A single regex might not be the best way to parse that string. Or perhaps you just need a looser regex.
You are searching for a not a following DFIOQU with your negative look ahead (?!.*[DFIOQU])
In your second string there is a O at the end in ON, so it must be failing to match.
If you remove the .* in your negative look ahead it will only check the directly following character and not the complete string to the end (Is it this what you want?).
\s*([ABCEGHJKLMNPRSTVYX]\d(?![DFIOQU])(?:[A-Z]\s?\d[A-Z]\d))\s*
then it works, see it here on Regexr. It is now checking if there is not one of the characters in the class directly after the digit, I don't know if this is intended.
Btw. I removed the | from your first character class, its not needed and also some brackets around your whitespaces, also not needed.
As I understood you need to find the C1C 1C1 text in your string
I've used this regex for do this
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
after that you can extract text from named groups
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
RegexOptions myRegexOptions = RegexOptions.Multiline;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
string secondStr = "LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
Match match = myRegex.Match(strTargetString);
string c1c = match.Groups["c1c"].Value;
string c1c2 = match.Groups["c1c2"].Value;
Console.WriteLine(c1c + " " +c1c2);
Below is a simple code snippet that demonstrates the seemingly buggy behavior of end of line matching ("$") in .Net regular expressions. Am I missing something obvious?
string input = "Hello\nWorld\n";
string regex = #"^Hello\n^World\n"; //Match
//regex = #"^Hello\nWorld\n"; //Match
//regex = #"^Hello$"; //Match
//regex = #"^Hello$World$"; //No match!!!
//regex = #"^Hello$^World$"; //No match!!!
Match m = Regex.Match(input, regex, RegexOptions.Multiline | RegexOptions.CultureInvariant);
Console.WriteLine(m.Success);
$ does not consume the newline character(s). #"^Hello$\s+^World$" should match.
The $ doesn't match a newline. It matches the end of the string in which the pattern is applied (unless multiline mode is enabled). There isn't much sense in having two ends in a string.