How to detect dot (.) at the start of a line in C#?

How to detect dot (.) at the start of a line in C#? - c#

I want to remove a dot (.) if appears at the start of the line, so for example:
hi,
<new line> .
<new line> How are you.
How can I remove this line?

Remove a dot at the start of a line:
resultString = Regex.Replace(subjectString, #"^\.", "", RegexOptions.Multiline);
Remove an entire line if it starts with a dot:
resultString = Regex.Replace(subjectString, #"^\..*\r\n", "", RegexOptions.Multiline);
Remove an entire line if it contains only a dot:
resultString = Regex.Replace(subjectString, #"^\.\r\n", "", RegexOptions.Multiline);
Remove an entire line if it starts with a dot and possibly contains trailing whitespace:
resultString = Regex.Replace(subjectString, #"^\.[^\r\n\S]*\r\n", "", RegexOptions.Multiline);

string result = "hi,\n.\nHow are you.".Replace("\n.\n", "\n");

You could use built-in string-comparisons - there are plenty avialable.
You could use RegEx.
But it all comes to one point --> what have you tried? and please read the docs.
Short possible answer:
// allLines is your List/Array/Enumerable of all lines that need checking
foreach(string line in allLines){
if(!line.Trim().StartsWith("."){
// Do whatever you like with the found string.
line = line.Remove(".",1);
}
}

What does "detect" mean in your case? If it's just to return true/false if a dot occurs, you can just search for "\n." - a line break followed by a dot. You don't even need regex for it:
bool weHaveDot = myString[0] == "." || (myString.IndexOf("\n.") > -1);

What about:
MyLine.Substring(MyLine.IndexOf("<new line> ") + 11).StartsWith(".")

I'm afraid the answer depends on what the question means.
My guess is that you have several lines, and you want to remove those lines that consist of only a dot. Right?
Then the solution would be:
Split the string into a string array, each of which contains one line
Remove all array elements that consist of just a dot
Join the string array back together.
And you don't need a regex for that. (Unless you want to get some experience in regexes; in that case, sorry.)

Related

Removing numbers from text using C#

I have a text file for processing, which has some numbers. I want JUST text in it, and nothing else. I managed to remove the punctuation marks, but how do I remove the numbers? I want this using C# code.
Also, I want to remove words with length greater than 10. How do I do that using Reg Expressions?

You can do this with a regex:
string withNumbers = // string with numbers
string withoutNumbers = Regex.Replace(withNumbers, "[0-9]", "");
Use this regex to remove words with more than 10 characters:
[\w]{10, 100}
100 defines the max length to match. I don't know if there is a quantifier for min length...

Only letters and nothing else (because I see you also want to remove the punctuation marks)
Regex.IsMatch(input, #"^[a-zA-Z]+$");

You can also use string.Join:
string s = "asdasdad34534t3sdf43534";
s = string.Join(null, System.Text.RegularExpressions.Regex.Split(s, "[\\d]"));

The Regex.Replace method should do the trick.
// regex to match any digit
var regex = new Regex("\d");
// replace all matches in input with empty string
var output = regex.Replace(input, String.Empty);

Regex - Find from both sides only if it has spaces

I need some help on Regex. I need to find a word that is surrounded by whatever element, for example - *. But I need to match it only if it has spaces or nothing on the ether sides. For example if it is at start of the text I can't really have space there, same for end.
Here is what I came up to
string myString = "You will find *me*, and *me* also!";
string findString = #"(\*(.*?)\*)";
string foundText;
MatchCollection matchCollection = Regex.Matches(myString, findString);
foreach (Match match in matchCollection)
{
foundText = match.Value.Replace("*", "");
myString = myString.Replace(match.Value, "->" + foundText + "<-");
match.NextMatch();
}
Console.WriteLine(myString);
You will find ->me<-, and ->me<- also!
Works correct, the problem is when I add * in the middle of text, I don't want it to match then.
Example: You will find *m*e*, and *me* also!
Output: You will find ->m<-e->, and <-me* also!
How can I fix that?

Try the following pattern:
string findString = #"(?<=\s|^)\*(.*?)\*(?=\s|$)";
(?<=\s|^)X will match any X only if preceded by a space-char (\s), or the start-of-input, and
X(?=\s|$) matches any X if followed by a space-char (\s), or the end-of-input.
Note that it will not match *me* in foo *me*, bar since the second * has a , after it! If you want to match that too, you need to include the comma like this:
string findString = #"(?<=[\s,]|^)\*(.*?)\*(?=[\s,]|$)";
You'll need to expand the set [\s,] as you see fit, of course. You might want to add !, ? and . at the very least: [\s,!?.] (and no, . and ? do not need to be escaped inside a character-set!).
EDIT
A small demo:
string Txt = "foo *m*e*, bar";
string Pattern = #"(?<=[\s,]|^)\*(.*?)\*(?=[\s,]|$)";
Console.WriteLine(Regex.Replace(Txt, Pattern, ">$1<"));
which would print:
>m*e<

You can add "beginning of line or space" and "space or end of line" around your match:
(^|\s)\*(.*?)\*(\s|$)
You'll now need to refer to the middle capture group for the match string.

How to remove extra returns and spaces in a string by regex?

I convert a HTML code to plain text.But there are many extra returns and spaces.How to remove them?

string new_string = Regex.Replace(orig_string, #"\s", "") will remove all whitespace
string new_string = Regex.Replace(orig_string, #"\s+", " ") will just collapse multiple whitespaces into one

I'm assuming that you want to
find two or more consecutive spaces and replace them with a single space, and
find two or more consecutive newlines and replace them with a single newline.
If that's correct, then you could use
resultString = Regex.Replace(subjectString, #"( |\r?\n)\1+", "$1");
This keeps the original "type" of whitespace intact and also preserves Windows line endings correctly. If you also want to "condense" multiple tabs into one, use
resultString = Regex.Replace(subjectString, #"( |\t|\r?\n)\1+", "$1");
To condense a string of newlines and spaces (any number of each) into a single newline, use
resultString = Regex.Replace(subjectString, #"(?:(?:\r?\n)+ +){2,}", #"\n");

I used a lot of algorithm for that. Every loop was good but this was clear and absolute.
//define what you want to remove as char
char tb = (char)9; //Tab char ascii code
spc = (char)32; //space char ascii code
nwln = (char)10; //New line char ascii char
yourstring.Replace(tb,"");
yourstring.Replace(spc,"");
yourstring.Replace(nwln,"");
//by defining chars, result was better.

You can use Trim() to remove the spaces and returns. In HTML the spaces is not important so you can omit them by using the Trim() method in System.String class.

How to take only first line from the multiline text

How can I get only the first line of multiline text using regular expressions?
string test = #"just take this first line
even there is
some more
lines here";
Match m = Regex.Match(test, "^", RegexOptions.Multiline);
if (m.Success)
Console.Write(m.Groups[0].Value);

If you just need the first line, you can do it without using a regex like this
var firstline = test.Substring(0, test.IndexOf(Environment.NewLine));
As much as I like regexs, you don't really need them for everything, so unless this is part of some larger regex exercise, I would go for the simpler solution in this case.

string test = #"just take this first line
even there is
some more
lines here";
Match m = Regex.Match(test, "^(.*)", RegexOptions.Multiline);
if (m.Success)
Console.Write(m.Groups[0].Value);
. is often touted to match any character, while this isn't totally true. . matches any character only if you use the RegexOptions.Singleline option. Without this option, it matches any character except for '\n' (end of line).
That said, a better option is likely to be:
string test = #"just take this first line
even there is
some more
lines here";
string firstLine = test.Split(new string[] {Environment.NewLine}, StringSplitOptions.None)[0];
And better yet, is Brian Rasmussen's version:
string firstline = test.Substring(0, test.IndexOf(Environment.NewLine));

Try this one:
Match m = Regex.Match(test, #".*\n", RegexOptions.Multiline);

This kind of line replaces rest of text after linefeed with empty string.
test = Regex.Replace(test, "(\n.*)$", "", RegexOptions.Singleline);
This will work also properly if string does not have linefeed - then no replacement will be done.

Regex to match alphanumeric and spaces

What am I doing wrong here?
string q = "john s!";
string clean = Regex.Replace(q, #"([^a-zA-Z0-9]|^\s)", string.Empty);
// clean == "johns". I want "john s";

just a FYI
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
would actually be better like
string clean = Regex.Replace(q, #"[^\w\s]", string.Empty);

This:
string clean = Regex.Replace(dirty, "[^a-zA-Z0-9\x20]", String.Empty);
\x20 is ascii hex for 'space' character
you can add more individual characters that you want to be allowed.
If you want for example "?" to be ok in the return string add \x3f.

I got it:
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
Didn't know you could put \s in the brackets

The following regex is for space inclusion in textbox.
Regex r = new Regex("^[a-zA-Z\\s]+");
r.IsMatch(textbox1.text);
This works fine for me.

I suspect ^ doesn't work the way you think it does outside of a character class.
What you're telling it to do is replace everything that isn't an alphanumeric with an empty string, OR any leading space. I think what you mean to say is that spaces are ok to not replace - try moving the \s into the [] class.

There appear to be two problems.
You're using the ^ outside a [] which matches the start of the line
You're not using a * or + which means you will only match a single character.
I think you want the following regex #"([^a-zA-Z0-9\s])+"

bottom regex with space, supports all keyboard letters from different culture
string input = "78-selim güzel667.,?";
Regex regex = new Regex(#"[^\w\x20]|[\d]");
var result= regex.Replace(input,"");
//selim güzel

The circumflex inside the square brackets means all characters except the subsequent range. You want a circumflex outside of square brackets.

This regex will help you to filter if there is at least one alphanumeric character and zero or more special characters i.e. _ (underscore), \s whitespace, -(hyphen)
string comparer = "string you want to compare";
Regex r = new Regex(#"^([a-zA-Z0-9]+[_\s-]*)+$");
if (!r.IsMatch(comparer))
{
return false;
}
return true;
Create a set using [a-zA-Z0-9]+ for alphanumeric characters, "+" sign (a quantifier) at the end of the set will make sure that there will be at least one alphanumeric character within the comparer.
Create another set [_\s-]* for special characters, "*" quantifier is to validate that there can be special characters within comparer string.
Pack these sets into a capture group ([a-zA-Z0-9]+[_\s-]*)+ to say that the comparer string should occupy these features.

[RegularExpression(#"^[A-Z]+[a-zA-Z""'\s-]*$")]
Above syntax also accepts space

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to detect dot (.) at the start of a line in C#? - c#

I want to remove a dot (.) if appears at the start of the line, so for example: hi, <new line> . <new line> How are you. How can I remove this line?

string result = "hi,\n.\nHow are you.".Replace("\n.\n", "\n");

What does "detect" mean in your case? If it's just to return true/false if a dot occurs, you can just search for "\n." - a line break followed by a dot. You don't even need regex for it: bool weHaveDot = myString[0] == "." || (myString.IndexOf("\n.") > -1);

What about: MyLine.Substring(MyLine.IndexOf("<new line> ") + 11).StartsWith(".")

Related

Removing numbers from text using C#

Regex - Find from both sides only if it has spaces

How to remove extra returns and spaces in a string by regex?

How to take only first line from the multiline text

Regex to match alphanumeric and spaces

Categories

Resources