Extract string that contains only letters in C#

Extract string that contains only letters in C# - c#

string input = "5991 Duncan Road";
var onlyLetters = new String(input.Where(Char.IsLetter).ToArray());
Output: DuncanRoad
But I am expecting output is Duncan Road. What need to change ?

For the input like yours, you do not need a regex, just skip all non-letter symbols at the beginning with SkipWhile():
Bypasses elements in a sequence as long as a specified condition is true and then returns the remaining elements.
C# code:
var input = "5991 Duncan Road";
var onlyLetters = new String(input.SkipWhile(p => !Char.IsLetter(p)).ToArray());
Console.WriteLine(onlyLetters);
See IDEONE demo
A regx solution that will remove numbers that are not part of words and also adjoining whitespace:
var res = Regex.Replace(str, #"\s+(?<!\p{L})\d+(?!\p{L})|(?<!\p{L})\d+(?!\p{L})\s+", string.Empty);

You can use this lookaround based regex:
repl = Regex.Replace(input, #"(?<![a-zA-Z])[^a-zA-Z]|[^a-zA-Z](?![a-zA-Z])", "");
//=> Duncan Road
(?<![a-zA-Z])[^a-zA-Z] matches a non-letter that is not preceded by another letter.
| is regex alternation
[^a-zA-Z](?![a-zA-Z]) matches a non-letter that is not followed by another letter.
RegEx Demo

You can still use LINQ filtering with Char.IsLetter || Char.IsWhiteSpace. To remove all leading and trailing whitespace chars you can call String.Trim:
string input = "5991 Duncan Road";
string res = String.Join("", input.Where(c => Char.IsLetter(c) || Char.IsWhiteSpace(c)))
.Trim();
Console.WriteLine(res); // Duncan Road

Related

Regex only letters except set of numbers

I'm using Replace(#"[^a-zA-Z]+", "");
leave only letters, but I have a set of numbers or characters that I want to keep as well, ex: 122456 and 112466. But I'm having trouble leaving it only if it's this sequence:
ex input:
abc 1239 asm122456000
I want to:
abscasm122456
tried this: ([^a-zA-Z])+|(?!122456)

My answer doesn't applying Replace(), but achieves a similar result:
(?:[a-zA-Z]+|\d{6})
which captures the group (non-capturing group) with the alphabetic character(s) or a set of digits with 6 occurrences.
Regex 101 & Test Result
Join all the matching values into a single string.
using System.Linq;
Regex regex = new Regex("(?:[a-zA-Z]+|\\d{6})");
string input = "abc 1239 asm12245600";
string output = "";
var matches = regex.Matches(input);
if (matches.Count > 0)
output = String.Join("", matches.Select(x => x.Value));
Sample .NET Fiddle

Alternate way,
using .Split() and .All(),
string input = "abc 1239 asm122456000";
string output = string.Join("", input.Split().Where(x => !x.All(char.IsDigit)));
.NET Fiddle

It is very simple: you need to match and capture what you need to keep, and just match what you need to remove, and then utilize a backreference to the captured group value in the replacement pattern to put it back into the resulting string.
Here is the regex:
(122456|112466)|[^a-zA-Z]
See the regex demo. Details:
(122456|112466) - Capturing group with ID 1: either of the two alternatives
| - or
[^a-zA-Z] - a char other than an ASCII letter (use \P{L} if you need to match any char other than any Unicode letter).
Note the removed + quantifier as [^A-Za-z] also matches digits.
You need to use $1 in the replacement:
var result = Regex.Replace(text, #"(122456|112466)|[^a-zA-Z]", "$1");

Regular expression to replace string except in sqaure brackets

Need to replace all forward-slash (/) with > except for the ones in the square brackets
input string:
string str = "//div[1]/li/a[#href='https://www.facebook.com/']";
Tried pattern (did not work):
string regex = #"\/(?=$|[^]]+\||\[[^]]+\]\/)";
var pattern = Regex.Replace(str, regex, ">");
Expected Result:
">>div[1]>li>a[#href='https://www.facebook.com/']"

Your thinking was good with lookbehind but instead positive use negative.
(?<!\[[^\]]*)(\/)
Demo
After updating your c# code
string pattern = #"(?<!\[[^\]]*)(\/)";
string input = "//div[1]/li/a[#href='https://www.facebook.com/']";
var result = Regex.Replace(input, pattern, ">");
You will get
>>div[1]>li>a[#href='https://www.facebook.com/']

If you're willing to also use String.Replace you can do the following:
string input = "//div[1]/li/a[#href='https://www.facebook.com/']";
string expected = ">>div[1]>li>a[#href='https://www.facebook.com/']";
var groups = Regex.Match(input, #"^(.*)(\[.*\])$")
.Groups
.Cast<Group>()
.Select(g => g.Value)
.Skip(1);
var left = groups.First().Replace('/', '>');
var right = groups.Last();
var actual = left + right;
Assert.Equal(expected, actual);
What this does is split the string into two groups, where for the first group the / is replaced by > as you describe. The second group is appended as is. Basically, you don't care what is between square brackets.
(The Assert is from an xUnit unit test.)

You could either match from an opening till a closing square bracket or capture the / in a capturing group.
In the replacement replace the / with a <
Pattern
\[[^]]+\]|(/)
\[[^]]+\] Match from opening [ till closing ]
| Or
(/) Capture / in group 1
Regex demo | C# demo
For example
string str = "//div[1]/li/a[#href='https://www.facebook.com/']";
string regex = #"\[[^]]+\]|(/)";
str = Regex.Replace(str, regex, m => m.Groups[1].Success ? ">" : m.Value);
Console.WriteLine(str);
Output
>>div[1]>li>a[#href='https://www.facebook.com/']

Regex - Removing specific characters before the final occurance of #

So, I'm trying to remove certain characters [.&#] before the final occurance of an #, but after that final #, those characters should be allowed.
This is what I have so far.
string pattern = #"\.|\&|\#(?![^#]+$)|[^a-zA-Z#]";
string input = "username#middle&something.else#company.com";
// input, pattern, replacement
string result = Regex.Replace(input, pattern, string.Empty);
Console.WriteLine(result);
Output: usernamemiddlesomethingelse#companycom
This currently removes all occurances of the specified characters, apart from the final #. I'm not sure how to get this to work, help please?

You may use
[.&#]+(?=.*#)
Or, equivalent [.&#]+(?![^#]*$). See the regex demo.
Details
[.&#]+ - 1 or more ., & or # chars
(?=.*#) - followed with any 0+ chars (other than LF) as many as possible and then a #.
See the C# demo:
string pattern = #"[.&#]+(?=.*#)";
string input = "username#middle&something.else#company.com";
string result = Regex.Replace(input, pattern, string.Empty);
Console.WriteLine(result);
// => usernamemiddlesomethingelse#company.com

Just a simple solution (and alternative to complex regex) using Substring and LastIndexOf:
string pattern = #"[.#&]";
string input = "username#middle&something.else#company.com";
string inputBeforeLastAt = input.Substring(0, input.LastIndexOf('#'));
// input, pattern, replacement
string result = Regex.Replace(inputBeforeLastAt, pattern, string.Empty) + input.Substring(input.LastIndexOf('#'));
Console.WriteLine(result);
Try it with this fiddle.

c# regex replace everything including a word base64, with nothing and keeping rest of the string

I am wanting to take a string and find base64, and get rid of that and everything prior to that
example
"asdfjljlkjaldf_base64,234u0909230948098234082304802384023094"
Notice "base64," ... I want ONLY everything after "base64,"
Desired: "234u0909230948098234082304802384023094"
I was looking at this code
"string test = "hello, base64, matching";
string regexStrTest;
regexStrTest = #"test\s\w+";
MatchCollection m1 = Regex.Matches(base64,, regexStrTest);
//gets the second matched value
string value = m1[1].Value;
but that is not quite what I want..

Why regular expressions? IndexOf + Substring seems to be quite enough:
string source = "asdfjljlkjaldf_base64,234u0909230948098234082304802384023094";
string tag = "base64,";
string result = source.Substring(source.IndexOf(tag) + tag.Length);

You tried a regex that matches test, a whitespace, and 1+ word chars. The input string just did not match it.
You may use
var results = Regex.Matches(s, #"base64,(\w+)")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
See the regex demo.
The pattern matches base64, substring and then captures into Group 1 one or more word chars with (\w+) pattern. The captured value is stored inside match.Groups[1].Value, just what you get with .Select(m => m.Groups[1].Value).

Some of the other answers are good. Here is a very simple regex
string yourData = "asdfjljlkjaldf_base64,234u0909230948098234082304802384023094";
var newString = Regex.Replace(yourData, "^.*base64,", "");

Remove numbers in specific part of string (within parentheses)

I have a string Test123(45) and I want to remove the numbers within the parenthesis. How would I go about doing that?
So far I have tried the following:
string str = "Test123(45)";
string result = Regex.Replace(str, "(\\d)", string.Empty);
This however leads to the result Test(), when it should be Test123().

tis replaces all parenthesis, filled with digits by parenthesis
string str = "Test123(45)";
string result = Regex.Replace(str, #"\(\d+\)", "()");

\d+(?=[^(]*\))
Try this.Use with verbatinum mode #.The lookahead will make sure number have ) without ( before it.Replace by empty string.
See demo.
https://regex101.com/r/uE3cC4/4

string str = "Test123(45)";
string result = Regex.Replace(str, #"\(\d+\)", "()");

you can also try this way:
string str = "Test123(45)";
string[] delimiters ={#"("};;
string[] split = str.Split(delimiters, StringSplitOptions.None);
var b=split[0]+"()";

Remove a number that is in fact inside parentheses BUT not the parentheses and keep anything else inside them that is not a number with C# Regex.Replace means matching all parenthetical substrings with \([^()]+\) and then removing all digits inside the MatchEvaluator.
Here is a C# sample program:
var str = "Test123(45) and More (5 numbers inside parentheses 123)";
var result = Regex.Replace(str, #"\([^()]+\)", m => Regex.Replace(m.Value, #"\d+", string.Empty));
// => Test123() and More ( numbers inside parentheses )
To remove digits that are enclosed in ( and ) symbols, the ASh's \(\d+\) solution will work well: \( matches a literal (, \d+ matches 1+ digits, \) matches a literal ).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract string that contains only letters in C# - c#

string input = "5991 Duncan Road"; var onlyLetters = new String(input.Where(Char.IsLetter).ToArray()); Output: DuncanRoad But I am expecting output is Duncan Road. What need to change ?

Related

Regex only letters except set of numbers

Regular expression to replace string except in sqaure brackets

Regex - Removing specific characters before the final occurance of #

c# regex replace everything including a word base64, with nothing and keeping rest of the string

Remove numbers in specific part of string (within parentheses)

Categories

Resources