I am having difficulty to figure out a regular expression.
I have sentences:
"1A11 - Vehicle Engine Control Unit (VECU) (Behind Plate)"
"1A1K5 - Vehicle Rear View (Front View)"
I want to trim my sentence from (----), I have this regular expression to do so "#"\s*([^)]*)" but the problem with this one is that like in my first sentence the (VECU) is the abbreviation so I need to keep it. But this regular expression doesn't work if i have 2 () (). How can I modify my regular expression 2 trim only that last () from the sentence?
if (!reportMode)
{
//Look line by line for Title
stream = GetStream(files);
List<String> fileContent = new List<String>();
using (StreamReader sr = new StreamReader(stream))
{
String line = "";
Boolean isInThere = false;
while (!sr.EndOfStream)
{
line = sr.ReadLine();
if (line.Contains(title))
{
//check for exact match
Int32 index = line.IndexOf(" - ");
String revisedLine = line.Substring(index + 3).Trim();
String str = Regex.Replace(revisedLine, #"\s*\([^\)]*\)", "").Trim();
if (Regex.IsMatch(str, String.Format("^{0}$", title)))
isInThere = true;
}
fileContent.Add(line);
}
You could anchor the regexp at the end of the line. This is usually done adding a '$' sign at the end: "\s*\([^\)]*\)$". If the closing parenthesis is the last character of the string this should do. Otherwise you can add expression to ignore whitespace.
(Fixed regexp syntax, thanks Patrick)
--
MaxP
In case you need to remove a parenthetical expression that is last but can appear not only at the end, you may use
Regex rx = new Regex(#"\s*\([^()]*\)(?=[^()]*$)");
String str = rx.Replace(revisedLine, "").Trim();
REGEX:
\s* - 0 or more whitespace symbols
\([^()]*\) - round bracket followed by any number of characters other than ) or (
(?=[^()]*$) - A lookahead that checks if before the end of string there is no ( nor ) symbols.
Mind that you do not need to escape the round brackets inside the character classes.
Related
Using regex want to remove adjacent Space near replacement Character
replacementCharcter = '-'
this._adjacentSpace = new Regex($#"\s*([\{replacementCharacter}])\s*");
MatchCollection replaceCharacterMatch = this._adjacentSpace.Matches(extractedText);
foreach (Match replaceCharacter in replaceCharacterMatch)
{
if (replaceCharacter.Success)
{
cleanedText = Extactedtext.Replace(replaceCharacter.Value, replaceCharacter.Value.Trim());
}
}
Extractedtext = - whi, - ch
cleanedtext = -whi, -ch
expected result : cleanedtext = -whi,-ch
You can use
var Extactedtext = "- whi, - ch";
var replacementCharacter = "-";
var _adjacentSpace = new Regex($#"\s*({Regex.Escape(replacementCharacter)})\s*");
var cleanedText = _adjacentSpace.Replace(Extactedtext, "$1");
Console.WriteLine(cleanedText); // => -whi,-ch
See the C# demo.
NOTE:
replacementCharacter is of type string in the code above
$#"\s*({Regex.Escape(replacementCharacter)})\s*" will create a regex like \s*-\s*, Regex.Escape() will escape any regex-special char (like +, (, etc.) correctly to be used in a regex pattern, and the whole regex simply matches (and captured into Group 1 with the capturing parentheses) the replacementCharacter enclosed with zero or more whitespaces
No need using Regex.Matches, just replace all matches if there are any, that is how Regex.Replace works.
_adjacentSpace is the compiled Regex object, to replace, just call the .Replace() method of the regex object instance
The replacement is a backreference to the Group 1 value, the - char here.
I am working on a Xamarin.Forms PCL project in C# and would like to detect all the hashtags.
I tried splitting at spaces and checking if the word begins with an # but the problem is if the post contains two spaces like "Hello #World Test" it would lose that the double space
string body = "Example string with a #hashtag in it";
string newbody = "";
foreach (var word in body.Split(' '))
{
if (word.StartsWith("#"))
newbody += "[" + word + "]";
newbody += word;
}
Goal output:
Example string with a [#hashtag] in it
I also only want it to have A-Z a-z 0-9 and _ stopping at any other character
Test #H3ll0_W0rld$%Test => Test [#H3ll0_W0rld]$%Test
Other Stack questions try to detect the string and extract it, I would like it work with it and put it back in the string without losing anything that methods such as splitting by certain characters would lose.
You can use Regex with #\w+ and $&
Explanation
# matches the character # literally (case sensitive)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$& Includes a copy of the entire match in the replacement string.
Example
var input = "asdads sdfdsf #burgers, #rabbits dsfsdfds #sdf #dfgdfg";
var regex = new Regex(#"#\w+");
var matches = regex.Matches(input);
foreach (var match in matches)
{
Console.WriteLine(match);
}
or
var result = regex.Replace(input, "[$&]" );
Console.WriteLine(result);
Ouput
#burgers
#rabbits
#sdf
#dfgdfg
asdads sdfdsf [#burgers], [#rabbits] dsfsdfds [#sdf] [#dfgdfg]
Updated Demo here
Another Example
Use a regular expression: \#\w*
string pattern = "\#\w*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
I have one problem in this code. I want to remove all special characters but the square brackets are not getting removed.
string regExp = "[\\\"]";
string tmp = Regex.Replace(str, regExp," ");
string[] strArray = tmp.Split(',');
obj.amcid = db.Execute("select MAX(amcid)+1 from sca_amcmaster");
foreach (string i in strArray)
{
// int myInts = int.Parse(i);
db.Execute(";EXEC insertitems1 #0,#1", i, obj.invoiceno);
}
Square Brackets are metacharacters in Regular Expressions, which allow us to define list of things. So if you want to match then using Regex you need to change your expression to:
string regExp = "\[\\\"\]";
Therefore, you simply need to include the backslashes before the square brackets to match then too.
If none of them are required in the expression, you can group then using brackets, and the character ? (zero or more matches):
string regExp = "(\[)?(\\)?(\")?(\])?";
I'm looking for a way to search a string for everything before a set of characters in C#. For Example, if this is my string value:
This is is a test.... 12345
I want build a new string with all of the characters before "12345".
So my new string would equal "This is is a test.... "
Is there a way to do this?
I've found Regex examples where you can focus on one character but not a sequence of characters.
You don't need to use a Regex:
public string GetBitBefore(string text, string end)
{
var index = text.IndexOf(end);
if (index == -1) return text;
return text.Substring(0, index);
}
You can use a lazy quantifier to match anything, followed by a lookahead:
var match = Regex.Match("This is is a test.... 12345", #".*?(?=\d{5})");
where:
.*? lazily matches everything (up to the lookahead)
(?=…) is a positive lookahead: the pattern must be matched, but is not included in the result
\d{5} matches exactly five digits. I'm assuming this is your lookahead; you can replace it
You can do so with help of regex lookahead.
.*(?=12345)
Example:
var data = "This is is a test.... 12345";
var rxStr = ".*(?=12345)";
var rx = new System.Text.RegularExpressions.Regex (rxStr,
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var match = rx.Match(data);
if (match.Success) {
Console.WriteLine (match.Value);
}
Above code snippet will print every thing upto 12345:
This is is a test....
For more detail about see regex positive lookahead
This should get you started:
var reg = new Regex("^(.+)12345$");
var match = reg.Match("This is is a test.... 12345");
var group = match.Groups[1]; // This is is a test....
Of course you'd want to do some additional validation, but this is the basic idea.
^ means start of string
$ means end of string
The asterisk tells the engine to attempt to match the preceding token zero or more times. The plus tells the engine to attempt to match the preceding token once or more
{min,max} indicate the minimum/maximum number of matches.
\d matches a single character that is a digit, \w matches a "word character" (alphanumeric characters plus underscore), and \s matches a whitespace character (includes tabs and line breaks).
[^a] means not so exclude a
The dot matches a single character, except line break characters
In your case there many way to accomplish the task.
Eg excluding digit: ^[^\d]*
If you know the set of characters and they are not only digit, don't use regex but IndexOf(). If you know the separator between first and second part as "..." you can use Split()
Take a look at this snippet:
class Program
{
static void Main(string[] args)
{
string input = "This is is a test.... 12345";
// Here we call Regex.Match.
MatchCollection matches = Regex.Matches(input, #"(?<MySentence>(\w+\s*)*)(?<MyNumberPart>\d*)");
foreach (Match item in matches)
{
Console.WriteLine(item.Groups["MySentence"]);
Console.WriteLine("******");
Console.WriteLine(item.Groups["MyNumberPart"]);
}
Console.ReadKey();
}
}
You could just split, not as optimal as the indexOf solution
string value = "oiasjdoiasj12345";
string end = "12345";
string result = value.Split(new string[] { end }, StringSplitOptions.None)[0] //Take first part of the result, not the quickest but fairly simple
Here is my string:
1-1 This is my first string. 1-2 This is my second string. 1-3 This is my third string.
How can I break like in C# like;
result[0] = This is my first string.
result[1] = This is my second string.
result[2] = This is my third string.
IEnumerable<string> lines = Regex.Split(text, "(?:^|[\r\n]+)[0-9-]+ ").Skip(1);
EDIT: If you want the result in an array you can do string[] result = lines.ToArray();
Regex regex = new Regex("^(?:[0-9]+-[0-9]+ )(.*?)$", RegexOptions.Multiline);
var str = "1-1 This is my first string.\n1-2 This is my second string.\n1-3 This is my third string.";
var matches = regex.Matches(str);
List<string> strings = matches.Cast<Match>().Select(p => p.Groups[1].Value).ToList();
foreach (var s in strings)
{
Console.WriteLine(s);
}
We use a multiline Regex, so that ^ and $ are the beginning and end of the line. We skip one or more numbers, a -, one or more numbers and a space (?:[0-9]+-[0-9]+ ). We lazily (*?) take everything (.) else until the end of the line (.*?)$, lazily so that the end of the line $ is more "important" than any character .
Then we put the matches in a List<string> using Linq.
Lines will end with newline, carriage-return or both, This splits the string into lines with all line-endings.
using System.Text.RegularExpressions;
...
var lines = Regex.Split( input, "[\r\n]+" );
Then you can do what you want with each line.
var words = Regex.Split( line[i], "\s" );