C# removing string before/after delimiter - c#

I've seen a lot of post using RegEx to remove part of string before or after using some delimiter. The fact is, I don't understand RegEx and have a case a little strange. Here is the situation :
I have a string that can be :
string test1 = (something)...keepThisOne
string test2 = keepThisOne...(something)
string test3 = (something)...keepThisOne...(somethingelse)
So far I got :
string test = testx.Substring(testx.LastIndexOf('.')+1);
but it does not work even for the string test1
I know RegExp can be use to remove everything between paranthesis and all the "..." in this string. My question is how can I achieve that with RegExp without knowing in advance what kind of string test I will get, and what does it the RegExp means ??
The output needed is the get only :
string result = keepThisOne
whatever the string test is used.

Try with Regex :
Regex rgx = new Regex(#"\.*\(\w*\)\.*");
string result = rgx.Replace(input, string.Empty);
Regex will generate the output as
keepThisOne
keepThisOne
keepThisOne
You can run the various scenario in this fiddle.

This does not need RegEx:
string test = testx.Split(new string[] { "..." }, StringSplitOptions.RemoveEmptyEntries)
.Single(s => !s.StartsWith("(") && !s.EndsWith(")"));
This splits the original string by the dots and only returns the part that does not start and end with parentheses.

You can use this code (adapted from another answer):
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
Regex rgx = new Regex(#"…*\.*\(\w*\)\.*…*");
Console.WriteLine(rgx.Replace("(something)…keepThisOne", string.Empty));
Console.WriteLine(rgx.Replace("keepThisOne…(something)", string.Empty));
Console.WriteLine(rgx.Replace("(something)...keepThisOne…(somethingelse)", string.Empty));
}
}
Try it in a fiddle.

This is a LINQ solution working in all 3 cases:
var res = String.Join("", Char.IsLetter(input.First()) ?
input.TakeWhile(c => Char.IsLetter(c)) :
input.SkipWhile(c => c != '.')
.SkipWhile(c => c == '.')
.TakeWhile(c => Char.IsLetter(c)));

Related

How can I replace "XX,XXX" with "XX XXX"?

I need to replace string like "XX,XXX" with "XX XXX". The string "XX,XXX" is in another string, e.g:
"-1299-5,"XXX,XXXX",trft,4,0,10800"
The string is fetched from a text file. I want to split the string by ",". But the comma in the substring led to the wrong result.
The X represents a char. I think regex can help, who can give me the right regex expression.
This expression,
(.*"[^,]*),([^,]*".*)
with a replacement of $1 $2 might work.
Demo
Example
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(.*""[^,]*),([^,]*"".*)";
string substitution = #"\1 \2";
string input = #"-1299-5,""XXX,XXXX"",trft,4,0,10800";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}
Simply, use 'Replace' to replace char from your string.
var test = "XXX,XXXX";
var filtered = test.Replace(',', ' ');
Console.WriteLine(filtered);
Output :
XXX XXXX

C# Write Word at the end, if string pattern contains

I am trying to write code,
any line which contains word 'ocean', I will write 'water' at the end
how would I conduct this with RegeEx?
Sample:
test1
abcdocean123
test2
test3
Result (keeps all other spacing in file):
test1
abcdocean123 water
test2
test3
Code Attempt:
public string FileRead(string path)
{
content = File.ReadAllText(path);
return content;
}
public string FileChange()
{
var lines = content.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(line => Regex.Replace(line, #"\bocean\b\n", "water \n"));
content = String.Join("\n", lines);
return content;
}
You need to check if a line contains ocean, and, if yes, append the water to that line only:
var content = "test1\n\nabcdocean123 \n\n\ntest2\ntest3";
var lines = content.Split(new[] { "\n" }, StringSplitOptions.None)
.Select(line => line.Contains("ocean") ? $"{line}water" : line);
return string.Join("\n", lines);
See the C# demo
If you still need to use a regex replace line.Contains("ocean") with Regex.IsMatch(line, #"\bocean\b"), or whatever regex you need there. Just note that \b is a word boundary and \bocean\b will match only when not enclosed with word chars (digits, letters or underscores).
Note you should rely on splitting with a newline without removing any empty lines, and when joining the lines back you won't lose any empty ones.
If you really want to continue your journey with regex, you may use
var content = "test1\n\nabcdocean123 \n\n\ntest2\ntest3";
content = Regex.Replace(content, #"ocean.*", "$&water");
// If your line endings are CRLF, use
// content = Regex.Replace(content, #"ocean[^\r\n]*", "$&water");
Console.WriteLine(content);
See this C# demo
Here, ocean.* matches ocean substring and .* matches the rest of the line and $& replaces with the match found and then water is added. [^\r\n] is preferable if your line endings may include CR and as . matches CR, it is safer to use [^\r\n], any char but CR and LF.
Check this
Regex.Replace(line, #"(ocean)(\w+)", "$1water $2\n");
Working Fiddle
There is no need to use Regex at all in your case, if I got your question right.
You can just check wheter a string contains the ocean phrase and append the water word then.
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
private static readonly string Token = "ocean";
private static readonly string AppendToken = "water";
public static void Main()
{
var mylist = new List<string>(new string[] { "firststring", "asdsadsaoceansadsadas", "onemoreocean", "notOcccean" });
var newList = mylist.Select(str => {
if(str.Contains(Program.Token)) {
return str + " " +Program.AppendToken;
}
return str;
});
foreach (object o in newList)
{
Console.WriteLine(o);
}
}
}
You can run this code on DotnetFiddle

Splitting text in C# by tag

I am splitting string in my code like this:
var lines = myString == null
? new string[] { }
: myString.Split(new[] { "\n", "<br />" }, StringSplitOptions.RemoveEmptyEntries);
The trouble is this, sometimes the text looks like this:
sdjkgjkdgjk<br />asdfsdg
And in this case my code works. however, other times, the text looks like this:
sdjkgjkdgjk<br style="someAttribute: someProperty;"/>asdfsdg
And in this case, I don't get the result I want. how to split this string by the whole br tag, along with its all attributes?
I hope the following code will help you.
var items = Regex.Split("sdjkgjkdgjk<br style='someAttribute: someProperty;'/>asdfsdg", #"<.*?>");
If you only need to split by br tags and newline, regex is a good option:
var lines = myString == null ?
new string[] { } :
Regex.Split(myString, "(<br.+>)|(\r\n?|\n)");
But if your requirements get more complex, I'd suggest using an HTML parser.
you can try this one:
var parts = Regex.Split(value, #"(<b>[\s\S]+?<\/b>)").Where(l => l != string.Empty).ToArray();
Use Regex.Split(). Below is an example:-
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = "sdjkgjkdgjk<br />asdfsdg";
string pattern = "<br.*\\/>"; // Split on <br/>
DisplayByRegex(input, pattern);
input = "sdjkgjkdgjk<br style=\"someAttribute: someProperty;\"/>asdfsdg";
DisplayByRegex(input, pattern);
Console.Read();
}
private static void DisplayByRegex(string input, string pattern)
{
string[] substrings = Regex.Split(input, pattern);
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
}
}
You shoul use a regular expression.
Here you can find a good tutorial for your purpose.

C# Get string between two characters in a string

I have a string like below:
{{"textA","textB","textC"}}
And currently, I'm using below code to split them:
string stringIWantToSplit = "{{\"textA\",\"textB\",\"textC\"}}";
string[] result = stringIWantToSplit.Split(',');
And I can get the below result:
{{"textA"
"textB"
"textC"}}
After that, I can manually trim out the '{' and '}' to get the final result, but here is the problem:
If the string is like below:
`{{"textA","textB,textD","textC"}}`
Then the result will be different from Expected result
Expected result:
"textA"
"textB,textD"
"textC"
Actual result:
{{"textA"
"textB
textD"
"textC"}}
How can I get the string between two double quotes?
Updated:
Just now when I checked the data, I found that some of them contains decimals i.e.
{{"textA","textB","",0,9.384,"textC"}}
Currently, I'm trying to use Jenish Rabadiya's approach, and the regex I'm using is
(["'])(?:(?=(\\?))\2.)*?\1
but with this regex, the numbers aren't selected, how to modify it so that the numbers / decimal can be selected?
Try using regex like following.
Regex regex = new Regex(#"([""'])(?:(?=(\\?))\2.)*?\1");
foreach (var match in regex.Matches("{{\"textA\",\"textB\",\"textC\"}}"))
{
Console.WriteLine(match);
}
Here is working dotnet fiddle => Link
Assuming your string will always look like your examples, you can use a simple regular expression to get your strings out:
string s = "{{\"textA\",\"textB,textD\",\"textC\"}}";
foreach (Match m in Regex.Matches(s, "\\\".*?\\\""))
{
//do stuff
}
I think this will help you,
List<string> specialChars = new List<string>() {",", "{{","}}" };
string stringIWantToSplit = "{{\"textA\",\"textB,textD\",\"textC\"}}";
string[] result = stringIWantToSplit.Split(new char[] {'"'}, StringSplitOptions.RemoveEmptyEntries)
.Where(text => !specialChars.Contains(text)).ToArray();
Using this regex makes simple:
text = Regex.Replace(text, #"^[\s,]+|[\s,]+$", "");
I finally modified the regex to this:
(["'])(?:(?=(\\?))\2.)*?\1|(\d*\.?\d*)[^"' {},]
And this finally works:
Sample:
https://dotnetfiddle.net/vg4jUh

Regex starting with a string

I want to filter the following string with the regular expressions:
TEST^AB^^HOUSE-1234~STR2255
I wanna get only the string "HOUSE-1234" and I've to test the string always with the beginning "TEST^AB^^" and ending with the "~".
Can you please help me how the regex should look like?
You can use \^\^(.*?)\~ pattern which matches start with ^^ and ends with ~
string s = #"TEST^AB^^HOUSE-1234~STR2255";
Match match = Regex.Match(s, #"\^\^(.*?)\~", RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
Output will be;
HOUSE-1234
Here is a DEMO.
string input = "TEST^AB^^HOUSE-1234~STR2255";
var matches = Regex.Matches(input, #"TEST\^AB\^\^(.+?)~").Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
string pattern=#"\^\^(.*)\~";
Regex re=new Regex(pattern);
With the little information you've given us (and assuming that the TEST^AB isn't necessarily constant), this might work:
(?:\^\^).*(?:~)
See here
Or if TEST^AB is constant, you can throw it in too
(?:TEST\^AB\^\^).*(?:~)
The important part is to remember that you need to escape the ^
Don't even need the RegEx for something that well defined. If you want to simplify:
string[] splitString;
if (yourstring.StartsWith("TEST^AB^^"))
{
yourstring = yourstring.Remove(0, 9);
splitString = yourstring.Split('~');
return splitString[0];
}
return null;
(TEST\^AB\^\^)((\w)+-(\w+))(\~.+)
There are three groups :
(TEST\^AB\^\^) : match yours TEST^AB^^
((\w)+\-(\w+)) : match yours HOUSE-123
(\~.+) : match the rest
You should do this without regex:
var str = "TEST^AB^^HOUSE-1234~STR2255";
var result = (str.StartsWith("TEST^AB^^") && str.IndexOf('~') > -1)
? new string(str.Skip(9).TakeWhile(c=>c!='~').ToArray())
: null;
Console.WriteLine(result);

Categories