Splitting text in C# by tag

Splitting text in C# by tag - c#

I am splitting string in my code like this:
var lines = myString == null
? new string[] { }
: myString.Split(new[] { "\n", "<br />" }, StringSplitOptions.RemoveEmptyEntries);
The trouble is this, sometimes the text looks like this:
sdjkgjkdgjk<br />asdfsdg
And in this case my code works. however, other times, the text looks like this:
sdjkgjkdgjk<br style="someAttribute: someProperty;"/>asdfsdg
And in this case, I don't get the result I want. how to split this string by the whole br tag, along with its all attributes?

I hope the following code will help you.
var items = Regex.Split("sdjkgjkdgjk<br style='someAttribute: someProperty;'/>asdfsdg", #"<.*?>");

If you only need to split by br tags and newline, regex is a good option:
var lines = myString == null ?
new string[] { } :
Regex.Split(myString, "(<br.+>)|(\r\n?|\n)");
But if your requirements get more complex, I'd suggest using an HTML parser.

you can try this one:
var parts = Regex.Split(value, #"(<b>[\s\S]+?<\/b>)").Where(l => l != string.Empty).ToArray();

Use Regex.Split(). Below is an example:-
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = "sdjkgjkdgjk<br />asdfsdg";
string pattern = "<br.*\\/>"; // Split on <br/>
DisplayByRegex(input, pattern);
input = "sdjkgjkdgjk<br style=\"someAttribute: someProperty;\"/>asdfsdg";
DisplayByRegex(input, pattern);
Console.Read();
}
private static void DisplayByRegex(string input, string pattern)
{
string[] substrings = Regex.Split(input, pattern);
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
}
}

You shoul use a regular expression.
Here you can find a good tutorial for your purpose.

Related

C# Write Word at the end, if string pattern contains

I am trying to write code,
any line which contains word 'ocean', I will write 'water' at the end
how would I conduct this with RegeEx?
Sample:
test1
abcdocean123
test2
test3
Result (keeps all other spacing in file):
test1
abcdocean123 water
test2
test3
Code Attempt:
public string FileRead(string path)
{
content = File.ReadAllText(path);
return content;
}
public string FileChange()
{
var lines = content.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(line => Regex.Replace(line, #"\bocean\b\n", "water \n"));
content = String.Join("\n", lines);
return content;
}

You need to check if a line contains ocean, and, if yes, append the water to that line only:
var content = "test1\n\nabcdocean123 \n\n\ntest2\ntest3";
var lines = content.Split(new[] { "\n" }, StringSplitOptions.None)
.Select(line => line.Contains("ocean") ? $"{line}water" : line);
return string.Join("\n", lines);
See the C# demo
If you still need to use a regex replace line.Contains("ocean") with Regex.IsMatch(line, #"\bocean\b"), or whatever regex you need there. Just note that \b is a word boundary and \bocean\b will match only when not enclosed with word chars (digits, letters or underscores).
Note you should rely on splitting with a newline without removing any empty lines, and when joining the lines back you won't lose any empty ones.
If you really want to continue your journey with regex, you may use
var content = "test1\n\nabcdocean123 \n\n\ntest2\ntest3";
content = Regex.Replace(content, #"ocean.*", "$&water");
// If your line endings are CRLF, use
// content = Regex.Replace(content, #"ocean[^\r\n]*", "$&water");
Console.WriteLine(content);
See this C# demo
Here, ocean.* matches ocean substring and .* matches the rest of the line and $& replaces with the match found and then water is added. [^\r\n] is preferable if your line endings may include CR and as . matches CR, it is safer to use [^\r\n], any char but CR and LF.

Check this
Regex.Replace(line, #"(ocean)(\w+)", "$1water $2\n");
Working Fiddle

There is no need to use Regex at all in your case, if I got your question right.
You can just check wheter a string contains the ocean phrase and append the water word then.
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
private static readonly string Token = "ocean";
private static readonly string AppendToken = "water";
public static void Main()
{
var mylist = new List<string>(new string[] { "firststring", "asdsadsaoceansadsadas", "onemoreocean", "notOcccean" });
var newList = mylist.Select(str => {
if(str.Contains(Program.Token)) {
return str + " " +Program.AppendToken;
}
return str;
});
foreach (object o in newList)
{
Console.WriteLine(o);
}
}
}
You can run this code on DotnetFiddle

Split string by character in C#

I need to split this code by ',' in C#.
Sample string:
'DC0''008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'
I can use string.split(',') but as you can see 'Comm,erc,' is split up by
comm
erc
also 'DC0''008_' should split up as
'DC0''008_'
not as
'DC0'
'008_'
The expected output should be like this:
'DC0''008_'
'23802.76'
'23802.76'
'23802.76'
'Comm,erc,'
'2f17'
'3f44c0ba-daf1-44f0-a361-'

split can do it but regex will be more complex.
You can use Regex.Matches using this simpler regex:
'[^']*'
and get all quoted strings in a collection.
Code:
MatchCollection matches = Regex.Matches(input, #"'[^']*'");
To print all the matched values:
foreach (Match match in Regex.Matches(input, #"'[^']*'"))
Console.WriteLine("Found {0}", match.Value);
To store all matched values in an ArrayList:
ArrayList list = new ArrayList();
foreach (Match match in Regex.Matches(input, #"'[^']*'")) {
list.add(match.Value);
}
EDIT: As per comments below if OP wants to consume '' in the captured string then use this lookaround regex:
'.*?(?<!')'(?!')
(?<!')'(?!') means match a single quote that is not surrounded by another single quote.
RegEx Demo

You can use this Regex to get all the things inside the commas and apostrophes:
(?<=')[^,].*?(?=')
Regex101 Explanation
To convert it into a string array, you can use the following:
var matches = Regex.Matches(strInput, "(?<=')[^,].*?(?=')");
var array = matches.Cast<Match>().Select(x => x.Value).ToArray();
EDIT: If you want it to be able to capture double quotes, then the Regex that will match it in every case becomes unwieldy. At this point, It's better to just use a simpler pattern with Regex.Split:
var matches = Regex.Split(strInput, "^'|'$|','")
.Where(x => !string.IsNullOrEmpty(x))
.ToArray();

it is good to modify your string then split it so that you will achieve what you want like some thing below
string data = "'DC0008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'";
data = Process(data); //process before split i.e for the time being replace outer comma with some thing else like '#'
string[] result = data.Split('#'); // now it will work lolz not confirmed and tested
the Process() function is below
private string Process(string input)
{
bool flag = false;
string temp="";
char[] data = input.ToCharArray();
foreach(char ch in data)
{
if(ch == '\'' || ch == '"')
if(flag)
flag=false;
else
flag=true;
if(ch == ',')
{
if(flag) //if it is inside ignore else replace with #
temp+=ch;
else
temp+="#";
}
else
temp+=ch;
}
return temp;
}
see output here http://rextester.com/COAH43918

using System;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication15
{
class Program
{
static void Main(string[] args)
{
string str = "'DC0008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'";
var matches = Regex.Matches(str, "(?<=')[^,].*?(?=')");
var array = matches.Cast<Match>().Select(x => x.Value).ToArray();
foreach (var item in array)
Console.WriteLine("'" + item + "'");
}
}
}

C# removing string before/after delimiter

I've seen a lot of post using RegEx to remove part of string before or after using some delimiter. The fact is, I don't understand RegEx and have a case a little strange. Here is the situation :
I have a string that can be :
string test1 = (something)...keepThisOne
string test2 = keepThisOne...(something)
string test3 = (something)...keepThisOne...(somethingelse)
So far I got :
string test = testx.Substring(testx.LastIndexOf('.')+1);
but it does not work even for the string test1
I know RegExp can be use to remove everything between paranthesis and all the "..." in this string. My question is how can I achieve that with RegExp without knowing in advance what kind of string test I will get, and what does it the RegExp means ??
The output needed is the get only :
string result = keepThisOne
whatever the string test is used.

Try with Regex :
Regex rgx = new Regex(#"\.*\(\w*\)\.*");
string result = rgx.Replace(input, string.Empty);
Regex will generate the output as
keepThisOne
keepThisOne
keepThisOne
You can run the various scenario in this fiddle.

This does not need RegEx:
string test = testx.Split(new string[] { "..." }, StringSplitOptions.RemoveEmptyEntries)
.Single(s => !s.StartsWith("(") && !s.EndsWith(")"));
This splits the original string by the dots and only returns the part that does not start and end with parentheses.

You can use this code (adapted from another answer):
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
Regex rgx = new Regex(#"…*\.*\(\w*\)\.*…*");
Console.WriteLine(rgx.Replace("(something)…keepThisOne", string.Empty));
Console.WriteLine(rgx.Replace("keepThisOne…(something)", string.Empty));
Console.WriteLine(rgx.Replace("(something)...keepThisOne…(somethingelse)", string.Empty));
}
}
Try it in a fiddle.

This is a LINQ solution working in all 3 cases:
var res = String.Join("", Char.IsLetter(input.First()) ?
input.TakeWhile(c => Char.IsLetter(c)) :
input.SkipWhile(c => c != '.')
.SkipWhile(c => c == '.')
.TakeWhile(c => Char.IsLetter(c)));

C# Get string between two characters in a string

I have a string like below:
{{"textA","textB","textC"}}
And currently, I'm using below code to split them:
string stringIWantToSplit = "{{\"textA\",\"textB\",\"textC\"}}";
string[] result = stringIWantToSplit.Split(',');
And I can get the below result:
{{"textA"
"textB"
"textC"}}
After that, I can manually trim out the '{' and '}' to get the final result, but here is the problem:
If the string is like below:
`{{"textA","textB,textD","textC"}}`
Then the result will be different from Expected result
Expected result:
"textA"
"textB,textD"
"textC"
Actual result:
{{"textA"
"textB
textD"
"textC"}}
How can I get the string between two double quotes?
Updated:
Just now when I checked the data, I found that some of them contains decimals i.e.
{{"textA","textB","",0,9.384,"textC"}}
Currently, I'm trying to use Jenish Rabadiya's approach, and the regex I'm using is
(["'])(?:(?=(\\?))\2.)*?\1
but with this regex, the numbers aren't selected, how to modify it so that the numbers / decimal can be selected?

Try using regex like following.
Regex regex = new Regex(#"([""'])(?:(?=(\\?))\2.)*?\1");
foreach (var match in regex.Matches("{{\"textA\",\"textB\",\"textC\"}}"))
{
Console.WriteLine(match);
}
Here is working dotnet fiddle => Link

Assuming your string will always look like your examples, you can use a simple regular expression to get your strings out:
string s = "{{\"textA\",\"textB,textD\",\"textC\"}}";
foreach (Match m in Regex.Matches(s, "\\\".*?\\\""))
{
//do stuff
}

I think this will help you,
List<string> specialChars = new List<string>() {",", "{{","}}" };
string stringIWantToSplit = "{{\"textA\",\"textB,textD\",\"textC\"}}";
string[] result = stringIWantToSplit.Split(new char[] {'"'}, StringSplitOptions.RemoveEmptyEntries)
.Where(text => !specialChars.Contains(text)).ToArray();

Using this regex makes simple:
text = Regex.Replace(text, #"^[\s,]+|[\s,]+$", "");

I finally modified the regex to this:
(["'])(?:(?=(\\?))\2.)*?\1|(\d*\.?\d*)[^"' {},]
And this finally works:
Sample:
https://dotnetfiddle.net/vg4jUh

Remove formatting on string literal

Given the c# code:
string foo = #"
abcde
fghijk";
I am trying to remove all formatting, including whitespaces between the lines.
So far the code
foo = foo.Replace("\n","").Replace("\r", "");
works but the whitespace between lines 2 and 3 and still kept.
I assume a regular expression is the only solution?
Thanks.

I'm assuming you want to keep multiple lines, if not, i'd choose CAbbott's answer.
var fooNoWhiteSpace = string.Join(
Environment.NewLine,
foo.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(fooline => fooline.Trim())
);
What this does it split the string into lines (foo.Split),
trim whitespace from the start and end of each line (.Select(fooline => fooline.Trim())),
then combine them back together with a new line inbetween (string.Join).

You could use a regular expression:
foo = Regex.Replace(foo, #"\s+", "");

How about this?
string input = #"
abcde
fghijk";
string output = "";
string[] parts = input.Split('\n');
foreach (var part in parts)
{
// If you want everything on one line... else just + "\n" to it
output += part.Trim();
}
This should remove everthing.

If the whitespace is all spaces, you could use
foo.Replace(" ", "");
For any other whitespace that may be in there, do the same. Example:
foo.Replace("\t", "");

Just add a Replace(" ", "") your dealing with a string literal which mean all the white space is part of the string.

Try something like this:
string test = #"
abcde
fghijk";
EDIT: Addded code to only filter out white spaces.
string newString = new string(test.Where(c => Char.IsWhiteSpace(c) == false).ToArray());
Produces the following: abcdefghijk

I've written something similar to George Duckett but put my logic into a string extension method so it easier for other to read/consume:
public static class Extensions
{
public static string RemoveTabbing(this string fmt)
{
return string.Join(
System.Environment.NewLine,
fmt.Split(new string[] { System.Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(fooline => fooline.Trim()));
}
}
you can the call it like this:
string foo = #"
abcde
fghijk".RemoveTabbing();
I hope that helps someone

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Splitting text in C# by tag - c#

I hope the following code will help you. var items = Regex.Split("sdjkgjkdgjk<br style='someAttribute: someProperty;'/>asdfsdg", #"<.*?>");

If you only need to split by br tags and newline, regex is a good option: var lines = myString == null ? new string[] { } : Regex.Split(myString, "(<br.+>)|(\r\n?|\n)"); But if your requirements get more complex, I'd suggest using an HTML parser.

you can try this one: var parts = Regex.Split(value, #"(<b>[\s\S]+?<\/b>)").Where(l => l != string.Empty).ToArray();

You shoul use a regular expression. Here you can find a good tutorial for your purpose.

Related

C# Write Word at the end, if string pattern contains

Split string by character in C#

C# removing string before/after delimiter

C# Get string between two characters in a string

Remove formatting on string literal

Categories

Resources