C# Write Word at the end, if string pattern contains - c#

I am trying to write code,
any line which contains word 'ocean', I will write 'water' at the end
how would I conduct this with RegeEx?
Sample:
test1
abcdocean123
test2
test3
Result (keeps all other spacing in file):
test1
abcdocean123 water
test2
test3
Code Attempt:
public string FileRead(string path)
{
content = File.ReadAllText(path);
return content;
}
public string FileChange()
{
var lines = content.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(line => Regex.Replace(line, #"\bocean\b\n", "water \n"));
content = String.Join("\n", lines);
return content;
}

You need to check if a line contains ocean, and, if yes, append the water to that line only:
var content = "test1\n\nabcdocean123 \n\n\ntest2\ntest3";
var lines = content.Split(new[] { "\n" }, StringSplitOptions.None)
.Select(line => line.Contains("ocean") ? $"{line}water" : line);
return string.Join("\n", lines);
See the C# demo
If you still need to use a regex replace line.Contains("ocean") with Regex.IsMatch(line, #"\bocean\b"), or whatever regex you need there. Just note that \b is a word boundary and \bocean\b will match only when not enclosed with word chars (digits, letters or underscores).
Note you should rely on splitting with a newline without removing any empty lines, and when joining the lines back you won't lose any empty ones.
If you really want to continue your journey with regex, you may use
var content = "test1\n\nabcdocean123 \n\n\ntest2\ntest3";
content = Regex.Replace(content, #"ocean.*", "$&water");
// If your line endings are CRLF, use
// content = Regex.Replace(content, #"ocean[^\r\n]*", "$&water");
Console.WriteLine(content);
See this C# demo
Here, ocean.* matches ocean substring and .* matches the rest of the line and $& replaces with the match found and then water is added. [^\r\n] is preferable if your line endings may include CR and as . matches CR, it is safer to use [^\r\n], any char but CR and LF.

Check this
Regex.Replace(line, #"(ocean)(\w+)", "$1water $2\n");
Working Fiddle

There is no need to use Regex at all in your case, if I got your question right.
You can just check wheter a string contains the ocean phrase and append the water word then.
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
private static readonly string Token = "ocean";
private static readonly string AppendToken = "water";
public static void Main()
{
var mylist = new List<string>(new string[] { "firststring", "asdsadsaoceansadsadas", "onemoreocean", "notOcccean" });
var newList = mylist.Select(str => {
if(str.Contains(Program.Token)) {
return str + " " +Program.AppendToken;
}
return str;
});
foreach (object o in newList)
{
Console.WriteLine(o);
}
}
}
You can run this code on DotnetFiddle

Related

How can I Replace special characters

I've got a string value with a lot of different characters
I want to:
replace TAB,ENTER, with Space
replace Arabic ي with Persian ی
replace Arabic ك with Persian ک
remove newlines from both sides of a string
replace multiple space with one space
Trim space
The following Function is for cleaning data. and it works correctly.
Does anyone have any idea for better performance and less code for maintenance :)
static void Main(string[] args)
{
var output = "كgeeks 01$سهيلاطريقي03. اشك!#!!.ي";
//output = output.Replace("\u064A", "\u0649");//ي
output = output.Replace("\u064A", "\u06CC");//replace arabic ي with persian ی
output = output.Replace("\u0643", "\u06A9");//replace arabic ك with persian ک
output = output.Trim('\r', '\n');//remove newlines from both sides of a string
output = output.Replace("\n", "").Replace("\r", " ");//replace newline with space
RegexOptions options = RegexOptions.None;
Regex regex = new Regex("[ ]{2,}", options);//replace multiple space with one space
output = regex.Replace(output, " ");
char tab = '\u0009';
output = output.Replace(tab.ToString(), "");
Console.WriteLine(output);
}
You can refactor using two lists: one for the trim process and one for the replace process.
var itemsTrimChars = new List<char>()
{
'\r',
'\n'
};
var itemsReplaceStrings = new Dictionary<string, string>()
{
{ "\n", "" },
{ "\r", " " },
{ "\u064A", "\u06CC" },
{ "\u0643", "\u06A9" },
{ "\u0009", "" }
}.ToList();
Thus they are maintenable tables with the technology you want: as local in this example, declared at the level of a class, using tables in a database, using disk text files...
Used like that:
itemsTrimChars.ForEach(c => output = output.Trim(c));
itemsReplaceStrings.ForEach(p => output = output.Replace(p.Key, p.Value));
For the regex to replace double spaces, I know nothing about, but if you need to replace other doubled, you can create a third list.
You can do this by iterating over each character and apply those rules, forming a new output string that is the format you want. It should be faster than all those string.Replace, and Regex.Match.
Use string builder for performance when appending, don't use string += string
First Find Character in your string and then remove it and in the same index add new character
private string ReplaceChars(string Source, string Find, string Replace)
{
int Place = Source.IndexOf(Find);
string result = Source.Remove(Place, Find.Length).Insert(Place, Replace);
return result;
}
Usage :
text= "كgeeks 01$سهيلاطريقي03. اشك!#!!.ي";
var result =ReplaceChars(text,"ي","ی");

Splitting text in C# by tag

I am splitting string in my code like this:
var lines = myString == null
? new string[] { }
: myString.Split(new[] { "\n", "<br />" }, StringSplitOptions.RemoveEmptyEntries);
The trouble is this, sometimes the text looks like this:
sdjkgjkdgjk<br />asdfsdg
And in this case my code works. however, other times, the text looks like this:
sdjkgjkdgjk<br style="someAttribute: someProperty;"/>asdfsdg
And in this case, I don't get the result I want. how to split this string by the whole br tag, along with its all attributes?
I hope the following code will help you.
var items = Regex.Split("sdjkgjkdgjk<br style='someAttribute: someProperty;'/>asdfsdg", #"<.*?>");
If you only need to split by br tags and newline, regex is a good option:
var lines = myString == null ?
new string[] { } :
Regex.Split(myString, "(<br.+>)|(\r\n?|\n)");
But if your requirements get more complex, I'd suggest using an HTML parser.
you can try this one:
var parts = Regex.Split(value, #"(<b>[\s\S]+?<\/b>)").Where(l => l != string.Empty).ToArray();
Use Regex.Split(). Below is an example:-
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = "sdjkgjkdgjk<br />asdfsdg";
string pattern = "<br.*\\/>"; // Split on <br/>
DisplayByRegex(input, pattern);
input = "sdjkgjkdgjk<br style=\"someAttribute: someProperty;\"/>asdfsdg";
DisplayByRegex(input, pattern);
Console.Read();
}
private static void DisplayByRegex(string input, string pattern)
{
string[] substrings = Regex.Split(input, pattern);
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
}
}
You shoul use a regular expression.
Here you can find a good tutorial for your purpose.

Get only Whole Words from a .Contains() statement

I've used .Contains() to find if a sentence contains a specific word however I found something weird:
I wanted to find if the word "hi" was present in a sentence which are as follows:
The child wanted to play in the mud
Hi there
Hector had a hip problem
if(sentence.contains("hi"))
{
//
}
I only want the SECOND sentence to be filtered however all 3 gets filtered since CHILD has a 'hi' in it and hip has a 'hi' in it. How do I use the .Contains() such that only whole words get picked out?
Try using Regex:
if (Regex.Match(sentence, #"\bhi\b", RegexOptions.IgnoreCase).Success)
{
//
};
This works just fine for me on your input text.
Here's a Regex solution:
Regex has a Word Boundary Anchor using \b
Also, if the search string might come from user input, you might consider escaping the string using Regex.Escape
This example should filter a list of strings the way you want.
string findme = "hi";
string pattern = #"\b" + Regex.Escape(findme) + #"\b";
Regex re = new Regex(pattern,RegexOptions.IgnoreCase);
List<string> data = new List<string> {
"The child wanted to play in the mud",
"Hi there",
"Hector had a hip problem"
};
var filtered = data.Where(d => re.IsMatch(d));
DotNetFiddle Example
You could split your sentence into words - you could split at each space and then trim any punctuation. Then check if any of these words are 'hi':
var punctuation = source.Where(Char.IsPunctuation).Distinct().ToArray();
var words = sentence.Split().Select(x => x.Trim(punctuation));
var containsHi = words.Contains("hi", StringComparer.OrdinalIgnoreCase);
See a working demo here: https://dotnetfiddle.net/AomXWx
You could write your own extension method for string like:
static class StringExtension
{
public static bool ContainsWord(this string s, string word)
{
string[] ar = s.Split(' ');
foreach (string str in ar)
{
if (str.ToLower() == word.ToLower())
return true;
}
return false;
}
}

How can I strip in-line comments from a text reader

Hi I'm trying to remove comments from within a text file by iterating through a streamreader and checking if each line starts with /*
private void StripComments()
{
_list = new List<string>();
using (_reader = new StreamReader(_path))
{
while ((_line = _reader.ReadLine()) != null)
{
var temp =_line.Trim();
if (!temp.StartsWith(#"/*"))
{
_list.Add(temp);
}
}
}
}
I need to remove comments with the following format /* I AM A COMMENT */ I thought that the file only had whole line comments but upon closer inspection there are comments located at the ends of some lines. The .endswith(#"*/") can't be used as this would remove the code preceding it.
Thanks.
If you are comfortable with regex
string pattern="(?s)/[*].*?[*]/";
var output=Regex.Replace(File.ReadAllText(path),pattern,"");
. would match any character other then newline.
(?s) toggles the single line mode in which . would also match newlines..
.* would match 0 to many characters where * is a quantifier
.*? would match lazily i.e it would match as less as possible
NOTE
That won't work if a string within "" contain /*..You should use a parser instead!
Regex is a good fit for this.
string START = Regex.Escape("/*");
string END = Regex.Escape("*/");
string input = #"aaa/* bcd
de */ f";
var str = Regex.Replace(input, START + ".+?" + END, "",RegexOptions.Singleline);
List<string> _list = new List<string>();
Regex r = new Regex("/[*]");
string temp = #"sadf/*slkdj*/";
if (temp.StartsWith(#"/*")) { }
else if (temp.EndsWith(#"*/") && temp.Contains(#"/*"))
{
string pre = temp.Substring(0, r.Match(temp).Index);
_list.Add(pre);
}
else
{
_list.Add(temp);
}

Remove formatting on string literal

Given the c# code:
string foo = #"
abcde
fghijk";
I am trying to remove all formatting, including whitespaces between the lines.
So far the code
foo = foo.Replace("\n","").Replace("\r", "");
works but the whitespace between lines 2 and 3 and still kept.
I assume a regular expression is the only solution?
Thanks.
I'm assuming you want to keep multiple lines, if not, i'd choose CAbbott's answer.
var fooNoWhiteSpace = string.Join(
Environment.NewLine,
foo.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(fooline => fooline.Trim())
);
What this does it split the string into lines (foo.Split),
trim whitespace from the start and end of each line (.Select(fooline => fooline.Trim())),
then combine them back together with a new line inbetween (string.Join).
You could use a regular expression:
foo = Regex.Replace(foo, #"\s+", "");
How about this?
string input = #"
abcde
fghijk";
string output = "";
string[] parts = input.Split('\n');
foreach (var part in parts)
{
// If you want everything on one line... else just + "\n" to it
output += part.Trim();
}
This should remove everthing.
If the whitespace is all spaces, you could use
foo.Replace(" ", "");
For any other whitespace that may be in there, do the same. Example:
foo.Replace("\t", "");
Just add a Replace(" ", "") your dealing with a string literal which mean all the white space is part of the string.
Try something like this:
string test = #"
abcde
fghijk";
EDIT: Addded code to only filter out white spaces.
string newString = new string(test.Where(c => Char.IsWhiteSpace(c) == false).ToArray());
Produces the following: abcdefghijk
I've written something similar to George Duckett but put my logic into a string extension method so it easier for other to read/consume:
public static class Extensions
{
public static string RemoveTabbing(this string fmt)
{
return string.Join(
System.Environment.NewLine,
fmt.Split(new string[] { System.Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(fooline => fooline.Trim()));
}
}
you can the call it like this:
string foo = #"
abcde
fghijk".RemoveTabbing();
I hope that helps someone

Categories