How can I Replace special characters - c#

I've got a string value with a lot of different characters
I want to:
replace TAB,ENTER, with Space
replace Arabic ي with Persian ی
replace Arabic ك with Persian ک
remove newlines from both sides of a string
replace multiple space with one space
Trim space
The following Function is for cleaning data. and it works correctly.
Does anyone have any idea for better performance and less code for maintenance :)
static void Main(string[] args)
{
var output = "كgeeks 01$سهيلاطريقي03. اشك!#!!.ي";
//output = output.Replace("\u064A", "\u0649");//ي
output = output.Replace("\u064A", "\u06CC");//replace arabic ي with persian ی
output = output.Replace("\u0643", "\u06A9");//replace arabic ك with persian ک
output = output.Trim('\r', '\n');//remove newlines from both sides of a string
output = output.Replace("\n", "").Replace("\r", " ");//replace newline with space
RegexOptions options = RegexOptions.None;
Regex regex = new Regex("[ ]{2,}", options);//replace multiple space with one space
output = regex.Replace(output, " ");
char tab = '\u0009';
output = output.Replace(tab.ToString(), "");
Console.WriteLine(output);
}

You can refactor using two lists: one for the trim process and one for the replace process.
var itemsTrimChars = new List<char>()
{
'\r',
'\n'
};
var itemsReplaceStrings = new Dictionary<string, string>()
{
{ "\n", "" },
{ "\r", " " },
{ "\u064A", "\u06CC" },
{ "\u0643", "\u06A9" },
{ "\u0009", "" }
}.ToList();
Thus they are maintenable tables with the technology you want: as local in this example, declared at the level of a class, using tables in a database, using disk text files...
Used like that:
itemsTrimChars.ForEach(c => output = output.Trim(c));
itemsReplaceStrings.ForEach(p => output = output.Replace(p.Key, p.Value));
For the regex to replace double spaces, I know nothing about, but if you need to replace other doubled, you can create a third list.

You can do this by iterating over each character and apply those rules, forming a new output string that is the format you want. It should be faster than all those string.Replace, and Regex.Match.
Use string builder for performance when appending, don't use string += string

First Find Character in your string and then remove it and in the same index add new character
private string ReplaceChars(string Source, string Find, string Replace)
{
int Place = Source.IndexOf(Find);
string result = Source.Remove(Place, Find.Length).Insert(Place, Replace);
return result;
}
Usage :
text= "كgeeks 01$سهيلاطريقي03. اشك!#!!.ي";
var result =ReplaceChars(text,"ي","ی");

Related

How can I remove the spaces that appear between the words even after splitting the string? [duplicate]

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?
String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.
You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"
I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.
txt = txt.Trim();
Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.
text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();
Use the Trim method.
static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.
You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

Compare and extract common words between 2 strings

In ASP.NET C# and assuming I have a string contains a comma separated words:
string strOne = "word,WordTwo,another word, a third long word, and so on";
How to split then compare with another paragraph that might and might not contain these words:
string strTwo = " when search a word or try another word you may find that WordTwo is there with others";
Then how to output these common words departed with commas in a third string
string strThree = "output1, output2, output3";
To get a result like : "word, WordTwo, another word,"
You will need to split strOne by comma, and use a contains against strTwo.
Note: You can't split strTwo by space and use intersect because your items may have spaces. i.e. "another word"
string strOne = "word,WordTwo,another word, a third long word, and so on";
string strTwo = " when search a word or try another word you may find that WordTwo is there with others";
var tokensOne = strOne.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
var list = tokensOne.Where(x => strTwo.Contains(x));
var result = string.Join(", ",list);
You could do something like this:
string strOne = "word,WordTwo,another word, a third long word, and so on";
string strTwo = " when search a word or try another word you may find that WordTwo is there with others";
string finalString = string.Empty;
foreach (var line in strOne.Split(","))
{
if(strTwo.Contains(line))
finalString += (line + ",");
}
finalString = finalString.Substring(0, finalString.Length - 1);
Console.WriteLine(finalString);

C# String.Trim() not removing characters from MailMessage.Subject

I'm trying to eliminate certain symbols from the Subject property of a MailMessage object. What I'm experiencing is that it does nothing. Even after assigning Subject to a string, and trimming that, the final Subject still has the symbols in it. (not showed in example)
MailMessage mailMessage = new MailMessage
{
From = new MailAddress(mail.SenderEmailAddress),
SubjectEncoding = System.Text.Encoding.UTF8,
Subject = mail.Subject.Trim(new char[] {}), //symbol list, like ":", "~", ">"
Body = mail.Body
};
String path = #"C:\Users\" + Environment.UserName + #"\Documents\EML\";
if (!Directory.Exists(path))
{
Directory.CreateDirectory(path);
}
path = #"C:\Users\" + Environment.UserName + #"\Documents\EML\"
+ mailMessage.Subject + ".eml";
MessageBox.Show(path);
The message box is just to see whether the symbol gets removed or not at the moment, path will be put into a method later.
mail has subject RE: dog, .Trim tries to remove :,
MessageBox shows C:\Users\user\Documents\EML\RE: dog.eml.
The String.Trim(Char[]) method, as per official MSDN documentation, removes all leading and trailing occurrences of a set of characters specified in an array from the current string object. If you want to remove all the occurrences of a specified list of characters from the string, even when they don't appear at the beginning or at the end of it, you may want to use a different approach.
Given the following example string and the following replacements:
String text = "This is: the~ mail sub~ject!";
Char[] replacements = new Char[] { ':', '~' };
you can perform this operation using various approaches. Here is a list containing a few of them:
1) Using String.Split and String.Join
text = String.Join(String.Empty, text.Split(replacements));
2) Using LINQ
text = new String
(
(from c in text
where !replacements.Contains(c)
select c).ToArray()
);
or:
text = new String(text.Where(c => !replacements.Contains(c)).ToArray());
3) Using Regular Expressions
text = Regex.Replace(text, "[:~]", String.Empty);
4) Using a Loop and String.Replace
foreach (Char c in replacements)
text = text.Replace(c.ToString(), String.Empty);
5) Using an Extension Method
public static String RemoveChars(this String input, params Char[] chars)
{
StringBuilder builder = new StringBuilder();
for (Int32 i = 0; i < input.Length; ++i)
{
if (!chars.Contains(input[i]))
builder.Append(input[i]);
}
return builder.ToString();
}
text = text.RemoveChars(replacements);
The final output is always the same:
This is the mail subject!
From MSDN:
String.Trim Method () - Removes all leading and trailing white-space characters from the current String object.
So, Trim isn't going to remove characters from the middle of a String. Commenters suggested using Replace instead, but there isn't a signature that takes an array of characters like you are using. An easy way around that is Extension methods.
class Program
{
static void Main(string[] args)
{
string text = "This:is~a>test";
string subject = text.ReplaceFromCollection(new char[] { ':', '~', '>'}); //symbol list, like ":", "~", ">"
Console.WriteLine($"{text}\n{subject}");
Console.ReadLine();
}
}
static class Extensions
{
public static String ReplaceFromCollection(this string text, IEnumerable<char> characters)
{
foreach (var chr in characters)
{
text = text.Replace(chr.ToString(), String.Empty);
}
return text;
}
}
Using this, each character in your string that matches a character in the array is replaced with the empty String one by one. The result is then passed back.
More reading on Extension Methods.

Complex string split C#

I have input file like this:
input.txt
aa#aa.com bb#bb.com "Information" "Hi there"
cc#cc.com dd#dd.com "Follow up" "Interview"
I have used this method:
string[] words = item.Split(' ');
However, it splits every words with space. I also have spaces in quotes strings but I won't split those spaces.
Basically I want to parse this input from file to this output:
From = aa#aa.com
To = bb#bb.com
Subject = Information
Body = Hi there
How do I split these strings in C#?
Simply you can use Regex as it is said in this question
var stringValue = "aa#aa.com bb#bb.com \"Information\" \"Hi there\"";
var parts = Regex.Matches(stringValue, #"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
//parts: aa#aa.com
bb#bb.com
"Information"
"Hi there"
Also you may try Replace function to remove those " characters.
The String.Split() method has an overload that allows you to specify the number of splits required. You can get what you want like this:
Read one line at a time
Call input.Split(new string[" "], 3, StringSplitOptions.None) - this returns an array of strings with 3 parts. Since email addresses don't have spaces in them, the first two strings will be the from/to addresses, and the third string will be the subject and message. Assume the result of this call is stored in firstSplit[], then firstSplit[0] is the from address, firstSplit[1] is the to address, and firstSplit[2] is the subject and message combined.
Call firstSplit[2].Split(new string[""" """], 2, StringSplitOptions.None) - this searches for the string " " in the concatenated subject+message from the previous call, which should pinpoint the separator between the end of the subject and the start of the message. This will give you the subject and message in another array. (The double-quotes inside are doubled to escape them)
This assumes you disallow double quotes in your subject and message. If you do allow double quotes, then you need to ensure you escape them before putting it in the file in the first place.
You can do this without using regex by just using IndexOf and SubString just put it in a loop if you have multiple emails to parse.
It's not pretty but it would be faster than RegEx if you're doing a lot of them.
string content = #"abba#aa.com dddb#bdd.com ""Information"" ""Hi there""";
string firstEmail = content.Substring(0, content.IndexOf(" ", StringComparison.Ordinal));
string secondEmail = content.Substring(firstEmail.Length, content.IndexOf(" ", firstEmail.Length + 1) - firstEmail.Length);
int firstQuote = content.IndexOf("\"", StringComparison.Ordinal);
string subjectandMessage = content.Substring(firstQuote, content.Length - content.IndexOf("\"", firstQuote, StringComparison.Ordinal));
String[] words = subjectandMessage.Split(new string[] { "\" \"" }, StringSplitOptions.None);
Console.WriteLine(firstEmail);
Console.WriteLine(secondEmail);
Console.WriteLine(words[0].Remove(0,1));
Console.WriteLine(words[1].Remove(words[1].Length -1));
Output:
aa#aa.com
bb#bb.com
Information
Hi there
As Spencer pointed out, read this file line by line using File.ReadAllLines() method and then apply String.Split[] method with spaces using something like this:
string[] elements = string.Split(new char[0]);
UPDATE
Not a pretty solution, but this is how I think it can work:
string[] readText = File.ReadAllLines(' ');
//Take value of first 3 fields by simple readText[index]; (index: 0-2)
string temp = "";
for(int i=3; i<readText.Length; i++)
{
temp += readText[i];
}
Requires reference to Microsoft.VisualBasic, but a bit more reliable than Regex:
using (var tfp = new Microsoft.VisualBasic.FileIO.TextFieldParser("input.txt")) {
for (tfp.SetDelimiters(" "); !tfp.EndOfData;) {
string[] fields = tfp.ReadFields();
Debug.Print(string.Join(",", fields)); // "aa#aa.com,bb#bb.com,Information,Hi there"
}
}

Remove formatting on string literal

Given the c# code:
string foo = #"
abcde
fghijk";
I am trying to remove all formatting, including whitespaces between the lines.
So far the code
foo = foo.Replace("\n","").Replace("\r", "");
works but the whitespace between lines 2 and 3 and still kept.
I assume a regular expression is the only solution?
Thanks.
I'm assuming you want to keep multiple lines, if not, i'd choose CAbbott's answer.
var fooNoWhiteSpace = string.Join(
Environment.NewLine,
foo.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(fooline => fooline.Trim())
);
What this does it split the string into lines (foo.Split),
trim whitespace from the start and end of each line (.Select(fooline => fooline.Trim())),
then combine them back together with a new line inbetween (string.Join).
You could use a regular expression:
foo = Regex.Replace(foo, #"\s+", "");
How about this?
string input = #"
abcde
fghijk";
string output = "";
string[] parts = input.Split('\n');
foreach (var part in parts)
{
// If you want everything on one line... else just + "\n" to it
output += part.Trim();
}
This should remove everthing.
If the whitespace is all spaces, you could use
foo.Replace(" ", "");
For any other whitespace that may be in there, do the same. Example:
foo.Replace("\t", "");
Just add a Replace(" ", "") your dealing with a string literal which mean all the white space is part of the string.
Try something like this:
string test = #"
abcde
fghijk";
EDIT: Addded code to only filter out white spaces.
string newString = new string(test.Where(c => Char.IsWhiteSpace(c) == false).ToArray());
Produces the following: abcdefghijk
I've written something similar to George Duckett but put my logic into a string extension method so it easier for other to read/consume:
public static class Extensions
{
public static string RemoveTabbing(this string fmt)
{
return string.Join(
System.Environment.NewLine,
fmt.Split(new string[] { System.Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(fooline => fooline.Trim()));
}
}
you can the call it like this:
string foo = #"
abcde
fghijk".RemoveTabbing();
I hope that helps someone

Categories