Remove all symbols different from numbers and letters in string [duplicate] - c#

This question already has answers here:
c# Regex non letter characters from a string
(2 answers)
Closed 6 years ago.
I need a regex that removes all symbols different from numbers and letters from string. Example:
string address = "TEXT 3 !##$%^&*()_}|{:?> REMOVE ALL SYMBOLS 45";
string result = "TEXT 3 REMOVE ALL SYMBOLS 45";
Any ideas?

try this please
string address = "TEXT 3 !##$%^&*()_}|{\":?> REMOVE ALL SYMBOLS 45";
var sb = new StringBuilder();
foreach (var c in address)
{
if (Char.IsLetterOrDigit(c) || Char.IsWhiteSpace(c))
sb.Append(c);
}
var result = sb.ToString();
It should be faster than regex.

This should work:
var result = new Regex("[^a-zA-Z0-9 ]").Replace(address, string.Empty);
This keeps only whatever in a-Z, A-Z or 0-9 or white space
You can also use linq:
var result2 = new String(address.Where(x => char.IsLetterOrDigit(x)
|| char.IsWhiteSpace(x)).ToArray());

Both worked for me.
My final code:
var addressWithoutEmtySpacesMoreThanOne = Regex.Replace(address, #"\s+", " ");
var result = new Regex("[^a-zA-Zа-яА-Я0-9 -]").Replace(addressWithoutEmtySpaces, "");
customer.Address = result;

Related

Read just [Brackets] string from a text file [duplicate]

This question already has answers here:
C# Regex Split - everything inside square brackets
(2 answers)
Closed 4 years ago.
I have a text file named hello.txt with the following text:
[Hello] this is stack overflow and I Love [THIS] a lot. I use [Stack]
for help.
I want just [ ] (brackets string) in a listbox.
I tried:
using (StringReader reader = new StringReader(File Location))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string input = line;
string output = input.Split('[', ']')[1];
MessageBox.Show(output);
}
}
But this doesn't work for me.
This is what you are looking for
string a = "Someone is [here]";
string b = Regex.Match(a, #"\[.*?\]").Groups[0].Value;
Console.WriteLine(b);
//or if you need all occurences
foreach(Match match in Regex.Matches(a, #"\[.*?\]"))
{
Console.WriteLine(match.Groups[0].Value);
}
You can create a function for this which accept three parameter first input string, starting string and ending string and return list of value between those two string
private static IEnumerable<string> GetListOfString(string input, string start, string end)
{
var regex = new Regex(Regex.Escape(start) + "(.*?)" + Regex.Escape(end));
var matches = regex.Matches(input);
return (from object match in matches select match.ToString()).ToList();
}
You can use a regular expression like:
var pattern = #"\[[^\]]*]";
while ((line = reader.ReadLine()) != null) {
var matches = Regex.Matches(line, pattern);
foreach (var m in matches) {
MessageBox.Show(m);
}
}
This pattern looks for anything between square brackets that is not a closing square bracket.
If you want the string between the brackets without the brackets themselves, you can trim the brackets from each match:
MessageBox.Show(m.Value.Substring(1, m.Value.Length - 2));
Or you can use this pattern:
var pattern = #"\[([^\]]*)]";
while ((line = reader.ReadLine()) != null) {
var matches = Regex.Matches(line, pattern);
foreach (Match m in matches) {
MessageBox.Show(m.Groups[1]);
}
}
Here is another way to do that using LINQ
string[] text = "[Hello] this is stack overflow and I Love [THIS] a lot. I use [Stack] for help.".Split(' ');
var wantedString = text.Where(s => s.StartsWith("[") && s.EndsWith("]"));
foreach(string word in wantedString)
{
Console.WriteLine(word);
}

split string and store it in another variable in c# [duplicate]

This question already has answers here:
Split string using backslash
(3 answers)
Closed 5 years ago.
How to Read character after '\':
string PrName = "software\Plan Mobile";
First of all, do not forget #:
string PrName = #"software\Plan Mobile";
Next, if you want just the tail only (i.e. "Plan Mobile") then Substring will do:
// if no '\' found, the entire string will be return
string tail = PrName.Substring(PrName.IndexOf('\\') + 1);
If you want both (all parts), try Split:
// parts[0] == "software"
// parts[1] == "Plan Mobile"
string[] parts = PrName.Split('\\');
Try this:
char charToFind = '\';
string PrName = "software\Plan Mobile";
int indexOfChar = PrName.IndexOf(charToFind);
if (indexOfChar >= 0)
{
string result = PrName.Substring(indexOfChar + 1);
}
Output: result = "Plan Mobile"
I think, you want to split string
string s = "software\Plan Mobile";
// Split string on '\'.
string[] words = s.Split('\');
foreach (string word in words)
{
Console.WriteLine(word);
}
Output:
software
Plan mobile

Regex replace all occurences with something that is "derived" from the part to be replaced

I have the following line from a RTF document
10 \u8314?\u8805? 0
(which says in clear text 10 ⁺≥ 0). You can see that the special characters are escaped with \u followed by the decimal unicode and by a question mark (which is the replacement character which should be printed in the case that displaying the special character is not possible). I want to have the text in a string variable in C# which is equivalent to the following variable:
string expected = "10 \u207A\u2265 0";
In the debugger I want to see the variable to have the value of 10 ⁺≥ 0. I therefore must replace every occurence by the corresponding hexadecimal unicode (#207A = 8314 and #2265 = 8805). What is the simplest way to accomplish this with regular expressions?
The code is:
string str = #"10 \u8314?\u8805? 0";
string replaced = Regex.Replace(str, #"\\u([0-9]+)\?", match => {
string value = match.Groups[1].Value;
string hex = #"\u" + int.Parse(value).ToString("X4");
return hex;
});
This will return
string line = #"10 \u207A\u2265 0";
so the \u207A\u2265 won't be unescaped.
Note that the value is first converted to a number (int.Parse(value)) and then converted to a fixed-notation 4 digits hex number (ToString("X4"))
Or
string replaced = Regex.Replace(str, #"\\u([0-9]+)\?", match => {
string value = match.Groups[1].Value;
char ch = (char)int.Parse(value);
return ch.ToString();
});
This will return
string line = #"10 ⁺≥ 0";
If I understood your question correctly, you want to parse the unicode representation of the RTF to a C# string.
So, the one-liner solution looks like this
string result = Regex.Replace(line, #"\\u(\d+?)\?", new MatchEvaluator(m => ((char)Convert.ToInt32(m.Groups[1].Value)).ToString()));
But I suggest to use a cleaner code:
private static string ReplaceRtfUnicodeChar(Match match) {
int number = Convert.ToInt32(match.Groups[1].Value);
char chr = (char)number;
return chr.ToString();
}
public static void Main(string[] args)
{
string line= #"10 \u8314?\u8805? 0";
var r = new Regex(#"\\u(\d+?)\?");
string result = r.Replace(line, new MatchEvaluator(ReplaceRtfUnicodeChar));
Console.WriteLine(result); // Displays 10 ⁺≥ 0
}
You have to use MatchEvaluator:
string input = "10 \u8314?\u8805? 0";
Regex reg = new Regex(#"\\u([A-Fa-f0-9]+)\?",RegexOptions.Multiline);
string result = reg.Replace(input, delegate(Match m) {
return ConvertToWhatYouWant(m.Value);
});

How to check if a string contains only # [duplicate]

This question already has answers here:
Check If String Contains All "?"
(8 answers)
Closed 9 years ago.
I have a string like "# # # # #"
another string like "123 # abc # xyz"
I need to check if the string contains only # .How to achieve this.
I tried using contains ,but this does not work .
Providing that the String is not null, the possible solution can be:
String text = "123#abc#xyz";
Boolean result = text.All((x) => x == '#');
In case the white spaces should be ignored (e.g. "# # # # #" considered being the right string)
String text = "123#abc#xyz";
Boolean result = text.All((x) => x == '#' || Char.IsWhiteSpace(x));
bool IsSharpOnly(string str)
{
for(int i = 0; i < str.Length ; i++)
{
if (str[i] != '#')
return false;
}
return true;
}
Another solution with a Regex:
Regex r = new Regex("^#+$");
bool b1 = r.IsMatch("asdas#asdas");
bool b2 = r.IsMatch("#####");
Edit
Was not sure if white space should be ignored or not, if so:
Regex r = new Regex("^[\\s*#+]+$");
With a regular expression?
Like this: "([0-9]+)|([a-z]+)"
you can check if the input string does not match.
For instance for the string contains '#' only:
String text = "123#abc#xyz";
Boolean result = Regex.Match(text, "^#*$").Success;
Try this,
string ss = "##g#";
if ((ss.Split('#').Length - 1).Equals(ss.Length))
{
//Contains only #
}
You can also try this:
private bool CheckIfStringContainsOnlyHash(string value)
{
return !value.Where(a => !string.IsNullOrWhiteSpace(a.ToString()) && a != '#').Select(a => true).FirstOrDefault();
}
Try the below code
string txt = "123#abc#xyz";
if (!txt.Any((X) => X != '#'))
{
//Contains only '#'
}
Dmitry's example is probably the most elegant, but something like this could work too (again assuming the input has been null checked):
string test = "#####";
return test.Replace("#", "").Length == 0;
EDIT: picking up on the discussion about ignoring whitespace too, we could use:
string test = "#####";
return String.IsNullOrWhiteSpace(test.Replace("#", ""));
this is also a solution
String data= "###########";
bool isAllSame = data.All(d => d == '#');
if(isAllSame)
{
// code when string contain only #
}

How to replace special characters with their equivalent (such as " á " for " a") in C#?

I need to get the Portuguese text content out of an Excel file and create an xml which is going to be used by an application that doesn't support characters such as "ç", "á", "é", and others. And I can't just remove the characters, but replace them with their equivalent ("c", "a", "e", for example).
I assume there's a better way to do it than check each character individually and replace it with their counterparts. Any suggestions on how to do it?
You could try something like
var decomposed = "áéö".Normalize(NormalizationForm.FormD);
var filtered = decomposed.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
var newString = new String(filtered.ToArray());
This decomposes accents from the text, filters them and creates a new string. Combining diacritics are in the Non spacing mark unicode category.
string text = {text to replace characters in};
Dictionary<char, char> replacements = new Dictionary<char, char>();
// add your characters to the replacements dictionary,
// key: char to replace
// value: replacement char
replacements.Add('ç', 'c');
...
System.Text.StringBuilder replaced = new System.Text.StringBuilder();
for (int i = 0; i < text.Length; i++)
{
char character = text[i];
if (replacements.ContainsKey(character))
{
replaced.Append(replacements[character]);
}
else
{
replaced.Append(character);
}
}
// 'replaced' is now your converted text
For future reference, this is exactly what I ended up with:
temp = stringToConvert.Normalize(NormalizationForm.FormD);
IEnumerable<char> filtered = temp;
filtered = filtered.Where(c => char.GetUnicodeCategory(c) != System.Globalization.UnicodeCategory.NonSpacingMark);
final = new string(filtered.ToArray());
The perform is better with this solution:
string test = "áéíóúç";
string result = Regex.Replace(test .Normalize(NormalizationForm.FormD), "[^A-Za-z| ]", string.empty);

Categories