Removing Specified Punctuation From Strings - c#

I have a String that in need to convert into a String[] of each word in the string. However I do not need any white space or any punctuation EXCEPT hyphens and Apostrophes that belong in the word.
Example Input:
Hello! This is a test and it's a short-er 1. - [ ] { } ___)
Example of the Array made from Input:
[ "Hello", "this", "is", "a", "test", "and", "it's", "a", "short-er", "1" ]
Currently this is the code I have tried
(Note: the 2nd gives an error later in the program when string.First() is called):
private string[] ConvertWordsFromFile(String NewFileText)
{
char[] delimiterChars = { ' ', ',', '.', ':', '/', '|', '<', '>', '/', '#', '#', '$', '%', '^', '&', '*', '"', '(', ')', ';' };
string[] words = NewFileText.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
return words;
}
or
private string[] ConvertWordsFromFile(String NewFileText)
{
return Regex.Split(NewFileText, #"\W+");
}
The second example crashes with the following code
private string GroupWordsByFirstLetter(List<String> words)
{
var groups =
from w in words
group w by w.First();
return FormatGroupsByAlphabet(groups);
}
specifically, when w.First() is called.

To remove unwanted characters from a String
string randomString = "thi$ is h#ving s*me inva!id ch#rs";
string excpList ="$#*!";
LINQ Option 1
var chRemoved = randomString
.Select(ch => excpList.Contains(ch) ? (char?)null : ch);
var Result = string.Concat(chRemoved.ToArray());
LINQ Option 2
var Result = randomString.Split().Select(x => x.Except(excList.ToArray()))
.Select(c => new string(c.ToArray()))
.ToArray();

Here is a little something I worked up. Splits on \n and removes any unwanted characters.
private string ValidChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789'-";
private IEnumerable<string> SplitRemoveInvalid(string input)
{
string tmp = "";
foreach(char c in input)
{
if(c == '\n')
{
if(!String.IsNullOrEmpty(tmp))
{
yield return tmp;
tmp = "";
}
continue;
}
if(ValidChars.Contains(c))
{
tmp += tmp;
}
}
if (!String.IsNullOrEmpty(tmp)) yield return tmp;
}
Usage could be something like this:
string[] array = SplitRemoveInvalid("Hello! This is a test and it's a short-er 1. - [ ] { } _)")
.ToArray();
I didnt actually test it, but it should work. If it doesnt, it should be easy enough to fix.

Use string.Split(char [])
string strings = "4,6,8\n9,4";
string [] split = strings .Split(new Char [] {',' , '\n' });
OR
Try below if you get any unwanted empty items. String.Split Method (String[], StringSplitOptions)
string [] split = strings .Split(new Char [] {',' , '\n' },
StringSplitOptions.RemoveEmptyEntries);

This can be done quite easily with a RegEx, by matching words. I am using the following RegEx, which will allow hyphens and apostrophes in the middle of words, but will strip them out if they occur at a word boundary.
\w(?:[\w'-]*\w)?
See it in action here.
In C# it could look like this:
private string[] ConvertWordsFromFile(String NewFileText)
{
return (from m in new Regex(#"\w(?:[\w'-]*\w)?").Matches(NewFileText)
select m.Value).ToArray();
}
I am using LINQ to get an array of words from the MatchCollection returned by Matches.

Related

Split word from string

I use this method for splitting words from string, but \n doesn't consider. How can I solve it?
public string SplitXWord(string text, int wordCount)
{
string output = "";
IEnumerable<string> words = text.Split().Take(wordCount);
foreach (string word in words)
{
output += " " + word;
}
return output;
}
Well, string.Split() splits by white-spaces only
https://learn.microsoft.com/en-us/dotnet/api/system.string.split?view=net-6.0
Split is used to break a delimited string into substrings. You can use either a character array or a string array to specify zero or more delimiting characters or strings. If no delimiting characters are specified, the string is split at white-space characters.
bold is mine.
So far so good, string.Split() splits on spaces ' ', tabulation '\t', new line '\n', carriage return '\r' etc.:
Console.Write(string.Join(", ", "a\nb\rc\td e".Split()));
produces
a, b, c, d, e
If you want to split on your cown delimiters, you should prvide them:
Console.Write(string.Join(", ", "a\nb\rc\td e".Split(new char[] {' ', '\t'})));
note that \r and \n are preserved, when splitted on ' ' and 't'
a
b
c, d, e
So, it seems that your method should be something like this:
using System.Linq;
...
//DONE: static - we don't want this here
public static string SplitXWord(string text, int wordCount) {
//DONE: don't forget about degenerated cases
if (string.IsNullOrWhiteSpace(text) || wordCount <= 0)
return "";
//TODO: specify delimiters on which you want to split
return string.Join(" ", text
.Split(
new char[] { ' ', '\t' },
wordCount + 1,
StringSplitOptions.RemoveEmptyEntries)
.Take(wordCount));
}
Use the overload of Split method which accepts an array of char separators and clears the empty entries
string str = "my test \n\r string \n is here";
string[] words = str.Split(new []{' ', '\r', '\n'}, StringSplitOptions.RemoveEmptyEntries);
UPDATE:
Another solution with regex and keeping line characters:
string str = "my test\r\n string\n is here";
var wordsByRegex = Regex.Split(str, #"(?= ).+?(\r|\n|\r\n)?").ToList();
fiddle
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp17
{
class Program
{
static void Main(string[] args)
{
string myStr = "hello my friend \n whats up \n bro";
string[] mySplitStr = myStr.Split("\n");
mySplitStr.ToList().ForEach(str=>{
Console.WriteLine(str);
//to remove the white spaces
//Console.WriteLine(str.Replace(" ",""));
});
Console.ReadLine();
}
}
}

c# split string and remove empty string

I want to remove empty and null string in the split operation:
string number = "9811456789, ";
List<string> mobileNos = number.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries).Select(mobile => mobile.Trim()).ToList();
I tried this but this is not removing the empty space entry
var mobileNos = number.Replace(" ", "")
.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries).ToList();
As I understand it can help to you;
string number = "9811456789, ";
List<string> mobileNos = number.Split(',').Where(x => !string.IsNullOrWhiteSpace(x)).ToList();
the result only one element in list as [0] = "9811456789".
Hope it helps to you.
a string extension can do this in neat way as below
the extension :
public static IEnumerable<string> SplitAndTrim(this string value, params char[] separators)
{
Ensure.Argument.NotNull(value, "source");
return value.Trim().Split(separators, StringSplitOptions.RemoveEmptyEntries).Select(s => s.Trim());
}
then you can use it with any string as below
char[] separator = { ' ', '-' };
var mobileNos = number.SplitAndTrim(separator);
I know it's an old question, but the following works just fine:
string number = "9811456789, ";
List<string> mobileNos = number.Split(new char[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
No need for extension methods or whatsoever.
"string,,,,string2".Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
return ["string"],["string2"]
The easiest and best solution is to use both StringSplitOptions.TrimEntries to trim the results and StringSplitOptions.RemoveEmptyEntries to remove empty entries, fed in through the pipe operator (|).
string number = "9811456789, ";
List<string> mobileNos = number
.Split(',', StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries)
.ToList();
Checkout the below test results to compare how each option works,

Extracting Formula from String

I have to extract all variables from Formula
Fiddle for below problem
eg. (FB+AB+ESI) / 12
Output {FB,AB,ESI}
Code written so far
var length = formula.Length;
List<string> variables = new List<string>();
List<char> operators = new List<char> { '+', '-', '*', '/', ')', '(', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
int count = 0;
string character = string.Empty;
for (int i = 0; i < length; i++)
{
if (!operators.Contains(formula[i]))
character += formula[i];
else
{
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
character = string.Empty;
count = i;
}
}
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
return variables;
Output of the Method is {FB,AB,ESI} which is correct
My problem is where Varaible contains numeric field i.e
eg. (FB1+AB1)/100
Expected Output : {FB1,AB1}
But My method return {FB,AB}
If variable's names must start with
letter A..Z, a..z
and if variable's names can contain
letters A..Z, a..z
digits 0..1
underscopes _
you can use regular expressions:
String source = "(FB2+a_3B+EsI) / 12";
String pattern = #"([A-Z]|[a-z])+([A-z]|[a-z]|\d|_)*";
// output will be "{FB2,a_3B,EsI}"
String output = "{" + String.Join(",",
Regex.Matches(source, pattern)
.OfType<Match>()
.Select(item => item.Value)) + "}";
In case you need a collection, say an array of variable's names, just modify the Linq:
String names[] = Regex.Matches(source, pattern)
.OfType<Match>()
.Select(item => item.Value)
.ToArray();
However, what is implemented is just a kind of naive tokenizer: you have to separate "variable names" found from function names, class names, check if they are commented out etc.
Have changed your code to do what you asked, but not sure about the approach of the solution, seeing that bracket and operator precedence is not taken into consideration.
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
string formula = "AB1+FB+100";
var length = formula.Length;
List<string> variables = new List<string>();
List<char> operators = new List<char>{'+', '-', '*', '/', ')', '('};
List<char> numerals = new List<char>{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
int count = 0;
string character = string.Empty;
char prev_char = '\0';
for (int i = 0; i < length; i++)
{
bool is_operator = operators.Contains(formula[i]);
bool is_numeral = numerals.Contains(formula[i]);
bool is_variable = !(is_operator || is_numeral);
bool was_variable = character.Contains(prev_char);
if (is_variable || (was_variable && is_numeral) )
character += formula[i];
else
{
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
character = string.Empty;
count = i;
}
prev_char = formula[i];
}
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
foreach (var item in variables)
Console.WriteLine(item);
Console.WriteLine();
Console.WriteLine();
}
}
Maybe also consider something like Math-Expression-Evaluator (on nuget)
Here is how you could do it with Regular Expressions.
Regex regex = new Regex(#"([A-Z])\w+");
List<string> matchedStrings = new List<string>();
foreach (Match match in regex.Matches("(FB1+AB1)/100"))
{
matchedStrings.Add(match.Value);
}
This will create a list of strings of all the matches.
Without regex, you can split on the actual operators (not numbers), and then remove any items that begin with a number:
public static List<string> GetVariables(string formula)
{
if (string.IsNullOrWhitespace(formula)) return new List<string>();
var operators = new List<char> { '+', '-', '*', '/', '^', '%', '(', ')' };
int temp;
return formula
.Split(operators.ToArray(), StringSplitOptions.RemoveEmptyEntries)
.Where(operand => !int.TryParse(operand[0].ToString(), out temp))
.ToList();
}
You can do it this way, just optimize the code as you want.
string ss = "(FB+AB+ESI) / 12";
string[] spl = ss.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
string final = spl[0].Replace("(", "").Replace(")", "").Trim();
string[] entries = final.Split(new char[] {'+'}, StringSplitOptions.RemoveEmptyEntries);
StringBuilder sbFinal = new StringBuilder();
sbFinal.Append("{");
foreach(string en in entries)
{
sbFinal.Append(en + ",");
}
string finalString = sbFinal.ToString().TrimEnd(',');
finalString += "}";
What you are trying to do is an interpreter.
I can't give you the whole code but what I can give you is a head start (it will require a lot of coding).
First, learn about reverse polish notation.
Second, you need to learn about stacks.
Third, you have to apply both to get what you want to interpret.

manipulating strings

I am trying to remove some special characters from a string.
I have got the following string
[_fesd][009] Statement
and I want to get rid of all '_' '[' and ']'
I managed to remove the first characters with TrimStart and I get fesd][009] Statement
How should I remove the special characters from the middle of my string?
Currently Im using the following code
string newStr = str.Trim(new Char[] { '[', ']', '_' });
where str is the strin that should be manupulated and the result should be stored in newStr
string newStr = str.Replace("[", "").Replace("]", "").Replace("_", "");
var newStr = Regex.Replace("[_fesd][009] Statement", "(\\[)|(\\])|(_)", string.Empty);
Use string.Replace with string.Empty as the string to replace with.
You could use Linq for it:
static void Main(string[] args)
{
var s = #"[_fesd][009] Statement";
var unwanted = #"_[]";
var sanitizedS = s
.Where(i => !unwanted.Contains(i))
.Aggregate<char, string>("", (a, b) => a + b);
Console.WriteLine(sanitizedS);
// output: fesd009 Statement
}
var chars = new Char[] { '[', ']', '_' };
var newValue = new String(str.Where(x => !chars.Contains(x)).ToArray());

How do I split a string by a multi-character delimiter in C#?

What if I want to split a string using a delimiter that is a word?
For example, This is a sentence.
I want to split on is and get This and a sentence.
In Java, I can send in a string as a delimiter, but how do I accomplish this in C#?
http://msdn.microsoft.com/en-us/library/system.string.split.aspx
Example from the docs:
string source = "[stop]ONE[stop][stop]TWO[stop][stop][stop]THREE[stop][stop]";
string[] stringSeparators = new string[] {"[stop]"};
string[] result;
// ...
result = source.Split(stringSeparators, StringSplitOptions.None);
foreach (string s in result)
{
Console.Write("'{0}' ", String.IsNullOrEmpty(s) ? "<>" : s);
}
You can use the Regex.Split method, something like this:
Regex regex = new Regex(#"\bis\b");
string[] substrings = regex.Split("This is a sentence");
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
Edit: This satisfies the example you gave. Note that an ordinary String.Split will also split on the "is" at the end of the word "This", hence why I used the Regex method and included the word boundaries around the "is". Note, however, that if you just wrote this example in error, then String.Split will probably suffice.
Based on existing responses on this post, this simplify the implementation :)
namespace System
{
public static class BaseTypesExtensions
{
/// <summary>
/// Just a simple wrapper to simplify the process of splitting a string using another string as a separator
/// </summary>
/// <param name="s"></param>
/// <param name="pattern"></param>
/// <returns></returns>
public static string[] Split(this string s, string separator)
{
return s.Split(new string[] { separator }, StringSplitOptions.None);
}
}
}
string s = "This is a sentence.";
string[] res = s.Split(new string[]{ " is " }, StringSplitOptions.None);
for(int i=0; i<res.length; i++)
Console.Write(res[i]);
EDIT: The "is" is padded on both sides with spaces in the array in order to preserve the fact that you only want the word "is" removed from the sentence and the word "this" to remain intact.
...In short:
string[] arr = "This is a sentence".Split(new string[] { "is" }, StringSplitOptions.None);
Or use this code; ( same : new String[] )
.Split(new[] { "Test Test" }, StringSplitOptions.None)
You can use String.Replace() to replace your desired split string with a character that does not occur in the string and then use String.Split on that character to split the resultant string for the same effect.
Here is an extension function to do the split with a string separator:
public static string[] Split(this string value, string seperator)
{
return value.Split(new string[] { seperator }, StringSplitOptions.None);
}
Example of usage:
string mystring = "one[split on me]two[split on me]three[split on me]four";
var splitStrings = mystring.Split("[split on me]");
var dict = File.ReadLines("test.txt")
.Where(line => !string.IsNullOrWhitespace(line))
.Select(line => line.Split(new char[] { '=' }, 2, 0))
.ToDictionary(parts => parts[0], parts => parts[1]);
or
enter code here
line="to=xxx#gmail.com=yyy#yahoo.co.in";
string[] tokens = line.Split(new char[] { '=' }, 2, 0);
ans:
tokens[0]=to
token[1]=xxx#gmail.com=yyy#yahoo.co.in
string strData = "This is much easier"
int intDelimiterIndx = strData.IndexOf("is");
int intDelimiterLength = "is".Length;
str1 = strData.Substring(0, intDelimiterIndx);
str2 = strData.Substring(intDelimiterIndx + intDelimiterLength, strData.Length - (intDelimiterIndx + intDelimiterLength));

Categories