Split word from string - c#

I use this method for splitting words from string, but \n doesn't consider. How can I solve it?
public string SplitXWord(string text, int wordCount)
{
string output = "";
IEnumerable<string> words = text.Split().Take(wordCount);
foreach (string word in words)
{
output += " " + word;
}
return output;
}

Well, string.Split() splits by white-spaces only
https://learn.microsoft.com/en-us/dotnet/api/system.string.split?view=net-6.0
Split is used to break a delimited string into substrings. You can use either a character array or a string array to specify zero or more delimiting characters or strings. If no delimiting characters are specified, the string is split at white-space characters.
bold is mine.
So far so good, string.Split() splits on spaces ' ', tabulation '\t', new line '\n', carriage return '\r' etc.:
Console.Write(string.Join(", ", "a\nb\rc\td e".Split()));
produces
a, b, c, d, e
If you want to split on your cown delimiters, you should prvide them:
Console.Write(string.Join(", ", "a\nb\rc\td e".Split(new char[] {' ', '\t'})));
note that \r and \n are preserved, when splitted on ' ' and 't'
a
b
c, d, e
So, it seems that your method should be something like this:
using System.Linq;
...
//DONE: static - we don't want this here
public static string SplitXWord(string text, int wordCount) {
//DONE: don't forget about degenerated cases
if (string.IsNullOrWhiteSpace(text) || wordCount <= 0)
return "";
//TODO: specify delimiters on which you want to split
return string.Join(" ", text
.Split(
new char[] { ' ', '\t' },
wordCount + 1,
StringSplitOptions.RemoveEmptyEntries)
.Take(wordCount));
}

Use the overload of Split method which accepts an array of char separators and clears the empty entries
string str = "my test \n\r string \n is here";
string[] words = str.Split(new []{' ', '\r', '\n'}, StringSplitOptions.RemoveEmptyEntries);
UPDATE:
Another solution with regex and keeping line characters:
string str = "my test\r\n string\n is here";
var wordsByRegex = Regex.Split(str, #"(?= ).+?(\r|\n|\r\n)?").ToList();

fiddle
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp17
{
class Program
{
static void Main(string[] args)
{
string myStr = "hello my friend \n whats up \n bro";
string[] mySplitStr = myStr.Split("\n");
mySplitStr.ToList().ForEach(str=>{
Console.WriteLine(str);
//to remove the white spaces
//Console.WriteLine(str.Replace(" ",""));
});
Console.ReadLine();
}
}
}

Related

reverse words BUT dot should be at the end

I have sentence:
"I love Marry."
and I would like to get:
"Marry love I." (dot at the end)
How can I do that?
public static string ReverseWords(string originalString)
{
return string.Join(" ", originalString.Split(' ').Where(x => !string.IsNullOrEmpty(x)).Reverse());
}
You can remove the last '.' before the split.
Demo:
public static string ReverseWords(string originalString)
{
var input = originalString.EndsWith(".") ? originalString.Remove(originalString.Length - 1) : originalString; // will trim ending '.'
return string.Join(" ", input.Split().Reverse()) + ".";
}
Try it online!
Try this. I am making it into several statements for readability.
var words = originalString.Split(new [] {' ', '.'}, StringSplitOptions.RemoveEmptyEntries).Reverse();
That gets your words in reverse order, and avoids the need for your Where clause. Then join them back with the period:
return string.Join(' ', words) + '.';
Do it in two steps where you split on . first;
return
string.Join(".",
originalString.Split('.')
.ToList()
.Select(s => string.Join(" ", s.Split(' ').Where(x => !string.IsNullOrEmpty(x)).Reverse())));
For single sentences, remove the dot and append it again in the end.
To remove the dot you can use TrimEnd which will remove all dots from the end of the string. If there is none, nothing is removed:
public static string ReverseWords(string originalString)
{
originalString = originalString.TrimEnd('.');
originalString = string.Join(" ", originalString.Split(' ').Where(x => !string.IsNullOrEmpty(x)).Reverse());
return originalString + ".";
}
For multiple senctences you can split the input string at the ., which will give you an array of sentences without dots. Then you simply reverse each part, append a dot and put them back together (I used a StringBuilder to do that):
public static string ReverseWordsMultiple(string originalString)
{
String[] sentences = originalString.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
StringBuilder builder = new StringBuilder();
foreach (String senctence in sentences)
{
builder.Append(string.Join(" ", senctence.Split(' ').Where(x => !string.IsNullOrEmpty(x)).Reverse()));
builder.Append(". ");
}
return builder.ToString().TrimEnd();
}

C# - Identify the matching character when using String.Split(CharArray)

If I use the Split() function on a string, passing in various split characters as a char[] parameter, and given that the matching split character is removed from the string, how can I identify which character it matched & split on?
string inputString = "Hello, there| world";
char[] splitChars = new char[] { ',','|' }
foreach(string section in inputString.Split(splitChars))
{
Console.WriteLine(section) // [0] Hello [1} there [2] world (no splitChars)
}
I understand that perhaps its not possible to retain this information with my approach. If thats the case, could you suggest an alternative approach?
The C# Regex.Split() method documented here can return the split characters as well as the words between them.
string inputString = "Hello, there| world";
string pattern = #"(,)|([|])";
foreach (string result in Regex.Split(inputString, pattern))
{
Console.WriteLine("'{0}'", result);
}
the result is:
'Hello'
','
' there'
'|'
' world'
Use the Regex.Split() method. I have wrapped this method in the following extension method that is as easy to use as string.Split() itself:
public static string[] ExtendedSplit(this string input, char[] splitChars)
{
string pattern = string.Join("|", splitChars.Select(x => "(" + Regex.Escape(x.ToString()) + ")"));
return Regex.Split(input, pattern);
}
Usage:
string inputString = "Hello, there| world";
char[] splitChars = new char[]{',', '|'};
foreach (string result in inputString.ExtendedSplit(splitChars))
{
Console.WriteLine("'{0}'", result);
}
Output:
'Hello'
','
' there'
'|'
' world'
No, but its rather trivial to write one yourself. Remember, framework methods aren't magic, somebody wrote them. If something doesn't exactly match your needs, write one that does!
static IEnumerable<(string Sector, char Separator)> Split(
this string s,
IEnumerable<char> separators,
bool removeEmptyEntries)
{
var buffer = new StringBuilder();
var separatorsSet = new HashSet<char>(separators);
foreach (var c in s)
{
if (separatorsSet.Contains(c))
{
if (!removeEmptyEntries || buffer.Length > 0)
yield return (buffer.ToString(), c);
buffer.Clear();
}
else
buffer.Append(c);
}
if (buffer.Length > 0)
yield return (buffer.ToString(), default(char));
}

How to split a string with non-numbers as delimiter?

I want to split a string in C#. It should split on the basis of a text in the string.Like i have a string "41sugar1100" , i want to split on the base of text in it that is "sugar".How can i do this ?
NOTE: Without passing "sugar" directly as a delimiter.Because text can be change in next iteration.Means wherever it finds text in the string, it should split on the basis of that text.
Use Regex.Split:
string input = "44sugar1100";
string pattern = "[a-zA-Z]+"; // Split on any group of letters
string[] substrings = Regex.Split(input, pattern);
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
char[] array = "41sugar1100".ToCharArray();
StringBuilder sb = new StringBuilder();
// Append letters and special char '#' when original char is a number to split later
foreach (char c in array)
sb.Append(Char.IsNumber(c) ? c : '#');
// Split on special char '#' and remove empty string items
string[] items = sb.ToString().Split('#').Where(s => s != string.Empty).ToArray();
foreach (string item in items)
Console.WriteLine(item);
// Output:
// 41
// 1100
****Use char[] array for split a string from string****
string s = "44sugar1100";
char[] c = new char[] { 's', 'u', 'g', 'a', 'r' };
string[] s1 = s.Split(c,StringSplitOptions.RemoveEmptyEntries);
string s2 = s1.ToString();
Regex regex = new Regex(#"(?<firstNumber>\d+)(?<word>[^\d]+)+(?<secondNumber>\d+)", RegexOptions.CultureInvariant);
string s = "41sugar1100";
Match match = regex.Match(s);
if (match.Success)
{
string firstNumber = match.Groups["firstNumber"].Value;
string word = match.Groups["word"].Value;
string secondNumber = match.Groups["secondNumber"].Value;
}
I would take the string and put it into a char array
then int.tryparse each char in the array for example...
string myString = "44sugar1100";
int num=0; //for storage
string newString="";//for rebuilding
foreach(char ch in myString)
{
if(int.TryParse(ch, out num)
{
newString+=num.toString();
}
}
string text = "41sugar1100";
string[] array = text.Split('sugar');

manipulating strings

I am trying to remove some special characters from a string.
I have got the following string
[_fesd][009] Statement
and I want to get rid of all '_' '[' and ']'
I managed to remove the first characters with TrimStart and I get fesd][009] Statement
How should I remove the special characters from the middle of my string?
Currently Im using the following code
string newStr = str.Trim(new Char[] { '[', ']', '_' });
where str is the strin that should be manupulated and the result should be stored in newStr
string newStr = str.Replace("[", "").Replace("]", "").Replace("_", "");
var newStr = Regex.Replace("[_fesd][009] Statement", "(\\[)|(\\])|(_)", string.Empty);
Use string.Replace with string.Empty as the string to replace with.
You could use Linq for it:
static void Main(string[] args)
{
var s = #"[_fesd][009] Statement";
var unwanted = #"_[]";
var sanitizedS = s
.Where(i => !unwanted.Contains(i))
.Aggregate<char, string>("", (a, b) => a + b);
Console.WriteLine(sanitizedS);
// output: fesd009 Statement
}
var chars = new Char[] { '[', ']', '_' };
var newValue = new String(str.Where(x => !chars.Contains(x)).ToArray());

Removing Specified Punctuation From Strings

I have a String that in need to convert into a String[] of each word in the string. However I do not need any white space or any punctuation EXCEPT hyphens and Apostrophes that belong in the word.
Example Input:
Hello! This is a test and it's a short-er 1. - [ ] { } ___)
Example of the Array made from Input:
[ "Hello", "this", "is", "a", "test", "and", "it's", "a", "short-er", "1" ]
Currently this is the code I have tried
(Note: the 2nd gives an error later in the program when string.First() is called):
private string[] ConvertWordsFromFile(String NewFileText)
{
char[] delimiterChars = { ' ', ',', '.', ':', '/', '|', '<', '>', '/', '#', '#', '$', '%', '^', '&', '*', '"', '(', ')', ';' };
string[] words = NewFileText.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
return words;
}
or
private string[] ConvertWordsFromFile(String NewFileText)
{
return Regex.Split(NewFileText, #"\W+");
}
The second example crashes with the following code
private string GroupWordsByFirstLetter(List<String> words)
{
var groups =
from w in words
group w by w.First();
return FormatGroupsByAlphabet(groups);
}
specifically, when w.First() is called.
To remove unwanted characters from a String
string randomString = "thi$ is h#ving s*me inva!id ch#rs";
string excpList ="$#*!";
LINQ Option 1
var chRemoved = randomString
.Select(ch => excpList.Contains(ch) ? (char?)null : ch);
var Result = string.Concat(chRemoved.ToArray());
LINQ Option 2
var Result = randomString.Split().Select(x => x.Except(excList.ToArray()))
.Select(c => new string(c.ToArray()))
.ToArray();
Here is a little something I worked up. Splits on \n and removes any unwanted characters.
private string ValidChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789'-";
private IEnumerable<string> SplitRemoveInvalid(string input)
{
string tmp = "";
foreach(char c in input)
{
if(c == '\n')
{
if(!String.IsNullOrEmpty(tmp))
{
yield return tmp;
tmp = "";
}
continue;
}
if(ValidChars.Contains(c))
{
tmp += tmp;
}
}
if (!String.IsNullOrEmpty(tmp)) yield return tmp;
}
Usage could be something like this:
string[] array = SplitRemoveInvalid("Hello! This is a test and it's a short-er 1. - [ ] { } _)")
.ToArray();
I didnt actually test it, but it should work. If it doesnt, it should be easy enough to fix.
Use string.Split(char [])
string strings = "4,6,8\n9,4";
string [] split = strings .Split(new Char [] {',' , '\n' });
OR
Try below if you get any unwanted empty items. String.Split Method (String[], StringSplitOptions)
string [] split = strings .Split(new Char [] {',' , '\n' },
StringSplitOptions.RemoveEmptyEntries);
This can be done quite easily with a RegEx, by matching words. I am using the following RegEx, which will allow hyphens and apostrophes in the middle of words, but will strip them out if they occur at a word boundary.
\w(?:[\w'-]*\w)?
See it in action here.
In C# it could look like this:
private string[] ConvertWordsFromFile(String NewFileText)
{
return (from m in new Regex(#"\w(?:[\w'-]*\w)?").Matches(NewFileText)
select m.Value).ToArray();
}
I am using LINQ to get an array of words from the MatchCollection returned by Matches.

Categories