Related
This question already has answers here:
Finding longest word in string
(4 answers)
Closed 2 years ago.
I have .txt file.
I have to remove all the longest words for each line.
The main method where I'm looking for longest word is :
/// <summary>
/// Finds longest word in line
/// </summary>
/// <param name="eil">Line</param>
/// <param name="skyr">Punctuation</param>
/// <returns>Returns longest word for line</returns>
static string[] RastiIlgZodiEil(string eil, char[] skyr)
{
string[] zodIlg = new string[100];
for (int k = 0; k < 100; k++)
{
zodIlg[k] = " ";
}
int kiek = 0;
string[] parts = eil.Split(skyr,
StringSplitOptions.RemoveEmptyEntries);
int i = 0;
foreach (string zodis in parts)
{
if (zodis.Length > zodIlg[i].Length)
{
zodIlg[kiek] = zodis;
kiek++;
i++;
}
else
{
i++;
}
}
return zodIlg;
}
EDIT : Method that reads the .txt file and uses the previous method to replace line with a new line that is configured (by replacing the word that has to be deleted with an empty string).
/// <summary>
/// Finds longest words for each line and then replaces them with
/// emptry string
/// </summary>
/// <param name="fv">File name</param>
/// <param name="skyr">Punctuation</param>
static void RastiIlgZodiFaile(string fv, string fvr, char[] skyr)
{
using (var fr = new StreamWriter(fvr, true,
System.Text.Encoding.GetEncoding(1257)))
{
using (StreamReader reader = new StreamReader(fv,
Encoding.GetEncoding(1257)))
{
int n = 0;
string line;
while (((line = reader.ReadLine()) != null))
{
n++;
if (line.Length > 0)
{
string[] temp = RastiIlgZodiEil(line, skyr);
foreach (string t in temp)
{
line = line.Replace(t, "");
}
fr.WriteLine(line);
}
}
}
}
}
You could remove the longest word(s) from each line with:
static string RemoveLongestWord(string eil, char[] skyr)
{
string[] parts = eil.Split(skyr, StringSplitOptions.RemoveEmptyEntries);
int longestLength = parts.OrderByDescending(s => s.Length).First().Length;
var longestWords = parts.Where(s => s.Length == longestLength);
foreach(string word in longestWords)
{
eil = eil.Replace(word, "");
}
return eil;
}
Just pass each line to the function and you'll get that line back with the longest word removed.
Here's an approach that more closely resembles what you were doing before:
static string[] RastiIlgZodiEil(string eil, char[] skyr)
{
List<string> zodIlg = new List<string>();
string[] parts = eil.Split(skyr, StringSplitOptions.RemoveEmptyEntries);
int maxLength = -1;
foreach (string zodis in parts)
{
if (zodis.Length > maxLength)
{
maxLength = zodis.Length;
}
}
foreach (string zodis in parts)
{
if (zodis.Length == maxLength)
{
zodIlg.Add(zodis);
}
}
return zodIlg.Distinct().ToArray();
}
The first pass finds the longest length. The second pass adds all word that match that length to a List<string>. Finally, we call Distinct() to remove duplicates from the list and return an array version of it.
Here is another solution. Read all lines from a file and split each line using a loop. Aggregate function is used to get the highest length to filter the data further.
static void Main(string[] args)
{
var data = File.ReadAllLines(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "test.txt"));
for (int i = 0; i < data.Length; i++)
{
Console.WriteLine(data[i]);
var split = data[i].Split(' ');
int length = split.Aggregate((a, b) => a.Length >= b.Length ? a : b).Length;
data[i] = string.Join(' ', split.Where(w => w.Length < length));
Console.WriteLine(data[i]);
}
Console.Read();
}
I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}
I'm trying to count the number of words from a rich textbox in C# the code that I have below only works if it is a single line. How do I do this without relying on regex or any other special functions.
string whole_text = richTextBox1.Text;
string trimmed_text = whole_text.Trim();
string[] split_text = trimmed_text.Split(' ');
int space_count = 0;
string new_text = "";
foreach(string av in split_text)
{
if (av == "")
{
space_count++;
}
else
{
new_text = new_text + av + ",";
}
}
new_text = new_text.TrimEnd(',');
split_text = new_text.Split(',');
MessageBox.Show(split_text.Length.ToString ());
char[] delimiters = new char[] {' ', '\r', '\n' };
whole_text.Split(delimiters,StringSplitOptions.RemoveEmptyEntries).Length;
Since you are only interested in word count, and you don't care about individual words, String.Split could be avoided. String.Split is handy, but it unnecessarily generates a (potentially) large number of String objects, which in turn creates an unnecessary burden on the garbage collector. For each word in your text, a new String object needs to be instantiated, and then soon collected since you are not using it.
For a homework assignment, this may not matter, but if your text box contents change often and you do this calculation inside an event handler, it may be wiser to simply iterate through characters manually. If you really want to use String.Split, then go for a simpler version like Yonix recommended.
Otherwise, use an algorithm similar to this:
int wordCount = 0, index = 0;
// skip whitespace until first word
while (index < text.Length && char.IsWhiteSpace(text[index]))
index++;
while (index < text.Length)
{
// check if current char is part of a word
while (index < text.Length && !char.IsWhiteSpace(text[index]))
index++;
wordCount++;
// skip whitespace until next word
while (index < text.Length && char.IsWhiteSpace(text[index]))
index++;
}
This code should work better with cases where you have multiple spaces between each word, you can test the code online.
There are some better ways to do this, but in keeping with what you've got, try the following:
string whole_text = richTextBox1.Text;
string trimmed_text = whole_text.Trim();
// new line split here
string[] lines = trimmed_text.Split(Environment.NewLine.ToCharArray());
// don't need this here now...
//string[] split_text = trimmed_text.Split(' ');
int space_count = 0;
string new_text = "";
Now make two foreach loops. One for each line and one for counting words within the lines.
foreach (string line in lines)
{
// Modify the inner foreach to do the split on ' ' here
// instead of split_text
foreach (string av in line.Split(' '))
{
if (av == "")
{
space_count++;
}
else
{
new_text = new_text + av + ",";
}
}
}
new_text = new_text.TrimEnd(',');
// use lines here instead of split_text
lines = new_text.Split(',');
MessageBox.Show(lines.Length.ToString());
}
This was a phone screening interview question that I just took (by a large company located in CA who sells all kinds of devices that starts with a letter "i"), and I think I franked... after I got offline, I wrote this. I wish I were able to do it during interview..
static void Main(string[] args)
{
Debug.Assert(CountWords("Hello world") == 2);
Debug.Assert(CountWords(" Hello world") == 2);
Debug.Assert(CountWords("Hello world ") == 2);
Debug.Assert(CountWords("Hello world") == 2);
}
public static int CountWords(string test)
{
int count = 0;
bool wasInWord = false;
bool inWord = false;
for (int i = 0; i < test.Length; i++)
{
if (inWord)
{
wasInWord = true;
}
if (Char.IsWhiteSpace(test[i]))
{
if (wasInWord)
{
count++;
wasInWord = false;
}
inWord = false;
}
else
{
inWord = true;
}
}
// Check to see if we got out with seeing a word
if (wasInWord)
{
count++;
}
return count;
}
Have a look at the Lines property mentioned in #Jay Riggs comment, along with this overload of String.Split to make the code much simpler. Then the simplest approach would be to loop over each line in the Lines property, call String.Split on it, and add the length of the array it returns to a running count.
EDIT: Also, is there any reason you're using a RichTextBox instead of a TextBox with Multiline set to True?
I use an extension method for grabbing word count in a string. Do note, however, that double spaces will mess the count up.
public static int CountWords(this string line)
{
var wordCount = 0;
for (var i = 0; i < line.Length; i++)
if (line[i] == ' ' || i == line.Length - 1)
wordCount++;
return wordCount;
}
}
Your approach is on the right path. I would do something like, passing the text property of richTextBox1 into the method. This however won't be accurate if your rich textbox is formatting HTML, so you'll need to strip out any HTML tags prior to running the word count:
public static int CountWords(string s)
{
int c = 0;
for (int i = 1; i < s.Length; i++)
{
if (char.IsWhiteSpace(s[i - 1]) == true)
{
if (char.IsLetterOrDigit(s[i]) == true ||
char.IsPunctuation(s[i]))
{
c++;
}
}
}
if (s.Length > 2)
{
c++;
}
return c;
}
We used an adapted form of Yoshi's answer, where we fixed the bug where it would not count the last word in a string if there was no white-space after it:
public static int CountWords(string test)
{
int count = 0;
bool inWord = false;
foreach (char t in test)
{
if (char.IsWhiteSpace(t))
{
inWord = false;
}
else
{
if (!inWord) count++;
inWord = true;
}
}
return count;
}
using System.Collections;
using System;
class Program{
public static void Main(string[] args){
//Enter the value of n
int n = Convert.ToInt32(Console.ReadLine());
string[] s = new string[n];
ArrayList arr = new ArrayList();
//enter the elements
for(int i=0;i<n;i++){
s[i] = Console.ReadLine();
}
string str = "";
//Filter out duplicate values and store in arr
foreach(string i in s){
if(str.Contains(i)){
}else{
arr.Add(i);
}
str += i;
}
//Count the string with arr and s variables
foreach(string i in arr){
int count = 0;
foreach(string j in s){
if(i.Equals(j)){
count++;
}
}
Console.WriteLine(i+" - "+count);
}
}
}
int wordCount = 0;
bool previousLetterWasWhiteSpace = false;
foreach (char letter in keyword)
{
if (char.IsWhiteSpace(letter))
{
previousLetterWasWhiteSpace = true;
}
else
{
if (previousLetterWasWhiteSpace)
{
previousLetterWasWhiteSpace = false;
wordCount++;
}
}
}
public static int WordCount(string str)
{
int num=0;
bool wasInaWord=true;;
if (string.IsNullOrEmpty(str))
{
return num;
}
for (int i=0;i< str.Length;i++)
{
if (i!=0)
{
if (str[i]==' ' && str[i-1]!=' ')
{
num++;
wasInaWord=false;
}
}
if (str[i]!=' ')
{
wasInaWord=true;
}
}
if (wasInaWord)
{
num++;
}
return num;
}
class Program
{
static void Main(string[] args)
{
string str;
int i, wrd, l;
StringBuilder sb = new StringBuilder();
Console.Write("\n\nCount the total number of words in a string
:\n");
Console.Write("---------------------------------------------------
---\n");
Console.Write("Input the string : ");
str = Console.ReadLine();
l = 0;
wrd = 1;
foreach (var a in str)
{
sb.Append(a);
if (str[l] == ' ' || str[l] == '\n' || str[l] == '\t')
{
wrd++;
}
l++;
}
Console.WriteLine(sb.Replace(' ', '\n'));
Console.Write("Total number of words in the string is : {0}\n",
wrd);
Console.ReadLine();
}
This should work
input.Split(' ').ToList().Count;
This can show you the number of words in a line
string line = Console.ReadLine();
string[] word = line.Split(' ');
Console.WriteLine("Words " + word.Length);
You can also do it in this way!! Add this method to your extension methods.
public static int WordsCount(this string str)
{
return Regex.Matches(str, #"((\w+(\s?)))").Count;
}
And call it like this.
string someString = "Let me show how I do it!";
int wc = someString.WordsCount();
I have a field in my database that holds input from an html input. So I have in my db column data. What I need is to be able to extract this and display a short version of it as an intro. Maybe even the first paragraph if possible.
Something like this maybe?
public string Get(string text, int maxWordCount)
{
int wordCounter = 0;
int stringIndex = 0;
char[] delimiters = new[] { '\n', ' ', ',', '.' };
while (wordCounter < maxWordCount)
{
stringIndex = text.IndexOfAny(delimiters, stringIndex + 1);
if (stringIndex == -1)
return text;
++wordCounter;
}
return text.Substring(0, stringIndex);
}
It's quite simplified and doesnt handle if multiple delimiters comes after each other (for instance ", "). you might just want to use space as a delimiter.
If you want to get just the first paragraph, simply search after "\r\n\r\n" <-- two line breaks:
public string GetFirstParagraph(string text)
{
int pos = text.IndexOf("\r\n\r\n");
return pos == -1 ? text : text.Substring(0, pos);
}
Edit:
A very simplistic way to strip HTML:
return Regex.Replace(text, #”<(.|\n)*?>”, string.Empty);
The Html Agility Pack is usually the recommended way to strip out the HTML. After that it would just be a matter of doing a String.Substring to get the bit of it that you want.
If you need to get out the 2000 first words I suppose you could either use IndexOf to find a whitespace 2000 times and loop through it until then to get the index to use in the call to Substring.
Edit: Add sample method
public int GetIndex(string str, int numberWanted)
{
int count = 0;
int index = 1;
for (; index < str.Length; index++)
{
if (char.IsWhiteSpace(str[index - 1]) == true)
{
if (char.IsLetterOrDigit(str[index]) == true ||
char.IsPunctuation(str[index]))
{
count++;
if (count >= numberWanted)
break;
}
}
}
return index;
}
And call it like:
string wordList = "This is a list of a lot of words";
int i = GetIndex(wordList, 5);
string result = wordList.Substring(0, i);
Once you have your string you would have to count your words. I assume space is a delimiter for words, so the following code should find the first 2000 words in a string (or break out if there are fewer words).
string myString = "la la la";
int lastPosition = 0;
for (int i = 0; i < 2000; i++)
{
int position = myString.IndexOf(' ', lastPosition + 1);
if (position == -1) break;
lastPosition = position;
}
string firstThousandWords = myString.Substring(0, lastPosition);
You can change indexOf to indexOfAny to support more characters as delimiters.
I had the same problem and combined a few Stack Overflow answers into this class. It uses the HtmlAgilityPack which is a better tool for the job. Call:
Words(string html, int n)
To get n words
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace UmbracoUtilities
{
public class Text
{
/// <summary>
/// Return the first n words in the html
/// </summary>
/// <param name="html"></param>
/// <param name="n"></param>
/// <returns></returns>
public static string Words(string html, int n)
{
string words = html, n_words;
words = StripHtml(html);
n_words = GetNWords(words, n);
return n_words;
}
/// <summary>
/// Returns the first n words in text
/// Assumes text is not a html string
/// http://stackoverflow.com/questions/13368345/get-first-250-words-of-a-string
/// </summary>
/// <param name="text"></param>
/// <param name="n"></param>
/// <returns></returns>
public static string GetNWords(string text, int n)
{
StringBuilder builder = new StringBuilder();
//remove multiple spaces
//http://stackoverflow.com/questions/1279859/how-to-replace-multiple-white-spaces-with-one-white-space
string cleanedString = System.Text.RegularExpressions.Regex.Replace(text, #"\s+", " ");
IEnumerable<string> words = cleanedString.Split().Take(n + 1);
foreach (string word in words)
builder.Append(" " + word);
return builder.ToString();
}
/// <summary>
/// Returns a string of html with tags removed
/// </summary>
/// <param name="html"></param>
/// <returns></returns>
public static string StripHtml(string html)
{
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
var root = document.DocumentNode;
var stringBuilder = new StringBuilder();
foreach (var node in root.DescendantsAndSelf())
{
if (!node.HasChildNodes)
{
string text = node.InnerText;
if (!string.IsNullOrEmpty(text))
stringBuilder.Append(" " + text.Trim());
}
}
return stringBuilder.ToString();
}
}
}
Merry Christmas!
What is the best way to convert from Pascal Case (upper Camel Case) to a sentence.
For example starting with
"AwaitingFeedback"
and converting that to
"Awaiting feedback"
C# preferable but I could convert it from Java or similar.
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => m.Value[0] + " " + char.ToLower(m.Value[1]));
}
In versions of visual studio after 2015, you can do
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => $"{m.Value[0]} {char.ToLower(m.Value[1])}");
}
Based on: Converting Pascal case to sentences using regular expression
I will prefer to use Humanizer for this. Humanizer is a Portable Class Library that meets all your .NET needs for manipulating and displaying strings, enums, dates, times, timespans, numbers and quantities.
Short Answer
"AwaitingFeedback".Humanize() => Awaiting feedback
Long and Descriptive Answer
Humanizer can do a lot more work other examples are:
"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Can_return_title_Case".Humanize(LetterCasing.Title) => "Can Return Title Case"
"CanReturnLowerCase".Humanize(LetterCasing.LowerCase) => "can return lower case"
Complete code is :
using Humanizer;
using static System.Console;
namespace HumanizerConsoleApp
{
class Program
{
static void Main(string[] args)
{
WriteLine("AwaitingFeedback".Humanize());
WriteLine("PascalCaseInputStringIsTurnedIntoSentence".Humanize());
WriteLine("Underscored_input_string_is_turned_into_sentence".Humanize());
WriteLine("Can_return_title_Case".Humanize(LetterCasing.Title));
WriteLine("CanReturnLowerCase".Humanize(LetterCasing.LowerCase));
}
}
}
Output
Awaiting feedback
Pascal case input string is turned into sentence
Underscored input string is turned into sentence Can Return Title Case
can return lower case
If you prefer to write your own C# code you can achieve this by writing some C# code stuff as answered by others already.
Here you go...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace CamelCaseToString
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(CamelCaseToString("ThisIsYourMasterCallingYou"));
}
private static string CamelCaseToString(string str)
{
if (str == null || str.Length == 0)
return null;
StringBuilder retVal = new StringBuilder(32);
retVal.Append(char.ToUpper(str[0]));
for (int i = 1; i < str.Length; i++ )
{
if (char.IsLower(str[i]))
{
retVal.Append(str[i]);
}
else
{
retVal.Append(" ");
retVal.Append(char.ToLower(str[i]));
}
}
return retVal.ToString();
}
}
}
This works for me:
Regex.Replace(strIn, "([A-Z]{1,2}|[0-9]+)", " $1").TrimStart()
This is just like #SSTA, but is more efficient than calling TrimStart.
Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")
Found this in the MvcContrib source, doesn't seem to be mentioned here yet.
return Regex.Replace(input, "([A-Z])", " $1", RegexOptions.Compiled).Trim();
Just because everyone has been using Regex (except this guy), here's an implementation with StringBuilder that was about 5x faster in my tests. Includes checking for numbers too.
"SomeBunchOfCamelCase2".FromCamelCaseToSentence == "Some Bunch Of Camel Case 2"
public static string FromCamelCaseToSentence(this string input) {
if(string.IsNullOrEmpty(input)) return input;
var sb = new StringBuilder();
// start with the first character -- consistent camelcase and pascal case
sb.Append(char.ToUpper(input[0]));
// march through the rest of it
for(var i = 1; i < input.Length; i++) {
// any time we hit an uppercase OR number, it's a new word
if(char.IsUpper(input[i]) || char.IsDigit(input[i])) sb.Append(' ');
// add regularly
sb.Append(input[i]);
}
return sb.ToString();
}
Here's a basic way of doing it that I came up with using Regex
public static string CamelCaseToSentence(this string value)
{
var sb = new StringBuilder();
var firstWord = true;
foreach (var match in Regex.Matches(value, "([A-Z][a-z]+)|[0-9]+"))
{
if (firstWord)
{
sb.Append(match.ToString());
firstWord = false;
}
else
{
sb.Append(" ");
sb.Append(match.ToString().ToLower());
}
}
return sb.ToString();
}
It will also split off numbers which I didn't specify but would be useful.
string camel = "MyCamelCaseString";
string s = Regex.Replace(camel, "([A-Z])", " $1").ToLower().Trim();
Console.WriteLine(s.Substring(0,1).ToUpper() + s.Substring(1));
Edit: didn't notice your casing requirements, modifed accordingly. You could use a matchevaluator to do the casing, but I think a substring is easier. You could also wrap it in a 2nd regex replace where you change the first character
"^\w"
to upper
\U (i think)
I'd use a regex, inserting a space before each upper case character, then lowering all the string.
string spacedString = System.Text.RegularExpressions.Regex.Replace(yourString, "\B([A-Z])", " \k");
spacedString = spacedString.ToLower();
It is easy to do in JavaScript (or PHP, etc.) where you can define a function in the replace call:
var camel = "AwaitingFeedbackDearMaster";
var sentence = camel.replace(/([A-Z].)/g, function (c) { return ' ' + c.toLowerCase(); });
alert(sentence);
Although I haven't solved the initial cap problem... :-)
Now, for the Java solution:
String ToSentence(String camel)
{
if (camel == null) return ""; // Or null...
String[] words = camel.split("(?=[A-Z])");
if (words == null) return "";
if (words.length == 1) return words[0];
StringBuilder sentence = new StringBuilder(camel.length());
if (words[0].length() > 0) // Just in case of camelCase instead of CamelCase
{
sentence.append(words[0] + " " + words[1].toLowerCase());
}
else
{
sentence.append(words[1]);
}
for (int i = 2; i < words.length; i++)
{
sentence.append(" " + words[i].toLowerCase());
}
return sentence.toString();
}
System.out.println(ToSentence("AwaitingAFeedbackDearMaster"));
System.out.println(ToSentence(null));
System.out.println(ToSentence(""));
System.out.println(ToSentence("A"));
System.out.println(ToSentence("Aaagh!"));
System.out.println(ToSentence("stackoverflow"));
System.out.println(ToSentence("disableGPS"));
System.out.println(ToSentence("Ahh89Boo"));
System.out.println(ToSentence("ABC"));
Note the trick to split the sentence without loosing any character...
Pseudo-code:
NewString = "";
Loop through every char of the string (skip the first one)
If char is upper-case ('A'-'Z')
NewString = NewString + ' ' + lowercase(char)
Else
NewString = NewString + char
Better ways can perhaps be done by using regex or by string replacement routines (replace 'X' with ' x')
An xquery solution that works for both UpperCamel and lowerCamel case:
To output sentence case (only the first character of the first word is capitalized):
declare function content:sentenceCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),lower-case(replace($remainingCharacters, '([A-Z])', ' $1')))
};
To output title case (first character of each word capitalized):
declare function content:titleCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),replace($remainingCharacters, '([A-Z])', ' $1'))
};
Found myself doing something similar, and I appreciate having a point-of-departure with this discussion. This is my solution, placed as an extension method to the string class in the context of a console application.
using System;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string piratese = "avastTharMatey";
string ivyese = "CheerioPipPip";
Console.WriteLine("{0}\n{1}\n", piratese.CamelCaseToString(), ivyese.CamelCaseToString());
Console.WriteLine("For Pete\'s sake, man, hit ENTER!");
string strExit = Console.ReadLine();
}
}
public static class StringExtension
{
public static string CamelCaseToString(this string str)
{
StringBuilder retVal = new StringBuilder(32);
if (!string.IsNullOrEmpty(str))
{
string strTrimmed = str.Trim();
if (!string.IsNullOrEmpty(strTrimmed))
{
retVal.Append(char.ToUpper(strTrimmed[0]));
if (strTrimmed.Length > 1)
{
for (int i = 1; i < strTrimmed.Length; i++)
{
if (char.IsUpper(strTrimmed[i])) retVal.Append(" ");
retVal.Append(char.ToLower(strTrimmed[i]));
}
}
}
}
return retVal.ToString();
}
}
}
Most of the preceding answers split acronyms and numbers, adding a space in front of each character. I wanted acronyms and numbers to be kept together so I have a simple state machine that emits a space every time the input transitions from one state to the other.
/// <summary>
/// Add a space before any capitalized letter (but not for a run of capitals or numbers)
/// </summary>
internal static string FromCamelCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input)) return String.Empty;
var sb = new StringBuilder();
bool upper = true;
for (var i = 0; i < input.Length; i++)
{
bool isUpperOrDigit = char.IsUpper(input[i]) || char.IsDigit(input[i]);
// any time we transition to upper or digits, it's a new word
if (!upper && isUpperOrDigit)
{
sb.Append(' ');
}
sb.Append(input[i]);
upper = isUpperOrDigit;
}
return sb.ToString();
}
And here's some tests:
[TestCase(null, ExpectedResult = "")]
[TestCase("", ExpectedResult = "")]
[TestCase("ABC", ExpectedResult = "ABC")]
[TestCase("abc", ExpectedResult = "abc")]
[TestCase("camelCase", ExpectedResult = "camel Case")]
[TestCase("PascalCase", ExpectedResult = "Pascal Case")]
[TestCase("Pascal123", ExpectedResult = "Pascal 123")]
[TestCase("CustomerID", ExpectedResult = "Customer ID")]
[TestCase("CustomABC123", ExpectedResult = "Custom ABC123")]
public string CanSplitCamelCase(string input)
{
return FromCamelCaseToSentence(input);
}
Mostly already answered here
Small chage to the accepted answer, to convert the second and subsequent Capitalised letters to lower case, so change
if (char.IsUpper(text[i]))
newText.Append(' ');
newText.Append(text[i]);
to
if (char.IsUpper(text[i]))
{
newText.Append(' ');
newText.Append(char.ToLower(text[i]));
}
else
newText.Append(text[i]);
Here is my implementation. This is the fastest that I got while avoiding creating spaces for abbreviations.
public static string PascalCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input) || input.Length < 2)
return input;
var sb = new char[input.Length + ((input.Length + 1) / 2)];
var len = 0;
var lastIsLower = false;
for (int i = 0; i < input.Length; i++)
{
var current = input[i];
if (current < 97)
{
if (lastIsLower)
{
sb[len] = ' ';
len++;
}
lastIsLower = false;
}
else
{
lastIsLower = true;
}
sb[len] = current;
len++;
}
return new string(sb, 0, len);
}