Accidentally splitting unicode chars when truncating strings

Accidentally splitting unicode chars when truncating strings - c#

I'm saving some strings from a third party into my database (postgres). Sometimes these strings are too long and need to be truncated to fit into the column in my table.
On some random occasions I accidentally truncate the string right where there is a Unicode character, which gives me a "broken" string that I cannot save into the database. I get the following error: Unable to translate Unicode character \uD83D at index XXX to specified code page.
I've created a minimal example to show you what I mean. Here I have a string that contains a Unicode character ("Small blue diamond" 🔹 U+1F539). Depending on where I truncate, it gives me a valid string or not.
var myString = #"This is a string before an emoji:🔹 This is after the emoji.";
var brokenString = myString.Substring(0, 34);
// Gives: "This is a string before an emoji:☐"
var test3 = myString.Substring(0, 35);
// Gives: "This is a string before an emoji:🔹"
Is there a way for me to truncate the string without accidentally breaking any Unicode chars?

A Unicode character may be represented with several chars, that is the problem with string.Substring you are having.
You may convert your string to a StringInfo object and then use SubstringByTextElements() method to get the substring based on the Unicode character count, not a char count.
See a C# demo:
Console.WriteLine("🔹".Length); // => 2
Console.WriteLine(new StringInfo("🔹").LengthInTextElements); // => 1
var myString = #"This is a string before an emoji:🔹This is after the emoji.";
var teMyString = new StringInfo(myString);
Console.WriteLine(teMyString.SubstringByTextElements(0, 33));
// => "This is a string before an emoji:"
Console.WriteLine(teMyString.SubstringByTextElements(0, 34));
// => This is a string before an emoji:🔹
Console.WriteLine(teMyString.SubstringByTextElements(0, 35));
// => This is a string before an emoji:🔹T

I ended up using a modification of xanatos answer here. The difference is that this version will strip the last grapheme, if adding it would give a string longer than length.
public static string UnicodeSafeSubstring(this string str, int startIndex, int length)
{
if (str == null)
{
throw new ArgumentNullException(nameof(str));
}
if (startIndex < 0 || startIndex > str.Length)
{
throw new ArgumentOutOfRangeException(nameof(startIndex));
}
if (length < 0)
{
throw new ArgumentOutOfRangeException(nameof(length));
}
if (startIndex + length > str.Length)
{
throw new ArgumentOutOfRangeException(nameof(length));
}
if (length == 0)
{
return string.Empty;
}
var stringBuilder = new StringBuilder(length);
var enumerator = StringInfo.GetTextElementEnumerator(str, startIndex);
while (enumerator.MoveNext())
{
var grapheme = enumerator.GetTextElement();
startIndex += grapheme.Length;
if (startIndex > str.Length)
{
break;
}
// Skip initial Low Surrogates/Combining Marks
if (stringBuilder.Length == 0)
{
if (char.IsLowSurrogate(grapheme[0]))
{
continue;
}
var cat = char.GetUnicodeCategory(grapheme, 0);
if (cat == UnicodeCategory.NonSpacingMark || cat == UnicodeCategory.SpacingCombiningMark || cat == UnicodeCategory.EnclosingMark)
{
continue;
}
}
// Do not append the grapheme if the resulting string would be longer than the required length
if (stringBuilder.Length + grapheme.Length <= length)
{
stringBuilder.Append(grapheme);
}
if (stringBuilder.Length >= length)
{
break;
}
}
return stringBuilder.ToString();
}
}

Here is an example for truncate (startIndex = 0):
string truncatedStr = (str.Length > maxLength)
? str.Substring(0, maxLength - (char.IsLowSurrogate(str[maxLength]) ? 1 : 0))
: str;

Better truncate by the number of bytes not string length
public static string TruncateByBytes(this string text, int maxBytes)
{
if (string.IsNullOrEmpty(text) || Encoding.UTF8.GetByteCount(text) <= maxBytes)
{
return text;
}
var enumerator = StringInfo.GetTextElementEnumerator(text);
var newStr = string.Empty;
do
{
enumerator.MoveNext();
if (Encoding.UTF8.GetByteCount(newStr + enumerator.Current) <= maxBytes)
{
newStr += enumerator.Current;
}
else
{
break;
}
} while (true);
return newStr;
}

Related

I want to write a function that takes a string and returns the string with at least 3 digits at the end

If the string passed in already has 3 digits at the end then return unchanged. If the string passed in does not have 3 digits at the end then need to insert zeros before any digits at the end to have 3 digits.
I have done coding where i had put some logic in private static string stringCleaner(string inputString) to implement but its giving this error:
Test:'A12' Expected:'A012' Exception:Index was outside the bounds of the array.
Test:'A12345' Expected:'A12345' Exception:Index was outside the bounds of the array.
Test:'A1B3' Expected:'A1B003' Exception:Index was outside the bounds of the array.
Test:'' Expected:'000' Exception:Object reference not set to an instance of an object.
Test:'' Expected:'000' Actual:'000' Result:Pass
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConvertToCamelCaseCS
{
class Program
{
static void Main(string[] args)
{
List<string[]> testValues = new List<string[]>()
{
new string[]{"A12","A012"},
new string[]{"A12345","A12345"},
new string[]{"A1B3","A1B003"},
new string[]{null, "000"},
new string[]{"", "000"}
};
foreach (string[] testValue in testValues)
{
testStringCleaner(testValue[0], testValue[1]);
}
Console.ReadLine();
}
private static void testStringCleaner(string inputString, string expectedString)
{
try
{
String actualString = stringCleaner(inputString);
String passOrFail = (actualString == expectedString) ? "Pass" : "Fail";
Console.WriteLine("Test:'{0}' Expected:'{1}' Actual:'{2}' Result:{3}", inputString, expectedString, actualString, passOrFail);
}
catch (Exception ex)
{
Console.WriteLine("Test:'{0}' Expected:'{1}' Exception:{2}", inputString, expectedString, ex.Message);
}
}
private static string stringCleaner(string inputString)
{
string result = inputString;
int lengthOfString = result.Length;
int changeIndex = 0;
if (lengthOfString == 0)
{
result = "000";
}
else
{
for (int i = lengthOfString; i >= lengthOfString - 2; i--)
{
char StrTOChar = (char)result[i];
int CharToInt = (int)StrTOChar;
if (CharToInt >= 65 && CharToInt <= 122)
{
changeIndex = i;
break;
}
}
if (lengthOfString == changeIndex + 3)
{
return result;
}
else
{
if (changeIndex == lengthOfString)
{
return result = result + "000";
}
else if (changeIndex + 1 == lengthOfString)
{
return result = result.Substring(0, changeIndex) + "00" + result.Substring(changeIndex + 1, lengthOfString);
}
else if(changeIndex+2==lengthOfString)
{
return result = result.Substring(0, changeIndex) + "0" + result.Substring(changeIndex + 1, lengthOfString);
}
}
}
return result;
}
}
}

You are overcomplicating this a lot from what I can tell among this somewhat confusing question and code.
I would use substring to extract the last 3 characters, and then check that string from the back whether it is a digit using Char.IsDigit. Depending on when you run into a non-digit you add a certain amount of zero using simple string concatenation.
Perhaps try to rewrite your code from scratch now that you probably have a better idea of how to do this.

char StrTOChar = (char)result[i];
Your problem is in this line. (Line: 60)
You used i that starts from result.Length. And, results[result.Length] is outside of the bounds of the array. You must use it lower than the length of the array.

Let's implement:
private static string stringCleaner(string value) {
// let's not hardcode magic values: 3, "000" etc. but a have a constant
const int digits_at_least = 3;
// special case: null or empty string
if (string.IsNullOrEmpty(value))
return new string('0', digits_at_least);
int digits = 0;
// let's count digits starting from the end
// && digits < digits_at_least - do not loop if we have enough digits
// (value[i] >= '0' && value[i] <= '9') - we want 0..9 digits only,
// not unicode digits (e.g. Persian ones) - char.IsDigit
for (int i = value.Length - 1; i >= 0 && digits < digits_at_least; --i)
if (value[i] >= '0' && value[i] <= '9')
digits += 1;
else
break;
if (digits >= digits_at_least) // we have enough digits, return as it is
return value;
else
return value.Substring(0, value.Length - digits) +
new string('0', digits_at_least - digits) + // inserting zeros
value.Substring(value.Length - digits);
}
Tests:
using System.Linq;
...
var testValues = new string[][] {
new string[]{"A12","A012"},
new string[]{"A12345","A12345"},
new string[]{"A1B3","A1B003"},
new string[]{null, "000"},
new string[]{"", "000"}
};
// Failed tests
var failed = testValues
.Where(test => test[1] != stringCleaner(test[0]))
.Select(test =>
$"stringCleaner ({test[0]}) == {stringCleaner(test[0])} expected {test[1]}");
string failedReport = string.Join(Environment.NewLine, failed);
// All failed tests
Console.WriteLine(failedReport);
// All tests and their results
var allTests = testValues
.Select(test => new {
argument = test[0],
expected = test[1],
actual = stringCleaner(test[0]),
})
.Select(test => $"{(test.expected == test.actual ? "passed" : $"failed: f({test.argument}) = {test.actual} expected {test.expected}")}");
string allReport = string.Join(Environment.NewLine, allTests);
Console.WriteLine(allReport);
Outcome (no failedReport and all tests passed):
passed
passed
passed
passed
passed

toTitleCase to ignore ordinals in C#

I'm trying to figure out a way to use toTitleCase to ignore ordinals. It works as I want it to for all string except for ordinals (e.g. 1st, 2nd, 3rd becomes 1St, 2Nd, 3Rd).
Any help would be appreciated. A regular expression may be the way to handle this, I'm just not sure how such a regex would be constructed.
Update: Here is the solution I used (Using John's answer I wrote below extension method):
public static string ToTitleCaseIgnoreOrdinals(this string text)
{
string input = System.Globalization.CultureInfo.CurrentCulture.TextInfo.ToTitleCase(text);
string result = System.Text.RegularExpressions.Regex.Replace(input, "([0-9]st)|([0-9]th)|([0-9]rd)|([0-9]nd)", new System.Text.RegularExpressions.MatchEvaluator((m) => m.Captures[0].Value.ToLower()), System.Text.RegularExpressions.RegexOptions.IgnoreCase);
return result;
}

string input = System.Globalization.CultureInfo.CurrentCulture.TextInfo.ToTitleCase("hello there, this is the 1st");
string result = System.Text.RegularExpressions.Regex.Replace(input, "([0-9]st)|([0-9]th)|([0-9]rd)|([0-9]nd)", new System.Text.RegularExpressions.MatchEvaluator((m) =>
{
return m.Captures[0].Value.ToLower();
}), System.Text.RegularExpressions.RegexOptions.IgnoreCase);

You can use regular expressions to check if the string starts with a digit before you convert to Title Case, like this:
if (!Regex.IsMatch(text, #"^\d+"))
{
CultureInfo.CurrentCulture.TextInfo.toTitleCase(text);
}
Edit: forgot to reverse the conditional... changed so it will apply toTitleCase if it DOESN'T match.
2nd edit: added loop to check all words in a sentence:
string text = "150 east 40th street";
string[] array = text.Split(' ');
for (int i = 0; i < array.Length; i++)
{
if (!Regex.IsMatch(array[i], #"^\d+"))
{
array[i] = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(array[i]);
}
}
string newText = string.Join(" ",array);

I would split the text up and iterate through the resulting array, skipping things that don't start with a letter.
using System.Globalization;
TextInfo textInfo = new CultureInfo("en-US", false).TextInfo;
string[] text = myString.Split();
for(int i = 0; i < text.Length; i++)
{ //Check for zero-length strings, because these will throw an
//index out of range exception in Char.IsLetter
if (text[i].Length > 0 && Char.IsLetter(text[i][0]))
{
text[i] = textInfo.ToTitleCase(text[i]);
}
}

You could simply use String.Replace (or StringBuilder.Replace):
string[] ordinals = { "1St", "2Nd", "3Rd" }; // add all others
string text = "This is just sample text which contains some ordinals, the 1st, the 2nd and the third.";
var sb = new StringBuilder(CultureInfo.InvariantCulture.TextInfo.ToTitleCase(text));
foreach (string ordinal in ordinals)
sb.Replace(ordinal, ordinal.ToLowerInvariant());
text = sb.ToString();
this is not elegant at all. It requires you to maintain an infinite
list of ordinals on the first line. I'm assuming that's why someone
downvoted you.
It's not elegant but it works better than other simple approaches like the regex. You want to title-case words in longer text. But only words which are not ordinal-numbers. An ordinal number is f.e. 1st, 2nd or 3rd and 31st but not 31th. So the simple regex sollutions will fail fast. You also want to title-case words like 10m to 10M (where M could be the abbreviation for million).
So i don't understand why it's so bad to maintain a list of ordinal numbers.
You could even generate them automatically with an upper-limit, for example:
public static IEnumerable<string> GetTitleCaseOrdinalNumbers()
{
for (int num = 1; num <= int.MaxValue; num++)
{
switch (num % 100)
{
case 11:
case 12:
case 13:
yield return num + "Th";
break;
}
switch (num % 10)
{
case 1:
yield return num + "St"; break;
case 2:
yield return num + "Nd"; break;
case 3:
yield return num + "Rd"; break;
default:
yield return num + "Th"; break;
}
}
}
So if you want to check for the first 1000 ordinal numbers:
foreach (string ordinal in GetTitleCaseOrdinalNumbers().Take(1000))
sb.Replace(ordinal, ordinal.ToLowerInvariant());
Update
For what it's worth, here is my try to provide an efficient way that really checks words (and not only substrings) and skips ToTitleCase on words which really represent ordinal numbers(so not 31th but 31st for example). It also takes care of separator chars that are not white-spaces (like dots or commas):
private static readonly char[] separator = { '.', ',', ';', ':', '-', '(', ')', '\\', '{', '}', '[', ']', '/', '\\', '\'', '"', '"', '?', '!', '|' };
public static bool IsOrdinalNumber(string word)
{
if (word.Any(char.IsWhiteSpace))
return false; // white-spaces are not allowed
if (word.Length < 3)
return false;
var numericPart = word.TakeWhile(char.IsDigit);
string numberText = string.Join("", numericPart);
if (numberText.Length == 0)
return false;
int number;
if (!int.TryParse(numberText, out number))
return false; // handle unicode digits which are not really numeric like ۵
string ordinalNumber;
switch (number % 100)
{
case 11:
case 12:
case 13:
ordinalNumber = number + "th";
break;
}
switch (number % 10)
{
case 1:
ordinalNumber = number + "st"; break;
case 2:
ordinalNumber = number + "nd"; break;
case 3:
ordinalNumber = number + "rd"; break;
default:
ordinalNumber = number + "th"; break;
}
string checkForOrdinalNum = numberText + word.Substring(numberText.Length);
return checkForOrdinalNum.Equals(ordinalNumber, StringComparison.CurrentCultureIgnoreCase);
}
public static string ToTitleCaseIgnoreOrdinalNumbers(string text, TextInfo info)
{
if(text.Trim().Length < 3)
return info.ToTitleCase(text);
int whiteSpaceIndex = FindWhiteSpaceIndex(text, 0, separator);
if(whiteSpaceIndex == -1)
{
if(IsOrdinalNumber(text.Trim()))
return text;
else
return info.ToTitleCase(text);
}
StringBuilder sb = new StringBuilder();
int wordStartIndex = 0;
if(whiteSpaceIndex == 0)
{
// starts with space, find word
wordStartIndex = FindNonWhiteSpaceIndex(text, 1, separator);
sb.Append(text.Remove(wordStartIndex)); // append leading spaces
}
while(wordStartIndex >= 0)
{
whiteSpaceIndex = FindWhiteSpaceIndex(text, wordStartIndex + 1, separator);
string word;
if(whiteSpaceIndex == -1)
word = text.Substring(wordStartIndex);
else
word = text.Substring(wordStartIndex, whiteSpaceIndex - wordStartIndex);
if(IsOrdinalNumber(word))
sb.Append(word);
else
sb.Append(info.ToTitleCase(word));
wordStartIndex = FindNonWhiteSpaceIndex(text, whiteSpaceIndex + 1, separator);
string whiteSpaces;
if(wordStartIndex >= 0)
whiteSpaces = text.Substring(whiteSpaceIndex, wordStartIndex - whiteSpaceIndex);
else
whiteSpaces = text.Substring(whiteSpaceIndex);
sb.Append(whiteSpaces); // append spaces between words
}
return sb.ToString();
}
public static int FindWhiteSpaceIndex(string text, int startIndex = 0, params char[] separator)
{
bool checkSeparator = separator != null && separator.Any();
for (int i = startIndex; i < text.Length; i++)
{
char c = text[i];
if (char.IsWhiteSpace(c) || (checkSeparator && separator.Contains(c)))
return i;
}
return -1;
}
public static int FindNonWhiteSpaceIndex(string text, int startIndex = 0, params char[] separator)
{
bool checkSeparator = separator != null && separator.Any();
for (int i = startIndex; i < text.Length; i++)
{
char c = text[i];
if (!char.IsWhiteSpace(text[i]) && (!checkSeparator || !separator.Contains(c)))
return i;
}
return -1;
}
Note that this is really not tested yet but should give you an idea.

This would work for those strings, you could override ToTitleCase() via an Extension method.
string s = "1st";
if ( s[0] >= '0' && s[0] <= '9' ) {
//this string starts with a number
//so don't call ToTitleCase()
}
else { //call ToTileCase() }

How can I do take a substring and length of a string in C# if that string might be a null?

I have the following strings:
var temp = null;
var temp = "";
var temp = "12345678";
var temp = "1234567890";
What I need to do is if have a function that will give me the last four digits of the input variable if the input variable is 8 or 10 characters long. Otherwise I need it to return "";
Is there an easy way I can do this in C#. I am just not sure how to deal with null because if I get the length of null then I think that will give me an error.

int length = (temp ?? "").Length;
string subString = "";
if(length == 8 || length == 10)
{
subString = temp.Substring(length - 4);
}

You can use IsNullOrEmpty, As if string is null or Empty then substring is not possible.
if (!String.IsNullOrEmpty(str) && (str.Length == 8 || str.Length == 10))
{
string substr = str.Substring(str.Length-4);
}

Try this
string YourFunctionName(string input)
{
string rVal = "";
if (string.IsNullOrEmpty(input))
return rVal;
if (input.Length == 8 || input.Length == 10)
rVal = input.Substring(input.Length - 4);
return rVal;
}

I would likely write it like this, if writing a function:
string GiveMeABetterName (string input) {
if (input != null
&& (input.Length == 8 || input.Length == 10)) {
return input.Substring(input.Length - 4);
} else {
return "";
}
}
As can be seen by all the answers, there is multiple ways to accomplish this.

split a string with max character limit

I'm attempting to split a string into many strings (List) with each one having a maximum limit of characters. So say if I had a string of 500 characters, and I want each string to have a max of 75, there would be 7 strings, and the last one would not have a full 75.
I've tried some of the examples I have found on stackoverflow, but they 'truncate' the results. Any ideas?

You can write your own extension method to do something like that
static class StringExtensions
{
public static IEnumerable<string> SplitOnLength(this string input, int length)
{
int index = 0;
while (index < input.Length)
{
if (index + length < input.Length)
yield return input.Substring(index, length);
else
yield return input.Substring(index);
index += length;
}
}
}
And then you could call it like this
string temp = new string('#', 500);
string[] array = temp.SplitOnLength(75).ToArray();
foreach (string x in array)
Console.WriteLine(x);

I think this is a little cleaner than the other answers:
public static IEnumerable<string> SplitByLength(string s, int length)
{
while (s.Length > length)
{
yield return s.Substring(0, length);
s = s.Substring(length);
}
if (s.Length > 0) yield return s;
}

I would tackle this with a loop using C# String.Substring method.
Note that this isn't exact code, but you get the idea.
var myString = "hello world";
List<string> list = new List();
int maxSize
while(index < myString.Length())
{
if(index + maxSize > myString.Length())
{
// handle last case
list.Add(myString.Substring(index));
break;
}
else
{
list.Add(myString.Substring(index,maxSize));
index+= maxSize;
}
}

When you say split, are you referring to the split function? If not, something like this will work:
List<string> list = new List<string>();
string s = "";
int num = 75;
while (s.Length > 0)
{
list.Add(s.Substring(0, num));
s = s.Remove(0, num);
}

i am assuming maybe a delimiter - like space character.
search on the string (instr) until you find the next position of the delimiter.
if that is < your substring length (75) then append to the current substring.
if not, start a new substring.
special case - if there is no delimiter in the entire substring - then you need to define what happens - like add a '-' then continue.

public static string SplitByLength(string s, int length)
{
ArrayList sArrReturn = new ArrayList();
String[] sArr = s.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string sconcat in sArr)
{
if (((String.Join(" ", sArrReturn.ToArray()).Length + sconcat.Length)+1) < length)
sArrReturn.Add(sconcat);
else
break;
}
return String.Join(" ", sArrReturn.ToArray());
}
public static string SplitByLengthOld(string s, int length)
{
try
{
string sret = string.Empty;
String[] sArr = s.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string sconcat in sArr)
{
if ((sret.Length + sconcat.Length + 1) < length)
sret = string.Format("{0}{1}{2}", sret, string.IsNullOrEmpty(sret) ? string.Empty : " ", sconcat);
}
return sret;
}
catch
{
return string.Empty;
}
}

Truncate string on whole words in .NET C#

I am trying to truncate some long text in C#, but I don't want my string to be cut off part way through a word. Does anyone have a function that I can use to truncate my string at the end of a word?
E.g:
"This was a long string..."
Not:
"This was a long st..."

Try the following. It is pretty rudimentary. Just finds the first space starting at the desired length.
public static string TruncateAtWord(this string value, int length) {
if (value == null || value.Length < length || value.IndexOf(" ", length) == -1)
return value;
return value.Substring(0, value.IndexOf(" ", length));
}

Thanks for your answer Dave. I've tweaked the function a bit and this is what I'm using ... unless there are any more comments ;)
public static string TruncateAtWord(this string input, int length)
{
if (input == null || input.Length < length)
return input;
int iNextSpace = input.LastIndexOf(" ", length, StringComparison.Ordinal);
return string.Format("{0}…", input.Substring(0, (iNextSpace > 0) ? iNextSpace : length).Trim());
}

My contribution:
public static string TruncateAtWord(string text, int maxCharacters, string trailingStringIfTextCut = "…")
{
if (text == null || (text = text.Trim()).Length <= maxCharacters)
return text;
int trailLength = trailingStringIfTextCut.StartsWith("&") ? 1
: trailingStringIfTextCut.Length;
maxCharacters = maxCharacters - trailLength >= 0 ? maxCharacters - trailLength
: 0;
int pos = text.LastIndexOf(" ", maxCharacters);
if (pos >= 0)
return text.Substring(0, pos) + trailingStringIfTextCut;
return string.Empty;
}
This is what I use in my projects, with optional trailing. Text will never exceed the maxCharacters + trailing text length.

If you are using windows forms, in the Graphics.DrawString method, there is an option in StringFormat to specify if the string should be truncated, if it does not fit into the area specified. This will handle adding the ellipsis as necessary.
http://msdn.microsoft.com/en-us/library/system.drawing.stringtrimming.aspx

I took your approach a little further:
public string TruncateAtWord(string value, int length)
{
if (value == null || value.Trim().Length <= length)
return value;
int index = value.Trim().LastIndexOf(" ");
while ((index + 3) > length)
index = value.Substring(0, index).Trim().LastIndexOf(" ");
if (index > 0)
return value.Substring(0, index) + "...";
return value.Substring(0, length - 3) + "...";
}
I'm using this to truncate tweets.

This solution works too (takes first 10 words from myString):
String.Join(" ", myString.Split(' ').Take(10))

Taking into account more than just a blank space separator (e.g. words can be separated by periods followed by newlines, followed by tabs, etc.), and several other edge cases, here is an appropriate extension method:
public static string GetMaxWords(this string input, int maxWords, string truncateWith = "...", string additionalSeparators = ",-_:")
{
int words = 1;
bool IsSeparator(char c) => Char.IsSeparator(c) || additionalSeparators.Contains(c);
IEnumerable<char> IterateChars()
{
yield return input[0];
for (int i = 1; i < input.Length; i++)
{
if (IsSeparator(input[i]) && !IsSeparator(input[i - 1]))
if (words == maxWords)
{
foreach (char c in truncateWith)
yield return c;
break;
}
else
words++;
yield return input[i];
}
}
return !input.IsNullOrEmpty()
? new String(IterateChars().ToArray())
: String.Empty;
}

simplified, added trunking character option and made it an extension.
public static string TruncateAtWord(this string value, int maxLength)
{
if (value == null || value.Trim().Length <= maxLength)
return value;
string ellipse = "...";
char[] truncateChars = new char[] { ' ', ',' };
int index = value.Trim().LastIndexOfAny(truncateChars);
while ((index + ellipse.Length) > maxLength)
index = value.Substring(0, index).Trim().LastIndexOfAny(truncateChars);
if (index > 0)
return value.Substring(0, index) + ellipse;
return value.Substring(0, maxLength - ellipse.Length) + ellipse;
}

Heres what i came up with. This is to get the rest of the sentence also in chunks.
public static List<string> SplitTheSentenceAtWord(this string originalString, int length)
{
try
{
List<string> truncatedStrings = new List<string>();
if (originalString == null || originalString.Trim().Length <= length)
{
truncatedStrings.Add(originalString);
return truncatedStrings;
}
int index = originalString.Trim().LastIndexOf(" ");
while ((index + 3) > length)
index = originalString.Substring(0, index).Trim().LastIndexOf(" ");
if (index > 0)
{
string retValue = originalString.Substring(0, index) + "...";
truncatedStrings.Add(retValue);
string shortWord2 = originalString;
if (retValue.EndsWith("..."))
{
shortWord2 = retValue.Replace("...", "");
}
shortWord2 = originalString.Substring(shortWord2.Length);
if (shortWord2.Length > length) //truncate it further
{
List<string> retValues = SplitTheSentenceAtWord(shortWord2.TrimStart(), length);
truncatedStrings.AddRange(retValues);
}
else
{
truncatedStrings.Add(shortWord2.TrimStart());
}
return truncatedStrings;
}
var retVal_Last = originalString.Substring(0, length - 3);
truncatedStrings.Add(retVal_Last + "...");
if (originalString.Length > length)//truncate it further
{
string shortWord3 = originalString;
if (originalString.EndsWith("..."))
{
shortWord3 = originalString.Replace("...", "");
}
shortWord3 = originalString.Substring(retVal_Last.Length);
List<string> retValues = SplitTheSentenceAtWord(shortWord3.TrimStart(), length);
truncatedStrings.AddRange(retValues);
}
else
{
truncatedStrings.Add(retVal_Last + "...");
}
return truncatedStrings;
}
catch
{
return new List<string> { originalString };
}
}

I use this
public string Truncate(string content, int length)
{
try
{
return content.Substring(0,content.IndexOf(" ",length)) + "...";
}
catch
{
return content;
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Accidentally splitting unicode chars when truncating strings - c#

Here is an example for truncate (startIndex = 0): string truncatedStr = (str.Length > maxLength) ? str.Substring(0, maxLength - (char.IsLowSurrogate(str[maxLength]) ? 1 : 0)) : str;

Related

I want to write a function that takes a string and returns the string with at least 3 digits at the end

toTitleCase to ignore ordinals in C#

How can I do take a substring and length of a string in C# if that string might be a null?

split a string with max character limit

Truncate string on whole words in .NET C#

Categories

Resources