Truncate string on whole words in .NET C# - c#

I am trying to truncate some long text in C#, but I don't want my string to be cut off part way through a word. Does anyone have a function that I can use to truncate my string at the end of a word?
E.g:
"This was a long string..."
Not:
"This was a long st..."

Try the following. It is pretty rudimentary. Just finds the first space starting at the desired length.
public static string TruncateAtWord(this string value, int length) {
if (value == null || value.Length < length || value.IndexOf(" ", length) == -1)
return value;
return value.Substring(0, value.IndexOf(" ", length));
}

Thanks for your answer Dave. I've tweaked the function a bit and this is what I'm using ... unless there are any more comments ;)
public static string TruncateAtWord(this string input, int length)
{
if (input == null || input.Length < length)
return input;
int iNextSpace = input.LastIndexOf(" ", length, StringComparison.Ordinal);
return string.Format("{0}…", input.Substring(0, (iNextSpace > 0) ? iNextSpace : length).Trim());
}

My contribution:
public static string TruncateAtWord(string text, int maxCharacters, string trailingStringIfTextCut = "…")
{
if (text == null || (text = text.Trim()).Length <= maxCharacters)
return text;
int trailLength = trailingStringIfTextCut.StartsWith("&") ? 1
: trailingStringIfTextCut.Length;
maxCharacters = maxCharacters - trailLength >= 0 ? maxCharacters - trailLength
: 0;
int pos = text.LastIndexOf(" ", maxCharacters);
if (pos >= 0)
return text.Substring(0, pos) + trailingStringIfTextCut;
return string.Empty;
}
This is what I use in my projects, with optional trailing. Text will never exceed the maxCharacters + trailing text length.

If you are using windows forms, in the Graphics.DrawString method, there is an option in StringFormat to specify if the string should be truncated, if it does not fit into the area specified. This will handle adding the ellipsis as necessary.
http://msdn.microsoft.com/en-us/library/system.drawing.stringtrimming.aspx

I took your approach a little further:
public string TruncateAtWord(string value, int length)
{
if (value == null || value.Trim().Length <= length)
return value;
int index = value.Trim().LastIndexOf(" ");
while ((index + 3) > length)
index = value.Substring(0, index).Trim().LastIndexOf(" ");
if (index > 0)
return value.Substring(0, index) + "...";
return value.Substring(0, length - 3) + "...";
}
I'm using this to truncate tweets.

This solution works too (takes first 10 words from myString):
String.Join(" ", myString.Split(' ').Take(10))

Taking into account more than just a blank space separator (e.g. words can be separated by periods followed by newlines, followed by tabs, etc.), and several other edge cases, here is an appropriate extension method:
public static string GetMaxWords(this string input, int maxWords, string truncateWith = "...", string additionalSeparators = ",-_:")
{
int words = 1;
bool IsSeparator(char c) => Char.IsSeparator(c) || additionalSeparators.Contains(c);
IEnumerable<char> IterateChars()
{
yield return input[0];
for (int i = 1; i < input.Length; i++)
{
if (IsSeparator(input[i]) && !IsSeparator(input[i - 1]))
if (words == maxWords)
{
foreach (char c in truncateWith)
yield return c;
break;
}
else
words++;
yield return input[i];
}
}
return !input.IsNullOrEmpty()
? new String(IterateChars().ToArray())
: String.Empty;
}

simplified, added trunking character option and made it an extension.
public static string TruncateAtWord(this string value, int maxLength)
{
if (value == null || value.Trim().Length <= maxLength)
return value;
string ellipse = "...";
char[] truncateChars = new char[] { ' ', ',' };
int index = value.Trim().LastIndexOfAny(truncateChars);
while ((index + ellipse.Length) > maxLength)
index = value.Substring(0, index).Trim().LastIndexOfAny(truncateChars);
if (index > 0)
return value.Substring(0, index) + ellipse;
return value.Substring(0, maxLength - ellipse.Length) + ellipse;
}

Heres what i came up with. This is to get the rest of the sentence also in chunks.
public static List<string> SplitTheSentenceAtWord(this string originalString, int length)
{
try
{
List<string> truncatedStrings = new List<string>();
if (originalString == null || originalString.Trim().Length <= length)
{
truncatedStrings.Add(originalString);
return truncatedStrings;
}
int index = originalString.Trim().LastIndexOf(" ");
while ((index + 3) > length)
index = originalString.Substring(0, index).Trim().LastIndexOf(" ");
if (index > 0)
{
string retValue = originalString.Substring(0, index) + "...";
truncatedStrings.Add(retValue);
string shortWord2 = originalString;
if (retValue.EndsWith("..."))
{
shortWord2 = retValue.Replace("...", "");
}
shortWord2 = originalString.Substring(shortWord2.Length);
if (shortWord2.Length > length) //truncate it further
{
List<string> retValues = SplitTheSentenceAtWord(shortWord2.TrimStart(), length);
truncatedStrings.AddRange(retValues);
}
else
{
truncatedStrings.Add(shortWord2.TrimStart());
}
return truncatedStrings;
}
var retVal_Last = originalString.Substring(0, length - 3);
truncatedStrings.Add(retVal_Last + "...");
if (originalString.Length > length)//truncate it further
{
string shortWord3 = originalString;
if (originalString.EndsWith("..."))
{
shortWord3 = originalString.Replace("...", "");
}
shortWord3 = originalString.Substring(retVal_Last.Length);
List<string> retValues = SplitTheSentenceAtWord(shortWord3.TrimStart(), length);
truncatedStrings.AddRange(retValues);
}
else
{
truncatedStrings.Add(retVal_Last + "...");
}
return truncatedStrings;
}
catch
{
return new List<string> { originalString };
}
}

I use this
public string Truncate(string content, int length)
{
try
{
return content.Substring(0,content.IndexOf(" ",length)) + "...";
}
catch
{
return content;
}
}

Related

Index out of bounds of array but only sometimes [duplicate]

Suppose I had a string:
string str = "1111222233334444";
How can I break this string into chunks of some size?
e.g., breaking this into sizes of 4 would return strings:
"1111"
"2222"
"3333"
"4444"
static IEnumerable<string> Split(string str, int chunkSize)
{
return Enumerable.Range(0, str.Length / chunkSize)
.Select(i => str.Substring(i * chunkSize, chunkSize));
}
Please note that additional code might be required to gracefully handle edge cases (null or empty input string, chunkSize == 0, input string length not divisible by chunkSize, etc.). The original question doesn't specify any requirements for these edge cases and in real life the requirements might vary so they are out of scope of this answer.
In a combination of dove+Konstatin's answers...
static IEnumerable<string> WholeChunks(string str, int chunkSize) {
for (int i = 0; i < str.Length; i += chunkSize)
yield return str.Substring(i, chunkSize);
}
This will work for all strings that can be split into a whole number of chunks, and will throw an exception otherwise.
If you want to support strings of any length you could use the following code:
static IEnumerable<string> ChunksUpto(string str, int maxChunkSize) {
for (int i = 0; i < str.Length; i += maxChunkSize)
yield return str.Substring(i, Math.Min(maxChunkSize, str.Length-i));
}
However, the the OP explicitly stated he does not need this; it's somewhat longer and harder to read, slightly slower. In the spirit of KISS and YAGNI, I'd go with the first option: it's probably the most efficient implementation possible, and it's very short, readable, and, importantly, throws an exception for nonconforming input.
Why not loops? Here's something that would do it quite well:
string str = "111122223333444455";
int chunkSize = 4;
int stringLength = str.Length;
for (int i = 0; i < stringLength ; i += chunkSize)
{
if (i + chunkSize > stringLength) chunkSize = stringLength - i;
Console.WriteLine(str.Substring(i, chunkSize));
}
Console.ReadLine();
I don't know how you'd deal with case where the string is not factor of 4, but not saying you're idea is not possible, just wondering the motivation for it if a simple for loop does it very well? Obviously the above could be cleaned and even put in as an extension method.
Or as mentioned in comments, you know it's /4 then
str = "1111222233334444";
for (int i = 0; i < stringLength; i += chunkSize)
{Console.WriteLine(str.Substring(i, chunkSize));}
This is based on #dove solution but implemented as an extension method.
Benefits:
Extension method
Covers corner cases
Splits string with any chars: numbers, letters, other symbols
Code
public static class EnumerableEx
{
public static IEnumerable<string> SplitBy(this string str, int chunkLength)
{
if (String.IsNullOrEmpty(str)) throw new ArgumentException();
if (chunkLength < 1) throw new ArgumentException();
for (int i = 0; i < str.Length; i += chunkLength)
{
if (chunkLength + i > str.Length)
chunkLength = str.Length - i;
yield return str.Substring(i, chunkLength);
}
}
}
Usage
var result = "bobjoecat".SplitBy(3); // bob, joe, cat
Unit tests removed for brevity (see previous revision)
Using regular expressions and Linq:
List<string> groups = (from Match m in Regex.Matches(str, #"\d{4}")
select m.Value).ToList();
I find this to be more readable, but it's just a personal opinion. It can also be a one-liner : ).
How's this for a one-liner?
List<string> result = new List<string>(Regex.Split(target, #"(?<=\G.{4})", RegexOptions.Singleline));
With this regex it doesn't matter if the last chunk is less than four characters, because it only ever looks at the characters behind it.
I'm sure this isn't the most efficient solution, but I just had to toss it out there.
Starting with .NET 6, we can also use the Chunk method:
var result = str
.Chunk(4)
.Select(x => new string(x))
.ToList();
I recently had to write something that accomplishes this at work, so I thought I would post my solution to this problem. As an added bonus, the functionality of this solution provides a way to split the string in the opposite direction and it does correctly handle unicode characters as previously mentioned by Marvin Pinto above. So, here it is:
using System;
using Extensions;
namespace TestCSharp
{
class Program
{
static void Main(string[] args)
{
string asciiStr = "This is a string.";
string unicodeStr = "これは文字列です。";
string[] array1 = asciiStr.Split(4);
string[] array2 = asciiStr.Split(-4);
string[] array3 = asciiStr.Split(7);
string[] array4 = asciiStr.Split(-7);
string[] array5 = unicodeStr.Split(5);
string[] array6 = unicodeStr.Split(-5);
}
}
}
namespace Extensions
{
public static class StringExtensions
{
/// <summary>Returns a string array that contains the substrings in this string that are seperated a given fixed length.</summary>
/// <param name="s">This string object.</param>
/// <param name="length">Size of each substring.
/// <para>CASE: length > 0 , RESULT: String is split from left to right.</para>
/// <para>CASE: length == 0 , RESULT: String is returned as the only entry in the array.</para>
/// <para>CASE: length < 0 , RESULT: String is split from right to left.</para>
/// </param>
/// <returns>String array that has been split into substrings of equal length.</returns>
/// <example>
/// <code>
/// string s = "1234567890";
/// string[] a = s.Split(4); // a == { "1234", "5678", "90" }
/// </code>
/// </example>
public static string[] Split(this string s, int length)
{
System.Globalization.StringInfo str = new System.Globalization.StringInfo(s);
int lengthAbs = Math.Abs(length);
if (str == null || str.LengthInTextElements == 0 || lengthAbs == 0 || str.LengthInTextElements <= lengthAbs)
return new string[] { str.ToString() };
string[] array = new string[(str.LengthInTextElements % lengthAbs == 0 ? str.LengthInTextElements / lengthAbs: (str.LengthInTextElements / lengthAbs) + 1)];
if (length > 0)
for (int iStr = 0, iArray = 0; iStr < str.LengthInTextElements && iArray < array.Length; iStr += lengthAbs, iArray++)
array[iArray] = str.SubstringByTextElements(iStr, (str.LengthInTextElements - iStr < lengthAbs ? str.LengthInTextElements - iStr : lengthAbs));
else // if (length < 0)
for (int iStr = str.LengthInTextElements - 1, iArray = array.Length - 1; iStr >= 0 && iArray >= 0; iStr -= lengthAbs, iArray--)
array[iArray] = str.SubstringByTextElements((iStr - lengthAbs < 0 ? 0 : iStr - lengthAbs + 1), (iStr - lengthAbs < 0 ? iStr + 1 : lengthAbs));
return array;
}
}
}
Also, here is an image link to the results of running this code: http://i.imgur.com/16Iih.png
It's not pretty and it's not fast, but it works, it's a one-liner and it's LINQy:
List<string> a = text.Select((c, i) => new { Char = c, Index = i }).GroupBy(o => o.Index / 4).Select(g => new String(g.Select(o => o.Char).ToArray())).ToList();
This should be much faster and more efficient than using LINQ or other approaches used here.
public static IEnumerable<string> Splice(this string s, int spliceLength)
{
if (s == null)
throw new ArgumentNullException("s");
if (spliceLength < 1)
throw new ArgumentOutOfRangeException("spliceLength");
if (s.Length == 0)
yield break;
var start = 0;
for (var end = spliceLength; end < s.Length; end += spliceLength)
{
yield return s.Substring(start, spliceLength);
start = end;
}
yield return s.Substring(start);
}
You can use morelinq by Jon Skeet. Use Batch like:
string str = "1111222233334444";
int chunkSize = 4;
var chunks = str.Batch(chunkSize).Select(r => new String(r.ToArray()));
This will return 4 chunks for the string "1111222233334444". If the string length is less than or equal to the chunk size Batch will return the string as the only element of IEnumerable<string>
For output:
foreach (var chunk in chunks)
{
Console.WriteLine(chunk);
}
and it will give:
1111
2222
3333
4444
Personally I prefer my solution :-)
It handles:
String lengths that are a multiple of the chunk size.
String lengths that are NOT a multiple of the chunk size.
String lengths that are smaller than the chunk size.
NULL and empty strings (throws an exception).
Chunk sizes smaller than 1 (throws an exception).
It is implemented as a extension method, and it calculates the number of chunks is going to generate beforehand. It checks the last chunk because in case the text length is not a multiple it needs to be shorter. Clean, short, easy to understand... and works!
public static string[] Split(this string value, int chunkSize)
{
if (string.IsNullOrEmpty(value)) throw new ArgumentException("The string cannot be null.");
if (chunkSize < 1) throw new ArgumentException("The chunk size should be equal or greater than one.");
int remainder;
int divResult = Math.DivRem(value.Length, chunkSize, out remainder);
int numberOfChunks = remainder > 0 ? divResult + 1 : divResult;
var result = new string[numberOfChunks];
int i = 0;
while (i < numberOfChunks - 1)
{
result[i] = value.Substring(i * chunkSize, chunkSize);
i++;
}
int lastChunkSize = remainder > 0 ? remainder : chunkSize;
result[i] = value.Substring(i * chunkSize, lastChunkSize);
return result;
}
Simple and short:
// this means match a space or not a space (anything) up to 4 characters
var lines = Regex.Matches(str, #"[\s\S]{0,4}").Cast<Match>().Select(x => x.Value);
I know question is years old, but here is a Rx implementation. It handles the length % chunkSize != 0 problem out of the box:
public static IEnumerable<string> Chunkify(this string input, int size)
{
if(size < 1)
throw new ArgumentException("size must be greater than 0");
return input.ToCharArray()
.ToObservable()
.Buffer(size)
.Select(x => new string(x.ToArray()))
.ToEnumerable();
}
public static IEnumerable<IEnumerable<T>> SplitEvery<T>(this IEnumerable<T> values, int n)
{
var ls = values.Take(n);
var rs = values.Skip(n);
return ls.Any() ?
Cons(ls, SplitEvery(rs, n)) :
Enumerable.Empty<IEnumerable<T>>();
}
public static IEnumerable<T> Cons<T>(T x, IEnumerable<T> xs)
{
yield return x;
foreach (var xi in xs)
yield return xi;
}
Best , Easiest and Generic Answer :).
string originalString = "1111222233334444";
List<string> test = new List<string>();
int chunkSize = 4; // change 4 with the size of strings you want.
for (int i = 0; i < originalString.Length; i = i + chunkSize)
{
if (originalString.Length - i >= chunkSize)
test.Add(originalString.Substring(i, chunkSize));
else
test.Add(originalString.Substring(i,((originalString.Length - i))));
}
static IEnumerable<string> Split(string str, int chunkSize)
{
IEnumerable<string> retVal = Enumerable.Range(0, str.Length / chunkSize)
.Select(i => str.Substring(i * chunkSize, chunkSize))
if (str.Length % chunkSize > 0)
retVal = retVal.Append(str.Substring(str.Length / chunkSize * chunkSize, str.Length % chunkSize));
return retVal;
}
It correctly handles input string length not divisible by chunkSize.
Please note that additional code might be required to gracefully handle edge cases (null or empty input string, chunkSize == 0).
static IEnumerable<string> Split(string str, double chunkSize)
{
return Enumerable.Range(0, (int) Math.Ceiling(str.Length/chunkSize))
.Select(i => new string(str
.Skip(i * (int)chunkSize)
.Take((int)chunkSize)
.ToArray()));
}
and another approach:
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
var x = "Hello World";
foreach(var i in x.ChunkString(2)) Console.WriteLine(i);
}
}
public static class Ext{
public static IEnumerable<string> ChunkString(this string val, int chunkSize){
return val.Select((x,i) => new {Index = i, Value = x})
.GroupBy(x => x.Index/chunkSize, x => x.Value)
.Select(x => string.Join("",x));
}
}
Six years later o_O
Just because
public static IEnumerable<string> Split(this string str, int chunkSize, bool remainingInFront)
{
var count = (int) Math.Ceiling(str.Length/(double) chunkSize);
Func<int, int> start = index => remainingInFront ? str.Length - (count - index)*chunkSize : index*chunkSize;
Func<int, int> end = index => Math.Min(str.Length - Math.Max(start(index), 0), Math.Min(start(index) + chunkSize - Math.Max(start(index), 0), chunkSize));
return Enumerable.Range(0, count).Select(i => str.Substring(Math.Max(start(i), 0),end(i)));
}
or
private static Func<bool, int, int, int, int, int> start = (remainingInFront, length, count, index, size) =>
remainingInFront ? length - (count - index) * size : index * size;
private static Func<bool, int, int, int, int, int, int> end = (remainingInFront, length, count, index, size, start) =>
Math.Min(length - Math.Max(start, 0), Math.Min(start + size - Math.Max(start, 0), size));
public static IEnumerable<string> Split(this string str, int chunkSize, bool remainingInFront)
{
var count = (int)Math.Ceiling(str.Length / (double)chunkSize);
return Enumerable.Range(0, count).Select(i => str.Substring(
Math.Max(start(remainingInFront, str.Length, count, i, chunkSize), 0),
end(remainingInFront, str.Length, count, i, chunkSize, start(remainingInFront, str.Length, count, i, chunkSize))
));
}
AFAIK all edge cases are handled.
Console.WriteLine(string.Join(" ", "abc".Split(2, false))); // ab c
Console.WriteLine(string.Join(" ", "abc".Split(2, true))); // a bc
Console.WriteLine(string.Join(" ", "a".Split(2, true))); // a
Console.WriteLine(string.Join(" ", "a".Split(2, false))); // a
List<string> SplitString(int chunk, string input)
{
List<string> list = new List<string>();
int cycles = input.Length / chunk;
if (input.Length % chunk != 0)
cycles++;
for (int i = 0; i < cycles; i++)
{
try
{
list.Add(input.Substring(i * chunk, chunk));
}
catch
{
list.Add(input.Substring(i * chunk));
}
}
return list;
}
I took this to another level. Chucking is an easy one liner, but in my case I needed whole words as well. Figured I would post it, just in case someone else needs something similar.
static IEnumerable<string> Split(string orgString, int chunkSize, bool wholeWords = true)
{
if (wholeWords)
{
List<string> result = new List<string>();
StringBuilder sb = new StringBuilder();
if (orgString.Length > chunkSize)
{
string[] newSplit = orgString.Split(' ');
foreach (string str in newSplit)
{
if (sb.Length != 0)
sb.Append(" ");
if (sb.Length + str.Length > chunkSize)
{
result.Add(sb.ToString());
sb.Clear();
}
sb.Append(str);
}
result.Add(sb.ToString());
}
else
result.Add(orgString);
return result;
}
else
return new List<string>(Regex.Split(orgString, #"(?<=\G.{" + chunkSize + "})", RegexOptions.Singleline));
}
Results based on below comment:
string msg = "336699AABBCCDDEEFF";
foreach (string newMsg in Split(msg, 2, false))
{
Console.WriteLine($">>{newMsg}<<");
}
Console.ReadKey();
Results:
>>33<<
>>66<<
>>99<<
>>AA<<
>>BB<<
>>CC<<
>>DD<<
>>EE<<
>>FF<<
>><<
Another way to pull it:
List<string> splitData = (List<string>)Split(msg, 2, false);
for (int i = 0; i < splitData.Count - 1; i++)
{
Console.WriteLine($">>{splitData[i]}<<");
}
Console.ReadKey();
New Results:
>>33<<
>>66<<
>>99<<
>>AA<<
>>BB<<
>>CC<<
>>DD<<
>>EE<<
>>FF<<
An important tip if the string that is being chunked needs to support all Unicode characters.
If the string is to support international characters like 𠀋, then split up the string using the System.Globalization.StringInfo class. Using StringInfo, you can split up the string based on number of text elements.
string internationalString = '𠀋';
The above string has a Length of 2, because the String.Length property returns the number of Char objects in this instance, not the number of Unicode characters.
Changed slightly to return parts whose size not equal to chunkSize
public static IEnumerable<string> Split(this string str, int chunkSize)
{
var splits = new List<string>();
if (str.Length < chunkSize) { chunkSize = str.Length; }
splits.AddRange(Enumerable.Range(0, str.Length / chunkSize).Select(i => str.Substring(i * chunkSize, chunkSize)));
splits.Add(str.Length % chunkSize > 0 ? str.Substring((str.Length / chunkSize) * chunkSize, str.Length - ((str.Length / chunkSize) * chunkSize)) : string.Empty);
return (IEnumerable<string>)splits;
}
I think this is an straight forward answer:
public static IEnumerable<string> Split(this string str, int chunkSize)
{
if(string.IsNullOrEmpty(str) || chunkSize<1)
throw new ArgumentException("String can not be null or empty and chunk size should be greater than zero.");
var chunkCount = str.Length / chunkSize + (str.Length % chunkSize != 0 ? 1 : 0);
for (var i = 0; i < chunkCount; i++)
{
var startIndex = i * chunkSize;
if (startIndex + chunkSize >= str.Length)
yield return str.Substring(startIndex);
else
yield return str.Substring(startIndex, chunkSize);
}
}
And it covers edge cases.
static List<string> GetChunks(string value, int chunkLength)
{
var res = new List<string>();
int count = (value.Length / chunkLength) + (value.Length % chunkLength > 0 ? 1 : 0);
Enumerable.Range(0, count).ToList().ForEach(f => res.Add(value.Skip(f * chunkLength).Take(chunkLength).Select(z => z.ToString()).Aggregate((a,b) => a+b)));
return res;
}
demo
Here's my 2 cents:
IEnumerable<string> Split(string str, int chunkSize)
{
while (!string.IsNullOrWhiteSpace(str))
{
var chunk = str.Take(chunkSize).ToArray();
str = str.Substring(chunk.Length);
yield return new string(chunk);
}
}//Split
I've slightly build up on João's solution.
What I've done differently is in my method you can actually specify whether you want to return the array with remaining characters or whether you want to truncate them if the end characters do not match your required chunk length, I think it's pretty flexible and the code is fairly straight forward:
using System;
using System.Linq;
using System.Text.RegularExpressions;
namespace SplitFunction
{
class Program
{
static void Main(string[] args)
{
string text = "hello, how are you doing today?";
string[] chunks = SplitIntoChunks(text, 3,false);
if (chunks != null)
{
chunks.ToList().ForEach(e => Console.WriteLine(e));
}
Console.ReadKey();
}
private static string[] SplitIntoChunks(string text, int chunkSize, bool truncateRemaining)
{
string chunk = chunkSize.ToString();
string pattern = truncateRemaining ? ".{" + chunk + "}" : ".{1," + chunk + "}";
string[] chunks = null;
if (chunkSize > 0 && !String.IsNullOrEmpty(text))
chunks = (from Match m in Regex.Matches(text,pattern)select m.Value).ToArray();
return chunks;
}
}
}
public static List<string> SplitByMaxLength(this string str)
{
List<string> splitString = new List<string>();
for (int index = 0; index < str.Length; index += MaxLength)
{
splitString.Add(str.Substring(index, Math.Min(MaxLength, str.Length - index)));
}
return splitString;
}
I can't remember who gave me this, but it works great. I speed tested a number of ways to break Enumerable types into groups. The usage would just be like this...
List<string> Divided = Source3.Chunk(24).Select(Piece => string.Concat<char>(Piece)).ToList();
The extention code would look like this...
#region Chunk Logic
private class ChunkedEnumerable<T> : IEnumerable<T>
{
class ChildEnumerator : IEnumerator<T>
{
ChunkedEnumerable<T> parent;
int position;
bool done = false;
T current;
public ChildEnumerator(ChunkedEnumerable<T> parent)
{
this.parent = parent;
position = -1;
parent.wrapper.AddRef();
}
public T Current
{
get
{
if (position == -1 || done)
{
throw new InvalidOperationException();
}
return current;
}
}
public void Dispose()
{
if (!done)
{
done = true;
parent.wrapper.RemoveRef();
}
}
object System.Collections.IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
position++;
if (position + 1 > parent.chunkSize)
{
done = true;
}
if (!done)
{
done = !parent.wrapper.Get(position + parent.start, out current);
}
return !done;
}
public void Reset()
{
// per http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx
throw new NotSupportedException();
}
}
EnumeratorWrapper<T> wrapper;
int chunkSize;
int start;
public ChunkedEnumerable(EnumeratorWrapper<T> wrapper, int chunkSize, int start)
{
this.wrapper = wrapper;
this.chunkSize = chunkSize;
this.start = start;
}
public IEnumerator<T> GetEnumerator()
{
return new ChildEnumerator(this);
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
private class EnumeratorWrapper<T>
{
public EnumeratorWrapper(IEnumerable<T> source)
{
SourceEumerable = source;
}
IEnumerable<T> SourceEumerable { get; set; }
Enumeration currentEnumeration;
class Enumeration
{
public IEnumerator<T> Source { get; set; }
public int Position { get; set; }
public bool AtEnd { get; set; }
}
public bool Get(int pos, out T item)
{
if (currentEnumeration != null && currentEnumeration.Position > pos)
{
currentEnumeration.Source.Dispose();
currentEnumeration = null;
}
if (currentEnumeration == null)
{
currentEnumeration = new Enumeration { Position = -1, Source = SourceEumerable.GetEnumerator(), AtEnd = false };
}
item = default(T);
if (currentEnumeration.AtEnd)
{
return false;
}
while (currentEnumeration.Position < pos)
{
currentEnumeration.AtEnd = !currentEnumeration.Source.MoveNext();
currentEnumeration.Position++;
if (currentEnumeration.AtEnd)
{
return false;
}
}
item = currentEnumeration.Source.Current;
return true;
}
int refs = 0;
// needed for dispose semantics
public void AddRef()
{
refs++;
}
public void RemoveRef()
{
refs--;
if (refs == 0 && currentEnumeration != null)
{
var copy = currentEnumeration;
currentEnumeration = null;
copy.Source.Dispose();
}
}
}
/// <summary>Speed Checked. Works Great!</summary>
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
if (chunksize < 1) throw new InvalidOperationException();
var wrapper = new EnumeratorWrapper<T>(source);
int currentPos = 0;
T ignore;
try
{
wrapper.AddRef();
while (wrapper.Get(currentPos, out ignore))
{
yield return new ChunkedEnumerable<T>(wrapper, chunksize, currentPos);
currentPos += chunksize;
}
}
finally
{
wrapper.RemoveRef();
}
}
#endregion
class StringHelper
{
static void Main(string[] args)
{
string str = "Hi my name is vikas bansal and my email id is bansal.vks#gmail.com";
int offSet = 10;
List<string> chunks = chunkMyStr(str, offSet);
Console.Read();
}
static List<string> chunkMyStr(string str, int offSet)
{
List<string> resultChunks = new List<string>();
for (int i = 0; i < str.Length; i += offSet)
{
string temp = str.Substring(i, (str.Length - i) > offSet ? offSet : (str.Length - i));
Console.WriteLine(temp);
resultChunks.Add(temp);
}
return resultChunks;
}
}

Accidentally splitting unicode chars when truncating strings

I'm saving some strings from a third party into my database (postgres). Sometimes these strings are too long and need to be truncated to fit into the column in my table.
On some random occasions I accidentally truncate the string right where there is a Unicode character, which gives me a "broken" string that I cannot save into the database. I get the following error: Unable to translate Unicode character \uD83D at index XXX to specified code page.
I've created a minimal example to show you what I mean. Here I have a string that contains a Unicode character ("Small blue diamond" 🔹 U+1F539). Depending on where I truncate, it gives me a valid string or not.
var myString = #"This is a string before an emoji:🔹 This is after the emoji.";
var brokenString = myString.Substring(0, 34);
// Gives: "This is a string before an emoji:☐"
var test3 = myString.Substring(0, 35);
// Gives: "This is a string before an emoji:🔹"
Is there a way for me to truncate the string without accidentally breaking any Unicode chars?
A Unicode character may be represented with several chars, that is the problem with string.Substring you are having.
You may convert your string to a StringInfo object and then use SubstringByTextElements() method to get the substring based on the Unicode character count, not a char count.
See a C# demo:
Console.WriteLine("🔹".Length); // => 2
Console.WriteLine(new StringInfo("🔹").LengthInTextElements); // => 1
var myString = #"This is a string before an emoji:🔹This is after the emoji.";
var teMyString = new StringInfo(myString);
Console.WriteLine(teMyString.SubstringByTextElements(0, 33));
// => "This is a string before an emoji:"
Console.WriteLine(teMyString.SubstringByTextElements(0, 34));
// => This is a string before an emoji:🔹
Console.WriteLine(teMyString.SubstringByTextElements(0, 35));
// => This is a string before an emoji:🔹T
I ended up using a modification of xanatos answer here. The difference is that this version will strip the last grapheme, if adding it would give a string longer than length.
public static string UnicodeSafeSubstring(this string str, int startIndex, int length)
{
if (str == null)
{
throw new ArgumentNullException(nameof(str));
}
if (startIndex < 0 || startIndex > str.Length)
{
throw new ArgumentOutOfRangeException(nameof(startIndex));
}
if (length < 0)
{
throw new ArgumentOutOfRangeException(nameof(length));
}
if (startIndex + length > str.Length)
{
throw new ArgumentOutOfRangeException(nameof(length));
}
if (length == 0)
{
return string.Empty;
}
var stringBuilder = new StringBuilder(length);
var enumerator = StringInfo.GetTextElementEnumerator(str, startIndex);
while (enumerator.MoveNext())
{
var grapheme = enumerator.GetTextElement();
startIndex += grapheme.Length;
if (startIndex > str.Length)
{
break;
}
// Skip initial Low Surrogates/Combining Marks
if (stringBuilder.Length == 0)
{
if (char.IsLowSurrogate(grapheme[0]))
{
continue;
}
var cat = char.GetUnicodeCategory(grapheme, 0);
if (cat == UnicodeCategory.NonSpacingMark || cat == UnicodeCategory.SpacingCombiningMark || cat == UnicodeCategory.EnclosingMark)
{
continue;
}
}
// Do not append the grapheme if the resulting string would be longer than the required length
if (stringBuilder.Length + grapheme.Length <= length)
{
stringBuilder.Append(grapheme);
}
if (stringBuilder.Length >= length)
{
break;
}
}
return stringBuilder.ToString();
}
}
Here is an example for truncate (startIndex = 0):
string truncatedStr = (str.Length > maxLength)
? str.Substring(0, maxLength - (char.IsLowSurrogate(str[maxLength]) ? 1 : 0))
: str;
Better truncate by the number of bytes not string length
public static string TruncateByBytes(this string text, int maxBytes)
{
if (string.IsNullOrEmpty(text) || Encoding.UTF8.GetByteCount(text) <= maxBytes)
{
return text;
}
var enumerator = StringInfo.GetTextElementEnumerator(text);
var newStr = string.Empty;
do
{
enumerator.MoveNext();
if (Encoding.UTF8.GetByteCount(newStr + enumerator.Current) <= maxBytes)
{
newStr += enumerator.Current;
}
else
{
break;
}
} while (true);
return newStr;
}

Matching strings with wildcard

I would like to match strings with a wildcard (*), where the wildcard means "any". For example:
*X = string must end with X
X* = string must start with X
*X* = string must contain X
Also, some compound uses such as:
*X*YZ* = string contains X and contains YZ
X*YZ*P = string starts with X, contains YZ and ends with P.
Is there a simple algorithm to do this? I'm unsure about using regex (though it is a possibility).
To clarify, the users will type in the above to a filter box (as simple a filter as possible), I don't want them to have to write regular expressions themselves. So something I can easily transform from the above notation would be good.
Often, wild cards operate with two type of jokers:
? - any character (one and only one)
* - any characters (zero or more)
so you can easily convert these rules into appropriate regular expression:
// If you want to implement both "*" and "?"
private static String WildCardToRegular(String value) {
return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$";
}
// If you want to implement "*" only
private static String WildCardToRegular(String value) {
return "^" + Regex.Escape(value).Replace("\\*", ".*") + "$";
}
And then you can use Regex as usual:
String test = "Some Data X";
Boolean endsWithEx = Regex.IsMatch(test, WildCardToRegular("*X"));
Boolean startsWithS = Regex.IsMatch(test, WildCardToRegular("S*"));
Boolean containsD = Regex.IsMatch(test, WildCardToRegular("*D*"));
// Starts with S, ends with X, contains "me" and "a" (in that order)
Boolean complex = Regex.IsMatch(test, WildCardToRegular("S*me*a*X"));
You could use the VB.NET Like-Operator:
string text = "x is not the same as X and yz not the same as YZ";
bool contains = LikeOperator.LikeString(text,"*X*YZ*", Microsoft.VisualBasic.CompareMethod.Binary);
Use CompareMethod.Text if you want to ignore the case.
You need to add using Microsoft.VisualBasic.CompilerServices; and add a reference to the Microsoft.VisualBasic.dll.
Since it's part of the .NET framework and will always be, it's not a problem to use this class.
For those using .NET Core 2.1+ or .NET 5+, you can use the FileSystemName.MatchesSimpleExpression method in the System.IO.Enumeration namespace.
string text = "X is a string with ZY in the middle and at the end is P";
bool isMatch = FileSystemName.MatchesSimpleExpression("X*ZY*P", text);
Both parameters are actually ReadOnlySpan<char> but you can use string arguments too. There's also an overloaded method if you want to turn on/off case matching. It is case insensitive by default as Chris mentioned in the comments.
Using of WildcardPattern from System.Management.Automation may be an option.
pattern = new WildcardPattern(patternString);
pattern.IsMatch(stringToMatch);
Visual Studio UI may not allow you to add System.Management.Automation assembly to References of your project. Feel free to add it manually, as described here.
A wildcard * can be translated as .* or .*? regex pattern.
You might need to use a singleline mode to match newline symbols, and in this case, you can use (?s) as part of the regex pattern.
You can set it for the whole or part of the pattern:
X* = > #"X(?s:.*)"
*X = > #"(?s:.*)X"
*X* = > #"(?s).*X.*"
*X*YZ* = > #"(?s).*X.*YZ.*"
X*YZ*P = > #"(?s:X.*YZ.*P)"
*X*YZ* = string contains X and contains YZ
#".*X.*YZ"
X*YZ*P = string starts with X, contains YZ and ends with P.
#"^X.*YZ.*P$"
It is necessary to take into consideration, that Regex IsMatch gives true with XYZ, when checking match with Y*. To avoid it, I use "^" anchor
isMatch(str1, "^" + str2.Replace("*", ".*?"));
So, full code to solve your problem is
bool isMatchStr(string str1, string str2)
{
string s1 = str1.Replace("*", ".*?");
string s2 = str2.Replace("*", ".*?");
bool r1 = Regex.IsMatch(s1, "^" + s2);
bool r2 = Regex.IsMatch(s2, "^" + s1);
return r1 || r2;
}
This is kind of an improvement on the popular answer from #Dmitry Bychenko above (https://stackoverflow.com/a/30300521/4491768). In order to support ? and * as a matching characters we have to escape them. Use \\? or \\* to escape them.
Also a pre compiled regex will improve the performance (on reuse).
public class WildcardPattern
{
private readonly string _expression;
private readonly Regex _regex;
public WildcardPattern(string pattern)
{
if (string.IsNullOrEmpty(pattern)) throw new ArgumentNullException(nameof(pattern));
_expression = "^" + Regex.Escape(pattern)
.Replace("\\\\\\?","??").Replace("\\?", ".").Replace("??","\\?")
.Replace("\\\\\\*","**").Replace("\\*", ".*").Replace("**","\\*") + "$";
_regex = new Regex(_expression, RegexOptions.Compiled);
}
public bool IsMatch(string value)
{
return _regex.IsMatch(value);
}
}
usage
new WildcardPattern("Hello *\\**\\?").IsMatch("Hello W*rld?");
new WildcardPattern(#"Hello *\**\?").IsMatch("Hello W*rld?");
To support those one with C#+Excel (for partial known WS name) but not only - here's my code with wildcard (ddd*).
Briefly: the code gets all WS names and if today's weekday(ddd) matches the first 3 letters of WS name (bool=true) then it turn it to string that gets extracted out of the loop.
using System;
using Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;
using Range = Microsoft.Office.Interop.Excel.Range;
using System.Diagnostics;
using System.Reflection;
using System.IO;
using System.Text.RegularExpressions;
...
string weekDay = DateTime.Now.ToString("ddd*");
Workbook sourceWorkbook4 = xlApp.Workbooks.Open(LrsIdWorkbook, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
Workbook destinationWorkbook = xlApp.Workbooks.Open(masterWB, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
static String WildCardToRegular(String value)
{
return "^" + Regex.Escape(value).Replace("\\*", ".*") + "$";
}
string wsName = null;
foreach (Worksheet works in sourceWorkbook4.Worksheets)
{
Boolean startsWithddd = Regex.IsMatch(works.Name, WildCardToRegular(weekDay + "*"));
if (startsWithddd == true)
{
wsName = works.Name.ToString();
}
}
Worksheet sourceWorksheet4 = (Worksheet)sourceWorkbook4.Worksheets.get_Item(wsName);
...
public class Wildcard
{
private readonly string _pattern;
public Wildcard(string pattern)
{
_pattern = pattern;
}
public static bool Match(string value, string pattern)
{
int start = -1;
int end = -1;
return Match(value, pattern, ref start, ref end);
}
public static bool Match(string value, string pattern, char[] toLowerTable)
{
int start = -1;
int end = -1;
return Match(value, pattern, ref start, ref end, toLowerTable);
}
public static bool Match(string value, string pattern, ref int start, ref int end)
{
return new Wildcard(pattern).IsMatch(value, ref start, ref end);
}
public static bool Match(string value, string pattern, ref int start, ref int end, char[] toLowerTable)
{
return new Wildcard(pattern).IsMatch(value, ref start, ref end, toLowerTable);
}
public bool IsMatch(string str)
{
int start = -1;
int end = -1;
return IsMatch(str, ref start, ref end);
}
public bool IsMatch(string str, char[] toLowerTable)
{
int start = -1;
int end = -1;
return IsMatch(str, ref start, ref end, toLowerTable);
}
public bool IsMatch(string str, ref int start, ref int end)
{
if (_pattern.Length == 0) return false;
int pindex = 0;
int sindex = 0;
int pattern_len = _pattern.Length;
int str_len = str.Length;
start = -1;
while (true)
{
bool star = false;
if (_pattern[pindex] == '*')
{
star = true;
do
{
pindex++;
}
while (pindex < pattern_len && _pattern[pindex] == '*');
}
end = sindex;
int i;
while (true)
{
int si = 0;
bool breakLoops = false;
for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
{
si = sindex + i;
if (si == str_len)
{
return false;
}
if (str[si] == _pattern[pindex + i])
{
continue;
}
if (si == str_len)
{
return false;
}
if (_pattern[pindex + i] == '?' && str[si] != '.')
{
continue;
}
breakLoops = true;
break;
}
if (breakLoops)
{
if (!star)
{
return false;
}
sindex++;
if (si == str_len)
{
return false;
}
}
else
{
if (start == -1)
{
start = sindex;
}
if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
{
break;
}
if (sindex + i == str_len)
{
if (end <= start)
{
end = str_len;
}
return true;
}
if (i != 0 && _pattern[pindex + i - 1] == '*')
{
return true;
}
if (!star)
{
return false;
}
sindex++;
}
}
sindex += i;
pindex += i;
if (start == -1)
{
start = sindex;
}
}
}
public bool IsMatch(string str, ref int start, ref int end, char[] toLowerTable)
{
if (_pattern.Length == 0) return false;
int pindex = 0;
int sindex = 0;
int pattern_len = _pattern.Length;
int str_len = str.Length;
start = -1;
while (true)
{
bool star = false;
if (_pattern[pindex] == '*')
{
star = true;
do
{
pindex++;
}
while (pindex < pattern_len && _pattern[pindex] == '*');
}
end = sindex;
int i;
while (true)
{
int si = 0;
bool breakLoops = false;
for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
{
si = sindex + i;
if (si == str_len)
{
return false;
}
char c = toLowerTable[str[si]];
if (c == _pattern[pindex + i])
{
continue;
}
if (si == str_len)
{
return false;
}
if (_pattern[pindex + i] == '?' && c != '.')
{
continue;
}
breakLoops = true;
break;
}
if (breakLoops)
{
if (!star)
{
return false;
}
sindex++;
if (si == str_len)
{
return false;
}
}
else
{
if (start == -1)
{
start = sindex;
}
if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
{
break;
}
if (sindex + i == str_len)
{
if (end <= start)
{
end = str_len;
}
return true;
}
if (i != 0 && _pattern[pindex + i - 1] == '*')
{
return true;
}
if (!star)
{
return false;
}
sindex++;
continue;
}
}
sindex += i;
pindex += i;
if (start == -1)
{
start = sindex;
}
}
}
}
C# Console application sample
Command line Sample:
C:/> App_Exe -Opy PythonFile.py 1 2 3
Console output:
Argument list: -Opy PythonFile.py 1 2 3
Found python filename: PythonFile.py
using System;
using System.Text.RegularExpressions; //Regex
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string cmdLine = String.Join(" ", args);
bool bFileExtFlag = false;
int argIndex = 0;
Regex regex;
foreach (string s in args)
{
//Search for the 1st occurrence of the "*.py" pattern
regex = new Regex(#"(?s:.*)\056py", RegexOptions.IgnoreCase);
bFileExtFlag = regex.IsMatch(s);
if (bFileExtFlag == true)
break;
argIndex++;
};
Console.WriteLine("Argument list: " + cmdLine);
if (bFileExtFlag == true)
Console.WriteLine("Found python filename: " + args[argIndex]);
else
Console.WriteLine("Python file with extension <.py> not found!");
}
}
}

Right of a character in C#

How do I get right(string) in C#?
if user="MyDomain\jKing"
I want just jking from the above string.
int index;
string user;
index = User.Identity.Name.IndexOf("\\");
user = (index > 0 ? User.Identity.Name.Substring(0, index) : "");
string user = User.Identity.Name
user= user.Remove(0, user.IndexOf(#"\")+ 1);
var user = User.Identity.Name;
var index = user.IndexOf("\\");
if (index < 0 || index == user.Length - 1)
{
user = string.Empty;
}
else
{
user = user.Substring(index + 1);
}
User.Identity.Name.Split(#"\")[1]
Might as well make it re-usable. Here are some extension methods patterned after the XLST functions substring-before and substring-after to make it generic.
Usage: var userNm = (User.Identity.Name.Contains(#"\") ? User.Identity.Name.SubstringAfter(#"\") : User.Identity.Name);
public static class StringExt {
public static string SubstringAfter(this string s, string searchString) {
if (String.IsNullOrEmpty(searchString)) return s;
var idx = s.IndexOf(searchString);
return (idx < 0 ? "" : s.Substring(idx + searchString.Length));
}
public static string SubstringBefore(this string s, string searchString) {
if (String.IsNullOrEmpty(searchString)) return s;
var idx = s.IndexOf(searchString);
return (idx < 0 ? "" : s.Substring(0, idx));
}
}

split a string with max character limit

I'm attempting to split a string into many strings (List) with each one having a maximum limit of characters. So say if I had a string of 500 characters, and I want each string to have a max of 75, there would be 7 strings, and the last one would not have a full 75.
I've tried some of the examples I have found on stackoverflow, but they 'truncate' the results. Any ideas?
You can write your own extension method to do something like that
static class StringExtensions
{
public static IEnumerable<string> SplitOnLength(this string input, int length)
{
int index = 0;
while (index < input.Length)
{
if (index + length < input.Length)
yield return input.Substring(index, length);
else
yield return input.Substring(index);
index += length;
}
}
}
And then you could call it like this
string temp = new string('#', 500);
string[] array = temp.SplitOnLength(75).ToArray();
foreach (string x in array)
Console.WriteLine(x);
I think this is a little cleaner than the other answers:
public static IEnumerable<string> SplitByLength(string s, int length)
{
while (s.Length > length)
{
yield return s.Substring(0, length);
s = s.Substring(length);
}
if (s.Length > 0) yield return s;
}
I would tackle this with a loop using C# String.Substring method.
Note that this isn't exact code, but you get the idea.
var myString = "hello world";
List<string> list = new List();
int maxSize
while(index < myString.Length())
{
if(index + maxSize > myString.Length())
{
// handle last case
list.Add(myString.Substring(index));
break;
}
else
{
list.Add(myString.Substring(index,maxSize));
index+= maxSize;
}
}
When you say split, are you referring to the split function? If not, something like this will work:
List<string> list = new List<string>();
string s = "";
int num = 75;
while (s.Length > 0)
{
list.Add(s.Substring(0, num));
s = s.Remove(0, num);
}
i am assuming maybe a delimiter - like space character.
search on the string (instr) until you find the next position of the delimiter.
if that is < your substring length (75) then append to the current substring.
if not, start a new substring.
special case - if there is no delimiter in the entire substring - then you need to define what happens - like add a '-' then continue.
public static string SplitByLength(string s, int length)
{
ArrayList sArrReturn = new ArrayList();
String[] sArr = s.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string sconcat in sArr)
{
if (((String.Join(" ", sArrReturn.ToArray()).Length + sconcat.Length)+1) < length)
sArrReturn.Add(sconcat);
else
break;
}
return String.Join(" ", sArrReturn.ToArray());
}
public static string SplitByLengthOld(string s, int length)
{
try
{
string sret = string.Empty;
String[] sArr = s.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string sconcat in sArr)
{
if ((sret.Length + sconcat.Length + 1) < length)
sret = string.Format("{0}{1}{2}", sret, string.IsNullOrEmpty(sret) ? string.Empty : " ", sconcat);
}
return sret;
}
catch
{
return string.Empty;
}
}

Categories