Tiny way to get the first 25 characters - c#

Can anyone think of a nicer way to do the following:
public string ShortDescription
{
get { return this.Description.Length <= 25 ? this.Description : this.Description.Substring(0, 25) + "..."; }
}
I would have liked to just do string.Substring(0, 25) but it throws an exception if the string is less than the length supplied.

I needed this so often, I wrote an extension method for it:
public static class StringExtensions
{
public static string SafeSubstring(this string input, int startIndex, int length, string suffix)
{
// Todo: Check that startIndex + length does not cause an arithmetic overflow - not that this is likely, but still...
if (input.Length >= (startIndex + length))
{
if (suffix == null) suffix = string.Empty;
return input.Substring(startIndex, length) + suffix;
}
else
{
if (input.Length > startIndex)
{
return input.Substring(startIndex);
}
else
{
return string.Empty;
}
}
}
}
if you only need it once, that is overkill, but if you need it more often then it can come in handy.
Edit: Added support for a string suffix. Pass in "..." and you get your ellipses on shorter strings, or pass in string.Empty for no special suffixes.

return this.Description.Substring(0, Math.Min(this.Description.Length, 25));
Doesn't have the ... part. Your way is probably the best, actually.

public static Take(this string s, int i)
{
if(s.Length <= i)
return s
else
return s.Substring(0, i) + "..."
}
public string ShortDescription
{
get { return this.Description.Take(25); }
}

The way you've done it seems fine to me, with the exception that I would use the magic number 25, I'd have that as a constant.
Do you really want to store this in your bean though? Presumably this is for display somewhere, so your renderer should be the thing doing the truncating instead of the data object

Well I know there's answer accepted already and I may get crucified for throwing out a regular expression here but this is how I usually do it:
//may return more than 25 characters depending on where in the string 25 characters is at
public string ShortDescription(string val)
{
return Regex.Replace(val, #"(.{25})[^\s]*.*","$1...");
}
// stricter version that only returns 25 characters, plus 3 for ...
public string ShortDescriptionStrict(string val)
{
return Regex.Replace(val, #"(.{25}).*","$1...");
}
It has the nice side benefit of not cutting a word in half as it always stops after the first whitespace character past 25 characters. (Of course if you need it to truncate text going into a database, that might be a problem.
Downside, well I'm sure it's not the fastest solution possible.
EDIT: replaced … with "..." since not sure if this solution is for the web!

without .... this should be the shortest :
public string ShortDescription
{
get { return Microsoft.VisualBasic.Left(this.Description;}
}

I think the approach is sound, though I'd recommend a few adjustments
Move the magic number to a const or configuration value
Use a regular if conditional rather than the ternary operator
Use a string.Format("{0}...") rather than + "..."
Have just one return point from the function
So:
public string ShortDescription
{
get
{
const int SHORT_DESCRIPTION_LENGTH = 25;
string _shortDescription = Description;
if (Description.Length > SHORT_DESCRIPTION_LENGTH)
{
_shortDescription = string.Format("{0}...", Description.Substring(0, SHORT_DESCRIPTION_LENGTH));
}
return _shortDescription;
}
}
For a more general approach, you might like to move the logic to an extension method:
public static string ToTruncated(this string s, int truncateAt)
{
string truncated = s;
if (s.Length > truncateAt)
{
truncated = string.Format("{0}...", s.Substring(0, truncateAt));
}
return truncated;
}
Edit
I use the ternary operator extensively, but prefer to avoid it if the code becomes sufficiently verbose that it starts to extend past 120 characters or so. In that case I'd like to wrap it onto multiple lines, so find that a regular if conditional is more readable.
Edit2
For typographical correctness you could also consider using the ellipsis character (…) as opposed to three dots/periods/full stops (...).

One way to do it:
int length = Math.Min(Description.Length, 25);
return Description.Substring(0, length) + "...";
There are two lines instead of one, but shorter ones :).
Edit:
As pointed out in the comments, this gets you the ... all the time, so the answer was wrong. Correcting it means we go back to the original solution.
At this point, I think using string extensions is the only option to shorten the code. And that makes sense only when that code is repeated in at least a few places...

Looks fine to me, being really picky I would replace "..." with the entity reference "…"

I can't think of any but your approach might not be the best. Are you adding presentation logic into your data object? If so then I suggest you put that logic elsewhere, for example a static StringDisplayUtils class with a GetShortStringMethod( int maxCharsToDisplay, string stringToShorten).
However, that approach might not be great either. What about different fonts and character sets? You'd have to start measuring the actual string length in terms of pixels. Check out the AutoEllipsis property on the winform's Label class (you'll prob need to set AutoSize to false if using this). The AutoEllipsis property, when true, will shorten a string and add the '...' chars for you.

I'd stick with what you have tbh, but just as an alternative, if you have LINQ to objects you could
new string(this.Description.ToCharArray().Take(25).ToArray())
//And to maintain the ...
+ (this.Description.Length <= 25 ? String.Empty : "...")
As others have said, you'd likely want to store 25 in a constant

You should see if you can reference the Microsoft.VisualBasic DLL into your app so you can make use of the "Left" function.

Related

is String.Contains() faster than walking through whole array of char in string?

I have a function that is walking through the string looking for pattern and changing parts of it. I could optimize it by inserting
if (!text.Contains(pattern)) return;
But, I am actually walking through the whole string and comparing parts of it with the pattern, so the question is, how String.Contains() actually works? I know there was such a question - How does String.Contains work? but answer is rather unclear. So, if String.Contains() walks through the whole array of chars as well and compare them to pattern I am looking for as well, it wouldn't really make my function faster, but slower.
So, is it a good idea to attempt such an optimizations? And - is it possible for String.Contains() to be even faster than function that just walk through the whole array and compare every single character with some constant one?
Here is the code:
public static char colorchar = (char)3;
public static Client.RichTBox.ContentText color(string text, Client.RichTBox SBAB)
{
if (text.Contains(colorchar.ToString()))
{
int color = 0;
bool closed = false;
int position = 0;
while (text.Length > position)
{
if (text[position] == colorchar)
{
if (closed)
{
text = text.Substring(position, text.Length - position);
Client.RichTBox.ContentText Link = new Client.RichTBox.ContentText(ProtocolIrc.decode_text(text), SBAB, Configuration.CurrentSkin.mrcl[color]);
return Link;
}
if (!closed)
{
if (!int.TryParse(text[position + 1].ToString() + text[position + 2].ToString(), out color))
{
if (!int.TryParse(text[position + 1].ToString(), out color))
{
color = 0;
}
}
if (color > 9)
{
text = text.Remove(position, 3);
}
else
{
text = text.Remove(position, 2);
}
closed = true;
if (color < 16)
{
text = text.Substring(position);
break;
}
}
}
position++;
}
}
return null;
}
Short answer is that your optimization is no optimization at all.
Basically, String.Contains(...) just returns String.IndexOf(..) >= 0
You could improve your alogrithm to:
int position = text.IndexOf(colorchar.ToString()...);
if (-1 < position)
{ /* Do it */ }
Yes.
And doesn't have a bug (ahhm...).
There are better ways of looking for multiple substrings in very long texts, but for most common usages String.Contains (or IndexOf) is the best.
Also IIRC the source of String.Contains is available in the .Net shared sources
Oh, and if you want a performance comparison you can just measure for your exact use-case
Check this similar post How does string.contains work
I think that you will not be able to simply do anything faster than String.Contains, unless you want to use standard CRT function wcsstr, available in msvcrt.dll, which is not so easy
Unless you have profiled your application and determined that the line with String.Contains is a bottle-neck, you should not do any such premature optimizations. It is way more important to keep your code's intention clear.
Ans while there are many ways to implement the methods in the .NET base classes, you should assume the default implementations are optimal enough for most people's use cases. For example, any (future) implementation of .NET might use the x86-specific instructions for string comparisons. That would then always be faster than what you can do in C#.
If you really want to be sure whether your custom string comparison code is faster than String.Contains, you need to measure them both using many iterations, each with a different string. For example using the Stopwatch class to measure the time.
If you now the details which you can use for optimizations (not just simple contains check) sure you can make your method faster than string.Contains, otherwise - not.

Fastest way to trim a string and convert it to lower case

I've written a class for processing strings and I have the following problem: the string passed in can come with spaces at the beginning and at the end of the string.
I need to trim the spaces from the strings and convert them to lower case letters. My code so far:
var searchStr = wordToSearchReplacemntsFor.ToLower();
searchStr = searchStr.Trim();
I couldn't find any function to help me in StringBuilder. The problem is that this class is supposed to process a lot of strings as quickly as possible. So I don't want to be creating 2 new strings for each string the class processes.
If this isn't possible, I'll go deeper into the processing algorithm.
Try method chaining.
Ex:
var s = " YoUr StRiNg".Trim().ToLower();
Cyberdrew has the right idea. With string being immutable, you'll be allocating memory during both of those calls regardless. One thing I'd like to suggest, if you're going to call string.Trim().ToLower() in many locations in your code, is to simplify your calls with extension methods. For example:
public static class MyExtensions
{
public static string TrimAndLower(this String str)
{
return str.Trim().ToLower();
}
}
Here's my attempt. But before I would check this in, I would ask two very important questions.
Are sequential "String.Trim" and "String.ToLower" calls really impacting the performance of my app? Would anyone notice if this algorithm was twice as slow or twice as fast? The only way to know is to measure the performance of my code and compare against pre-set performance goals. Otherwise, micro-optimizations will generate micro-performance gains.
Just because I wrote an implementation that appears faster, doesn't mean that it really is. The compiler and run-time may have optimizations around common operations that I don't know about. I should compare the running time of my code to what already exists.
static public string TrimAndLower(string str)
{
if (str == null)
{
return null;
}
int i = 0;
int j = str.Length - 1;
StringBuilder sb;
while (i < str.Length)
{
if (Char.IsWhiteSpace(str[i])) // or say "if (str[i] == ' ')" if you only care about spaces
{
i++;
}
else
{
break;
}
}
while (j > i)
{
if (Char.IsWhiteSpace(str[j])) // or say "if (str[j] == ' ')" if you only care about spaces
{
j--;
}
else
{
break;
}
}
if (i > j)
{
return "";
}
sb = new StringBuilder(j - i + 1);
while (i <= j)
{
// I was originally check for IsUpper before calling ToLower, probably not needed
sb.Append(Char.ToLower(str[i]));
i++;
}
return sb.ToString();
}
If the strings use only ASCII characters, you can look at the C# ToLower Optimization. You could also try a lookup table if you know the character set ahead of time
So first of all, trim first and replace second, so you have to iterate over a smaller string with your ToLower()
other than that, i think your best algorithm would look like this:
Iterate over the string once, and check
whether there's any upper case characters
whether there's whitespace in beginning and end (and count how many chars you're talking about)
if none of the above, return the original string
if upper case but no whitespace: do ToLower and return
if whitespace:
allocate a new string with the right size (original length - number of white chars)
fill it in while doing the ToLower
You can try this:
public static void Main (string[] args) {
var str = "fr, En, gB";
Console.WriteLine(str.Replace(" ","").ToLower());
}

substring help in c#

I have string qty__c which may or may not have a decimal point
The code below gives me a System.ArgumentOutOfRangeException: Length cannot be less than zero.
qty__c = qty__c.Substring(0, qty__c.IndexOf("."));
How do i cater to if there is no "."?
Thanks
The simplest way is just to test it separately:
int dotIndex = quantity.IndexOf('.');
if (dotIndex != -1)
{
quantity = quantity.Substring(0, dotIndex);
}
There are alternatives though... for example if you really wanted to do it in a single statement, you could either use a conditional operator above, or:
quantity = quantity.Split('.')[0];
or slightly more efficiently:
// C# 4 version
quantity = quantity.Split(new[] {'.'}, 2)[0];
// Pre-C# 4 version
quantity = quantity.Split(new char[] {'.'}, 2)[0];
(The second form effectively stops splitting after finding the first dot.)
Another option would be to use regular expressions.
On the whole, I think the first approach is the most sensible though. If you find you need to do this often, consider writing a method to encapsulate it:
// TODO: Think of a better name :)
public static string SubstringBeforeFirst(string text, string delimiter)
{
int index = text.IndexOf(delimiter);
return index == -1 ? text : text.Substring(0, index);
}
You just have to test if qty__c have a point in it before calling Substring :
var pointPos = qty__c.IndexOf('.');
if (pointPos != -1)
{
qty__c = qty__c.Substring(0, pointPos);
}
Use IndexOf method on that string. If returned value is -1, there is no character that was searched.
int index = mystring.IndexOf('.');
In your code, you are not checking the returned value of IndexOf. In the case where '.' is not present in the string, an exception will be thrown because SubString has been passed -1 as second parameter.
var indexofstring=quantity.Indexof('.');
if(indexofstring!=-1)
{
quantity=quantity.SubString(0,indexofstring);
}
Assuming you are looking for the decimal point in a number, which is at offset 3 both in
'123.456' and '123', then one solution is
var index = (mystring & ".").IndexOf(".")
(Sorry about the VB, I'm not sure of C# syntax)

String cannot contain any part of another string .NET 2.0

I'm looking for a simple way to discern if a string contains any part of another string (be that regex, built in function I don't know about, etc...). For Example:
string a = "unicorn";
string b = "cornholio";
string c = "ornament";
string d = "elephant";
if (a <comparison> b)
{
// match found ("corn" from 'unicorn' matched "corn" from 'cornholio')
}
if (a <comparison> c)
{
// match found ("orn" from 'unicorn' matched "orn" from 'ornament')
}
if (a <comparison> d)
{
// this will not match
}
something like if (a.ContainsAnyPartOf(b)) would be too much to hope for.
Also, I only have access to .NET 2.0.
Thanks in advance!
This method should work. You'll want to specify a minimum length for the "part" that might match. I'd assume you'd want to look for something of at least 2, but with this you can set it as high or low as you want. Note: error checking not included.
public static bool ContainsPartOf(string s1, string s2, int minsize)
{
for (int i = 0; i <= s2.Length - minsize; i++)
{
if (s1.Contains(s2.Substring(i, minsize)))
return true;
}
return false;
}
I think you're looking for this implementation of longest common substring?
Your best bet, according to my understanding of the question, is to compute the Levenshtein (or related values) distance and compare that against a threshold.
Your requirements are a little vague.
You need to define a minimum length for the match...but implementing an algorithm shouldn't be too difficult when you figure that part out.
I'd suggest breaking down the string into character arrays and then using tail recursion to find matches for the parts.

Formatting double as string in C#

I have a Double which could have a value from around 0.000001 to 1,000,000,000.000
I wish to format this number as a string but conditionally depending on its size. So if it's very small I want to format it with something like:
String.Format("{0:.000000000}", number);
if it's not that small, say 0.001 then I want to use something like
String.Format("{0:.00000}", number);
and if it's over, say 1,000 then format it as:
String.Format("{0:.0}", number);
Is there a clever way to construct this format string based on the size of the value I'm going to format?
Use Math.Log10 of the absolute value of the double to figure out how many 0's you need either left (if positive) or right (if negative) of the decimal place. Choose the format string based on this value. You'll need handle zero values separately.
string s;
double epislon = 0.0000001; // or however near zero you want to consider as zero
if (Math.Abs(value) < epislon) {
int digits = Math.Log10( Math.Abs( value ));
// if (digits >= 0) ++digits; // if you care about the exact number
if (digits < -5) {
s = string.Format( "{0:0.000000000}", value );
}
else if (digits < 0) {
s = string.Format( "{0:0.00000})", value );
}
else {
s = string.Format( "{0:#,###,###,##0.000}", value );
}
}
else {
s = "0";
}
Or construct it dynamically based on the number of digits.
Use the # character for optional positions in the string:
string.Format("{0:#,###,##0.000}", number);
I don't think you can control the number of decimal places like that as the precision of the double will likely mess things up.
To encapsulate the logic of deciding how many decimal places to output you could look at creating a custom formatter.
The first two String.Format in your question can be solved by automatically removing trailing zeros:
String.Format("{0:#,##0.########}", number);
And the last one you could solve by calling Math.Round(number,1) for values over 1000 and then use the same String.Format.
Something like:
String.Format("{0:#,##0.########}", number<1000 ? number : Math.Round(number,1));
Following up on OwenP's (and by "extension" tvanfosson):
If it's common enough, and you're on C# 3.0, I'd turn it into an extension method on the double:
class MyExtensions
{
public static string ToFormmatedString(this double d)
{
// Take d and implement tvanfosson's code
}
}
Now anywhere you have a double you can do:
double d = 1.005343;
string d_formatted = d.ToFormattedString();
If it were me, I'd write a custom wrapper class and put tvanfosson's code into its ToString method. That way you could still work with the double value, but you'd get the right string representation in just about all cases. It'd look something like this:
class FormattedDouble
{
public double Value { get; set; }
protected overrides void ToString()
{
// tvanfosson's code to produce the right string
}
}
Maybe it might be better to make it a struct, but I doubt it would make a big difference. You could use the class like this:
var myDouble = new FormattedDouble();
myDouble.Value = Math.Pi;
Console.WriteLine(myDouble);

Categories