c# Lambda, LINQ .... improve this method - c#

I am in the process of learning more about LINQ and Lambda expressions but at this stage, I simply don't "Get" Lambda expressions.
Yes ... I am a newbie to these new concepts.
I mean, every example I see illustrates how to add or subtract to parameters.
What about something a little more complex?
To help me gain a better understanding I have posted a small challenge for anyone who wishes to participate. I have the following method which will take any string and will put spaces in between any upper case characters and their preceding neighbour (as shown below).
i.e.
"SampleText" = "Sample Text"
"DoesNotMatterHowManyWords" = "Does Not Matter How Many Words"
Here is the code;
public static string ProperSpace(string text)
{
var sb = new StringBuilder();
var lowered = text.ToLower();
for (var i = 0; i < text.Length; i++)
{
var a = text.Substring(i, 1);
var b = lowered.Substring(i, 1);
if (a != b) sb.Append(" ");
sb.Append(a);
}
return sb.ToString().Trim();
}
I am sure that the method above can be re-written to use with LINQ or a Lambda expression. I am hoping that this exercise will help open my eyes to these new concepts.
Also, if you have any good links to LINQ or Lambda tutorials, please provide.
EDIT
Thanks to everyone who has contributed. Although the current method does do the job, I am happy to see it can be modified to utilize a lambda expression. I also acknowledge that this is perhaps not the best example for LINQ.
Here is the newly updated method using a Lambda expression (tested to work);
public static string ProperSpace(string text)
{
return text.Aggregate(new StringBuilder(), (sb, c) =>
{
if (Char.IsUpper(c)) sb.Append(" ");
sb.Append(c);
return sb;
}).ToString().Trim();
}
I also appreciate the many links to other (similar) topics.
In particular this topic which is so true.

This is doing the same as the original code and even avoids the generation of the second (lower case) string.
var result = text.Aggregate(new StringBuilder(),
(sb, c) => (Char.IsUpper(c) ? sb.Append(' ') : sb).Append(c));

Personally, I think your method is simple and clear, and I would stick with it (I think I might have even written the exact same code somewhere along the lines).
UPDATE:
How about this as a starting point?
public IEnumerable<char> MakeNice(IEnumerable<char> str)
{
foreach (var chr in str)
{
if (char.ToUpper(chr) == chr)
{
yield return ' ';
}
yield return chr;
}
}
public string MakeNiceString(string str)
{
return new string(MakeNice(str)).Trim();
}

Like leppie, I'm not sure this is a good candidate for LINQ. You could force it, of course, but that wouldn't be a useful example. A minor tweak would be to compare text[i] against lowered[i] to avoid some unnecessary strings - and maybe default the sb to new StringBuilder(text.Length) (or a small amount higher):
if (text[i] != lowered[i]) sb.Append(' ');
sb.Append(a);
Other than that - I'd leave it alone;

public static string ProperSpace(string text)
{
return text.Aggregate(new StringBuilder(), (sb, c) =>
{
if (Char.IsUpper(c) && sb.Length > 0)
sb.Append(" ");
sb.Append(c);
return sb;
}).ToString();
}

I would use RegularExpressions for this case.
public static string ProperSpace(string text)
{
var expression = new Regex("[A-Z]");
return expression.Replace(text, " $0");
}
If you want to use a lambda you could use:
public static string ManipulateString(string text, Func<string, string> manipulator)
{
return manipulator(text);
}
// then
var expression = new Regex("[A-Z]");
ManipulateString("DoesNotMatterHowManyWords", s => expression.Replace(text, " $0"));
Which is essentially the same as using an anonyous delegate of
var expression = new Regex("[A-Z]");
ManipulateString("DoesNotMatterHowManyWords", delegate(s) {
return expression.Replace(text, " $0")
});

Here is a way of doing it:
string.Join("", text.Select((c, i) => (i > 0 && char.IsUpper(c)) ? " " + c : c.ToString()).ToArray());
But I don't see where the improvement is. Just check this very recent question...
EDIT : For those who are wondering: yes, I intentionnaly picked an ugly solution.

I've got a Regex solution that's only 8 times slower than your current loop[1], and also harder to read than your solution[2].
return Regex.Replace(text, #"(\P{Lu})(\p{Lu})", "$1 $2");
It matches unicode character groups, in this case non-uppercase followed by an uppercase, and then adds a space between them. This solution works better than other regex-based solutions that only look for [A-Z].
[1] With reservations that my quickly made up test may suck.
[2] Anyone actually know the unicode character groups without googling? ;)

You can use existing LINQ functions to make this work but it's probably not the best approach. The following LINQ expression would work but is inneficient because it generates a lot of extra strings
public static string ProperCase(string text)
{
return text.Aggregate(
string.Empty,
(acc, c) => Char.ToLower(c) != c ? acc + " " + c.ToString() : acc + c.ToString())
.Trim();
}

For usefullness of linq (if you need convincing), you could check out this question.
I think one first step is to get used to the dot syntax, and only then move on to the 'sql' syntax. Otherwise it just hurts your eyes to start with. I do wonder whether Microsoft didn't slow uptake of linq by pushing the sql syntax, which made a lot of people think 'yuck, DB code in my C#'.
As for lambdas, try doing some code with anonymous delegates first, because if you haven't done that, you won't really understand what the fuss is all about.

I'm curious why a simple regular expression replace wouldn't suffice. I wrote one for someone else that does exactly this:
"[AI](?![A-Z]{2,})[a-z]*|[A-Z][a-z]+|[A-Z]{2,}(?=[A-Z]|$)"
I already posted this on another bulleting board here: http://bytes.com/topic/c-sharp/answers/864056-string-manupulation-net-c. There's one bug that requires a post regex trim that I haven't had the opportunity to address yet, but maybe someone else can post a fix for that.
Using the replace pattern: "$0[space]" where you replace [space] with an actual space would cut the code down immensely.
It handles some special cases which might be outside the scope of what you're trying to do but the bulletin board thread will give you the info on those.
Edit: P.S. A great way to start learning some of the applications of LINQ is to check out the GOLF and CODE-GOLF tags and look for the C# posts. There's a bunch of different and more complex uses of LINQ-to-Objects which should help you to recognise some of the more useful(?) and amusing applications of this technology.

Have you ever thought of using the Aggregate function ...
For instance, let’s say I have an array called routes and I want to set all the Active fields to false. This can be done as follow:
routes.Aggregate(false, (value, route) => route.Active = false);
- Routes is the name of the table.
- The first false is simply the seed value and needs to be the same type as the value that is being set. It’s kind of… redundant.
- value is also redundant and is basically the first value.
- route is the aggregate value (each individual element from the sequence)
No more redundant foreach loops…
I don't know Lambda expression all that well either... but i'm sure there is q genius out there somewhere that can abuse this to do that...

Related

String search in C# somewhat similiar to LIKE operator in say VB

I am aware this question as been asked. And I am not really looking for a function to do so. I was hoping to get some tips on making a little method I made better. Basically, take a long string, and search for a smaller string inside of it. I am aware that there is literally always a million ways to do things better, and that is what brought me here.
Please take a look at the code snippet, and let me know what you think. No, its not very complex, yes it does work for my needs, but I am more interested in learning where the pain points would be using this for something I would assume it would work for, but would not for such and such reason. I hope that makes sense. But to give this question a way to be answered for SO, is this a strong way to perform this task (I somewhat know the answer :) )
Super interested in constructive criticism, not just in "that's bad". I implore you do elaborate on such a thought so I can get the most out of the responses.
public static Boolean FindTextInString(string strTextToSearch, string strTextToLookFor)
{
//put the string to search into lower case
string strTextToSearchLower = strTextToSearch.ToLower();
//put the text to look for to lower case
string strTextToLookForLower = strTextToLookFor.ToLower();
//get the length of both of the strings
int intTextToLookForLength = strTextToLookForLower.Length;
int intTextToSearch = strTextToSearchLower.Length;
//loop through the division amount so we can check each part of the search text
for(int i = 0; i < intTextToSearch; i++)
{
//substring at multiple positions and see if it can be found
if (strTextToSearchLower.Substring(i,intTextToLookForLength) == strTextToLookForLower)
{
//return true if we found a matching string within the search in text
return true;
}
}
//otherwise we will return false
return false;
}
If you only care about finding a substring inside a string, just use String.Contains()
Example:
string string_to_search = "the cat jumped onto the table";
string string_to_find = "jumped onto";
return string_to_search.ToLower().Contains(string_to_find.ToLower());
You can reuse VB's Like operator this way:
1) Make a reference to Microsoft.VisualBasic.dll library.
2) Use the following code.
using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;
if (LikeOperator.LikeString(Source: "11", Pattern: "11*", CompareOption: CompareMethod.Text)
{
// Your code here...
}
To implement your function in a case-insensitive way, it may be more appropriate to use IndexOf instead of the combination of two ToLower() calls with Contains. This is both because ToLower() will generate a new string, and because of the Turkish İ Problem.
Something like the following should do the trick, where it returns False if either term is null, otherwise uses a case-insensitive IndexOf call to determine if the search term exists in the source string:
public static bool SourceContainsSearch(string source, string search)
{
return search != null &&
source?.IndexOf(search, StringComparison.OrdinalIgnoreCase) > -1;
}

Is there a better (faster, more convenient) way to concatenate multiple (nullable) objects?

I've ended up writing my own helper-class to concatenate objects: ConcatHelper.cs.
You see some examples in the gist, but also in the following snippet:
model.Summary = new ConcatHelper(", ")
.Concat(diploma.CodeProfession /* can be any object, will be null checked and ToString() called */)
.BraceStart() // if everything between the braces is null or empty, the braces will not appear
.Concat(diploma.CodeDiplomaType)
.Concat(label: DiplomaMessage.SrkRegisterId, labelSeparator: " ", valueDecorator: string.Empty, valueToAdd: diploma.SrkRegisterId)
.BraceEnd()
.Concat(diploma.CodeCountry)
.BraceStart()
.Concat(diploma.DateOfIssue?.Year.ToString(CultureInfo.InvariantCulture)) // no separator will be added if concatenated string is null or empty (no ", ,")
.BraceEnd()
.Concat(DiplomaMessage.Recognition, " ", string.Empty, diploma.DateOfRecognition?.Year.ToString(CultureInfo.InvariantCulture))
.ToString(); // results in something like: Drogist / Drogistin (Eidgenössischer Abschluss, SRK-Registrierungsnummer 3099-239), Irland (1991)
Benefits:
Does the null checks for you, avoids if/else branches.
Supports labeling, decorating and delimiting values. Doesn't add a label if the value will be null.
Joins everything, fluent notation - less codes
Good to do summaries of domain-objects.
Contra:
Rather slow:
I measured 7ms for the above example
I measured 0.01026ms per concatenation in a real-life example (see unit-test gist)
It's not static (could it be?)
Needs a list to keep track of everything.
Probably an overkill.
So as I am now starting to override a lot of ToString() methods of domain objects, I am unsure, if there is a better way.
By better I basically mean:
Is there a library that already does the stuff I need?
If not, can it be speed up without losing the convenient fluent-notation?
So I would be happy if you show me either a convenient way to achieve the same result without my helper, or helping me improving this class.
Greetings,
flo
Update:
Look at this gist for a real-life UnitTest.
I do not see any real problem with your code. But I would prefer a more streamlined syntax. It may look like this in the end:
string result = ConcatHelper.Concat(
diploma.CodeProfession,
new Brace(
diploma.CodeDiplomaType,
new LabeledItem(label: DiplomaMessage.SrkRegisterId, labelSeparator: " ",
valueDecorator: string.Empty, valueToAdd: diploma.SrkRegisterId)
),
diploma.CodeCountry,
new Brace(
diploma.DateOfIssue?.Year.ToString(CultureInfo.InvariantCulture)
),
DiplomaMessage.Recognition
).ToString();
no wall of Text
you do not have to repeat Concat over and over again
now chance to mix up the braces
Concat() would be of the type static ConcatHelper Concat(objs object[] params) in this case. Brace and LabeledItem need to be handled by ConcatHelper of course (if (obj is LabeledItem) { ... }).
Regarding your contras:
It should be fast enough (10us/ call should be okay). If you really need it faster, you probably should use a single String.Format()
Concat can be static. Just create the ConcatHelper-object inside the Concat-call.
Yes, it needs a list. Is there a problem?
It may be overkill, it may not. If you use this type of code regularly, Utility classes can save you much time and make the code more readable.
Variant 1 - stream like
var sb = new StringBuilder();
const string delimiter = ", ";
var first = true;
Action<object> append = _ => {
if(null!=_){
if(!first){ sb.Append(delimiter);}
first = false;
sb.Append(_.ToString());
}
}
append(diploma.X);
append(diploma.Y);
...
Another one - with collection
var data = new List<object>();
data.Add(diploma.X);
data.Add(diploma.Y);
...
var result = string.Join(", ",data.Where(_=>null!=_).Select(_=>_.ToString()));
It's not much efficient but it allow you addition step between data preparation and joining to do somwthing over collection itself.

Using LINQ in a string array to improve efficient C#

I have a equation string and when I split it with a my pattern I get the folowing string array.
string[] equationList = {"code1","+","code2","-","code3"};
Then from this I create a list which only contains the codes.
List<string> codeList = {"code1","code2","code3"};
Then existing code loop through the codeList and retrieve the value of each code and replaces the value in the equationList with the below code.
foreach (var code in codeList ){
var codeVal = GetCodeValue(code);
for (var i = 0; i < equationList.Length; i++){
if (!equationList[i].Equals(code,StringComparison.InvariantCultureIgnoreCase)) continue;
equationList[i] = codeVal;
break;
}
}
I am trying to improve the efficiency and I believe I can get rid of the for loop within the foreach by using linq.
My question is would it be any better if I do in terms of speeding up the process?
If yes then can you please help with the linq statement?
Before jumping to LINQ... which doesn't solve any problems you've described, let's look at the logic you have here.
We split a string with a 'pattern'. How?
We then create a new list of codes. How?
We then loop through those codes and decode them. How?
But since we forgot to keep track of where those code came from, we now loop through the equationList (which is an array, not a List<T>) to substitute the results.
Seems a little convoluted to me.
Maybe a simpler solution would be:
Take in a string, and return IEnumerable<string> of words (similar to what you do now).
Take in a IEnumerable<string> of words, and return a IEnumerable<?> of values.
That is to say with this second step iterate over the strings, and simply return the value you want to return - rather than trying to extract certain values out, parsing them, and then inserting them back into a collection.
//Ideally we return something more specific eg, IEnumerable<Tokens>
public IEnumerable<string> ParseEquation(IEnumerable<string> words)
{
foreach (var word in words)
{
if (IsOperator(word)) yield return ToOperator(word);
else if (IsCode(word)) yield return ToCode(word);
else ...;
}
}
This is quite similar to the LINQ Select Statement... if one insisted I would suggest writing something like so:
var tokens = equationList.Select(ToToken);
...
public Token ToToken(string word)
{
if (IsOperator(word)) return ToOperator(word);
else if (IsCode(word)) return ToCode(word);
else ...;
}
If GetCodeValue(code) doesn't already, I suggest it probably could use some sort of caching/dictionary in its implementation - though the specifics dictate this.
The benefits of this approach is that it is flexible (we can easily add more processing steps), simple to follow (we put in these values and get these as a result, no mutating state) and easy to write. It also breaks the problem down into nice little chunks that solve their own task, which will help immensely when trying to refactor, or find niggly bugs/performance issues.
If your array is always alternating codex then operator this LINQ should do what you want:
string[] equationList = { "code1", "+", "code2", "-", "code3" };
var processedList = equationList.Select((s,j) => (j % 2 == 1) ? s :GetCodeValue(s)).ToArray();
You will need to check if it is faster
I think the fastest solution will be this:
var codeCache = new Dictionary<string, string>();
for (var i = equationList.Length - 1; i >= 0; --i)
{
var item = equationList[i];
if (! < item is valid >) // you know this because you created the codeList
continue;
string codeVal;
if (!codeCache.TryGetValue(item, out codeVal))
{
codeVal = GetCodeValue(item);
codeCache.Add(item, codeVal);
}
equationList[i] = codeVal;
}
You don't need a codeList. If every code is unique you can remove the codeCace.

is String.Contains() faster than walking through whole array of char in string?

I have a function that is walking through the string looking for pattern and changing parts of it. I could optimize it by inserting
if (!text.Contains(pattern)) return;
But, I am actually walking through the whole string and comparing parts of it with the pattern, so the question is, how String.Contains() actually works? I know there was such a question - How does String.Contains work? but answer is rather unclear. So, if String.Contains() walks through the whole array of chars as well and compare them to pattern I am looking for as well, it wouldn't really make my function faster, but slower.
So, is it a good idea to attempt such an optimizations? And - is it possible for String.Contains() to be even faster than function that just walk through the whole array and compare every single character with some constant one?
Here is the code:
public static char colorchar = (char)3;
public static Client.RichTBox.ContentText color(string text, Client.RichTBox SBAB)
{
if (text.Contains(colorchar.ToString()))
{
int color = 0;
bool closed = false;
int position = 0;
while (text.Length > position)
{
if (text[position] == colorchar)
{
if (closed)
{
text = text.Substring(position, text.Length - position);
Client.RichTBox.ContentText Link = new Client.RichTBox.ContentText(ProtocolIrc.decode_text(text), SBAB, Configuration.CurrentSkin.mrcl[color]);
return Link;
}
if (!closed)
{
if (!int.TryParse(text[position + 1].ToString() + text[position + 2].ToString(), out color))
{
if (!int.TryParse(text[position + 1].ToString(), out color))
{
color = 0;
}
}
if (color > 9)
{
text = text.Remove(position, 3);
}
else
{
text = text.Remove(position, 2);
}
closed = true;
if (color < 16)
{
text = text.Substring(position);
break;
}
}
}
position++;
}
}
return null;
}
Short answer is that your optimization is no optimization at all.
Basically, String.Contains(...) just returns String.IndexOf(..) >= 0
You could improve your alogrithm to:
int position = text.IndexOf(colorchar.ToString()...);
if (-1 < position)
{ /* Do it */ }
Yes.
And doesn't have a bug (ahhm...).
There are better ways of looking for multiple substrings in very long texts, but for most common usages String.Contains (or IndexOf) is the best.
Also IIRC the source of String.Contains is available in the .Net shared sources
Oh, and if you want a performance comparison you can just measure for your exact use-case
Check this similar post How does string.contains work
I think that you will not be able to simply do anything faster than String.Contains, unless you want to use standard CRT function wcsstr, available in msvcrt.dll, which is not so easy
Unless you have profiled your application and determined that the line with String.Contains is a bottle-neck, you should not do any such premature optimizations. It is way more important to keep your code's intention clear.
Ans while there are many ways to implement the methods in the .NET base classes, you should assume the default implementations are optimal enough for most people's use cases. For example, any (future) implementation of .NET might use the x86-specific instructions for string comparisons. That would then always be faster than what you can do in C#.
If you really want to be sure whether your custom string comparison code is faster than String.Contains, you need to measure them both using many iterations, each with a different string. For example using the Stopwatch class to measure the time.
If you now the details which you can use for optimizations (not just simple contains check) sure you can make your method faster than string.Contains, otherwise - not.

Fastest way to trim a string and convert it to lower case

I've written a class for processing strings and I have the following problem: the string passed in can come with spaces at the beginning and at the end of the string.
I need to trim the spaces from the strings and convert them to lower case letters. My code so far:
var searchStr = wordToSearchReplacemntsFor.ToLower();
searchStr = searchStr.Trim();
I couldn't find any function to help me in StringBuilder. The problem is that this class is supposed to process a lot of strings as quickly as possible. So I don't want to be creating 2 new strings for each string the class processes.
If this isn't possible, I'll go deeper into the processing algorithm.
Try method chaining.
Ex:
var s = " YoUr StRiNg".Trim().ToLower();
Cyberdrew has the right idea. With string being immutable, you'll be allocating memory during both of those calls regardless. One thing I'd like to suggest, if you're going to call string.Trim().ToLower() in many locations in your code, is to simplify your calls with extension methods. For example:
public static class MyExtensions
{
public static string TrimAndLower(this String str)
{
return str.Trim().ToLower();
}
}
Here's my attempt. But before I would check this in, I would ask two very important questions.
Are sequential "String.Trim" and "String.ToLower" calls really impacting the performance of my app? Would anyone notice if this algorithm was twice as slow or twice as fast? The only way to know is to measure the performance of my code and compare against pre-set performance goals. Otherwise, micro-optimizations will generate micro-performance gains.
Just because I wrote an implementation that appears faster, doesn't mean that it really is. The compiler and run-time may have optimizations around common operations that I don't know about. I should compare the running time of my code to what already exists.
static public string TrimAndLower(string str)
{
if (str == null)
{
return null;
}
int i = 0;
int j = str.Length - 1;
StringBuilder sb;
while (i < str.Length)
{
if (Char.IsWhiteSpace(str[i])) // or say "if (str[i] == ' ')" if you only care about spaces
{
i++;
}
else
{
break;
}
}
while (j > i)
{
if (Char.IsWhiteSpace(str[j])) // or say "if (str[j] == ' ')" if you only care about spaces
{
j--;
}
else
{
break;
}
}
if (i > j)
{
return "";
}
sb = new StringBuilder(j - i + 1);
while (i <= j)
{
// I was originally check for IsUpper before calling ToLower, probably not needed
sb.Append(Char.ToLower(str[i]));
i++;
}
return sb.ToString();
}
If the strings use only ASCII characters, you can look at the C# ToLower Optimization. You could also try a lookup table if you know the character set ahead of time
So first of all, trim first and replace second, so you have to iterate over a smaller string with your ToLower()
other than that, i think your best algorithm would look like this:
Iterate over the string once, and check
whether there's any upper case characters
whether there's whitespace in beginning and end (and count how many chars you're talking about)
if none of the above, return the original string
if upper case but no whitespace: do ToLower and return
if whitespace:
allocate a new string with the right size (original length - number of white chars)
fill it in while doing the ToLower
You can try this:
public static void Main (string[] args) {
var str = "fr, En, gB";
Console.WriteLine(str.Replace(" ","").ToLower());
}

Categories