I have a equation string and when I split it with a my pattern I get the folowing string array.
string[] equationList = {"code1","+","code2","-","code3"};
Then from this I create a list which only contains the codes.
List<string> codeList = {"code1","code2","code3"};
Then existing code loop through the codeList and retrieve the value of each code and replaces the value in the equationList with the below code.
foreach (var code in codeList ){
var codeVal = GetCodeValue(code);
for (var i = 0; i < equationList.Length; i++){
if (!equationList[i].Equals(code,StringComparison.InvariantCultureIgnoreCase)) continue;
equationList[i] = codeVal;
break;
}
}
I am trying to improve the efficiency and I believe I can get rid of the for loop within the foreach by using linq.
My question is would it be any better if I do in terms of speeding up the process?
If yes then can you please help with the linq statement?
Before jumping to LINQ... which doesn't solve any problems you've described, let's look at the logic you have here.
We split a string with a 'pattern'. How?
We then create a new list of codes. How?
We then loop through those codes and decode them. How?
But since we forgot to keep track of where those code came from, we now loop through the equationList (which is an array, not a List<T>) to substitute the results.
Seems a little convoluted to me.
Maybe a simpler solution would be:
Take in a string, and return IEnumerable<string> of words (similar to what you do now).
Take in a IEnumerable<string> of words, and return a IEnumerable<?> of values.
That is to say with this second step iterate over the strings, and simply return the value you want to return - rather than trying to extract certain values out, parsing them, and then inserting them back into a collection.
//Ideally we return something more specific eg, IEnumerable<Tokens>
public IEnumerable<string> ParseEquation(IEnumerable<string> words)
{
foreach (var word in words)
{
if (IsOperator(word)) yield return ToOperator(word);
else if (IsCode(word)) yield return ToCode(word);
else ...;
}
}
This is quite similar to the LINQ Select Statement... if one insisted I would suggest writing something like so:
var tokens = equationList.Select(ToToken);
...
public Token ToToken(string word)
{
if (IsOperator(word)) return ToOperator(word);
else if (IsCode(word)) return ToCode(word);
else ...;
}
If GetCodeValue(code) doesn't already, I suggest it probably could use some sort of caching/dictionary in its implementation - though the specifics dictate this.
The benefits of this approach is that it is flexible (we can easily add more processing steps), simple to follow (we put in these values and get these as a result, no mutating state) and easy to write. It also breaks the problem down into nice little chunks that solve their own task, which will help immensely when trying to refactor, or find niggly bugs/performance issues.
If your array is always alternating codex then operator this LINQ should do what you want:
string[] equationList = { "code1", "+", "code2", "-", "code3" };
var processedList = equationList.Select((s,j) => (j % 2 == 1) ? s :GetCodeValue(s)).ToArray();
You will need to check if it is faster
I think the fastest solution will be this:
var codeCache = new Dictionary<string, string>();
for (var i = equationList.Length - 1; i >= 0; --i)
{
var item = equationList[i];
if (! < item is valid >) // you know this because you created the codeList
continue;
string codeVal;
if (!codeCache.TryGetValue(item, out codeVal))
{
codeVal = GetCodeValue(item);
codeCache.Add(item, codeVal);
}
equationList[i] = codeVal;
}
You don't need a codeList. If every code is unique you can remove the codeCace.
Related
I have a case where I have the name of an object, and a bunch of file names. I need to match the correct file name with the object. The file name can contain numbers and words, separated by either hyphen(-) or underscore(_). I have no control of either file name or object name. For example:
10-11-12_001_002_003_13001_13002_this_is_an_example.svg
The object name in this case is just a string, representing an number
10001
I need to return true or false if the file name is a match for the object name. The different segments of the file name can match on their own, or any combination of two segments. In the example above, it should be true for the following cases (not every true case, just examples):
10001
10002
10003
11001
11002
11003
12001
12002
12003
13001
13002
And, we should return false for this case (among others):
13003
What I've come up with so far is this:
public bool IsMatch(string filename, string objectname)
{
var namesegments = GetNameSegments(filename);
var match = namesegments.Contains(objectname);
return match;
}
public static List<string> GetNameSegments(string filename)
{
var segments = filename.Split('_', '-').ToList();
var newSegments = new List<string>();
foreach (var segment in segments)
{
foreach (var segment2 in segments)
{
if (segment == segment2)
continue;
var newToken = segment + segment2;
newSegments.Add(newToken);
}
}
return segments.Concat(newSegments).ToList();
}
One or two segments combined can make a match, and that is enought. Three or more segments combined should not be considered.
This does work so far, but is there a better way to do it, perhaps without nesting foreach loops?
First: don't change debugged, working, sufficiently efficient code for no reason. Your solution looks good.
However, we can make some improvements to your solution.
public static List<string> GetNameSegments(string filename)
Making the output a list puts restrictions on the implementation that are not required by the caller. It should be IEnumerable<String>. Particularly since the caller in this case only cares about the first match.
var segments = filename.Split('_', '-').ToList();
Why ToList? A list is array-backed. You've already got an array in hand. Just use the array.
Since there is no longer a need to build up a list, we can transform your two-loop solution into an iterator block:
public static IEnumerable<string> GetNameSegments(string filename)
{
var segments = filename.Split('_', '-');
foreach (var segment in segments)
yield return segment;
foreach (var s1 in segments)
foreach (var s2 in segments)
if (s1 != s2)
yield return s1 + s2;
}
Much nicer. Alternatively we could notice that this has the structure of a query and simply return the query:
public static IEnumerable<string> GetNameSegments(string filename)
{
var q1= filename.Split('_', '-');
var q2 = from s1 in q1
from s2 in q1
where s1 != s2
select s1 + s2;
return q1.Concat(q2);
}
Again, much nicer in this form.
Now let's talk about efficiency. As is often the case, we can achieve greater efficiency at a cost of increased complication. This code looks like it should be plenty fast enough. Your example has nine segments. Let's suppose that nine or ten is typical. Our solutions thus far consider the ten or so singletons first, and then the hundred or so combinations. That's nothing; this code is probably fine. But what if we had thousands of segments and were considering millions of possibilities?
In that case we should restructure the algorithm. One possibility would be this general solution:
public bool IsMatch(HashSet<string> segments, string name)
{
if (segments.Contains(name))
return true;
var q = from s1 in segments
where name.StartsWith(s1)
let s2 = name.Substring(s1.Length)
where s1 != s2
where segments.Contains(s2)
select 1; // Dummy. All we care about is if there is one.
return q.Any();
}
Your original solution is quadratic in the number of segments. This one is linear; we rely on the constant order contains operation. (This assumes of course that string operations are constant time because strings are short. If that's not true then we have a whole other kettle of fish to fry.)
How else could we extract wins in the asymptotic case?
If we happened to have the property that the collection was not a hash set but rather a sorted list then we could do even better; we could binary search the list to find the start and end of the range of possible prefix matches, and then pour the list into a hashset to do the suffix matches. That's still linear, but could have a smaller constant factor.
If we happened to know that the target string was small compared to the number of segments, we could attack the problem from the other end. Generate all possible combinations of partitions of the target string and check if both halves are in the segment set. The problem with this solution is that it is quadratic in memory usage in the size of the string. So what we'd want to do there is construct a special hash on character sequences and use that to populate the hash table, rather than the standard string hash. I'm sure you can see how the solution would go from there; I shan't spell out the details.
Efficiency is very much dependent on the business problem that you're attempting to solve. Without knowing the full context/usage it's difficult to define the most efficient solution. What works for one situation won't always work for others.
I would always advocate to write working code and then solve any performance issues later down the line (or throw more tin at the problem as it's usually cheaper!) If you're having specific performance issues then please do tell us more...
I'm going to go out on a limb here and say (hope) that you're only going to be matching the filename against the object name once per execution. If that's the case I reckon this approach will be just about the fastest. In a circumstance where you're matching a single filename against multiple object names then the obvious choice is to build up an index of sorts and match against that as you were already doing, although I'd consider different types of collection depending on your expected execution/usage.
public static bool IsMatch(string filename, string objectName)
{
var segments = filename.Split('-', '_');
for (int i = 0; i < segments.Length; i++)
{
if (string.Equals(segments[i], objectName)) return true;
for (int ii = 0; ii < segments.Length; ii++)
{
if (ii == i) continue;
if (string.Equals($"{segments[i]}{segments[ii]}", objectName)) return true;
}
}
return false;
}
If you are willing to use the MoreLINQ NuGet package then this may be worth considering:
public static HashSet<string> GetNameSegments(string filename)
{
var segments = filename.Split(new char[] {'_', '-'}, StringSplitOptions.RemoveEmptyEntries).ToList();
var matches = segments
.Cartesian(segments, (x, y) => x == y ? null : x + y)
.Where(z => z != null)
.Concat(segments);
return new HashSet<string>(matches);
}
StringSplitOptions.RemoveEmptyEntries handles adjacent separators (e.g. --). Cartesian is roughly equivalent to your existing nested for loops. The Where is to remove null entries (i.e. if x == y). Concat is the same as your existing Concat. The use of HashSet allows for your Contains calls (in IsMatch) to be faster.
I'm a student and I was wondering what the most efficient way is to check if a certain value is present in a array.
My second attempt:
string value = "pow";
string[] array = new string[] { "pong", "ping", "pow" };
bool valueIsInArray = false;
foreach(var s in array) if (s == value) valueIsInArray = true;
if (valueIsInArray)
{
// code here
}
I've researched and found if I were to use LINQ the code would look like this:
string value = "oink"; // value given to the method
string[] array = new string[] { "oink", "oink", "baboinkadoink" };
if (array.Contains(value))
{
//code here
}
The question is if using LINQ in anyway negatively impacts the speed or consistency of the code, and if there is an even better way to go about doing this?
Use linq Any(), The enumeration of source is stopped as soon as the result can be determined.
string value = "pow";
string[] array = new string[] { "pong", "ping", "pow" };
bool isValuePresent = array.Any(x => x == value);
https://msdn.microsoft.com/en-us/library/bb534972(v=vs.110).aspx
As a commenter said, LiNQ won't really trouble you here. The difference is microscopic (even on larger collections). However, if you must use an alternative, use IndexOf. See: https://msdn.microsoft.com/en-us/library/system.array.indexof(v=vs.110).aspx
Example:
string value = "oink"; // value given to the method
string[] array = new string[] { "oink", "oink", "baboinkadoink" };
if (Array.IndexOf(array, value) > -1)
{
//code here
}
Although I'm not sure what Contains ends up doing underwater, but they probably make a call to IndexOf aswell.
Willy-nilly you have to scan the array up to the first match (or entire array if there's no match); you can either put foreach loop:
bool valueIsInArray = false;
foreach (var item in array)
if (item == value) {
valueIsInArray = true;
break;
}
use for one:
bool valueIsInArray = false;
foreach (int i = 0; i < array.Length; ++i)
if (array[i] == value) {
valueIsInArray = true;
break;
}
Try Array class:
bool valueIsInArray = array.Contains(value);
Implement the code with a help of Linq:
bool valueIsInArray = array.Any(item => item == value);
The difference of these methods is a question of microseconds (if any); that's why put the version which is the most readable for you. My own choice is array.Contains(value) - let the system work for you and hide unwanted details (e.g. break in the loop)
You shoud have to iterate through the entire array for checking the value.
Either Linq or Conventional looping methods. Or you can use the
Array.Find()
also for the same. Better to go with the Linq and make the code is more simpler.
Happy coding
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
So, I am creating a word filter for a game server in C# and basically I am trying to scourer the sentence for banned words and replace them with clean words. I've already done so, but now I'm up to the part where I want to scan the sentence for a list of sentence banned words. I'm hopeless at this bit, and I can't seem to wrap my head around it.
Basically I am CheckSentence(Message) in the ChatManager, and need the following code to count and return continue; if the value is more than 5. So far I have:
public bool CheckSentence(string Message)
{
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (Message.ToLower().Contains(Filter.Word) && Filter.IsSentence)
{
// count Message, if message contains >5
// from (Message.Contains(Filter.Word))
// continue; else (ignore)
}
}
return false;
}
I'm not too sure if that makes much sense, but I want it to continue; if there are more than 5 Message.Contains(Filter.Word)
public bool CheckSentence(string rawMessage)
{
var lower = rawMessage.ToLower();
var count = 0;
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (lower.Contains(Filter.Word) && Filter.IsSentence)
{
count++;
}
}
return count >= 5;
}
If this becomes too slow, you may be better of caching the list of filtered words in a HashSet, and iterating over each word in the message, checking if it exists in the HashSet, which would give you O(n) speed, where N is the number of words.
LINQ Version
public bool CheckSentenceLinq(string rawMessage)
{
var lower = rawMessage.ToLower();
return _filteredWords
.Where(x => x.IsSentence)
.Count(x => lower.Contains(x.Word)) >= 5;
}
EDIT 2: LINQ Updated As per #S.C. Comment
By #S.C.
For the linq version, there's no need to count past the first five. return _filteredWords.Where(x => x.IsSentence && lower.Contains(x.Word)).Skip(5).Any();
public bool CheckSentenceLinq(string rawMessage)
{
var lower = rawMessage.ToLower();
return _filteredWords
.Where(x => x.IsSentence)
.Where(x => lower.Contains(x.Word))
.Skip(5)
.Any();
}
ToUpper vs ToLower
As #DevEstacion mentioned and per Microsoft best practices for using string recommendations here it is best to use ToUpperInvariant() for string comparisons rather than ToLowerInvariant().
EDIT:Using Continue
public bool CheckSentenceWithContinue(string rawMessage)
{
var lower = rawMessage.ToLower();
var count = 0;
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (!Filter.IsSentence)
continue; // Move on to the next filter, as this is not a senetece word filter
if (!lower.Contains(Filter.Word))
continue; // Move on to the next filter, as the message does not contain this word
// If you are here it means filter is a Sentence filter, and the message contains the word, so increment the counter
count++;
}
return count >= 5;
}
I believe someone already posted a correct answer, I'm just here to provide an alternative.
So instead of doing a forloop or foreach, I'll be providing you with Regex solution.
public bool CheckSentence(string rawMessage)
{
/*
The string.Join("|", _filteredWords) will create the pattern for the Regex
the '|' means or so from the list of filtered words, it will look it up on
the raw message and get all matches
*/
return new Regex(string.Join("|", _filteredWords.Where(x => x.IsSentence)),
RegexOptions.IgnoreCase | RegexOptions.Compiled).Match(rawMessage).Length >= 5;
}
Benefits? much shorter, prevents loop and could be faster :)
Don't forget to add these two lines of using declaration on top of the .cs file
using System.Linq;
using System.Text.RegularExpressions;
Currently I have 231556 of words collection and do below loop to check every words for duplication.
I am using this function :-
public bool IsContainStringCIAI(string wordIn, HybridDictionary hd, out string wordOut)
{
int iValue = 1;
foreach (DictionaryEntry de2 in hd)
{
iValue = CultureInfo.CurrentCulture.CompareInfo.Compare(wordIn.ToLower(), de2.Key.ToString().ToLower(), CompareOptions.IgnoreNonSpace);
if (iValue == 0)
{
wordOut = de2.Key.ToString(); //Assign the existing word
return true;
}
}
wordOut = wordIn;
return false;
}
It take around 20 hours to finish looping, because each word will be added in to dictionary after comparing if it is not same. Anything can I do to improve this loop? Thanks before.
Can you convert your HybridDictionary to a Dictionary<string, string> where all the keys are already converted into a format you can compare (lower case, unwanted characters stripped out, whatever)? Then your method pretty much becomes this:
return hd.TryGetValue(wordIn.ToLower(), out wordOut);
And Dictionary is pretty fast ;]
I am in the process of learning more about LINQ and Lambda expressions but at this stage, I simply don't "Get" Lambda expressions.
Yes ... I am a newbie to these new concepts.
I mean, every example I see illustrates how to add or subtract to parameters.
What about something a little more complex?
To help me gain a better understanding I have posted a small challenge for anyone who wishes to participate. I have the following method which will take any string and will put spaces in between any upper case characters and their preceding neighbour (as shown below).
i.e.
"SampleText" = "Sample Text"
"DoesNotMatterHowManyWords" = "Does Not Matter How Many Words"
Here is the code;
public static string ProperSpace(string text)
{
var sb = new StringBuilder();
var lowered = text.ToLower();
for (var i = 0; i < text.Length; i++)
{
var a = text.Substring(i, 1);
var b = lowered.Substring(i, 1);
if (a != b) sb.Append(" ");
sb.Append(a);
}
return sb.ToString().Trim();
}
I am sure that the method above can be re-written to use with LINQ or a Lambda expression. I am hoping that this exercise will help open my eyes to these new concepts.
Also, if you have any good links to LINQ or Lambda tutorials, please provide.
EDIT
Thanks to everyone who has contributed. Although the current method does do the job, I am happy to see it can be modified to utilize a lambda expression. I also acknowledge that this is perhaps not the best example for LINQ.
Here is the newly updated method using a Lambda expression (tested to work);
public static string ProperSpace(string text)
{
return text.Aggregate(new StringBuilder(), (sb, c) =>
{
if (Char.IsUpper(c)) sb.Append(" ");
sb.Append(c);
return sb;
}).ToString().Trim();
}
I also appreciate the many links to other (similar) topics.
In particular this topic which is so true.
This is doing the same as the original code and even avoids the generation of the second (lower case) string.
var result = text.Aggregate(new StringBuilder(),
(sb, c) => (Char.IsUpper(c) ? sb.Append(' ') : sb).Append(c));
Personally, I think your method is simple and clear, and I would stick with it (I think I might have even written the exact same code somewhere along the lines).
UPDATE:
How about this as a starting point?
public IEnumerable<char> MakeNice(IEnumerable<char> str)
{
foreach (var chr in str)
{
if (char.ToUpper(chr) == chr)
{
yield return ' ';
}
yield return chr;
}
}
public string MakeNiceString(string str)
{
return new string(MakeNice(str)).Trim();
}
Like leppie, I'm not sure this is a good candidate for LINQ. You could force it, of course, but that wouldn't be a useful example. A minor tweak would be to compare text[i] against lowered[i] to avoid some unnecessary strings - and maybe default the sb to new StringBuilder(text.Length) (or a small amount higher):
if (text[i] != lowered[i]) sb.Append(' ');
sb.Append(a);
Other than that - I'd leave it alone;
public static string ProperSpace(string text)
{
return text.Aggregate(new StringBuilder(), (sb, c) =>
{
if (Char.IsUpper(c) && sb.Length > 0)
sb.Append(" ");
sb.Append(c);
return sb;
}).ToString();
}
I would use RegularExpressions for this case.
public static string ProperSpace(string text)
{
var expression = new Regex("[A-Z]");
return expression.Replace(text, " $0");
}
If you want to use a lambda you could use:
public static string ManipulateString(string text, Func<string, string> manipulator)
{
return manipulator(text);
}
// then
var expression = new Regex("[A-Z]");
ManipulateString("DoesNotMatterHowManyWords", s => expression.Replace(text, " $0"));
Which is essentially the same as using an anonyous delegate of
var expression = new Regex("[A-Z]");
ManipulateString("DoesNotMatterHowManyWords", delegate(s) {
return expression.Replace(text, " $0")
});
Here is a way of doing it:
string.Join("", text.Select((c, i) => (i > 0 && char.IsUpper(c)) ? " " + c : c.ToString()).ToArray());
But I don't see where the improvement is. Just check this very recent question...
EDIT : For those who are wondering: yes, I intentionnaly picked an ugly solution.
I've got a Regex solution that's only 8 times slower than your current loop[1], and also harder to read than your solution[2].
return Regex.Replace(text, #"(\P{Lu})(\p{Lu})", "$1 $2");
It matches unicode character groups, in this case non-uppercase followed by an uppercase, and then adds a space between them. This solution works better than other regex-based solutions that only look for [A-Z].
[1] With reservations that my quickly made up test may suck.
[2] Anyone actually know the unicode character groups without googling? ;)
You can use existing LINQ functions to make this work but it's probably not the best approach. The following LINQ expression would work but is inneficient because it generates a lot of extra strings
public static string ProperCase(string text)
{
return text.Aggregate(
string.Empty,
(acc, c) => Char.ToLower(c) != c ? acc + " " + c.ToString() : acc + c.ToString())
.Trim();
}
For usefullness of linq (if you need convincing), you could check out this question.
I think one first step is to get used to the dot syntax, and only then move on to the 'sql' syntax. Otherwise it just hurts your eyes to start with. I do wonder whether Microsoft didn't slow uptake of linq by pushing the sql syntax, which made a lot of people think 'yuck, DB code in my C#'.
As for lambdas, try doing some code with anonymous delegates first, because if you haven't done that, you won't really understand what the fuss is all about.
I'm curious why a simple regular expression replace wouldn't suffice. I wrote one for someone else that does exactly this:
"[AI](?![A-Z]{2,})[a-z]*|[A-Z][a-z]+|[A-Z]{2,}(?=[A-Z]|$)"
I already posted this on another bulleting board here: http://bytes.com/topic/c-sharp/answers/864056-string-manupulation-net-c. There's one bug that requires a post regex trim that I haven't had the opportunity to address yet, but maybe someone else can post a fix for that.
Using the replace pattern: "$0[space]" where you replace [space] with an actual space would cut the code down immensely.
It handles some special cases which might be outside the scope of what you're trying to do but the bulletin board thread will give you the info on those.
Edit: P.S. A great way to start learning some of the applications of LINQ is to check out the GOLF and CODE-GOLF tags and look for the C# posts. There's a bunch of different and more complex uses of LINQ-to-Objects which should help you to recognise some of the more useful(?) and amusing applications of this technology.
Have you ever thought of using the Aggregate function ...
For instance, let’s say I have an array called routes and I want to set all the Active fields to false. This can be done as follow:
routes.Aggregate(false, (value, route) => route.Active = false);
- Routes is the name of the table.
- The first false is simply the seed value and needs to be the same type as the value that is being set. It’s kind of… redundant.
- value is also redundant and is basically the first value.
- route is the aggregate value (each individual element from the sequence)
No more redundant foreach loops…
I don't know Lambda expression all that well either... but i'm sure there is q genius out there somewhere that can abuse this to do that...