I'm currently trying to solve a Title Capitalization problem. I have a method that takes in a sentence, splits it into words, compare the words with a check list of words.
Based on this check list, I lowercase the words if they are in the list. Uppercase any words not in the list. The first and last words are always capitalized.
Here is my method:
public string TitleCase(string title)
{
LinkedList<string> wordsList = new LinkedList<string>();
string[] listToCheck = { "a", "the", "to", "in", "with", "and", "but", "or" };
string[] words = title.Split(null);
var last = words.Length - 1;
var firstWord = CapitalizeWord(words[0]);
var lastWord = CapitalizeWord(words[last]);
wordsList.AddFirst(firstWord);
for (var i = 1; i <= last - 1; i++)
{
foreach (var s in listToCheck)
{
if (words[i].Equals(s))
{
wordsList.AddLast(LowercaseWord(words[i]));
}
else
{
wordsList.AddLast(CapitalizeWord(words[i]));
}
}
}
wordsList.AddLast(lastWord);
var sentence = string.Join(" ", wordsList);
return sentence;
}
Running this with the example and expecting the result:
var result = TitleCase("i love solving problems and it is fun");
Assert.AreEqual("I Love Solving Problems and It Is Fun", result);
I get instead:
"I Love Love Love Love Love Love Love Love Solving Solving Solving Solving Solving Solving Solving Solving Problems Problems Problems Problems Problems Problems Problems Problems And And And And And and And And It It It It It It It It Is Is Is Is Is Is Is Is Fun"
If you look closely one and is lowercased. Any tips to how I solve this?
You're doing some extra looping when you go through each of the words to check, and you're not exiting the loop as soon as you find a match (so you're adding the word on each check). To fix this issue in your specific code, you would do something like:
for (var i = 1; i <= last - 1; i++)
{
bool foundMatch = false;
foreach (var s in listToCheck)
{
if (words[i].Equals(s))
{
foundMatch = true;
break;
}
}
if (foundMatch)
{
wordsList.AddLast(LowercaseWord(words[i]));
}
else
{
wordsList.AddLast(CapitalizeWord(words[i]));
}
}
However there is a much easier way, which other answers have provided. But I wanted to point out a couple of other things:
You are creating an unnecessary LinkedList. You already have a list of the words you can manipulate in the words array, so you'll save some memory by just using that.
I think there is a bug in your code (and in some of the answers) where if someone passes in a string with a capital A word in the middle, it will not be converted to lowercase because the Equals method (or in the case of other answers, the Contains method) does a case-sensitive comparison by default. So you might want to pass a case-insensitive comparer to that method.
You don't need to do separate checks for the first and last word. You can just have a single if statement with these checks in the body of your loop
So, here's what I would do:
public static string TitleCase(string title)
{
var listToCheck = new[]{ "a", "the", "to", "in", "with", "and", "but", "or" };
var words = title.Split(null);
// Loop through all words in the array
for (int i = 0; i < words.Length; i++)
{
// If we're on the first or last index, or if
// the word is not in our list, Capitalize it
if (i == 0 || i == (words.Length - 1) ||
!listToCheck.Contains(words[i], StringComparer.OrdinalIgnoreCase))
{
words[i] = CapitalizeWord(words[i]);
}
else
{
words[i] = LowercaseWord(words[i]);
}
}
return string.Join(" ", words);
}
You have a loop within a loop which messes things up, simplify the code to have just one loop:
for (var i = 1; i <= last - 1; i++)
{
// No inner loop
// Use the .Contains() method to see if it's a key word
if (listToCheck.Contains(words[i]))
{
wordsList.AddLast(LowercaseWord(words[i]));
}
else
{
wordsList.AddLast(CapitalizeWord(words[i]));
}
}
Output:
I Love Solving Problems and It Is Fun
The problem is in the foreach loop, you are doing eight checks (the length of the listToCheck array) for each word - and adding the word to the list each time. I'd also recommend using a Linq query, so it should look like this:
for (var i = 1; i <= last - 1; i++) {
if(listToCheck.Contains(words[i]))
wordsList.AddLast(LowercaseWord(words[i]));
else
wordsList.AddLast(CapitalizeWord(words[i]));
}
Also, the reason the sixth 'and' is lowercased is because it is the sixth word in the listToCheck array. On the sixth time around the foreach loop, it succeeds the test and is written in lower case, all the others fail so it is capitalized.
As mentioned in the other answers the loop within the loop doesn't exit.
Just a suggestion, with Linq you could combine checking for the first and last word (through index) and check the ListToCheck together:
public string TitleCase(string title)
{
string[] listToCheck = { "a", "the", "to", "in", "with", "and", "but", "or" };
string[] words = title.Split(null);
var last = words.Length - 1;
return string.Join(" ", words.Select(w=>w.ToLower()).Select(((w,i) => i == 0 || i == last || !listToCheck.Contains(w) ? CapitalizeWord(w) : w)));
}
Note, in this solution the first Select makes sure all words are in lowercase, so the lookup in listToCheck can be done without special comparisons. Because the words are already in lowercase, that doesn't have to be done any more if the word doesn't have to be capitalized.
Related
In my C# program (I'm new to C# so I hope that I'm doing things correctly), I'm trying to read in all of the lines from a text file, which will look something along the lines of this, but with more entries (these are fictional people so don't worry about privacy):
Logan Babbleton ID #: 0000011 108 Crest Circle Mr. Logan M. Babbleton
Pittsburgh PA 15668 SSN: XXX-XX-XXXX
Current Program(s): Bachelor of Science in Cybersecurity
Mr. Carter J. Bairn ID #: 0000012 21340 North Drive Mr. Carter Joseph Bairn
Pittsburgh PA 15668 SSN: XXX-XX-XXXX
Current Program(s): Bachelor of Science in Computer Science
I have these lines read into an array, concentrationArray and want to find the lines that contain the word "Current", split them at the "(s): " in "Program(s): " and print the words that follow. I've done this earlier in my program, but splitting at an ID instead, like this:
nameLine = nameIDLine.Split(new string[] { "ID" }, StringSplitOptions.None)[1];
However, whenever I attempt to do this, I get an error that my index is out of the bounds of my split array (not my concentrationArray). Here's what I currently have:
for (int i = 0; i < concentrationArray.Length; i++)
{
if (concentrationArray[i].Contains("Current"))
{
lstTest.Items.Add(concentrationArray[i].Split(new string[] { "(s): " }, StringSplitOptions.None)[1]);
}
}
Where I'm confused is that if I change the index to 0 instead of 1, it will print everything out perfectly, but it will print out the first half, instead of the second half, which is what I want. What am I doing wrong? Any feedback is greatly appreciated since I'm fairly new at C# and would love to learn what I can. Thanks!
Edit - The only thing that I could think of was that maybe sometimes there wasn't anything after the string that I used to separate each element, but when I checked my text file, I found that was not the case and there is always something following the string used to separate.
You should check the result of split before trying to read at index 1.
If your line doesn't contain a "(s): " your code will crash with the exception given
for (int i = 0; i < concentrationArray.Length; i++)
{
if (concentrationArray[i].Contains("Current"))
{
string[] result = concentrationArray[i].Split(new string[] { "(s): " }, StringSplitOptions.None);
if(result.Length > 1)
lstTest.Items.Add(result[1]);
else
Console.WriteLine($"Line {i} has no (s): followeed by a space");
}
}
To complete the answer, if you always use index 0 then there is no error because when no separator is present in the input string then the output is an array with a single element containing the whole unsplitted string
If the line will always starts with
Current Program(s):
then why don't you just replace it with empty string like this:
concentrationArray[i].Replace("Current Program(s): ", "")
It is perhaps a little easier to understand and more reusable if you separate the concerns. It will also be easier to test. An example might be...
var allLines = File.ReadLines(#"C:\your\file\path\data.txt");
var currentPrograms = ExtractCurrentPrograms(allLines);
if (currentPrograms.Any())
{
lstTest.Items.AddRange(currentPrograms);
}
...
private static IEnumerable<string> ExtractCurrentPrograms(IEnumerable<string> lines)
{
const string targetPhrase = "Current Program(s):";
foreach (var line in lines.Where(l => !string.IsNullOrWhiteSpace(l)))
{
var index = line.IndexOf(targetPhrase);
if (index >= 0)
{
var programIndex = index + targetPhrase.Length;
var text = line.Substring(programIndex).Trim();
if (!string.IsNullOrWhiteSpace(text))
{
yield return text;
}
}
}
}
Here is a bit different approach
List<string> test = new List<string>();
string pattern = "Current Program(s):";
string[] allLines = File.ReadAllLines(#"C:\Users\xyz\Source\demo.txt");
foreach (var line in allLines)
{
if (line.Contains(pattern))
{
test.Add(line.Substring(line.IndexOf(pattern) + pattern.Length));
}
}
or
string pattern = "Current Program(s):";
lstTest.Items.AddRange(File.ReadLines(#"C:\Users\ODuritsyn\Source\demo.xml")
.Where(line => line.Contains(pattern))
.Select(line => line.Substring(line.IndexOf(pattern) + pattern.Length)));
I am preparing for a interview question.One of the question is to revert a sentence. Such as "its a awesome day" to "day awesome a its. After this,they asked if there is duplication, can you remove the duplication such as "I am good, Is he good" to "good he is, am I".
for reversal of the sentence i have written following method
public static string reversesentence(string one)
{
StringBuilder builder = new StringBuilder();
string[] split = one.Split(' ');
for (int i = split.Length-1; i >= 0; i--)
{
builder.Append(split[i]);
builder.Append(" ");
}
return builder.ToString();
}
But i am not getting ideas on removing of duplication.Can i get some help here.
This works:
public static string reversesentence(string one)
{
Regex reg = new Regex("\\w+");
bool isFirst = true;
var usedWords = new HashSet<String>(StringComparer.InvariantCultureIgnoreCase);
return String.Join("", one.Split(' ').Reverse().Select((w => {
var trimmedWord = reg.Match(w).Value;
if (trimmedWord != null) {
var wasFirst = isFirst;
isFirst = false;
if (usedWords.Contains(trimmedWord)) //Is it duplicate?
return w.Replace(trimmedWord, ""); //Remove the duplicate phrase but keep punctuation
usedWords.Add(trimmedWord);
if (!wasFirst) //If it's the first word, don't add a leading space
return " " + w;
return w;
}
return null;
})));
}
Basically, we decide if it's distinct based on the word without punctuation. If it already exists, just return the punctuation. If it doesn't exist, print out the whole word including punctuation.
Punctuation also removes the space in your example, which is why we can't just do String.Join(" ", ...) (otherwise the result would be good he Is , am I instead of good he Is, am I
Test:
reversesentence("I am good, Is he good").Dump();
Result:
good he Is, am I
For plain reversal:
String.Join(" ", text.Split(' ').Reverse())
For reversal with duplicate removal:
String.Join(" ", text.Split(' ').Reverse().Distinct())
Both work fine for strings containing just spaces as the separator. When you introduce the , then problem becomes more difficult. So much so that you need to specify how it should be handled. For example, should "I am good, Is he good" become "good he Is am I" or "good he Is , am I"? Your example in the question changes the case of "Is" and groups the "," with it too. That seems wrong to me.
The other answer points to using abstractions but interviewers usually want to see implementation.
For the reversal, the usual trick is to reverse the sentence first and then reverse each word as you travel from left to right. A space will you tell you that you have reached the end of a word. (See Programming Interviews Exposed for a solution to this or just google it. This used to be a VERY popular interview question). Your approach works but is frowned upon because you are using extra space (O(n)).
For removing duplicates, if you're only working with ASCII, you can do the following:
bool[] seenChars = new bool[128];
var sb = new StringBuilder();
foreach(char c in stringOne)
{
if(!seenChars[c]){
seenChars[c] = true;
sb.Append(c);
}
}
return sb.ToString();
The idea is to use the value of the char as an index in the array to tell you whether you've seen this character before or not. With this approach, you will be using O(1) space!
Edit: If you want to de-duplicate words, you probably want to use a HashSet and skip adding it if it already exists.
try this
string sentence = "I am good, Is he good";
var words = sentence.Split(new char[]{' ',','}).Distinct(StringComparer.CurrentCultureIgnoreCase);
var stringBuilder = new StringBuilder();
foreach(var item in words)
{
stringBuilder.Append(item);
stringBuilder.Append(" ");
}
Console.Write(stringBuilder);
Console.ReadLine();
My question is part curiosity and part help, so bear with me.
My previous question had to do with passing text files as an argument to a function, which I managed to figure out with help, so thank you to all who helped previously.
So, consider this code bit:
protected bool FindWordInFile(StreamReader wordlist, string word_to_find)
{
// Read the first line.
string line = wordlist.ReadLine();
while (line != null)
{
if(line.Contains(word_to_find))
{
return true;
}
// Read the next line.
line = wordlist.ReadLine();
}
return false;
}
What happens with this particular function if you call in it the following way:
temp_sentence_string = post_words[i]; //Takes the first string in the array FROM the array and binds it to a temporary string variable
WordCount.Text = WordCount.Text + " ||| " + temp_sentence_string;
word_count = temp_sentence_string.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
for (int word_pos = 0; word_pos < word_count.Length; word_pos++)
{
bool WhatEver = FindWordInFile(goodwords_string, word_count[word_pos]);
if (WhatEver == true)
{
WordTest.Text = WordTest.Text + "{" + WhatEver + "} ";
}
WordTest.Text = WordTest.Text + "{" + WhatEver + "}";
}
AND:
The string passed is "good times are good" and the text file has the word "good" in it is this:
good{True}times{False}are{False}good{False}
Pretty strange. It looks like what happened is that:
1. The sentence "good times are good" got put into an array, split by the detection of a space. This happened correctly.
2. The first array element, "good" was compared against the text file and returned True. So that worked.
3. It then went to the next word "times", compared it, came up False.
4. Went to the next word "are", compared it, came up False.
5. THEN it got to the final word, "good", BUT it evaluated to False. This should NOT have happened.
So, my question is - what happened? It looks like the function of FindWordInFile was perhaps not coded right on my end, and somehow it kept returning False even though the word "good" was in the text file.
Second Part: Repeaters in ASP.NET and C#
So I have a repeater object bound to an array that is INSIDE a for loop. This particular algorithm takes an array of sentences and then breaks them down into a temp array of words. The temp array of words is bound to the Repeater.
But what happens is, let's say I have two sentences to do stuff to...
And so it's inside a loop. It does the stuff to the first array of words, and then does it to the second array of words, but what happens in the displaying the contents of the array, it only shows the contents of the LAST array that was generated and populated. Even though it's in the for loop, my expectation was that it would show all the word arrays, one after the other. But it only shows the last one. So if there's 5 sentences to break up, it only shows the 5th sentence that was populated by words.
Any ideas why?
for (int i = 0; i < num_sentences; i++) //num_sentences is an int that counted the number of elements in the array of sentences that was populated before. It populates the array by splitting based on punctuation.
{
temp_sentence_string = post_words[i]; //Takes the first string in the array FROM the sentence array and binds it to a temporary string variable
WordCount.Text = WordCount.Text + " ||| " + temp_sentence_string;
word_count = temp_sentence_string.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); //create a word count array with one word as a singular element in the array
//We have the Word Count Array now. We can go through it with a loop, right?
for (int j = 0; j < word_count.Length; j++)
{
Boolean FoundIt = File
.ReadLines(#"c:\wordfilelib\goodwords.txt") // <- Your file name
.Any(line => line.Contains(word_count[j]));
WordTest.Text = WordTest.Text + FoundIt + "(" + word_count[j] + ")";
}
Repeater2.DataSource = word_count;
Repeater2.DataBind();
}
First Part
You are passing a StreamReader into the Find function. A StreamReader must be reset in order to be used multiple times. Test with the following sentence and you will see the result.
"good good good good"
you will get
good{true}good{false}good{false}good{false}
I would suggest reading the file into an array only one time and then do your processing over the array.
using System.Linq
using System.Collections.Generic;
public class WordFinder
{
private static bool FindWordInLines(string word_to_find, string[] lines)
{
foreach(var line in lines)
{
if(line.Contains(word_to_find)
return true;
}
return false;
}
public string SearchForWordsInFile(string path, string sentence)
{
// https://msdn.microsoft.com/en-us/library/s2tte0y1(v=vs.110).aspx
var lines = System.IO.File.ReadAllLines(path);
var words = sentence.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
var result = string.Empty;
foreach(var word in words)
{
var found = FindWordInLines(word, lines);
// {{ in string.Format outputs {
// }} in string.Format outputs }
// {0} says use first parameter's ToString() method
result += string.Format("{{{0}}}", found);
}
return result;
}
}
Second Part:
If you bind it in the for loop like that it will only bind to the last result. If you accumulate the results in the outer loop you can pass the accumulated results to the repeater and bind outside the loop.
I created a sample loop class below that has two loops. The "resultList" is the variable that accumulates the results.
using System.Collections.Generic;
public class LoopExample
{
public void RunLoopExample()
{
var outerList = new string[]{"the", "quick", "brown", "fox"};
var innerList = new string[]{"jumps", "over", "the", "lazy", "dog"};
// define the resultList variable outside the outer loop
var resultList = new List<string>();
for(int outerIndex = 0; outerIndex < outerList.Length; outerIndex ++)
{
var outerValue = outerList[outerIndex];
for(int innerIndex = 0; innerIndex < innerList.Length; innerIndex++)
{
var innerValue = innerList[innerIndex];
resultList.Add(string.Format("{0}->{1}; ", outerValue, innerValue));
}
}
// use the resultList variable outside the outer loop
foreach(var result in resultList )
{
Console.WriteLine(result);
}
}
}
In your example, you would set the dataSource to the resultList
Repeater2.DataSource = resultList;
Repeater2.DataBind();
I would like to check some string for invalid characters. With invalid characters I mean characters that should not be there. What characters are these? This is different, but I think thats not that importan, important is how should I do that and what is the easiest and best way (performance) to do that?
Let say I just want strings that contains 'A-Z', 'empty', '.', '$', '0-9'
So if i have a string like "HELLO STaCKOVERFLOW" => invalid, because of the 'a'.
Ok now how to do that? I could make a List<char> and put every char in it that is not allowed and check the string with this list. Maybe not a good idea, because there a lot of chars then. But I could make a list that contains all of the allowed chars right? And then? For every char in the string I have to compare the List<char>? Any smart code for this? And another question: if I would add A-Z to the List<char> I have to add 25 chars manually, but these chars are as I know 65-90 in the ASCII Table, can I add them easier? Any suggestions? Thank you
You can use a regular expression for this:
Regex r = new Regex("[^A-Z0-9.$ ]$");
if (r.IsMatch(SomeString)) {
// validation failed
}
To create a list of characters from A-Z or 0-9 you would use a simple loop:
for (char c = 'A'; c <= 'Z'; c++) {
// c or c.ToString() depending on what you need
}
But you don't need that with the Regex - pretty much every regex engine understands the range syntax (A-Z).
I have only just written such a function, and an extended version to restrict the first and last characters when needed. The original function merely checks whether or not the string consists of valid characters only, the extended function adds two integers for the numbers of valid characters at the beginning of the list to be skipped when checking the first and last characters, in practice it simply calls the original function 3 times, in the example below it ensures that the string begins with a letter and doesn't end with an underscore.
StrChr(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
StrChrEx(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", 11, 1));
BOOL __cdecl StrChr(CHAR* str, CHAR* chars)
{
for (int s = 0; str[s] != 0; s++)
{
int c = 0;
while (true)
{
if (chars[c] == 0)
{
return false;
}
else if (str[s] == chars[c])
{
break;
}
else
{
c++;
}
}
}
return true;
}
BOOL __cdecl StrChrEx(CHAR* str, CHAR* chars, UINT excl_first, UINT excl_last)
{
char first[2] = {str[0], 0};
char last[2] = {str[strlen(str) - 1], 0};
if (!StrChr(str, chars))
{
return false;
}
if (excl_first != 0)
{
if (!StrChr(first, chars + excl_first))
{
return false;
}
}
if (excl_last != 0)
{
if (!StrChr(last, chars + excl_last))
{
return false;
}
}
return true;
}
If you are using c#, you do this easily using List and contains. You can do this with single characters (in a string) or a multicharacter string just the same
var pn = "The String To ChecK";
var badStrings = new List<string>()
{
" ","\t","\n","\r"
};
foreach(var badString in badStrings)
{
if(pn.Contains(badString))
{
//Do something
}
}
If you're not super good with regular expressions, then there is another way to go about this in C#. Here is a block of code I wrote to test a string variable named notifName:
var alphabet = "a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z";
var numbers = "0,1,2,3,4,5,6,7,8,9";
var specialChars = " ,(,),_,[,],!,*,-,.,+,-";
var validChars = (alphabet + "," + alphabet.ToUpper() + "," + numbers + "," + specialChars).Split(',');
for (int i = 0; i < notifName.Length; i++)
{
if (Array.IndexOf(validChars, notifName[i].ToString()) < 0) {
errorFound = $"Invalid character '{notifName[i]}' found in notification name.";
break;
}
}
You can change the characters added to the array as needed. The Array IndexOf method is the key to the whole thing. Of course if you want commas to be valid, then you would need to choose a different split character.
Not enough reps to comment directly, but I recommend the Regex approach. One small caveat: you probably need to anchor both ends of the input string, and you will want at least one character to match. So (with thanks to ThiefMaster), here's my regex to validate user input for a simple arithmetical calculator (plus, minus, multiply, divide):
Regex r = new Regex(#"^[0-9\.\-\+\*\/ ]+$");
I'd go with a regex, but still need to add my 2 cents here, because all the proposed non-regex solutions are O(MN) in the worst case (string is valid) which I find repulsive for religious reasons.
Even more so when LINQ offers a simpler and more efficient solution than nesting loops:
var isInvalid = "The String To Test".Intersect("ALL_INVALID_CHARS").Any();
I have a string that I am reading from another system. It's basically a long string that represents a list of key value pairs that are separated by a space in between. It looks like this:
key:value[space]key:value[space]key:value[space]
So I wrote this code to parse it:
string myString = ReadinString();
string[] tokens = myString.split(' ');
foreach (string token in tokens) {
string key = token.split(':')[0];
string value = token.split(':')[1];
. . . .
}
The issue now is that some of the values have spaces in them so my "simplistic" split at the top no longer works. I wanted to see how I could still parse out the list of key value pairs (given space as a separator character) now that I know there also could be spaces in the value field as split doesn't seem like it's going to be able to work anymore.
NOTE: I now confirmed that KEYs will NOT have spaces in them so I only have to worry about the values. Apologies for the confusion.
Use this regular expression:
\w+:[\w\s]+(?![\w+:])
I tested it on
test:testvalue test2:test value test3:testvalue3
It returns three matches:
test:testvalue
test2:test value
test3:testvalue3
You can change \w to any character set that can occur in your input.
Code for testing this:
var regex = new Regex(#"\w+:[\w\s]+(?![\w+:])");
var test = "test:testvalue test2:test value test3:testvalue3";
foreach (Match match in regex.Matches(test))
{
var key = match.Value.Split(':')[0];
var value = match.Value.Split(':')[1];
Console.WriteLine("{0}:{1}", key, value);
}
Console.ReadLine();
As Wonko the Sane pointed out, this regular expression will fail on values with :. If you predict such situation, use \w+:[\w: ]+?(?![\w+:]) as the regular expression. This will still fail when a colon in value is preceded by space though... I'll think about solution to this.
This cannot work without changing your split from a space to something else such as a "|".
Consider this:
Alfred Bester:Alfred Bester Alfred:Alfred Bester
Is this Key "Alfred Bester" & value Alfred" or Key "Alfred" & value "Bester Alfred"?
string input = "foo:Foobarius Maximus Tiberius Kirk bar:Barforama zap:Zip Brannigan";
foreach (Match match in Regex.Matches(input, #"(\w+):([^:]+)(?![\w+:])"))
{
Console.WriteLine("{0} = {1}",
match.Groups[1].Value,
match.Groups[2].Value
);
}
Gives you:
foo = Foobarius Maximus Tiberius Kirk
bar = Barforama
zap = Zip Brannigan
You could try to Url encode the content between the space (The keys and the values not the : symbol) but this would require that you have control over the Input Method.
Or you could simply use another format (Like XML or JSON), but again you will need control over the Input Format.
If you can't control the input format you could always use a Regular expression and that searches for single spaces where a word plus : follows.
Update (Thanks Jon Grant)
It appears that you can have spaces in the key and the value. If this is the case you will need to seriously rethink your strategy as even Regex won't help.
string input = "key1:value key2:value key3:value";
Dictionary<string, string> dic = input.Split(' ').Select(x => x.Split(':')).ToDictionary(x => x[0], x => x[1]);
The first will produce an array:
"key:value", "key:value"
Then an array of arrays:
{ "key", "value" }, { "key", "value" }
And then a dictionary:
"key" => "value", "key" => "value"
Note, that Dictionary<K,V> doesn't allow duplicated keys, it will raise an exception in such a case. If such a scenario is possible, use ToLookup().
Using a regular expression can solve your problem:
private void DoSplit(string str)
{
str += str.Trim() + " ";
string patterns = #"\w+:([\w+\s*])+[^!\w+:]";
var r = new System.Text.RegularExpressions.Regex(patterns);
var ms = r.Matches(str);
foreach (System.Text.RegularExpressions.Match item in ms)
{
string[] s = item.Value.Split(new char[] { ':' });
//Do something
}
}
This code will do it (given the rules below). It parses the keys and values and returns them in a Dictonary<string, string> data structure. I have added some code at the end that assumes given your example that the last value of the entire string/stream will be appended with a [space]:
private Dictionary<string, string> ParseKeyValues(string input)
{
Dictionary<string, string> items = new Dictionary<string, string>();
string[] parts = input.Split(':');
string key = parts[0];
string value;
int currentIndex = 1;
while (currentIndex < parts.Length-1)
{
int indexOfLastSpace=parts[currentIndex].LastIndexOf(' ');
value = parts[currentIndex].Substring(0, indexOfLastSpace);
items.Add(key, value);
key = parts[currentIndex].Substring(indexOfLastSpace + 1);
currentIndex++;
}
value = parts[parts.Length - 1].Substring(0,parts[parts.Length - 1].Length-1);
items.Add(key, parts[parts.Length-1]);
return items;
}
Note: this algorithm assumes the following rules:
No spaces in the values
No colons in the keys
No colons in the values
Without any Regex nor string concat, and as an enumerable (it supposes keys don't have spaces, but values can):
public static IEnumerable<KeyValuePair<string, string>> Split(string text)
{
if (text == null)
yield break;
int keyStart = 0;
int keyEnd = -1;
int lastSpace = -1;
for(int i = 0; i < text.Length; i++)
{
if (text[i] == ' ')
{
lastSpace = i;
continue;
}
if (text[i] == ':')
{
if (lastSpace >= 0)
{
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1, lastSpace - keyEnd - 1));
keyStart = lastSpace + 1;
}
keyEnd = i;
continue;
}
}
if (keyEnd >= 0)
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1));
}
I guess you could take your method and expand upon it slightly to deal with this stuff...
Kind of pseudocode:
List<string> parsedTokens = new List<String>();
string[] tokens = myString.split(' ');
for(int i = 0; i < tokens.Length; i++)
{
// We need to deal with the special case of the last item,
// or if the following item does not contain a colon.
if(i == tokens.Length - 1 || tokens[i+1].IndexOf(':' > -1)
{
parsedTokens.Add(tokens[i]);
}
else
{
// This bit needs to be refined to deal with values with multiple spaces...
parsedTokens.Add(tokens[i] + " " + tokens[i+1]);
}
}
Another approach would be to split on the colon... That way, your first array item would be the name of the first key, second item would be the value of the first key and then name of the second key (can use LastIndexOf to split it out), and so on. This would obviously get very messy if the values can include colons, or the keys can contain spaces, but in that case you'd be pretty much out of luck...