Extract values from a string based on a pattern - c#

I need to pull a bunch of key value pairs based on a predefined pattern from a string. An example of what I would need is this:
Pattern: {value1}-{value2}
String: Example-String
Result KVP:
{ Key: value1, value: Example },
{ Key: value2, value: String }
The catch is that the pattern could be pretty much anything (although the values I'd need to extract would always be surrounded in curly brackets), ie:
Pattern: {test1}\{test2}={value}
String: Example\Value=Result
Result KVP:
{ Key: test1, value: Example },
{ Key: test2, value: Value },
{ Key: value, value: Result }
What I have done so far isn't quite working and I'm quite certain that there has to be a more elegant way of doing this as opposed to my solution anyway so I thought I'd see if anyone here would have a good idea.
EDIT:
Here is essentially what I have so far (it's working, but IMO it's really ugly):
public List<KeyValuePair<string, string>> Example(string pattern, string input)
{
var values = new List<KeyValuePair<string, string>>();
var r1 = Regex.Matches(input, #"(\{[A-Z,a-z]*\})");
string newregex = string.Empty;
foreach (Match item in r1)
{
newregex = newregex.Replace(item.Value, "(.*?)"); //updates regex so that it adds this as a group for use later, ie: "{item1}-{item2}" will become "(.*?)-{item2}"
string field = item.Value.Substring(1, item.Value.Length - 2); // {test1} will return "test1"
values.Add(new KeyValuePair<string, string>(field, string.Empty));
}
newregex = $"{newregex}\\z"; // ensures that it matches to end of input
var r2 = Regex.Match(input, newregex);
// KVP index (used below)
int val = 0;
foreach (Group g in r2.Groups)
{
if (g.Value == input)
continue; // first group will be equal to input, ignore
values[val] = new KeyValuePair<string, string>(values[val].Key, g.Value); // update KVP at index with new KVP with the value
val++;
}
return values;
}

Unfortunately I don't know regular expressions very well, but one way to solve this is to walk through each character of the pattern string and create a list of keys and delimeters, after which we can walk through the search string, and find the index of each delimeter to get the current value, and then add a new KeyValuePair to a list.
Here's a rough sample that assumes good input:
public static List<KeyValuePair<string, string>> GetKVPs(string pattern, string search)
{
var results = new List<KeyValuePair<string, string>>();
var keys = new List<string>();
var delimeters = new List<string>();
var currentKey = string.Empty;
var currentDelimeter = string.Empty;
var processingKey = false;
// Populate our lists of Keys and Delimeters
foreach (var chr in pattern)
{
switch (chr)
{
case '}':
{
if (currentKey.Length > 0)
{
keys.Add(currentKey);
currentKey = string.Empty;
}
processingKey = false;
break;
}
case '{':
{
if (currentDelimeter.Length > 0)
{
delimeters.Add(currentDelimeter);
currentDelimeter = string.Empty;
}
processingKey = true;
break;
}
default:
{
if (processingKey)
{
currentKey += chr;
}
else
{
currentDelimeter += chr;
}
break;
}
}
}
if (currentDelimeter.Length > 0) delimeters.Add(currentDelimeter);
var lastDelim = -1;
// Find our Values based on the delimeter positions in the search string
for (int i = 0; i < delimeters.Count; i++)
{
var delimIndex = search.IndexOf(delimeters[i], lastDelim + 1);
if (delimIndex > -1)
{
var value = search.Substring(lastDelim + 1, delimIndex - lastDelim - 1);
results.Add(new KeyValuePair<string, string>(keys[i], value));
lastDelim = delimIndex + delimeters[i].Length - 1;
}
}
// Add the item after the final delimeter if it exists:
if (lastDelim > -1 && lastDelim < search.Length - 1)
{
results.Add(new KeyValuePair<string, string>(keys.Last(),
search.Substring(lastDelim + 1)));
}
return results;
}
And an example of it in action:
public static void Main(string[] args)
{
var results = GetKVPs(
"{greeting}, {recipient}, this is {sender}.",
"Hello, Dolly, this is Louis.");
foreach (var kvp in results)
{
Console.WriteLine($"{kvp.Key} = {kvp.Value}");
}
GetKeyFromUser("\nDone! Press any key to exit...");
}
Output

Related

How To Get Count of element in List<> without linq [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to get count the elements in an array but without linq
Example:
string a = "cat";
string b = "dog";
string c = "cat";
string d = "horse";
var list = new List<string>();
list.Add(a);
list.Add(b);
list.Add(c);
list.Add(d);
And
desired result is : cat=2, dog=1, horse=1
Here's one way I could think of using a Dictionary<string, int>:
public static Dictionary<string, int> GetObjectCount(List<string> items)
{
// Dictionary object to return
Dictionary<string, int> keysAndCount = new Dictionary<string, int>();
// Iterate your string values
foreach(string s in items)
{
// Check if dictionary contains the key, if so, add to count
if (keysAndCount.ContainsKey(s))
{
keysAndCount[s]++;
}
else
{
// Add key to dictionary with initial count of 1
keysAndCount.Add(s, 1);
}
}
return keysAndCount;
}
Then get the result back and print to console:
Dictionary<string, int> dic = GetObjectCount(list);
//Print to Console
foreach(string s in dic.Keys)
{
Console.WriteLine(s + " has a count of: " + dic[s]);
}
I am not sure why are you looking for LINQ less solution for this as this could be done very easily and efficiently by it. I strongly suggest you to use it and do it like below :
var _group = list.GroupBy(i => i);
string result = "";
foreach (var grp in _group)
result += grp.Key + ": " + grp.Count() + Environment.NewLine;
MessageBox.Show(result);
Otherwise you can do it like below if you really unable to use LINQ :
Dictionary<string, int> listCount = new Dictionary<string, int>();
foreach (string item in list)
if (!listCount.ContainsKey(item))
listCount.Add(item, 1);
else
listCount[item]++;
string result2 = "";
foreach (KeyValuePair<string, int> item in listCount)
result2 += item.Key + ": " + item.Value + Environment.NewLine;
MessageBox.Show(result2);
The simple solution to your issue is a foreach loop.
string[] myStrings = new string[] { "Cat", "Dog", "Horse", "CaT", "cat", "DOG" };
Console.WriteLine($"There are {GetCount(myStrings, "cat");} cats.");
static int GetCount(string[] strings, string searchTerm) {
int result = 0;
foreach (string s in strings)
if (s == searchTerm)
result++;
return result;
}
Linq does this under the hood. However, unless this is either for optimization of large lists or for learning experience, Linq should be your preferred choice if you know how to use it. It exists to make your life easier.
Another implementation of this would be to simplify the number of calls you need and just write the output in the method:
string[] myStrings = new string[] { "Cat", "Dog", "Horse", "CaT", "cat", "DOG" };
CountTerms(myStrings, "cat", "dog");
Console.ReadKey();
static void CountTerms(string[] strings, params string[] terms) {
foreach (string term in terms) {
int result = 0;
foreach (string s in strings)
if (s == term)
result++;
Console.WriteLine($"There are {result} instances of {term}");
}
}
With that said, I heavily recommend Ryan Wilson's answer. His version simplifies the task at hand. The only downside to his implementation is if you are implementing this in a singular manner the way List<string>.Count(c => c == "cat") would.
You could try something like:
public int countOccurances(List<string> inputList, string countFor)
{
// Identifiers used are:
int countSoFar = 0;
// Go through your list to count
foreach (string listItem in inputList)
{
// Check your condition
if (listItem == countFor)
{
countSoFar++;
}
}
// Return the results
return countSoFar;
}
this will give you the count for any sting you give it. As always there is a better way but this is a good start.
Or if you want:
public string countOccurances(List<string> inputList, string countFor)
{
// Identifiers used are:
int countSoFar = 0;
string result = countFor;
// Go through your list to count
foreach (string listItem in inputList)
{
// Check your condition
if (listItem == countFor)
{
countSoFar++;
}
}
// Return the results
return countFor + " = " countSoFar;
}
Or an even better option:
private static void CountOccurances(List<string> inputList, string countFor)
{
int result = 0;
foreach (string s in inputList)
{
if (s == countFor)
{
result++;
}
}
Console.WriteLine($"There are {result} occurrances of {countFor}.");
}
Linq is supposed to make developer's life easy. Anyway you could make something like this:
string a = "cat";
string b = "dog";
string c = "cat";
string d = "horse";
var list = new List<string>();
list.Add(a);
list.Add(b);
list.Add(c);
list.Add(d);
var result = GetCount(list);
Console.WriteLine(result);
Console.ReadLine();
static string GetCount(List<string> obj)
{
string result = string.Empty;
int cat = 0;
int dog = 0;
int horse = 0;
foreach (var item in obj)
{
switch (item)
{
case "dog":
dog++;
break;
case "cat":
cat++;
break;
case "horse":
horse++;
break;
}
}
result = "cat = " + cat.ToString() + " dog = " + dog.ToString() + " horse = " + horse.ToString();
return result;
}

Cannot store csv file into a dictionary in its original format

Have a csv file in the following format:
a,b
Goal is to store this csv file in a dictionary
Problem: csvOne has this as first field
kfjdfdsdsd, second value
aaaaaaa
sdasdasdaasdasdfffw
as a result it does not get stored in it's original format, i.e. only the below part gets stored:
key: "", value: kfjdfdsdsd
My code:
public void StoreInDictionary(string[] file, Dictionary<string, string> dictionary)
{
foreach (var line in file)
{
var cleansedLine = Regex.Replace(line, #"\s+", "");
var commaIndex = cleansedLine.IndexOf(',');
var valueOne = cleansedLine.Substring(0, commaIndex + 1);
var valueTwo = cleansedLine.Substring(commaIndex + 1);
if (!dictionary.ContainsKey(valueOne))
{
dictionary.Add(valueOne, valueTwo);
}
}
}
p.s. tried Replace \r\n too, did not work
Thanks a lot
For output as:
key value
kfjdfdsdsd second value
aaaaaaa (blank)
sdasdasdaasdasdfffw (blank)
use below code
public void StoreInDictionary(string[] file, Dictionary<string, string> dictionary)
{
foreach (var line in file)
{
var cleansedLine = Regex.Replace(line, #"\s+", "");
var commaIndex = cleansedLine.IndexOf(',');
string valueOne = String.Empty;
string valueTwo = String.Empty;
if (commaIndex > 0)
{
valueOne = cleansedLine.Substring(0, commaIndex + 1);
valueTwo = !String.IsNullOrWhiteSpace(cleansedLine.Substring(commaIndex + 1)) ?
cleansedLine.Substring(commaIndex + 1) :
String.Empty;
}
if (!dictionary.ContainsKey(valueOne))
{
dictionary.Add(valueOne, valueTwo);
}
}
}
This should give you what you want. Just a quick note, you might have line with multiple "," my solution doesn't deal with that.
public void StoreInDictionary(string[] file, Dictionary<string, string> dictionary)
{
foreach (var line in file)
{
if (!string.IsNullOrWhiteSpace(line))
{
string valueOne, valueTwo;
var idx = line.IndexOf(',');
if (idx >= 0)
{
valueOne = line.Substring(0, idx);
valueTwo = line.Substring(idx + 1);
}
else
{
valueOne = line;
valueTwo = string.Empty;
}
if (!dictionary.ContainsKey(valueOne))
{
dictionary.Add(valueOne, valueTwo);
}
}
}
}

Need to check a key in a dictionary for the letters of a string

So I have a dictionary and need to check each key entry to see if it contains a series of letters in a string(lets call this LETTERS). If the key has any letters that LETTERS does not, or it has more of a letter than LETTERS has, it has to be removed.(LETTERS is not known beforehand)
This is the code involved
Dictionary<string, int> wordHolder = new Dictionary<string, int>();
string LETTERS = Console.ReadLine();
for (int i = 0; i < LETTERS.Count(); i++)
{
for (int j = 0; j < wordHolder.Count; j++)
{
}
}
I think i have it this time if you still need an answer
The below has 5 keys... 3 of which can not be created from the contents of "LETTERS".
Dictionary<string, int> wordHolder = new Dictionary<string, int>();
wordHolder.Add("CEFBA",1);
wordHolder.Add("ZDFEEG",2);
wordHolder.Add("TYHRFG", 3);
wordHolder.Add("FFFFBBDD", 4);
wordHolder.Add("PCDATTY", 5);
var keysToRemove = new List<string>();
string myLetters = "ABCDEF";
var myLettersArray = myLetters.ToCharArray();
foreach (var keyToCheck in wordHolder)
{
var keyCannotBeCreatedFromLetters = false;
var keyArray = keyToCheck.Key.ToCharArray();
foreach (var letterExists in
from keyLetterToCheck in keyArray
where !keyCannotBeCreatedFromLetters
select myLettersArray.Any(a => a == keyLetterToCheck)
into letterExists
where !letterExists select letterExists)
{
keysToRemove.Add(keyToCheck.Key);
keyCannotBeCreatedFromLetters = true;
}
}
foreach (var key in keysToRemove)
{
wordHolder.Remove(key);
}
It correctly identifies the 2nd, 3rd and 5th key as not creatable.
Below is the same logic but as foreach loops. I find this often useful so you can see whats happening internally.
foreach (var keyToCheck in wordHolder)
{
var keyCannotBeCreatedFromLetters = false;
var keyArray = keyToCheck.Key.ToCharArray();
foreach (var keyLetterToCheck in keyArray)
{
if (keyCannotBeCreatedFromLetters)
continue;
var letterExists = myLettersArray.Any(a => a == keyLetterToCheck);
if (letterExists) continue;
keysToRemove.Add(keyToCheck.Key);
keyCannotBeCreatedFromLetters = true;
}
}
Assuming you want to return any KeyValuePair which Key contains all letter inside LETTERS.
It would look like this:
// Assuming
// Dictionary<string, int> wordHolder = new Dictionary<string, int>(); // Something
// string LETTERS = ""; // Something
List<char> listLettersToHave = LETTERS.ToList();
Dictionary<string, int> researchResult = new Dictionary<string, int>();
foreach (KeyValuePair<string, int> pair in wordHolder)
{
List<char> listLettersYouHave = pair.Key.ToList();
bool ok = true;
// If not the same count go to next KeyValuePair
if (listLettersToHave.Count != listLettersYouHave.Count)
continue;
foreach (char toCheck in listLettersToHave)
{
// Search first occurence
if (!listLettersYouHave.Contains(toCheck))
{
ok = false;
break;
}
// Remove first occurence
listLettersYouHave.Remove(toCheck);
}
if (ok)
// If all letters contained then Add to result
researchResult.Add(pair.Key, pair.Value);
}
// if it's a function
// return researchResult;
It's an example and you can improve it but the idea is here.
EDIT :
If the dictionnary have for keys : abc, cbda, dca
An input of bac
The results key would be : abc
The solution is case sensitive but a .ToUpper() will solve the problem.
Using the previous example, if you want cbda to match you can remove the check on Count.

How can I speed this loop up? Is there a class for replacing multiple terms at at time?

The loop:
var pattern = _dict[key];
string before;
do
{
before = pattern;
foreach (var pair in _dict)
if (key != pair.Key)
pattern = pattern.Replace(string.Concat("{", pair.Key, "}"), string.Concat("(", pair.Value, ")"));
} while (pattern != before);
return pattern;
It just does a repeated find-and-replace on a bunch of keys. The dictionary is just <string,string>.
I can see 2 improvements to this.
Every time we do pattern.Replace it searches from the beginning of the string again. It would be better if when it hit the first {, it would just look through the list of keys for a match (perhaps using a binary search), and then replace the appropriate one.
The pattern != before bit is how I check if anything was replaced during that iteration. If the pattern.Replace function returned how many or if any replaces actually occured, I wouldn't need this.
However... I don't really want to write a big nasty thing class to do all that. This must be a fairly common scenario? Are there any existng solutions?
Full Class
Thanks to Elian Ebbing and ChrisWue.
class FlexDict : IEnumerable<KeyValuePair<string,string>>
{
private Dictionary<string, string> _dict = new Dictionary<string, string>();
private static readonly Regex _re = new Regex(#"{([_a-z][_a-z0-9-]*)}", RegexOptions.Compiled | RegexOptions.IgnoreCase);
public void Add(string key, string pattern)
{
_dict[key] = pattern;
}
public string Expand(string pattern)
{
pattern = _re.Replace(pattern, match =>
{
string key = match.Groups[1].Value;
if (_dict.ContainsKey(key))
return "(" + Expand(_dict[key]) + ")";
return match.Value;
});
return pattern;
}
public string this[string key]
{
get { return Expand(_dict[key]); }
}
public IEnumerator<KeyValuePair<string, string>> GetEnumerator()
{
foreach (var p in _dict)
yield return new KeyValuePair<string,string>(p.Key, this[p.Key]);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Example Usage
class Program
{
static void Main(string[] args)
{
var flex = new FlexDict
{
{"h", #"[0-9a-f]"},
{"nonascii", #"[\200-\377]"},
{"unicode", #"\\{h}{1,6}(\r\n|[ \t\r\n\f])?"},
{"escape", #"{unicode}|\\[^\r\n\f0-9a-f]"},
{"nmstart", #"[_a-z]|{nonascii}|{escape}"},
{"nmchar", #"[_a-z0-9-]|{nonascii}|{escape}"},
{"string1", #"""([^\n\r\f\\""]|\\{nl}|{escape})*"""},
{"string2", #"'([^\n\r\f\\']|\\{nl}|{escape})*'"},
{"badstring1", #"""([^\n\r\f\\""]|\\{nl}|{escape})*\\?"},
{"badstring2", #"'([^\n\r\f\\']|\\{nl}|{escape})*\\?"},
{"badcomment1", #"/\*[^*]*\*+([^/*][^*]*\*+)*"},
{"badcomment2", #"/\*[^*]*(\*+[^/*][^*]*)*"},
{"baduri1", #"url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}"},
{"baduri2", #"url\({w}{string}{w}"},
{"baduri3", #"url\({w}{badstring}"},
{"comment", #"/\*[^*]*\*+([^/*][^*]*\*+)*/"},
{"ident", #"-?{nmstart}{nmchar}*"},
{"name", #"{nmchar}+"},
{"num", #"[0-9]+|[0-9]*\.[0-9]+"},
{"string", #"{string1}|{string2}"},
{"badstring", #"{badstring1}|{badstring2}"},
{"badcomment", #"{badcomment1}|{badcomment2}"},
{"baduri", #"{baduri1}|{baduri2}|{baduri3}"},
{"url", #"([!#$%&*-~]|{nonascii}|{escape})*"},
{"s", #"[ \t\r\n\f]+"},
{"w", #"{s}?"},
{"nl", #"\n|\r\n|\r|\f"},
{"A", #"a|\\0{0,4}(41|61)(\r\n|[ \t\r\n\f])?"},
{"C", #"c|\\0{0,4}(43|63)(\r\n|[ \t\r\n\f])?"},
{"D", #"d|\\0{0,4}(44|64)(\r\n|[ \t\r\n\f])?"},
{"E", #"e|\\0{0,4}(45|65)(\r\n|[ \t\r\n\f])?"},
{"G", #"g|\\0{0,4}(47|67)(\r\n|[ \t\r\n\f])?|\\g"},
{"H", #"h|\\0{0,4}(48|68)(\r\n|[ \t\r\n\f])?|\\h"},
{"I", #"i|\\0{0,4}(49|69)(\r\n|[ \t\r\n\f])?|\\i"},
{"K", #"k|\\0{0,4}(4b|6b)(\r\n|[ \t\r\n\f])?|\\k"},
{"L", #"l|\\0{0,4}(4c|6c)(\r\n|[ \t\r\n\f])?|\\l"},
{"M", #"m|\\0{0,4}(4d|6d)(\r\n|[ \t\r\n\f])?|\\m"},
{"N", #"n|\\0{0,4}(4e|6e)(\r\n|[ \t\r\n\f])?|\\n"},
{"O", #"o|\\0{0,4}(4f|6f)(\r\n|[ \t\r\n\f])?|\\o"},
{"P", #"p|\\0{0,4}(50|70)(\r\n|[ \t\r\n\f])?|\\p"},
{"R", #"r|\\0{0,4}(52|72)(\r\n|[ \t\r\n\f])?|\\r"},
{"S", #"s|\\0{0,4}(53|73)(\r\n|[ \t\r\n\f])?|\\s"},
{"T", #"t|\\0{0,4}(54|74)(\r\n|[ \t\r\n\f])?|\\t"},
{"U", #"u|\\0{0,4}(55|75)(\r\n|[ \t\r\n\f])?|\\u"},
{"X", #"x|\\0{0,4}(58|78)(\r\n|[ \t\r\n\f])?|\\x"},
{"Z", #"z|\\0{0,4}(5a|7a)(\r\n|[ \t\r\n\f])?|\\z"},
{"Z", #"z|\\0{0,4}(5a|7a)(\r\n|[ \t\r\n\f])?|\\z"},
{"CDO", #"<!--"},
{"CDC", #"-->"},
{"INCLUDES", #"~="},
{"DASHMATCH", #"\|="},
{"STRING", #"{string}"},
{"BAD_STRING", #"{badstring}"},
{"IDENT", #"{ident}"},
{"HASH", #"#{name}"},
{"IMPORT_SYM", #"#{I}{M}{P}{O}{R}{T}"},
{"PAGE_SYM", #"#{P}{A}{G}{E}"},
{"MEDIA_SYM", #"#{M}{E}{D}{I}{A}"},
{"CHARSET_SYM", #"#charset\b"},
{"IMPORTANT_SYM", #"!({w}|{comment})*{I}{M}{P}{O}{R}{T}{A}{N}{T}"},
{"EMS", #"{num}{E}{M}"},
{"EXS", #"{num}{E}{X}"},
{"LENGTH", #"{num}({P}{X}|{C}{M}|{M}{M}|{I}{N}|{P}{T}|{P}{C})"},
{"ANGLE", #"{num}({D}{E}{G}|{R}{A}{D}|{G}{R}{A}{D})"},
{"TIME", #"{num}({M}{S}|{S})"},
{"PERCENTAGE", #"{num}%"},
{"NUMBER", #"{num}"},
{"URI", #"{U}{R}{L}\({w}{string}{w}\)|{U}{R}{L}\({w}{url}{w}\)"},
{"BAD_URI", #"{baduri}"},
{"FUNCTION", #"{ident}\("},
};
var testStrings = new[] { #"""str""", #"'str'", "5", "5.", "5.0", "a", "alpha", "url(hello)",
"url(\"hello\")", "url(\"blah)", #"\g", #"/*comment*/", #"/**/", #"<!--", #"-->", #"~=",
"|=", #"#hash", "#import", "#page", "#media", "#charset", "!/*iehack*/important"};
foreach (var pair in flex)
{
Console.WriteLine("{0}\n\t{1}\n", pair.Key, pair.Value);
}
var sw = Stopwatch.StartNew();
foreach (var str in testStrings)
{
Console.WriteLine("{0} matches: ", str);
foreach (var pair in flex)
{
if (Regex.IsMatch(str, "^(" + pair.Value + ")$", RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
Console.WriteLine(" {0}", pair.Key);
}
}
Console.WriteLine("\nRan in {0} ms", sw.ElapsedMilliseconds);
Console.ReadLine();
}
}
Purpose
For building complex regular expressions that may extend eachother. Namely, I'm trying to implement the css spec.
I think it would be faster if you look for any occurrences of {foo} using a regular expression, and then use a MatchEvaluator that replaces the {foo} if foo happens to be a key in the dictionary.
I have currently no visual studio here, but I guess this is functionally equivalent with your code example:
var pattern = _dict[key];
bool isChanged = false;
do
{
isChanged = false;
pattern = Regex.Replace(pattern, "{([^}]+)}", match => {
string matchKey = match.Groups[1].Value;
if (matchKey != key && _dict.ContainsKey(matchKey))
{
isChanged = true;
return "(" + _dict[matchKey] + ")";
}
return match.Value;
});
} while (isChanged);
Can I ask you why you need the do/while loop? Can the value of a key in the dictionary again contain {placeholders} that have to be replaced? Can you be sure you don't get stuck in an infinite loop where key "A" contains "Blahblah {B}" and key "B" contains "Blahblah {A}"?
Edit: further improvements would be:
Using a precompiled Regex.
Using recursion instead of a loop (see ChrisWue's comment).
Using _dict.TryGetValue(), as in Guffa's code.
You will end up with an O(n) algorithm where n is the size of the output, so you can't do much better than this.
You should be able to use a regular expression to find the matches. Then you can also make use of the fast lookup of the dictionary and not just use it as a list.
var pattern = _dict[key];
bool replaced = false;
do {
pattern = Regex.Replace(pattern, #"\{([^\}]+)\}", m => {
string k = m.Groups[1].Value;
string value;
if (k != key && _dict.TryGetValue(k, out value) {
replaced = true;
return "(" + value + ")";
} else {
return "{" + k + "}";
}
});
} while (replaced);
return pattern;
You can implement the following algorithm:
Search for { in source string
Copy everything upto { to StringBuilder
Find matching } (the search is done from last fond position)
Compare value between { and } to keys in your dictionary
If it matches copy to String builder ( + Value + )
Else copy from source string
If source string end is not reached go to step 1
Could you use PLINQ at all?
Something along the lines of:
var keys = dict.KeyCollection.Where(k => k != key);
bool replacementMade = keys.Any();
foreach(var k in keys.AsParallel(), () => {replacement code})

How to word by word iterate in string in C#?

I want to iterate over string as word by word.
If I have a string "incidentno and fintype or unitno", I would like to read every word one by one as "incidentno", "and", "fintype", "or", and "unitno".
foreach (string word in "incidentno and fintype or unitno".Split(' ')) {
...
}
var regex = new Regex(#"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));
This works even if you have ".,; tabs and new lines" between your words.
Slightly twisted I know, but you could define an iterator block as an extension method on strings. e.g.
/// <summary>
/// Sweep over text
/// </summary>
/// <param name="Text"></param>
/// <returns></returns>
public static IEnumerable<string> WordList(this string Text)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(' ', cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
foreach (string word in "incidentno and fintype or unitno".WordList())
System.Console.WriteLine("'" + word + "'");
Which has the advantage of not creating a big array for long strings.
Use the Split method of the string class
string[] words = "incidentno and fintype or unitno".Split(" ");
This will split on spaces, so "words" will have [incidentno,and,fintype,or,unitno].
Assuming the words are always separated by a blank, you could use String.Split() to get an Array of your words.
There are multiple ways to accomplish this. Two of the most convenient methods (in my opinion) are:
Using string.Split() to create an array. I would probably use this method, because it is the most self-explanatory.
example:
string startingSentence = "incidentno and fintype or unitno";
string[] seperatedWords = startingSentence.Split(' ');
Alternatively, you could use (this is what I would use):
string[] seperatedWords = startingSentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
StringSplitOptions.RemoveEmptyEntries will remove any empty entries from your array that may occur due to extra whitespace and other minor problems.
Next - to process the words, you would use:
foreach (string word in seperatedWords)
{
//Do something
}
Or, you can use regular expressions to solve this problem, as Darin demonstrated (a copy is below).
example:
var regex = new Regex(#"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));
For processing, you can use similar code to the first option.
foreach (string word in words)
{
//Do something
}
Of course, there are many ways to solve this problem, but I think that these two would be the simplest to implement and maintain. I would go with the first option (using string.Split()) just because regex can sometimes become quite confusing, while a split will function correctly most of the time.
When using split, what about checking for empty entries?
string sentence = "incidentno and fintype or unitno"
string[] words = sentence.Split(new char[] { ' ', ',' ,';','\t','\n', '\r'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string word in words)
{
// Process
}
EDIT:
I can't comment so I'm posting here but this (posted above) works:
foreach (string word in "incidentno and fintype or unitno".Split(' '))
{
...
}
My understanding of foreach is that it first does a GetEnumerator() and the calles .MoveNext until false is returned. So the .Split won't be re-evaluated on each iteration
public static string[] MyTest(string inword, string regstr)
{
var regex = new Regex(regstr);
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase);
return words;
}
? MyTest("incidentno, and .fintype- or; :unitno",#"[^\w+]")
[0]: "incidentno"
[1]: "and"
[2]: "fintype"
[3]: "or"
[4]: "unitno"
I'd like to add some information to JDunkerley's awnser.
You can easily make this method more reliable if you give a string or char parameter to search for.
public static IEnumerable<string> WordList(this string Text,string Word)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(Word, cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
public static IEnumerable<string> WordList(this string Text, char c)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(c, cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
I write a string processor class.You can use it.
Example:
metaKeywords = bodyText.Process(prepositions).OrderByDescending().TakeTop().GetWords().AsString();
Class:
public static class StringProcessor
{
private static List<String> PrepositionList;
public static string ToNormalString(this string strText)
{
if (String.IsNullOrEmpty(strText)) return String.Empty;
char chNormalKaf = (char)1603;
char chNormalYah = (char)1610;
char chNonNormalKaf = (char)1705;
char chNonNormalYah = (char)1740;
string result = strText.Replace(chNonNormalKaf, chNormalKaf);
result = result.Replace(chNonNormalYah, chNormalYah);
return result;
}
public static List<KeyValuePair<String, Int32>> Process(this String bodyText,
List<String> blackListWords = null,
int minimumWordLength = 3,
char splitor = ' ',
bool perWordIsLowerCase = true)
{
string[] btArray = bodyText.ToNormalString().Split(splitor);
long numberOfWords = btArray.LongLength;
Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1);
foreach (string word in btArray)
{
if (word != null)
{
string lowerWord = word;
if (perWordIsLowerCase)
lowerWord = word.ToLower();
var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "")
.Replace("?", "").Replace("!", "").Replace(",", "")
.Replace("<br>", "").Replace(":", "").Replace(";", "")
.Replace("،", "").Replace("-", "").Replace("\n", "").Trim();
if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords)))
{
if (wordsDic.ContainsKey(normalWord))
{
var cnt = wordsDic[normalWord];
wordsDic[normalWord] = ++cnt;
}
else
{
wordsDic.Add(normalWord, 1);
}
}
}
}
List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList();
return keywords;
}
public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true)
{
List<KeyValuePair<String, Int32>> result = null;
if (isBasedOnFrequency)
result = list.OrderByDescending(q => q.Value).ToList();
else
result = list.OrderByDescending(q => q.Key).ToList();
return result;
}
public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10)
{
List<KeyValuePair<String, Int32>> result = list.Take(n).ToList();
return result;
}
public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list)
{
List<String> result = new List<String>();
foreach (var item in list)
{
result.Add(item.Key);
}
return result;
}
public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list)
{
List<Int32> result = new List<Int32>();
foreach (var item in list)
{
result.Add(item.Value);
}
return result;
}
public static String AsString<T>(this List<T> list, string seprator = ", ")
{
String result = string.Empty;
foreach (var item in list)
{
result += string.Format("{0}{1}", item, seprator);
}
return result;
}
private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords)
{
bool result = false;
if (blackListWords == null) return false;
foreach (var w in blackListWords)
{
if (w.ToNormalString().Equals(word))
{
result = true;
break;
}
}
return result;
}
}

Categories