What is the best way to parse this string in C#? - c#

I have a string that I am reading from another system. It's basically a long string that represents a list of key value pairs that are separated by a space in between. It looks like this:
key:value[space]key:value[space]key:value[space]
So I wrote this code to parse it:
string myString = ReadinString();
string[] tokens = myString.split(' ');
foreach (string token in tokens) {
string key = token.split(':')[0];
string value = token.split(':')[1];
. . . .
}
The issue now is that some of the values have spaces in them so my "simplistic" split at the top no longer works. I wanted to see how I could still parse out the list of key value pairs (given space as a separator character) now that I know there also could be spaces in the value field as split doesn't seem like it's going to be able to work anymore.
NOTE: I now confirmed that KEYs will NOT have spaces in them so I only have to worry about the values. Apologies for the confusion.

Use this regular expression:
\w+:[\w\s]+(?![\w+:])
I tested it on
test:testvalue test2:test value test3:testvalue3
It returns three matches:
test:testvalue
test2:test value
test3:testvalue3
You can change \w to any character set that can occur in your input.
Code for testing this:
var regex = new Regex(#"\w+:[\w\s]+(?![\w+:])");
var test = "test:testvalue test2:test value test3:testvalue3";
foreach (Match match in regex.Matches(test))
{
var key = match.Value.Split(':')[0];
var value = match.Value.Split(':')[1];
Console.WriteLine("{0}:{1}", key, value);
}
Console.ReadLine();
As Wonko the Sane pointed out, this regular expression will fail on values with :. If you predict such situation, use \w+:[\w: ]+?(?![\w+:]) as the regular expression. This will still fail when a colon in value is preceded by space though... I'll think about solution to this.

This cannot work without changing your split from a space to something else such as a "|".
Consider this:
Alfred Bester:Alfred Bester Alfred:Alfred Bester
Is this Key "Alfred Bester" & value Alfred" or Key "Alfred" & value "Bester Alfred"?

string input = "foo:Foobarius Maximus Tiberius Kirk bar:Barforama zap:Zip Brannigan";
foreach (Match match in Regex.Matches(input, #"(\w+):([^:]+)(?![\w+:])"))
{
Console.WriteLine("{0} = {1}",
match.Groups[1].Value,
match.Groups[2].Value
);
}
Gives you:
foo = Foobarius Maximus Tiberius Kirk
bar = Barforama
zap = Zip Brannigan

You could try to Url encode the content between the space (The keys and the values not the : symbol) but this would require that you have control over the Input Method.
Or you could simply use another format (Like XML or JSON), but again you will need control over the Input Format.
If you can't control the input format you could always use a Regular expression and that searches for single spaces where a word plus : follows.
Update (Thanks Jon Grant)
It appears that you can have spaces in the key and the value. If this is the case you will need to seriously rethink your strategy as even Regex won't help.

string input = "key1:value key2:value key3:value";
Dictionary<string, string> dic = input.Split(' ').Select(x => x.Split(':')).ToDictionary(x => x[0], x => x[1]);
The first will produce an array:
"key:value", "key:value"
Then an array of arrays:
{ "key", "value" }, { "key", "value" }
And then a dictionary:
"key" => "value", "key" => "value"
Note, that Dictionary<K,V> doesn't allow duplicated keys, it will raise an exception in such a case. If such a scenario is possible, use ToLookup().

Using a regular expression can solve your problem:
private void DoSplit(string str)
{
str += str.Trim() + " ";
string patterns = #"\w+:([\w+\s*])+[^!\w+:]";
var r = new System.Text.RegularExpressions.Regex(patterns);
var ms = r.Matches(str);
foreach (System.Text.RegularExpressions.Match item in ms)
{
string[] s = item.Value.Split(new char[] { ':' });
//Do something
}
}

This code will do it (given the rules below). It parses the keys and values and returns them in a Dictonary<string, string> data structure. I have added some code at the end that assumes given your example that the last value of the entire string/stream will be appended with a [space]:
private Dictionary<string, string> ParseKeyValues(string input)
{
Dictionary<string, string> items = new Dictionary<string, string>();
string[] parts = input.Split(':');
string key = parts[0];
string value;
int currentIndex = 1;
while (currentIndex < parts.Length-1)
{
int indexOfLastSpace=parts[currentIndex].LastIndexOf(' ');
value = parts[currentIndex].Substring(0, indexOfLastSpace);
items.Add(key, value);
key = parts[currentIndex].Substring(indexOfLastSpace + 1);
currentIndex++;
}
value = parts[parts.Length - 1].Substring(0,parts[parts.Length - 1].Length-1);
items.Add(key, parts[parts.Length-1]);
return items;
}
Note: this algorithm assumes the following rules:
No spaces in the values
No colons in the keys
No colons in the values

Without any Regex nor string concat, and as an enumerable (it supposes keys don't have spaces, but values can):
public static IEnumerable<KeyValuePair<string, string>> Split(string text)
{
if (text == null)
yield break;
int keyStart = 0;
int keyEnd = -1;
int lastSpace = -1;
for(int i = 0; i < text.Length; i++)
{
if (text[i] == ' ')
{
lastSpace = i;
continue;
}
if (text[i] == ':')
{
if (lastSpace >= 0)
{
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1, lastSpace - keyEnd - 1));
keyStart = lastSpace + 1;
}
keyEnd = i;
continue;
}
}
if (keyEnd >= 0)
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1));
}

I guess you could take your method and expand upon it slightly to deal with this stuff...
Kind of pseudocode:
List<string> parsedTokens = new List<String>();
string[] tokens = myString.split(' ');
for(int i = 0; i < tokens.Length; i++)
{
// We need to deal with the special case of the last item,
// or if the following item does not contain a colon.
if(i == tokens.Length - 1 || tokens[i+1].IndexOf(':' > -1)
{
parsedTokens.Add(tokens[i]);
}
else
{
// This bit needs to be refined to deal with values with multiple spaces...
parsedTokens.Add(tokens[i] + " " + tokens[i+1]);
}
}
Another approach would be to split on the colon... That way, your first array item would be the name of the first key, second item would be the value of the first key and then name of the second key (can use LastIndexOf to split it out), and so on. This would obviously get very messy if the values can include colons, or the keys can contain spaces, but in that case you'd be pretty much out of luck...

Related

Parse multiple values from string c#

Suppose I have written "5 and 6" or "5+6". How can I assign 5 and 6 to two different variables in c# ?
P.S. I also want to do certain work if certain chars are found in string. Suppose I have written 5+5. Will this code do that ?
if(string.Contains("+"))
{
sum=x+y;
}
string input="5+5";
var numbers = Regex.Matches(input, #"\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Personally, I would vote against doing some splitting and regular expression stuff.
Instead I would (and did in the past) use one of the many Expression Evaluation libraries, like e.g. this one over at Code Project (and the updated version over at CodePlex).
Using the parser/tool above, you could do things like:
A simple expression evaluation then could look like:
Expression e = new Expression("5 + 6");
Debug.Assert(11 == e.Evaluate());
To me this is much more error-proof than doing the parsing all by myself, including regular expressions and the like.
You should use another name for your string than string
var numbers = yourString.Split("+");
var sum = Convert.ToInt32(numbers[0]) + Convert.ToInt32(numbers[1]);
Note: Thats an implementation without any error checking or error handling...
If you want to assign numbers from string to variables, you will have to parse string and make conversion.
Simple example, if you have text with only one number
string text = "500";
int num = int.Parse(text);
Now, if you want to parse something more complicated, you can use split() and/or regex to get all numbers and operators between them. Than you just iterate array and assign numbers to variables.
string text = "500+400";
if (text.Contains("+"))
{
String[] data = text.Split("+");
int a = int.Parse(data[0]);
int b = int.Parse(data[1]);
int res = a + b;
}
Basicly, if you have just 2 numbers and operazor between them, its ok. If you want to make "calculator" you will need something more, like Binary Trees or Stack.
Use the String.Split method. It splits your string rom the given character and returns a string array containing the value that is broken down into multiple pieces depending on the character to break, in this case, its "+".
int x = 0;
int y = 0;
int z = 0;
string value = "5+6";
if (value.Contains("+"))
{
string[] returnedArray = value.Split('+');
x = Convert.ToInt32(returnedArray[0]);
y = Convert.ToInt32(returnedArray[1]);
z = x + y;
}
Something like this may helpful
string strMy = "5&6";
char[] arr = strMy.ToCharArray();
List<int> list = new List<int>();
foreach (char item in arr)
{
int value;
if (int.TryParse(item.ToString(), out value))
{
list.Add(item);
}
}
list will contains all the integer values
You can use String.Split method like;
string s = "5 and 6";
string[] a = s.Split(new string[] { "and", "+" }, StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(a[0].Trim());
Console.WriteLine(a[1].Trim());
Here is a DEMO.
Use regex to get those value and then switch on the operand to do the calculation
string str = "51 + 6";
str = str.Replace(" ", "");
Regex regex = new Regex(#"(?<rightHand>\d+)(?<operand>\+|and)(?<leftHand>\d+)");
var match = regex.Match(str);
int rightHand = int.Parse(match.Groups["rightHand"].Value);
int leftHand = int.Parse(match.Groups["leftHand"].Value);
string op = match.Groups["operand"].Value;
switch (op)
{
case "+":
.
.
.
}
Split function maybe is comfortable in use but it is space inefficient
because it needs array of strings
Maybe Trim(), IndexOf(), Substring() can replace Split() function

Parse Text File Into Dictionary

I have a text file that has several hundred configuration values. The general format of the configuration data is "Label:Value". Using C# .net, I would like to read these configurations, and use the Values in other portions of the code. My first thought is that I would use a string search to look for the Labels then parse out the values following the labels and add them to a dictionary, but this seems rather tedious considering the number of labels/values that I would have to search for. I am interested to hear some thoughts on a possible architecture to perform this task. I have included a small section of a sample text file that contains some of the labels and values (below). A couple of notes: The Values are not always numeric (as seen in the AUX Serial Number); For whatever reason, the text files were formatted using spaces (\s) rather than tabs (\t). Thanks in advance for any time you spend thinking about this.
Sample Text:
AUX Serial Number: 445P000023 AUX Hardware Rev: 1
Barometric Pressure Slope: -1.452153E-02
Barometric Pressure Intercept: 9.524336E+02
This is a nice little brain tickler. I think this code might be able to point you in the right direction. Keep in mind, this fills a Dictionary<string, string>, so there are no conversions of values into ints or the like. Also, please excuse the mess (and the poor naming conventions). It was a quick write-up based on my train of thought.
Dictionary<string, string> allTheThings = new Dictionary<string, string>();
public void ReadIt()
{
// Open the file into a streamreader
using (System.IO.StreamReader sr = new System.IO.StreamReader("text_path_here.txt"))
{
while (!sr.EndOfStream) // Keep reading until we get to the end
{
string splitMe = sr.ReadLine();
string[] bananaSplits = splitMe.Split(new char[] { ':' }); //Split at the colons
if (bananaSplits.Length < 2) // If we get less than 2 results, discard them
continue;
else if (bananaSplits.Length == 2) // Easy part. If there are 2 results, add them to the dictionary
allTheThings.Add(bananaSplits[0].Trim(), bananaSplits[1].Trim());
else if (bananaSplits.Length > 2)
SplitItGood(splitMe, allTheThings); // Hard part. If there are more than 2 results, use the method below.
}
}
}
public void SplitItGood(string stringInput, Dictionary<string, string> dictInput)
{
StringBuilder sb = new StringBuilder();
List<string> fish = new List<string>(); // This list will hold the keys and values as we find them
bool hasFirstValue = false;
foreach (char c in stringInput) // Iterate through each character in the input
{
if (c != ':') // Keep building the string until we reach a colon
sb.Append(c);
else if (c == ':' && !hasFirstValue)
{
fish.Add(sb.ToString().Trim());
sb.Clear();
hasFirstValue = true;
}
else if (c == ':' && hasFirstValue)
{
// Below, the StringBuilder currently has something like this:
// " 235235 Some Text Here"
// We trim the leading whitespace, then split at the first sign of a double space
string[] bananaSplit = sb.ToString()
.Trim()
.Split(new string[] { " " },
StringSplitOptions.RemoveEmptyEntries);
// Add both results to the list
fish.Add(bananaSplit[0].Trim());
fish.Add(bananaSplit[1].Trim());
sb.Clear();
}
}
fish.Add(sb.ToString().Trim()); // Add the last result to the list
for (int i = 0; i < fish.Count; i += 2)
{
// This for loop assumes that the amount of keys and values added together
// is an even number. If it comes out odd, then one of the lines on the input
// text file wasn't parsed correctly or wasn't generated correctly.
dictInput.Add(fish[i], fish[i + 1]);
}
}
So the only general approach that I can think of, given the format that you're limited to, is to first find the first colon on the line and take everything before it as the label. Skip all whilespace characters until you get to the first non-whitespace character. Take all non-whitespace characters as the value of the label. If there is a colon after the end of that value take everything after the end of the previous value to the colon as the next value and repeat. You'll also probably need to trim whitespace around the labels.
You might be able to capture that meaning with a regex, but it wouldn't likely be a pretty one if you could; I'd avoid it for something this complex unless you're entire development team is very proficient with them.
I would try something like this:
While string contains triple space, replace it with double space.
Replace all ": " and ": " (: with double space) with ":".
Replace all " " (double space) with '\n' (new line).
If line don't contain ':' than skip the line. Else, use string.Split(':'). This way you receive arrays of 2 strings (key and value). Some of them may contain empty characters at the beginning or at the end.
Use string.Trim() to get rid of those empty characters.
Add received key and value to Dictionary.
I am not sure if it solves all your cases but it's a general clue how I would try to do it.
If it works you could think about performance (use StringBuilder instead of string wherever it is possible etc.).
This is probably the dirtiest function I´ve ever written, but it works.
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
string[] data = line.Split(':');
if (line != String.Empty)
{
for (int i = 0; i < data.Length - 1; i++)
{
if (i != 0)
{
bool isPair;
if (i % 2 == 0)
{
isPair = true;
}
else
{
isPair = false;
}
if (isPair)
{
string keyOdd = data[i].Trim();
try { keyOdd = keyOdd.Substring(keyOdd.IndexOf(' ')).TrimStart(); }
catch { }
string valueOdd = data[i + 1].TrimStart();
try { valueOdd = valueOdd.Remove(valueOdd.IndexOf(' ')); } catch{}
yourDic.Add(keyOdd, valueOdd);
}
else
{
string keyPair = data[i].TrimStart();
keyPair = keyPair.Substring(keyPair.IndexOf(' ')).Trim();
string valuePair = data[i + 1].TrimStart();
try { valuePair = valuePair.Remove(valuePair.IndexOf(' ')); } catch { }
yourDic.Add(keyPair, valuePair);
}
}
else
{
string key = data[i].Trim();
string value = data[i + 1].TrimStart();
try { value = value.Remove(value.IndexOf(' ')); } catch{}
yourDic.Add(key, value);
}
}
}
}
How does it works?, well splitting the line you can know what you can get in every position of the array, so I just play with the even and odd values.
You will understand me when you debug this function :D. It fills the Dictionary that you need.
I have another idea. Does values contain spaces? If not you could do like this:
Ignore white spaces until you read some other char (first char of key).
Read string until ':' occures.
Trim key that you get.
Ignore white spaces until you read some other char (first char of value).
Read until you get empty char.
Trim value that you get.
If it is the end than stop. Else, go back to step 1.
Good luck.
Maybe something like this would work, be careful with the ':' character
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
yourDic.Add(line.Split(':')[0], line.Split(':')[1]);
}
Anyway, I recommend to organize that file in some way that you´ll always know in what format it comes.

How to remove a duplicate set of characters in a string

For example a string contains the following (the string is variable):
http://www.google.comhttp://www.google.com
What would be the most efficient way of removing the duplicate url here - e.g. output would be:
http://www.google.com
I assume that input contains only urls.
string input = "http://www.google.comhttp://www.google.com";
// this will get you distinct URLs but without "http://" at the beginning
IEnumerable<string> distinctAddresses = input
.Split(new[] {"http://"}, StringSplitOptions.RemoveEmptyEntries)
.Distinct();
StringBuilder output = new StringBuilder();
foreach (string distinctAddress in distinctAddresses)
{
// when building the output, insert "http://" before each address so
// that it resembles the original
output.Append("http://");
output.Append(distinctAddress);
}
Console.WriteLine(output);
Efficiency has various definitions: code size, total execution time, CPU usage, space usage, time to write the code, etc. If you want to be "efficient", you should know which one of these you're trying for.
I'd do something like this:
string url = "http://www.google.comhttp://www.google.com";
if (url.Length % 2 == 0)
{
string secondHalf = url.Substring(url.Length / 2);
if (url.StartsWith(secondHalf))
{
url = secondHalf;
}
}
Depending on the kinds of duplicates you need to remove, this may or may not work for you.
collect strings into list and use distinct, if your string has http address you can apply regex http:.+?(?=((http:)|($)) with RegexOptions.SingleLine
var distinctList = list.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();
Given you don't know the length of the string, you don't know if something is double and you don't know what is double:
string yourprimarystring = "http://www.google.comhttp://www.google.com";
int firstCharacter;
string temp;
for(int i = 0; i <= yourprimarystring.length; i++)
{
for(int j = 0; j <= yourprimarystring.length; j++)
{
string search = yourprimarystring.substring(i,j);
firstCharacter = yourprimaryString.IndexOf(search);
if(firstCharacter != -1)
{
temp = yourprimarystring.substring(0,firstCharacter) + yourprimarystring.substring(firstCharacter + j - i,yourprimarystring.length)
yourprimarystring = temp;
}
}
This itterates through all your elements, takes all out from first to last letter and searches for them like this:
ABCDA - searches for A finds A exludes A, thats the problem, you need to specify how long the duplication needs to be if you want to make it variable, but maybe my code helps you.

Finding the number of occurences strings in a specific format occur in a given text

I have a large string, where there can be specific words (text followed by a single colon, like "test:") occurring more than once. For example, like this:
word:
TEST:
word:
TEST:
TEST: // random text
"word" occurs twice and "TEST" occurs thrice, but the amount can be variable. Also, these words don't have to be in the same order and there can be more text in the same line as the word (as shown in the last example of "TEST"). What I need to do is append the occurrence number to each word, for example the output string needs to be this:
word_ONE:
TEST_ONE:
word_TWO:
TEST_TWO:
TEST_THREE: // random text
The RegEx for getting these words which I've written is ^\b[A-Za-z0-9_]{4,}\b:. However, I don't know how to accomplish the above in a fast way. Any ideas?
Regex is perfect for this job - using Replace with a match evaluator:
This example is not tested nor compiled:
public class Fix
{
public static String Execute(string largeText)
{
return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator);
}
private Dictionary<String, int> counters = new Dictionary<String, int>();
private static String[] numbers = {"ONE", "TWO", "THREE",...};
public String Evaluator(Match m)
{
String word = m.Groups[1].Value;
int count;
if (!counters.TryGetValue(word, out count))
count = 0;
count++;
counters[word] = count;
return word + "_" + numbers[count-1] + ":";
}
}
This should return what you requested when calling:
result = Fix.Execute(largeText);
i think you can do this with Regax.Replace(string, string, MatchEvaluator) and a dictionary.
Dictionary<string, int> wordCount=new Dictionary<string,int>();
string AppendIndex(Match m)
{
string matchedString = m.ToString();
if(wordCount.Contains(matchedString))
wordCount[matchedString]=wordCount[matchedString]+1;
else
wordCount.Add(matchedString, 1);
return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2
}
string inputText = "....";
string regexText = #"";
static void Main()
{
string text = "....";
string result = Regex.Replace(text, #"^\b[A-Za-z0-9_]{4,}\b:",
new MatchEvaluator(AppendIndex));
}
see this:
http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx
If I understand you correctly, regex is not necessary here.
You can split your large string by the ':' character. Maybe you also need to read line by line (split by '\n'). After that you just create a dictionary (IDictionary<string, int>), which counts the occurrences of certain words. Every time you find word x, you increase the counter in the dictionary.
EDIT
Read your file line by line OR split the string by '\n'
Check if your delimiter is present. Either by splitting by ':' OR using regex.
Get the first item from the split array OR the first match of your regex.
Use a dictionary to count your occurrences.
if (dictionary.Contains(key)) dictionary[key]++;
else dictionary.Add(key, 1);
If you need words instead of numbers, then create another dictionary for these. So that dictionary[key] equals one if key equals 1. Mabye there is another solution for that.
Look at this example (I know it's not perfect and not so nice)
lets leave the exact argument for the Split function, I think it can help
static void Main(string[] args)
{
string a = "word:word:test:-1+234=567:test:test:";
string[] tks = a.Split(':');
Regex re = new Regex(#"^\b[A-Za-z0-9_]{4,}\b");
var res = from x in tks
where re.Matches(x).Count > 0
select x + DecodeNO(tks.Count(y=>y.Equals(x)));
foreach (var item in res)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static string DecodeNO(int n)
{
switch (n)
{
case 1:
return "_one";
case 2:
return "_two";
case 3:
return "_three";
}
return "";
}

How can I split this string into an array?

My string is as follows:
smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;
I need back:
smtp:jblack#test.com
SMTP:jb#test.com
X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;
The problem is the semi-colons seperate the addresses and also part of the X400 address. Can anyone suggest how best to split this?
PS I should mentioned the order differs so it could be:
X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:jblack#test.com;SMTP:jb#test.com
There can be more than 3 address, 4, 5.. 10 etc including an X500 address, however they do all start with either smtp: SMTP: X400 or X500.
EDIT: With the updated information, this answer certainly won't do the trick - but it's still potentially useful, so I'll leave it here.
Will you always have three parts, and you just want to split on the first two semi-colons?
If so, just use the overload of Split which lets you specify the number of substrings to return:
string[] bits = text.Split(new char[]{';'}, 3);
May I suggest building a regular expression
(smtp|SMTP|X400|X500):((?!smtp:|SMTP:|X400:|X500:).)*;?
or protocol-less
.*?:((?![^:;]*:).)*;?
in other words find anything that starts with one of your protocols. Match the colon. Then continue matching characters as long as you're not matching one of your protocols. Finish with a semicolon (optionally).
You can then parse through the list of matches splitting on ':' and you'll have your protocols. Additionally if you want to add protocols, just add them to the list.
Likely however you're going to want to specify the whole thing as case-insensitive and only list the protocols in their uppercase or lowercase versions.
The protocol-less version doesn't care what the names of the protocols are. It just finds them all the same, by matching everything up to, but excluding a string followed by a colon or a semi-colon.
Split by the following regex pattern
string[] items = System.Text.RegularExpressions.Split(text, ";(?=\w+:)");
EDIT: better one can accept more special chars in the protocol name.
string[] items = System.Text.RegularExpressions.Split(text, ";(?=[^;:]+:)");
http://msdn.microsoft.com/en-us/library/c1bs0eda.aspx
check there, you can specify the number of splits you want. so in your case you would do
string.split(new char[]{';'}, 3);
Not the fastest if you are doing this a lot but it will work for all cases I believe.
string input1 = "smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
string input2 = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:jblack#test.com;SMTP:jb#test.com";
Regex splitEmailRegex = new Regex(#"(?<key>\w+?):(?<value>.*?)(\w+:|$)");
List<string> sets = new List<string>();
while (input2.Length > 0)
{
Match m1 = splitEmailRegex.Matches(input2)[0];
string s1 = m1.Groups["key"].Value + ":" + m1.Groups["value"].Value;
sets.Add(s1);
input2 = input2.Substring(s1.Length);
}
foreach (var set in sets)
{
Console.WriteLine(set);
}
Console.ReadLine();
Of course many will claim Regex: Now you have two problems. There may even be a better regex answer than this.
You could always split on the colon and have a little logic to grab the key and value.
string[] bits = text.Split(':');
List<string> values = new List<string>();
for (int i = 1; i < bits.Length; i++)
{
string value = bits[i].Contains(';') ? bits[i].Substring(0, bits[i].LastIndexOf(';') + 1) : bits[i];
string key = bits[i - 1].Contains(';') ? bits[i - 1].Substring(bits[i - 1].LastIndexOf(';') + 1) : bits[i - 1];
values.Add(String.Concat(key, ":", value));
}
Tested it with both of your samples and it works fine.
This caught my curiosity .... So this code actually does the job, but again, wants tidying :)
My final attempt - stop changing what you need ;=)
static void Main(string[] args)
{
string fneh = "X400:C=US400;A= ;P=Test;O=Exchange;S=Jack;G=Black;x400:C=US400l;A= l;P=Testl;O=Exchangel;S=Jackl;G=Blackl;smtp:jblack#test.com;X500:C=US500;A= ;P=Test;O=Exchange;S=Jack;G=Black;SMTP:jb#test.com;";
string[] parts = fneh.Split(new char[] { ';' });
List<string> addresses = new List<string>();
StringBuilder address = new StringBuilder();
foreach (string part in parts)
{
if (part.Contains(":"))
{
if (address.Length > 0)
{
addresses.Add(semiColonCorrection(address.ToString()));
}
address = new StringBuilder();
address.Append(part);
}
else
{
address.AppendFormat(";{0}", part);
}
}
addresses.Add(semiColonCorrection(address.ToString()));
foreach (string emailAddress in addresses)
{
Console.WriteLine(emailAddress);
}
Console.ReadKey();
}
private static string semiColonCorrection(string address)
{
if ((address.StartsWith("x", StringComparison.InvariantCultureIgnoreCase)) && (!address.EndsWith(";")))
{
return string.Format("{0};", address);
}
else
{
return address;
}
}
Try these regexes. You can extract what you're looking for using named groups.
X400:(?<X400>.*?)(?:smtp|SMTP|$)
smtp:(?<smtp>.*?)(?:;+|$)
SMTP:(?<SMTP>.*?)(?:;+|$)
Make sure when constructing them you specify case insensitive. They seem to work with the samples you gave
Lots of attempts. Here is mine ;)
string src = "smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
Regex r = new Regex(#"
(?:^|;)smtp:(?<smtp>([^;]*(?=;|$)))|
(?:^|;)x400:(?<X400>.*?)(?=;x400|;x500|;smtp|$)|
(?:^|;)x500:(?<X500>.*?)(?=;x400|;x500|;smtp|$)",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
foreach (Match m in r.Matches(src))
{
if (m.Groups["smtp"].Captures.Count != 0)
Console.WriteLine("smtp: {0}", m.Groups["smtp"]);
else if (m.Groups["X400"].Captures.Count != 0)
Console.WriteLine("X400: {0}", m.Groups["X400"]);
else if (m.Groups["X500"].Captures.Count != 0)
Console.WriteLine("X500: {0}", m.Groups["X500"]);
}
This finds all smtp, x400 or x500 addresses in the string in any order of appearance. It also identifies the type of address ready for further processing. The appearance of the text smtp, x400 or x500 in the addresses themselves will not upset the pattern.
This works!
string input =
"smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
string[] parts = input.Split(';');
List<string> output = new List<string>();
foreach(string part in parts)
{
if (part.Contains(":"))
{
output.Add(part + ";");
}
else if (part.Length > 0)
{
output[output.Count - 1] += part + ";";
}
}
foreach(string s in output)
{
Console.WriteLine(s);
}
Do the semicolon (;) split and then loop over the result, re-combining each element where there is no colon (:) with the previous element.
string input = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G="
+"Black;;smtp:jblack#test.com;SMTP:jb#test.com";
string[] rawSplit = input.Split(';');
List<string> result = new List<string>();
//now the fun begins
string buffer = string.Empty;
foreach (string s in rawSplit)
{
if (buffer == string.Empty)
{
buffer = s;
}
else if (s.Contains(':'))
{
result.Add(buffer);
buffer = s;
}
else
{
buffer += ";" + s;
}
}
result.Add(buffer);
foreach (string s in result)
Console.WriteLine(s);
here is another possible solution.
string[] bits = text.Replace(";smtp", "|smtp").Replace(";SMTP", "|SMTP").Replace(";X400", "|X400").Split(new char[] { '|' });
bits[0],
bits[1], and
bits[2]
will then contains the three parts in the order from your original string.

Categories