How can I remove duplicate substrings within a string? so for instance if I have a string like smith:rodgers:someone:smith:white then how can I get a new string that has the extra smith removed like smith:rodgers:someone:white. Also I'd like to keep the colons even though they are duplicated.
many thanks
string input = "smith:rodgers:someone:smith:white";
string output = string.Join(":", input.Split(':').Distinct().ToArray());
Of course this code assumes that you're only looking for duplicate "field" values. That won't remove "smithsmith" in the following string:
"smith:rodgers:someone:smithsmith:white"
It would be possible to write an algorithm to do that, but quite difficult to make it efficient...
Something like this:
string withoutDuplicates = String.Join(":", myString.Split(':').Distinct().ToArray());
Assuming the format of that string:
var theString = "smith:rodgers:someone:smith:white";
var subStrings = theString.Split(new char[] { ':' });
var uniqueEntries = new List<string>();
foreach(var item in subStrings)
{
if (!uniqueEntries.Contains(item))
{
uniqueEntries.Add(item);
}
}
var uniquifiedStringBuilder = new StringBuilder();
foreach(var item in uniqueEntries)
{
uniquifiedStringBuilder.AppendFormat("{0}:", item);
}
var uniqueString = uniquifiedStringBuilder.ToString().Substring(0, uniquifiedStringBuilder.Length - 1);
Is rather long-winded but shows the process to get from one to the other.
not sure why you want to keep the duplicate colons. if you are expecting the output to be "smith:rodgers:someone::white" try this code:
public static string RemoveDuplicates(string input)
{
string output = string.Empty;
System.Collections.Specialized.StringCollection unique = new System.Collections.Specialized.StringCollection();
string[] parts = input.Split(':');
foreach (string part in parts)
{
output += ":";
if (!unique.Contains(part))
{
unique.Add(part);
output += part;
}
}
output = output.Substring(1);
return output;
}
ofcourse i've not checked for null input, but i'm sure u'll do it ;)
Related
Hello Please could you suggest better ways of writing this C# code.
Basically when NumberList has missing values between '-' I am trying to rebuild the String with default Values.
The final result should be "123-10-45-9-09"
As you can see value of "second-10" is replaced as the second item in the string.
10, 9 and 09 are filled in from the value string values.
This is the bad string which is missing some values.
string NumberList = "123--45--";
I have stored this string value in my app.config file.
string valuestring = "first-12,second-10,third-99,fourth-9,fifth-09";
protected string MissingNumberString(string Number)
{
string NumberList = "123--45--";
string valuestring = "first-12,second-10,third-99,fourth-9,fifth-09";
var companyAccountList = valuestring.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
var result = NumberList.Split('-');
int counter = 0;
var builder = new System.Text.StringBuilder();
foreach (string s in companyAccountList)
{
string t = s.Substring(s.IndexOf('-') + 1);
if (string.IsNullOrEmpty(result[counter]))
builder.Append(t).Append("-");
else
{
if (companyAccountList.Length == counter)
builder.Append(result[counter]);
else
builder.Append(result[counter]).Append("-");
}
counter++;
}
return builder.ToString();
}
One way (assuming valuestring is in order and do not miss any defaults) to achieve this would be
string MissingNumber(string Number)
{
string valuestring = "first-12,second-10,third-99,fourth-9,fifth-09";
var regex = Regex.Matches(valuestring,#"(?<=-)(\d*)(?<=,)?");
var defaults = regex.Cast<Match>().Select(x=>x.Value).ToList();
var newArray = Number.Split('-').Select((x,index)=>string.IsNullOrEmpty(x)?defaults[index]:x);
return string.Join("-",newArray);
}
The code uses Regular Expression to break the ValueString and read the default values.
Regex.Matches(valuestring,#"(?<=-)(\d*)(?<=,)?");
The regular expression uses non-capturing groups to capture a number which is prefixed as by an optional "-" character and suffixed by an optional "," character.
Once the defaults are parsed into a List (assuming that the positions are in order and do not miss any values), we loop through the input string (which has been split based on delimiter), check if it is Empty, and if so, use the value from the Defaults (based on our assumption, it should have same index).
Update
Based on the comments, it looks like you other data in the original string, and hence the concerned sub-string has to be captured first.
We could update the Missing Number method as
static string MissingNumber(string Number)
{
string valuestring = "first-12,second-10,third-99,fourth-9,fifth-09";
var regexDefaultValues = Regex.Matches(valuestring,#"(?<=-)(\d*)(?<=,)?");
var defaults = regexDefaultValues.Cast<Match>().Select(x=>x.Value).ToList();
var regexNumberToParse = new Regex(#"(\d)*-(\d)*-(\d)*-(\d)*-(\d)*");
var capturedNumberFormat = regexNumberToParse.Match(Number).Value;
var newArray = capturedNumberFormat.Split('-').Select((x,index)=>string.IsNullOrEmpty(x)?defaults[index]:x);
var ValueWithDefaults = string.Join("-",newArray);
return regexNumberToParse.Replace(Number,ValueWithDefaults);
}
Demo Code
I have a large string, where there can be specific words (text followed by a single colon, like "test:") occurring more than once. For example, like this:
word:
TEST:
word:
TEST:
TEST: // random text
"word" occurs twice and "TEST" occurs thrice, but the amount can be variable. Also, these words don't have to be in the same order and there can be more text in the same line as the word (as shown in the last example of "TEST"). What I need to do is append the occurrence number to each word, for example the output string needs to be this:
word_ONE:
TEST_ONE:
word_TWO:
TEST_TWO:
TEST_THREE: // random text
The RegEx for getting these words which I've written is ^\b[A-Za-z0-9_]{4,}\b:. However, I don't know how to accomplish the above in a fast way. Any ideas?
Regex is perfect for this job - using Replace with a match evaluator:
This example is not tested nor compiled:
public class Fix
{
public static String Execute(string largeText)
{
return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator);
}
private Dictionary<String, int> counters = new Dictionary<String, int>();
private static String[] numbers = {"ONE", "TWO", "THREE",...};
public String Evaluator(Match m)
{
String word = m.Groups[1].Value;
int count;
if (!counters.TryGetValue(word, out count))
count = 0;
count++;
counters[word] = count;
return word + "_" + numbers[count-1] + ":";
}
}
This should return what you requested when calling:
result = Fix.Execute(largeText);
i think you can do this with Regax.Replace(string, string, MatchEvaluator) and a dictionary.
Dictionary<string, int> wordCount=new Dictionary<string,int>();
string AppendIndex(Match m)
{
string matchedString = m.ToString();
if(wordCount.Contains(matchedString))
wordCount[matchedString]=wordCount[matchedString]+1;
else
wordCount.Add(matchedString, 1);
return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2
}
string inputText = "....";
string regexText = #"";
static void Main()
{
string text = "....";
string result = Regex.Replace(text, #"^\b[A-Za-z0-9_]{4,}\b:",
new MatchEvaluator(AppendIndex));
}
see this:
http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx
If I understand you correctly, regex is not necessary here.
You can split your large string by the ':' character. Maybe you also need to read line by line (split by '\n'). After that you just create a dictionary (IDictionary<string, int>), which counts the occurrences of certain words. Every time you find word x, you increase the counter in the dictionary.
EDIT
Read your file line by line OR split the string by '\n'
Check if your delimiter is present. Either by splitting by ':' OR using regex.
Get the first item from the split array OR the first match of your regex.
Use a dictionary to count your occurrences.
if (dictionary.Contains(key)) dictionary[key]++;
else dictionary.Add(key, 1);
If you need words instead of numbers, then create another dictionary for these. So that dictionary[key] equals one if key equals 1. Mabye there is another solution for that.
Look at this example (I know it's not perfect and not so nice)
lets leave the exact argument for the Split function, I think it can help
static void Main(string[] args)
{
string a = "word:word:test:-1+234=567:test:test:";
string[] tks = a.Split(':');
Regex re = new Regex(#"^\b[A-Za-z0-9_]{4,}\b");
var res = from x in tks
where re.Matches(x).Count > 0
select x + DecodeNO(tks.Count(y=>y.Equals(x)));
foreach (var item in res)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static string DecodeNO(int n)
{
switch (n)
{
case 1:
return "_one";
case 2:
return "_two";
case 3:
return "_three";
}
return "";
}
Is there any method that I can use that returns a fixed length array after spliting a string with some delimiter and fill the rest with a default string.
Eg.
string fullName = "Jhon Doe";
string[] names = fullName.SpecialSplit(some parameters); //This should always return string array of length 3 with the second elememnt set to empty if there is no middle name.
Next time specify the language you're asking about. We're no guessers.
In Java:
fullName.split(" ");
And anyway, no method will "return string array of length 3 with the second elememnt set to empty if there is no middle name". For the method, there are just two elements. You have to write that method yourself wrapping the standard split() method.
You should read over Jon Skeet's Writing the perfect question. It will be beneficial to you in the future when posting questions of StackOverflow.
There is no method in C# to do what you are asking, but you can easily write an extension method to do what I think you are asking.
here is a quick example:
public static class AbreviatorExtention
{
public static string[] GetInitials(this String str, char splitChar)
{
string[] initialArray = new string[3];
var nameArray = str.Split(new char[] { splitChar },
StringSplitOptions.RemoveEmptyEntries);
if (nameArray.Length == 2)
{
var charArrayFirstName = nameArray[0].ToCharArray();
var charArrayLastName = nameArray[1].ToCharArray();
initialArray[0] = charArrayFirstName[0].ToString().ToUpper();
initialArray[1] = string.Empty;
initialArray[2] = charArrayLastName[0].ToString().ToUpper();
}
else
{
for (int i = 0; i < nameArray.Length; i++)
{
initialArray[i] = (nameArray[i].ToCharArray())[1]
.ToString().ToUpper();
}
}
return initialArray;
}
}
class Program
{
static void Main(string[] args)
{
string FullName = "john doe";
//Extension method in use
string[] names = FullName.GetInitials(' ');
foreach (var item in names)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
}
Output:
J
D
I would set it up to split the string separate from the fixed array. If you still want a fixed array, then you set up the array to a size of three an populate. This is not the best method, however, as it has no meaning. Better, set up a person or user class and then populate, via rules, from the split string.
I have a string that I am reading from another system. It's basically a long string that represents a list of key value pairs that are separated by a space in between. It looks like this:
key:value[space]key:value[space]key:value[space]
So I wrote this code to parse it:
string myString = ReadinString();
string[] tokens = myString.split(' ');
foreach (string token in tokens) {
string key = token.split(':')[0];
string value = token.split(':')[1];
. . . .
}
The issue now is that some of the values have spaces in them so my "simplistic" split at the top no longer works. I wanted to see how I could still parse out the list of key value pairs (given space as a separator character) now that I know there also could be spaces in the value field as split doesn't seem like it's going to be able to work anymore.
NOTE: I now confirmed that KEYs will NOT have spaces in them so I only have to worry about the values. Apologies for the confusion.
Use this regular expression:
\w+:[\w\s]+(?![\w+:])
I tested it on
test:testvalue test2:test value test3:testvalue3
It returns three matches:
test:testvalue
test2:test value
test3:testvalue3
You can change \w to any character set that can occur in your input.
Code for testing this:
var regex = new Regex(#"\w+:[\w\s]+(?![\w+:])");
var test = "test:testvalue test2:test value test3:testvalue3";
foreach (Match match in regex.Matches(test))
{
var key = match.Value.Split(':')[0];
var value = match.Value.Split(':')[1];
Console.WriteLine("{0}:{1}", key, value);
}
Console.ReadLine();
As Wonko the Sane pointed out, this regular expression will fail on values with :. If you predict such situation, use \w+:[\w: ]+?(?![\w+:]) as the regular expression. This will still fail when a colon in value is preceded by space though... I'll think about solution to this.
This cannot work without changing your split from a space to something else such as a "|".
Consider this:
Alfred Bester:Alfred Bester Alfred:Alfred Bester
Is this Key "Alfred Bester" & value Alfred" or Key "Alfred" & value "Bester Alfred"?
string input = "foo:Foobarius Maximus Tiberius Kirk bar:Barforama zap:Zip Brannigan";
foreach (Match match in Regex.Matches(input, #"(\w+):([^:]+)(?![\w+:])"))
{
Console.WriteLine("{0} = {1}",
match.Groups[1].Value,
match.Groups[2].Value
);
}
Gives you:
foo = Foobarius Maximus Tiberius Kirk
bar = Barforama
zap = Zip Brannigan
You could try to Url encode the content between the space (The keys and the values not the : symbol) but this would require that you have control over the Input Method.
Or you could simply use another format (Like XML or JSON), but again you will need control over the Input Format.
If you can't control the input format you could always use a Regular expression and that searches for single spaces where a word plus : follows.
Update (Thanks Jon Grant)
It appears that you can have spaces in the key and the value. If this is the case you will need to seriously rethink your strategy as even Regex won't help.
string input = "key1:value key2:value key3:value";
Dictionary<string, string> dic = input.Split(' ').Select(x => x.Split(':')).ToDictionary(x => x[0], x => x[1]);
The first will produce an array:
"key:value", "key:value"
Then an array of arrays:
{ "key", "value" }, { "key", "value" }
And then a dictionary:
"key" => "value", "key" => "value"
Note, that Dictionary<K,V> doesn't allow duplicated keys, it will raise an exception in such a case. If such a scenario is possible, use ToLookup().
Using a regular expression can solve your problem:
private void DoSplit(string str)
{
str += str.Trim() + " ";
string patterns = #"\w+:([\w+\s*])+[^!\w+:]";
var r = new System.Text.RegularExpressions.Regex(patterns);
var ms = r.Matches(str);
foreach (System.Text.RegularExpressions.Match item in ms)
{
string[] s = item.Value.Split(new char[] { ':' });
//Do something
}
}
This code will do it (given the rules below). It parses the keys and values and returns them in a Dictonary<string, string> data structure. I have added some code at the end that assumes given your example that the last value of the entire string/stream will be appended with a [space]:
private Dictionary<string, string> ParseKeyValues(string input)
{
Dictionary<string, string> items = new Dictionary<string, string>();
string[] parts = input.Split(':');
string key = parts[0];
string value;
int currentIndex = 1;
while (currentIndex < parts.Length-1)
{
int indexOfLastSpace=parts[currentIndex].LastIndexOf(' ');
value = parts[currentIndex].Substring(0, indexOfLastSpace);
items.Add(key, value);
key = parts[currentIndex].Substring(indexOfLastSpace + 1);
currentIndex++;
}
value = parts[parts.Length - 1].Substring(0,parts[parts.Length - 1].Length-1);
items.Add(key, parts[parts.Length-1]);
return items;
}
Note: this algorithm assumes the following rules:
No spaces in the values
No colons in the keys
No colons in the values
Without any Regex nor string concat, and as an enumerable (it supposes keys don't have spaces, but values can):
public static IEnumerable<KeyValuePair<string, string>> Split(string text)
{
if (text == null)
yield break;
int keyStart = 0;
int keyEnd = -1;
int lastSpace = -1;
for(int i = 0; i < text.Length; i++)
{
if (text[i] == ' ')
{
lastSpace = i;
continue;
}
if (text[i] == ':')
{
if (lastSpace >= 0)
{
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1, lastSpace - keyEnd - 1));
keyStart = lastSpace + 1;
}
keyEnd = i;
continue;
}
}
if (keyEnd >= 0)
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1));
}
I guess you could take your method and expand upon it slightly to deal with this stuff...
Kind of pseudocode:
List<string> parsedTokens = new List<String>();
string[] tokens = myString.split(' ');
for(int i = 0; i < tokens.Length; i++)
{
// We need to deal with the special case of the last item,
// or if the following item does not contain a colon.
if(i == tokens.Length - 1 || tokens[i+1].IndexOf(':' > -1)
{
parsedTokens.Add(tokens[i]);
}
else
{
// This bit needs to be refined to deal with values with multiple spaces...
parsedTokens.Add(tokens[i] + " " + tokens[i+1]);
}
}
Another approach would be to split on the colon... That way, your first array item would be the name of the first key, second item would be the value of the first key and then name of the second key (can use LastIndexOf to split it out), and so on. This would obviously get very messy if the values can include colons, or the keys can contain spaces, but in that case you'd be pretty much out of luck...
My string is as follows:
smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;
I need back:
smtp:jblack#test.com
SMTP:jb#test.com
X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;
The problem is the semi-colons seperate the addresses and also part of the X400 address. Can anyone suggest how best to split this?
PS I should mentioned the order differs so it could be:
X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:jblack#test.com;SMTP:jb#test.com
There can be more than 3 address, 4, 5.. 10 etc including an X500 address, however they do all start with either smtp: SMTP: X400 or X500.
EDIT: With the updated information, this answer certainly won't do the trick - but it's still potentially useful, so I'll leave it here.
Will you always have three parts, and you just want to split on the first two semi-colons?
If so, just use the overload of Split which lets you specify the number of substrings to return:
string[] bits = text.Split(new char[]{';'}, 3);
May I suggest building a regular expression
(smtp|SMTP|X400|X500):((?!smtp:|SMTP:|X400:|X500:).)*;?
or protocol-less
.*?:((?![^:;]*:).)*;?
in other words find anything that starts with one of your protocols. Match the colon. Then continue matching characters as long as you're not matching one of your protocols. Finish with a semicolon (optionally).
You can then parse through the list of matches splitting on ':' and you'll have your protocols. Additionally if you want to add protocols, just add them to the list.
Likely however you're going to want to specify the whole thing as case-insensitive and only list the protocols in their uppercase or lowercase versions.
The protocol-less version doesn't care what the names of the protocols are. It just finds them all the same, by matching everything up to, but excluding a string followed by a colon or a semi-colon.
Split by the following regex pattern
string[] items = System.Text.RegularExpressions.Split(text, ";(?=\w+:)");
EDIT: better one can accept more special chars in the protocol name.
string[] items = System.Text.RegularExpressions.Split(text, ";(?=[^;:]+:)");
http://msdn.microsoft.com/en-us/library/c1bs0eda.aspx
check there, you can specify the number of splits you want. so in your case you would do
string.split(new char[]{';'}, 3);
Not the fastest if you are doing this a lot but it will work for all cases I believe.
string input1 = "smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
string input2 = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:jblack#test.com;SMTP:jb#test.com";
Regex splitEmailRegex = new Regex(#"(?<key>\w+?):(?<value>.*?)(\w+:|$)");
List<string> sets = new List<string>();
while (input2.Length > 0)
{
Match m1 = splitEmailRegex.Matches(input2)[0];
string s1 = m1.Groups["key"].Value + ":" + m1.Groups["value"].Value;
sets.Add(s1);
input2 = input2.Substring(s1.Length);
}
foreach (var set in sets)
{
Console.WriteLine(set);
}
Console.ReadLine();
Of course many will claim Regex: Now you have two problems. There may even be a better regex answer than this.
You could always split on the colon and have a little logic to grab the key and value.
string[] bits = text.Split(':');
List<string> values = new List<string>();
for (int i = 1; i < bits.Length; i++)
{
string value = bits[i].Contains(';') ? bits[i].Substring(0, bits[i].LastIndexOf(';') + 1) : bits[i];
string key = bits[i - 1].Contains(';') ? bits[i - 1].Substring(bits[i - 1].LastIndexOf(';') + 1) : bits[i - 1];
values.Add(String.Concat(key, ":", value));
}
Tested it with both of your samples and it works fine.
This caught my curiosity .... So this code actually does the job, but again, wants tidying :)
My final attempt - stop changing what you need ;=)
static void Main(string[] args)
{
string fneh = "X400:C=US400;A= ;P=Test;O=Exchange;S=Jack;G=Black;x400:C=US400l;A= l;P=Testl;O=Exchangel;S=Jackl;G=Blackl;smtp:jblack#test.com;X500:C=US500;A= ;P=Test;O=Exchange;S=Jack;G=Black;SMTP:jb#test.com;";
string[] parts = fneh.Split(new char[] { ';' });
List<string> addresses = new List<string>();
StringBuilder address = new StringBuilder();
foreach (string part in parts)
{
if (part.Contains(":"))
{
if (address.Length > 0)
{
addresses.Add(semiColonCorrection(address.ToString()));
}
address = new StringBuilder();
address.Append(part);
}
else
{
address.AppendFormat(";{0}", part);
}
}
addresses.Add(semiColonCorrection(address.ToString()));
foreach (string emailAddress in addresses)
{
Console.WriteLine(emailAddress);
}
Console.ReadKey();
}
private static string semiColonCorrection(string address)
{
if ((address.StartsWith("x", StringComparison.InvariantCultureIgnoreCase)) && (!address.EndsWith(";")))
{
return string.Format("{0};", address);
}
else
{
return address;
}
}
Try these regexes. You can extract what you're looking for using named groups.
X400:(?<X400>.*?)(?:smtp|SMTP|$)
smtp:(?<smtp>.*?)(?:;+|$)
SMTP:(?<SMTP>.*?)(?:;+|$)
Make sure when constructing them you specify case insensitive. They seem to work with the samples you gave
Lots of attempts. Here is mine ;)
string src = "smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
Regex r = new Regex(#"
(?:^|;)smtp:(?<smtp>([^;]*(?=;|$)))|
(?:^|;)x400:(?<X400>.*?)(?=;x400|;x500|;smtp|$)|
(?:^|;)x500:(?<X500>.*?)(?=;x400|;x500|;smtp|$)",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
foreach (Match m in r.Matches(src))
{
if (m.Groups["smtp"].Captures.Count != 0)
Console.WriteLine("smtp: {0}", m.Groups["smtp"]);
else if (m.Groups["X400"].Captures.Count != 0)
Console.WriteLine("X400: {0}", m.Groups["X400"]);
else if (m.Groups["X500"].Captures.Count != 0)
Console.WriteLine("X500: {0}", m.Groups["X500"]);
}
This finds all smtp, x400 or x500 addresses in the string in any order of appearance. It also identifies the type of address ready for further processing. The appearance of the text smtp, x400 or x500 in the addresses themselves will not upset the pattern.
This works!
string input =
"smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
string[] parts = input.Split(';');
List<string> output = new List<string>();
foreach(string part in parts)
{
if (part.Contains(":"))
{
output.Add(part + ";");
}
else if (part.Length > 0)
{
output[output.Count - 1] += part + ";";
}
}
foreach(string s in output)
{
Console.WriteLine(s);
}
Do the semicolon (;) split and then loop over the result, re-combining each element where there is no colon (:) with the previous element.
string input = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G="
+"Black;;smtp:jblack#test.com;SMTP:jb#test.com";
string[] rawSplit = input.Split(';');
List<string> result = new List<string>();
//now the fun begins
string buffer = string.Empty;
foreach (string s in rawSplit)
{
if (buffer == string.Empty)
{
buffer = s;
}
else if (s.Contains(':'))
{
result.Add(buffer);
buffer = s;
}
else
{
buffer += ";" + s;
}
}
result.Add(buffer);
foreach (string s in result)
Console.WriteLine(s);
here is another possible solution.
string[] bits = text.Replace(";smtp", "|smtp").Replace(";SMTP", "|SMTP").Replace(";X400", "|X400").Split(new char[] { '|' });
bits[0],
bits[1], and
bits[2]
will then contains the three parts in the order from your original string.