Group string values - c#

I have this code
var myBuilder = new StringBuilder();
foreach (var item in myList)
{
myBuilder.Append(item.Number).Append(" - ").Append(item.SecondNumber).Append(", ");
}
var text = myBuilder;
and using that I'm getting this text
AAA08 - BB08, AAA09 - BB09, AAA09 - BB10,
myList returns this:
{ Number = "AAA08", SecondNumber = "BB08" }
{ Number = "AAA09", SecondNumber = "BB09" }
{ Number = "AAA09", SecondNumber = "BB10" }
How can I concatenate it to string to display this:
AAA08 - BB08; AAA09 - BB09, BB10
I can do replace , to ; but can't get how to group these Numberto display only one and each SecondNumber right to him

You can do this with Linq and string.Join
var grouped = myList
.GroupBy(x => x.Number)
.Select(g => g.Key + " - " + string.Join(", ", g.Select(x => x.SecondNumber)))
var text = string.Join("; ", grouped);

You can use GroupBy and String.Join:
var groupedNumbers= myList
.GroupBy(x => x.Number)
.Select(g => $"{g.Key} - {String.Join(", ", g.Select(x => x.SecondNumber))}");
string result = String.Join("; ", groupedNumbers);
I'm using string interpolation which is a C#6 feature, if you aren't using it replace $"{g.Key}..." with String.Format("{0}...", g.Key, ..).

When you get Number, iterate through myList to find all instances of Number, then get each corresponding SecondNumber and append it to your string.

Before building final string you probably would need to group your array first with this code:
var groupedList = myList.GroupBy(x => x.Number)
and later use that grouped list to get proper string:
foreach (var item in groupedList)
{
myBuilder.Append(item.Number).Append(" - ").Append(item.Select(x => x.SecondNumber).Join(", ")).Append("; ");
}

Related

c# how to sort nested collections

I am probably overthinking this all and I'm lost. I'm new to C# and don't know what is the best way to solve this.
string[] input = new string { "FR_Paris", "UK_London", "UK_Bristol" };
Desirable output in console is ordered by occurrence of cities in a country and cities are sorted alphabetically.
In this situation it is:
UK 2x Bristol, London
FR 1x Paris
I'm not going to lie, this is my homework. I know how to parse the input and I think for cities has to be used a collection which can be sorted but don't know which type. I'm kind of lost when it comes to nested collections.
Please give me at least a direction.
Thanks a lot!
Try This
// Init Array
string[] input = new string[] { "FR_Paris", "UK_London", "UK_Bristol" };
//Get All Codes
List<string> Codes = new List<string>();
foreach (var CityName in input)
{
var Name = CityName.Split('_')[0]; // Get Name After _ Paris
if (!Codes.Contains(Name))
Codes.Add(va);
}
// Print
foreach (var Code in Codes.OrderBy(x => x))
{
var AllNames = input.Where(x => x.StartsWith(Code + "_")).Select(x => x.Split('_')[1]);
Console.WriteLine(Code + " " + xd.Count() + "x " + string.Join(",", AllNames.OrderBy(x => x)));
}
You can try with linq
var input = new List<string> { "FR_Paris", "UK_London", "UK_Bristol" };
var result = input.Select(x => new { Country = x.Split("_")[0], City = x.Split("_")[1] })
.GroupBy(x => x.Country)
.Select(x => $"{x.Key} {x.Count()}x {String.Join(", ", x.OrderBy(x => x.City).Select(x => x.City))} ");
foreach (var item in result)
{
Console.WriteLine(item);
}
OUTPUT
FR 1x Paris
UK 2x Bristol, London
Using LINQ:
var grouped = input
.Select(x => x.Split('_')) // Split all strings into array[2]
.GroupBy(x => x[0]) // Group by country
.OrderByDecending(x => x.Count()); // Order by number of cities

Show duplicates from lists c#

I'm working on a WPF application, and at one point, I have to get/show all the duplicates from the string list. (With the duplicated strings name and the number of how many of that same string is in the list)Like this for example: "The list contains the String 'Hello' 3 times." So far, I'm getting the string's name successfully but I can't manage to get the correct number of times it is presented in the list.
This is my code so far:
List<String> answerData = new List<String>();
using (MySqlCommand command = new MySqlCommand(query2, conn))
{
using (MySqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
answerData.Add(reader.GetString(0));
}
}
}
var duplicates = answerData
.GroupBy(i => i)
.Where(g => g.Count() > 1)
.Select(g => g.Key);
foreach (var d in duplicates)
{
MessageBox.Show(""+ d + duplicates.Count().ToString()); //Here I tried to get the number
//with Count() but it doesn't work as I thought it would.
}
What should I add/change to get the result I want?
EDIT
As suggested changed my code to the following:
var duplicates = answerData
.GroupBy(i => i)
.Where(g => g.Count() > 1);
foreach (var d in duplicates)
{
MessageBox.Show(d.Key + " " + d.Count().ToString());
}
And now it works smoothly.
Thank you everyone!
Store the actual groups instead of the keys in duplicates:
var duplicates = answerData
.GroupBy(i => i)
.Where(g => g.Count() > 1);
You could then iterate through the groups:
foreach (var d in duplicates)
{
MessageBox.Show(d.Key + " " + d.Count().ToString());
}
This example counts, i.e. iterates, each group twice. Alternatively, you could store objects that contain both the Key and the Count as suggested by #HimBromBeere.
You just need to return the number within your Select:
var duplicates = answerData
.GroupBy(i => i)
.Select(g => new { Key = g.Key, Count = x.Count() })
.Where(x => x.Count > 1);
Notice that I changed the order of your statements to avoid a duplicate execution of g.Count().
You can do something like this
you need to use Dictionary for performance reasons
List<String> answerData = new List<String>();
Dictionary<string,int> map = new Dictionary<string, int>();
foreach (var data in answerData)
{
if (map.ContainsKey(data))
{
map[data]++;
}
else
{
map.Add(data, 1);
}
}
foreach (var item in map)
{
if (item.Value > 1)
{
Console.WriteLine("{0} - {1}", item.Key, item.Value);
}
}

Split the string and join all first elements then second element and so on in c#

I have a string like this -
var roleDetails = "09A880C2-8732-408C-BA09-4AD6F0A65CE9^Z:WB:SELECT_DOWNLOAD:0000^Product Delivery - Download^1,24B11B23-1669-403F-A24D-74CE72DFD42A^Z:WB:TRAINING_SUBSCRIBER:0000^Training Subscriber^1,6A4A6543-DB9F-46F2-B3C9-62D69D28A0B6^Z:WB:LIC_MGR_HOME_REDL:0000^License Manager - Home use^1,76B3B165-0BB4-4E3E-B61F-0C0292342CE2^Account Admin^Account Admin^1,B3C0CE51-00EE-4A0A-B208-98653E21AE11^Z:WB:1BENTLEY_ISA_ADMIN:0000^Co-Administrator^1,CBA225BC-680C-4627-A4F6-BED401682816^ReadOnly^ReadOnly^1,D80CF5CF-CB6E-4424-9D8F-E29F96EBD4C9^Z:WB:MY_SELECT_CD:0000^Product Delivery - DVD^1,E0275936-FBBB-4775-97D3-9A7D19D3E1B4^Z:WB:LICENSE_MANAGER:0000^License Manager^1";
Spliting it with "," returns this -
[0] "09A880C2-8732-408C-BA09-4AD6F0A65CE9^Z:WB:SELECT_DOWNLOAD:0000^Product Delivery - Download^1"
[1] "24B11B23-1669-403F-A24D-74CE72DFD42A^Z:WB:TRAINING_SUBSCRIBER:0000^Training Subscriber^1"
[2] "6A4A6543-DB9F-46F2-B3C9-62D69D28A0B6^Z:WB:LIC_MGR_HOME_REDL:0000^License Manager - Home use^1"
[3] "76B3B165-0BB4-4E3E-B61F-0C0292342CE2^Account Admin^Account Admin^1"
[4] "B3C0CE51-00EE-4A0A-B208-98653E21AE11^Z:WB:1BENTLEY_ISA_ADMIN:0000^Co-Administrator^1"
[5] "CBA225BC-680C-4627-A4F6-BED401682816^ReadOnly^ReadOnly^1"
[6] "D80CF5CF-CB6E-4424-9D8F-E29F96EBD4C9^Z:WB:MY_SELECT_CD:0000^Product Delivery - DVD^1"
[7] "E0275936-FBBB-4775-97D3-9A7D19D3E1B4^Z:WB:LICENSE_MANAGER:0000^License Manager^1"
All elements contains carat (^). so spliting each element further with ^ symbol will return four element.
But I want to join all first element then all second element and then third and so on and get the result like this -
[0]: 09A880C2-8732-408C-BA09-4AD6F0A65CE9, 24B11B23-1669-403F-A24D-74CE72DFD42A, 6A4A6543-DB9F-46F2-B3C9-62D69D28A0B6, 76B3B165-0BB4-4E3E-B61F-0C0292342CE2, B3C0CE51-00EE-4A0A-B208-98653E21AE11, CBA225BC-680C-4627-A4F6-BED401682816, D80CF5CF-CB6E-4424-9D8F-E29F96EBD4C9, E0275936-FBBB-4775-97D3-9A7D19D3E1B4
[1]: Z:WB:SELECT_DOWNLOAD:0000,Z:WB:TRAINING_SUBSCRIBER:0000, Z:WB:LIC_MGR_HOME_REDL:0000,Account Admin, Z:WB:1BENTLEY_ISA_ADMIN:0000, ReadOnly, Z:WB:MY_SELECT_CD:0000, Z:WB:LICENSE_MANAGER
[2]: Product Delivery - Download, Training Subscriber, License Manager - Home use, Account Admin, Co-Administrator, ReadOnly, Product Delivery - DVD, License Manager
[3]: 1,1,1,1,1,1,1,1
What is the quickest and simplest way of achieving this?
EDIT
This is what I tried so far -
var rolearray = roleDetails.Split(',').Select(s => s.Split('^')).Select(a => new { RoleId = a[0], RoleNme = a[1], FriendlyName = a[2], IsUserInRole = a[3] });
but again this is not returning the way I need it. But I want to join all a[0]s , then all a[1] and so on
SOLUTION:
After comparing solutions and ran it 10 times in a loop to see the performance I found solution suggested by Jamiec is taking less time. So selecting this solution.
Pure LINQ solution:
roleDetails.Split(',')
.SelectMany(x => x.Split('^').Select((str, idx) => new {str, idx}))
.GroupBy(x => x.idx)
.Select(grp => string.Join(", ", grp.Select(x => x.str)))
The easiest way to do this, is to simply do:
var split = roleDetails.Split(',')
.Select(x => x.Split('^').ToArray())
.ToArray();
You would then access the elements like a multi dimensional jagged array
Console.WriteLine(split[0][0]);
// result: 09A880C2-8732-408C-BA09-4AD6F0A65CE9
Live example: http://rextester.com/NEUVOR15080
And if you then want all the elements grouped
Console.WriteLine(String.Join(",",split.Select(x => x[0])));
Console.WriteLine(String.Join(",",split.Select(x => x[1])));
Console.WriteLine(String.Join(",",split.Select(x => x[2])));
Console.WriteLine(String.Join(",",split.Select(x => x[3])));
Live example: http://rextester.com/BZXLG67151
Here you can user Aggregate and Zip extension method of Linq.
Aggregate: Performs a specified operation to each element in a collection, while carrying the result forward.
Zip: The Zip extension method acts upon two collections. It processes each element in two series together.
var roleDetails = "09A880C2-8732-408C-BA09-4AD6F0A65CE9^Z:WB:SELECT_DOWNLOAD:0000^Product Delivery - Download^1,24B11B23-1669-403F-A24D-74CE72DFD42A^Z:WB:TRAINING_SUBSCRIBER:0000^Training Subscriber^1,6A4A6543-DB9F-46F2-B3C9-62D69D28A0B6^Z:WB:LIC_MGR_HOME_REDL:0000^License Manager - Home use^1,76B3B165-0BB4-4E3E-B61F-0C0292342CE2^Account Admin^Account Admin^1,B3C0CE51-00EE-4A0A-B208-98653E21AE11^Z:WB:1BENTLEY_ISA_ADMIN:0000^Co-Administrator^1,CBA225BC-680C-4627-A4F6-BED401682816^ReadOnly^ReadOnly^1,D80CF5CF-CB6E-4424-9D8F-E29F96EBD4C9^Z:WB:MY_SELECT_CD:0000^Product Delivery - DVD^1,E0275936-FBBB-4775-97D3-9A7D19D3E1B4^Z:WB:LICENSE_MANAGER:0000^License Manager^1";
var rolearray = roleDetails.Split(',')
.Select(s => s.Split('^'))
.Aggregate((s1Array, s2Array) => s1Array.Zip(s2Array, (s1, s2) => s1 + "," + s2).ToArray());
string roleDetails = "09A880C2-8732-408C-BA09-4AD6F0A65CE9^Z:WB:SELECT_DOWNLOAD:0000^Product Delivery - Download^1,24B11B23-1669-403F-A24D-74CE72DFD42A^Z:WB:TRAINING_SUBSCRIBER:0000^Training Subscriber^1,6A4A6543-DB9F-46F2-B3C9-62D69D28A0B6^Z:WB:LIC_MGR_HOME_REDL:0000^License Manager - Home use^1,76B3B165-0BB4-4E3E-B61F-0C0292342CE2^Account Admin^Account Admin^1,B3C0CE51-00EE-4A0A-B208-98653E21AE11^Z:WB:1BENTLEY_ISA_ADMIN:0000^Co-Administrator^1,CBA225BC-680C-4627-A4F6-BED401682816^ReadOnly^ReadOnly^1,D80CF5CF-CB6E-4424-9D8F-E29F96EBD4C9^Z:WB:MY_SELECT_CD:0000^Product Delivery - DVD^1,E0275936-FBBB-4775-97D3-9A7D19D3E1B4^Z:WB:LICENSE_MANAGER:0000^License Manager^1";
var RawItems = roleDetails.Split(',').Select(x=> x.Split('^'));
var Items1 = RawItems.Select(x=> x.ElementAt(0));
var Items2 = RawItems.Select(x=> x.ElementAt(1));
var Items3 = RawItems.Select(x=> x.ElementAt(2));
var Items4 = RawItems.Select(x=> x.ElementAt(3));
If you don't like the LINQ solutions, here's a solution without:
var result = new string[4];
var i = 0;
foreach(var line in roleDetails.Split(','))
foreach(var piece in line.Split('^'))
result[i++ % 4] += (i <= 4 ? "" : ",") + piece;
Basically, you split on commas and carets, and foreach on each, using a counter that tells us which array element to concatenate in, and whether to use a comma separator or not.
If your initial string is much bigger than in this example, consider first creating an array of StringBuilders first as these are better performing with concatenations:
var stringBuilders = new StringBuilder[4];
var result = new string[4];
var i = 0;
for (var i = 0; i < 4; i++)
stringBuilders[i] = new StringBuilder();
foreach(var line in roleDetails.Split(','))
foreach(var piece in line.Split('^'))
stringBuilders[i++ % 4].Append((i <= 4 ? "" : ",") + piece);
foreach (var stringBuilder in stringBuilders)
result[i++ % 4] = stringBuilder.ToString();
One more LINQ solution. But not as clean as #Pavel's:
string a = "", b = "", c = "", d = "";
roleDetails.Split(',').ToList().ForEach(x =>
{
a += x.Split('^')[0] + ',';
b += x.Split('^')[1] + ',';
c += x.Split('^')[2] + ',';
d += x.Split('^')[3] + ',';
});
MessageBox.Show(a.Trim(','));
MessageBox.Show(b.Trim(','));
MessageBox.Show(c.Trim(','));
MessageBox.Show(d.Trim(','));
OUTPUT:
a = 09A880C2-8732-408C-BA09-4AD6F0A65CE9,24B11B23-1669-403F-A24D-74CE72DFD42A,6A4A6543-DB9F-46F2-B3C9-62D69D28A0B6,76B3B165-0BB4-4E3E-B61F-0C0292342CE2,B3C0CE51-00EE-4A0A-B208-98653E21AE11,CBA225BC-680C-4627-A4F6-BED401682816,D80CF5CF-CB6E-4424-9D8F-E29F96EBD4C9,E0275936-FBBB-4775-97D3-9A7D19D3E1B4
b = Z:WB:SELECT_DOWNLOAD:0000,Z:WB:TRAINING_SUBSCRIBER:0000,Z:WB:LIC_MGR_HOME_REDL:0000,Account Admin,Z:WB:1BENTLEY_ISA_ADMIN:0000,ReadOnly,Z:WB:MY_SELECT_CD:0000,Z:WB:LICENSE_MANAGER:0000
c = Product Delivery - Download,Training Subscriber,License Manager - Home use,Account Admin,Co-Administrator,ReadOnly,Product Delivery - DVD,License Manager
d = 1,1,1,1,1,1,1,1
Fairly clean and fast...
var sets = new[]
{
new List<string>(),
new List<string>(),
new List<string>(),
new List<string>(),
};
foreach (var role in roleDetails.Split(','))
{
var details = role.Split('^');
sets[0].Add(details[0]);
sets[1].Add(details[1]);
sets[2].Add(details[2]);
sets[3].Add(details[3]);
}
var lines = sets.Select(set => string.Join(",", set)).ToArray();
... little nuts to understand and doesn't really save anything on performance ...
var ret = roleDetails.Split(',')
.Aggregate(seed: new { SBS = new[] { new StringBuilder(), new StringBuilder(),
new StringBuilder(), new StringBuilder(), },
Start = true },
func: (seed, role) =>
{
var details = role.Split('^');
if (seed.Start)
{
seed.SBS[0].Append(details[0]);
seed.SBS[1].Append(details[1]);
seed.SBS[2].Append(details[2]);
seed.SBS[3].Append(details[3]);
return new
{
seed.SBS,
Start = false,
};
}
else
{
seed.SBS[0].Append(',').Append(details[0]);
seed.SBS[1].Append(',').Append(details[1]);
seed.SBS[2].Append(',').Append(details[2]);
seed.SBS[3].Append(',').Append(details[3]);
return seed;
}
},
resultSelector: result => result.SBS.Select(sb => sb.ToString()).ToArray()
);
You can use Tuple here
var roles = roleDetails.Split(',')
.Select(x => x.Split('^'))
.Where(x=>x.Length==4)
.Select(x=>
new Tuple<string, string, string, string>(x[0], x[1], x[2], x[3]))
.ToList();
var item1 = string.Join(",", roles.Select(x=>x.Item1).ToArray());
var item2 = string.Join(",", roles.Select(x => x.Item2).ToArray());
var item3 = string.Join(",", roles.Select(x => x.Item3).ToArray());
var item4 = string.Join(",", roles.Select(x => x.Item4).ToArray());
Your attempt tries to do everything in a single line, which is making it much harder for you to understand what's happening.
You're already using all the tools you need (Select() and Split()). If you make your code more readable by separating everything into separate lines of code, then it becomes much easier to find your way:
//Your data string
string myDataString = "...";
//Your data string, separated into a list of rows (each row is a string)
var myDataRows = myDataString.Split(',');
//Your data string, separated into a list of rows (each row is a STRING ARRAY)
var myDataRowsAsStringArrays = myDataRows.Select(row => row.Split('^'))
And now, all you need to do is retrieve the correct data.
var firstColumnValues = myDataRowsAsStringArrays.Select(row => row[0]);
var secondColumnValues = myDataRowsAsStringArrays.Select(row => row[1]);
var thirdColumnValues = myDataRowsAsStringArrays.Select(row => row[2]);
var fourthColumnValues = myDataRowsAsStringArrays.Select(row => row[3]);
And if you so choose, you can join the values into a single comma separated string:
var firstColumnString = String.Join(", ", firstColumnValues);
var secondColumnString = String.Join(", ", secondColumnValues);
var thirdColumnString = String.Join(", ", thirdColumnValues);
var fourthColumnString = String.Join(", ", fourthColumnValues);

How to find the duplicates in the given string in c#

I want to find the duplicates for a given string, I tried for collections, It is working fine, but i don't know how to do it for a string.
Here is the code I tried for collections,
string name = "this is a a program program";
string[] arr = name.Split(' ');
var myList = new List<string>();
var duplicates = new List<string>();
foreach(string res in arr)
{
if (!myList.Contains(res))
{
myList.Add(res);
}
else
{
duplicates.Add(res);
}
}
foreach(string result in duplicates)
{
Console.WriteLine(result);
}
Console.ReadLine();
But I want to find the duplicates for the below string and to store it in an array. How to do that?
eg:- string aa = "elements";
In the above string i want to find the duplicate characters and store it in an array
Can anyone help me?
Linq solution:
string name = "this is a a program program";
String[] result = name.Split(' ')
.GroupBy(word => word)
.Where(chunk => chunk.Count() > 1)
.Select(chunk => chunk.Key)
.ToArray();
Console.Write(String.Join(Environment.NewLine, result));
The same princicple for duplicate characters within a string:
String source = "elements";
Char[] result = source
.GroupBy(c => c)
.Where(chunk => chunk.Count() > 1)
.Select(chunk => chunk.Key)
.ToArray();
// result = ['e']
Console.Write(String.Join(Environment.NewLine, result));
string name = "elements";
var myList = new List<char>();
var duplicates = new List<char>();
foreach (char res in name)
{
if (!myList.Contains(res))
{
myList.Add(res);
}
else if (!duplicates.Contains(res))
{
duplicates.Add(res);
}
}
foreach (char result in duplicates)
{
Console.WriteLine(result);
}
Console.ReadLine();
string is an array of chars. So, you can use your collection approach.
But, I would reccomend typed HashSet. Just load it with string and you'll get array of chars without duplicates, with preserved order.
take a look:
string s = "aaabbcdaaee";
HashSet<char> hash = new HashSet<char>(s);
HashSet<char> hashDup = new HashSet<char>();
foreach (var c in s)
if (hash.Contains(c))
hash.Remove(c);
else
hashDup.Add(c);
foreach (var x in hashDup)
Console.WriteLine(x);
Console.ReadKey();
Instead of a List<> i'd use a HashSet<> because it doesn't allow duplicates and Add returns false in that case. It's more efficient. I'd also use a Dictionary<TKey,Tvalue> instead of the list to track the count of each char:
string text = "elements";
var duplicates = new HashSet<char>();
var duplicateCounts = new Dictionary<char, int>();
foreach (char c in text)
{
int charCount = 0;
bool isDuplicate = duplicateCounts.TryGetValue(c, out charCount);
duplicateCounts[c] = ++charCount;
if (isDuplicate)
duplicates.Add(c);
}
Now you have all unique duplicate chars in the HashSet and the count of each unique char in the dictionary. In this example the set only contains e because it's three times in the string.
So you could output it in the following way:
foreach(char dup in duplicates)
Console.WriteLine("Duplicate char {0} appears {1} times in the text."
, dup
, duplicateCounts[dup]);
For what it's worth, here's a LINQ one-liner which also creates a Dictionary that only contains the duplicate chars and their count:
Dictionary<char, int> duplicateCounts = text
.GroupBy(c => c)
.Where(g => g.Count() > 1)
.ToDictionary(g => g.Key, g => g.Count());
I've shown it as second approach because you should first understand the standard way.
string name = "this is a a program program";
var arr = name.Split(' ').ToArray();
var dup = arr.Where(p => arr.Count(q => q == p) > 1).Select(p => p);
HashSet<string> hash = new HashSet<string>(dup);
string duplicate = string.Join(" ", hash);
You can do this through `LINQ
string name = "this is a a program program";
var d = name.Split(' ').GroupBy(x => x).Select(y => new { word = y.Key, Wordcount = y.Count() }).Where(z=>z.cou > 1).ToList();
Use LINQ to group values:
public static IEnumerable<T> GetDuplicates<T>(this IEnumerable<T> list)
{
return list.GroupBy(item => item).SelectMany(group => group.Skip(1));
}
public static bool HasDuplicates<T>(this IEnumerable<T> list)
{
return list.GetDuplicates().IsNotEmpty();
}
Then you use these extensions like this:
var list = new List<string> { "a", "b", "b", "c" };
var duplicatedValues = list.GetDuplicates();

Best way to parse a string into Dictionary of terms

Input - string: "TAG1xxxTAG2yyyTAG3zzzTAG1tttTAG1bbb"
Expected result: pairs TAG1 = {xxx,,ttt,bbb}, TAG2 = {yyy}, TAG3 = {zzz}.
I did it using regexps, but I'm really confused by using Regex.Replace and not using return value. I want to improve this code, so how can it be realized?
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace TermsTest
{
class Program
{
static void Main(string[] args)
{
string[] tags = { "TAG1", "TAG2", "TAG3", "TAG4", "TAG5", "TAG6", "TAG7", "TAG8" };
string file = "TAG2jjfjfjndbfdjTAG1qqqqqqqTAG3uytygh fhdjdfTAG5hgjdhfghTAG6trgfmxc hdfhdTAG2jfksksdhjskTAG3kdjbjvbsjTAG2jskjdjdvjvbxjkvbjdTAG2jkxcndjcjbkjn";
string tag = "(" + string.Join("|", tags) + ")";
var dictionary = new Dictionary<string, List<string>>(tags.Length);
Regex.Replace(file, string.Format(#"({0})(.+?)(?={0}|$)", tag), match =>
{
string key = match.Groups[1].Value, value = match.Groups[3].Value;
if (dictionary.ContainsKey(key))
dictionary[key].Add(value);
else
dictionary[key] = new List<string> {value};
return "";
});
foreach (var pair in dictionary)
{
Console.Write(pair.Key + " =\t");
foreach (var entry in pair.Value)
{
Console.Write(entry + " ");
}
Console.WriteLine();
Console.WriteLine();
}
}
}
}
string input = "TAG1xxxTAG2yyyTAG3zzzTAG1tttTAG1bbb";
var lookup = Regex.Matches(input, #"(TAG\d)(.+?)(?=TAG|$)")
.Cast<Match>()
.ToLookup(m => m.Groups[1].Value, m => m.Groups[2].Value);
foreach (var kv in lookup)
{
Console.WriteLine(kv.Key + " => " + String.Join(", ", kv));
}
OUTPUT:
TAG1 => xxx, ttt, bbb
TAG2 => yyy
TAG3 => zzz
What are you trying to do is simply grouping of the values of the same tag, so it should be easier to use GroupBy method:
string input = "TAG1xxxTAG2yyyTAG3zzzTAG1tttTAG1bbb";
var list = Regex.Matches(input, #"(TAG\d+)(.+?)(?=TAG\d+|$)")
.Cast<Match>()
.GroupBy(m => m.Groups[1].Value,
(key, values) => string.Format("{0} = {{{1}}}",
key,
string.Join(", ",
values.Select(v => v.Groups[2]))));
var output = string.Join(", ", list);
This produces as a output string "TAG1 = {xxx, ttt, bbb}, TAG2 = {yyy}, TAG3 = {zzz}"
I'm not sure that I'm aware of all your assumptions and conventions in this problem; but this gave me similar result:
var tagColl = string.Join("|", tags);
var tagGroup = string.Format("(?<tag>{0})(?<val>[a-z]*)", tagColl);
var result = from x in Regex.Matches(file, tagGroup).Cast<Match>()
where x.Success
let pair = new { fst = x.Groups["tag"].Value, snd = x.Groups["val"].Value }
group pair by pair.fst into g
select g;
And a simple test would be:
Console.WriteLine(string.Join("\r\n", from g in result
let coll = string.Join(", ", from item in g select item.snd)
select string.Format("{0}: {{{1}}}", g.Key, coll)));
This is a perfect job for the .NET CaptureCollection object—a unique .NET feature that lets you reuse the same capture group multiple times.
Use this regex and use Matches to create a MatchCollection:
(?:TAG1(.*?(?=TAG|$)))?(?:TAG2(.*?(?=TAG|$)))?(?:TAG3(.*?(?=TAG|$)))?
Then inspect the captures:
Groups[1].Captures will contain all the TAG1
Groups[2].Captures will contain all the TAG2
Groups[3].Captures will contain all the TAG3
From there it's a short step to your final data structure.
To reduce the potential for backtracking, you can make the tokens atomic:
(?>(?:TAG1(.*?(?=TAG|$)))?)(?>(?:TAG2(.*?(?=TAG|$)))?)(?>(?:TAG3(.*?(?=TAG|$)))?)
For details about how this works, see Capture Groups that can be Quantified.

Categories