String parsing C# creating segments? - c#

I have a string in the form of:
"company=ABCorp, location=New York, revenue=10million, type=informationTechnology"
I want to be able to parse this string out and get "name", "value" pairs in the form of
company = ABCCorp
location= New York etc.
This could be any suitable data structure to store. I was thinking maybe a Dictionary<string, string>() but im open to suggestions.
Is there a suitable way of doing this in C#?
EDIT: My final goal here is to have something like this:
Array[company] = ABCCorp.
Array[location] = New York.
What data structure could we use to achieve the above? MY first thought is a Dictionary but I am not sure if Im missing anything.
thanks

Using String.Split and ToDictionary, you could do:
var original = "company=ABCorp, location=New York, revenue=10million, type=informationTechnology";
var split = original.Split(',').Select(s => s.Trim().Split('='));
Dictionary<string,string> results = split.ToDictionary(s => s[0], s => s[1]);

string s = "company=ABCorp, location=New York, revenue=10million, type=informationTechnology";
var pairs = s.Split(',')
.Select(x => x.Split('='))
.ToDictionary(x => x[0], x => x[1]);
pairs is Dictionary with the key value pair. The only caveat is you will probably want to deal with any white space between the comma and the string.

It depends a lot on the expected syntax. One way to do this is to use String.Split:
http://msdn.microsoft.com/en-us/library/system.string.split(v=vs.110).aspx
First split on comma, then iterate over all items in the string list returned and split those on equality.
However, this requires that comma and equality are never present in the values?

I'm assuming a weak RegEx/LINQ background so here's a way to do it without anything "special".
string text = "company=ABCorp, location=New York, revenue=10million, type=informationTechnology";
string[] pairs = text.Split(',');
Dictionary<string, string> dictData = new Dictionary<string, string>();
foreach (string currPair in pairs)
{
string[] data = currPair.Trim().Split('=');
dictData.Add(data[0], data[1]);
}
This has the requirement that a comma (,) and an equal-sign (=) never exist in the data other than as delimiters.
This relies heavily on String.Split.

Related

How to compare list of strings to a string where elements in the list might have letters be scrambled up?

I'm trying to write a lambda expression to compare members of a list to a string, but I also need to catch elements in the list that might have their letter scrambled up.
Here's code I got right now
List<string> listOfWords = new List<String>() { "abc", "test", "teest", "tset"};
var word = "test";
var results = listOfWords.Where(s => s == word);
foreach (var i in results)
{
Console.Write(i);
}
So this code will find string "test" in the list and will print it out, but I also want it to catch cases like "tset". Is this possible to do easily with linq or do I have to use loops?
How about sorting the letters and seeing if the resulting sorted sequences of chars are equal?
var wordSorted = word.OrderBy(c=>c);
listOfWords.Where(w => w.OrderBy(c=>c).SequenceEqual(wordSorted));

Find specific part of string based on condition

I have below comma separated string. This string contains relation which is needed in the application for processing.
string userInputColRela = "input1:Student_Name, input2:Student_Age";
Now, i need to extract Student_Name if i provide input as input1 and Student_Age if the input provided is input2.
How can i achieve this? I know i can go with looping but that will be a little lengthy solution, what is other way round?
You could parse the input string by splitting firstly on the comma, then again on the semi-colon to get the key-value pairs contained in it in dictionary form. For example:
string userInputColRela = "input1: Student_Name, input2: Student_Age";
var inputLookup = userInputColRela
.Split(',')
.Select(a => a.Split(':'))
.ToDictionary(a => a[0].Trim(), a => a[1].Trim());
var studentName = inputLookup["input1"];
If your strings are always in the format input1:Student_Name, input2:Student_Age then probably you can use a Dictionary<k,v> and Split() function like
string userInputColRela = "input1:Student_Name, input2:Student_Age";
string input = "input1";
var args = userInputColRela.Split(',');
Dictionary<string, string> inputs = new Dictionary<string, string>();
foreach (var item in args)
{
var data = item.Split(':');
inputs.Add(data[0], data[1]);
}
Console.WriteLine(inputs[input]);

Best way to hold sets of related elements dynamically added

I have a list of files, and the filenames for those files contain some characters then an underscore, then anything else like so:
test_123.txt
What I'm trying to do is loop through these files, pull out the 'prefix' (the characters up to but not including the _, add the prefix to a list if it's not already in the list, and then add the whole filename as an element of that prefix.
That might be confusing so here's an example:
List of file names:
A_ieie.txt
B_ldld.txt
C_test.txt
A_232.txt
B_file2.txt
C_345.txt
So I am looping through these files and get the prefix like so:
string prefix = fileName.Substring(0, fileName.IndexOf('_'));
Now, I check if that prefix is already in a list of prefixes, and if not, add it:
List<string> prefixes = new List<string>();
if (!prefixes.Contains(prefix))
{
prefixes.Add(prefix);
}
So here's the prefixes that would be added to that list:
A //not yet seen, add it to list
B //not yet seen, add it to list
C //not yet seen, add it to list
A //already seen, don't add
B //already seen, don't add
C //already seen, don't add
Okay the above is easy to do, but what about when I want to add the filenames that share a prefix to a list?
Since these are going to be dynamically added and could be anything, I can't make several lists before hand. I thought about have a List of lists, but is that really the best way to do this? Would a class be ideal?
The end goal of the above example would be something like :
[0][0] = A_ieie.txt //This is the 'A' list
[0][1] = A_232.txt
[1][0] = B_ldld.txt //This is the 'B' list
[1][1] = B_file2.txt
[2][0] = C_test.txt //This is the 'C' list
[2][1] = C_345.txt
Sounds like you want a Dictionary:
var list = new Dictionary<string, List<string>>();
The Key would be the "prefix" and the Value would be a list of strings (the filenames).
EDIT
If you want the list of filenames to be unique, perhaps a HashSet is a better option:
var list = new Dictionary<string, HashSet<string>>();
Sounds like you want a Dictionary>
Then, each list is referenced by a key integer (or use a string to "name" the list):
public Dictionary<string, List<string>> myBookList = new Dictionary<string, List<string>>();
private void addList(string listName, List<string> contents)
{
myBookList.Add(listName, contents);
//direct add
List<string> science_Fiction_Books = new List<string>();
myBookList.Add("Science Fiction", science_Fiction_Books);
myBookList["Science_Fiction"].Add("mytitle.txt");
myBookList["Science_Fiction"][0] = "My book title.txt";
string fileLocation = #"c:\mydirectory\mylists\myBookTitle.txt";
myBookList["Science_Fiction"].Add(System.IO.Path.GetFileName(fileLocation));
//etc.
}
You can use linq to achieve this.
List<string> List = new List<string>() { "A_ieie.txt", "B_ldld.txt", "C_test.txt", "A_232.txt", "B_file2.txt", "C_345.txt" };
Dictionary<string, List<string>> Dict = new Dictionary<string, List<string>>();
Dict = List.GroupBy(x => x.Split('_')[0]).ToDictionary(x => x.Key, x => x.ToList());
How about this:
var textFileNameList =
new List<string>{"A_ieie.txt","B_ldld.txt","C_test.txt",
"A_232.txt","B_file2.txt","C_345.txt"};
var groupedList = textFileNameList.GroupBy(t => t.Split('_')[0])
.Select( t=> new {
Prefix = t.Key,
Files = t.Select( file=> file).ToList()
}).ToList();

Find all string parts starting with [ and ends with ] in long string

I have an interesting problem for which I want to find a best solution I have tried my best with regex . What I want is to find all the col_x values from this string using C# using regular expression or any other method.
[col_5] is a central heating boiler manufacturer produce boilers under [col_6]
brand name . Your selected [col_7] model name is a [col_6] [col_15] boiler.
[col_6] [col_15] boiler [col_7] model [col_10] came in production untill
[col_11]. [col_6] model product index number is [col_1] given by SEDBUK
'Seasonal Efficiency of a Domestic Boiler in the UK'. [col_6] model have
qualifier [col_8] and GCN [col_9] 'Boiler Gas Council No'. [col_7] model
source of heat for a boiler combustion is a [col_12].
The output expected is an array
var data =["col_5","col_10","etc..."]
Edit
my attempt :
string text = "[col_1]cc[col_2]asdfsd[col_3]";
var matches = Regex.Matches(text, #"[[^#]*]");
var uniques = matches.Cast<Match>().Select(match => match.Value).ToList().Distinct();
foreach(string m in uniques)
{
Console.WriteLine(m);
}
but no success.
Try something like this:
string[] result = Regex.Matches(input, #"\[(col_\d+)\]").
Cast<Match>().
Select(x => x.Groups[1].Value).
ToArray();
I think that's what you need:
string pattern = #"\[(col_\d+)\]";
MatchCollection matches = Regex.Matches(input, pattern);
string[] results = matches.Cast<Match>().Select(x => x.Groups[1].Value).ToArray();
Replace input with your input string.
I hope it helps
This is a little hacky but you could do this.
var myMessage =#"[col_5] is a central heating boiler..."; //etc.
var values = Enumerable.Range(1, 100)
.Select(x => "[col_" + x + "]")
.Where(x => myMessage.Contains(x))
.ToList();
Assuming there is a known max col_"x" in this case I assumed 100, it just tries them all by brute force returning only the ones that it finds inside the text.
If you know that there are only so many columns to hunt for, I would try this instead of Regex personally as I have had too many bad experiences burning hours on Regex.

Transforming List<string> into a tokenised string

I have a list of strings in a List container class that look like the following:
MainMenuItem|MenuItem|subItemX
..
..
..
..
MainMenuItem|MenuItem|subItem99
What I am trying to do is transform the string, using LINQ, so that the first item for each of the tokenised string is removed.
This is the code I already have:
protected static List<string> _menuItems = GetMenuItemsFromXMLFile();
_menuItems.Where(x => x.Contains(menuItemToSearch)).ToList();
First line of code is returning an entire XML file with all the menu items that exist within an application in a tokenised form;
The second line is saying 'get me all menu items that belong to menuItemToSearch'.
menuItemToSearch is contained in the delimited string that is returned. How do I remove it using linq?
EXAMPLE
Before transform: MainMenuItem|MenuItem|subItem99
After transform : MenuItem|subItem99
Hope the example illustrates my intentions
Thanks
You can take a substring from the first position of the pipe symbol '|' to remove the first item from a string, like this:
var str = "MainMenuItem|MenuItem|subItemX";
var dropFirst = str.Substring(str.IndexOf('|')+1);
Demo.
Apply this to all strings from the list in a LINQ Select to produce the desired result:
var res = _menuItems
.Where(x => x.Contains(menuItemToSearch))
.Select(str => str.Substring(str.IndexOf('|')+1))
.ToList();
Maybe sth like this can help you.
var regex = new Regex("[^\\|]+\\|(.+)");
var list = new List<string>(new string[] { "MainMenuItem|MenuItem|subItem99", "MainMenuItem|MenuItem|subItem99" });
var result = list.Where(p => regex.IsMatch(p)).Select(p => regex.Match(p).Groups[1]).ToList();
This should work correctly.

Categories