LINQ - turn List<string> into Dictionary<string,string> - c#

I'm just working on a Kata on my lunch and I've come unstuck...
Here's the steps I'm trying to follow:
Given an input string, split the string by the new line character
Given the string array result of the previous step, skip the first element in the array
Given the collection of strings resulting from the previous step, create a collection consisting of every 2 elements
In that last statement what I mean is, given this collection of 4 strings:
{
"string1",
"string2",
"string3",
"string4"
}
I should end up with this collection of pairs (is 'tuples' the right term?):
{
{ "string1","string2" },
{ "string3","string4" }
}
I started looking at ToDictionary, then moved over to selecting an anonymous type but I'm not sure how to say "return the next two strings as a pair".
My code looks similar to this at the time of writing:
public void myMethod() {
var splitInputString = input.Split('\n');
var dic = splitInputString.Skip(1).Select( /* each two elements */ );
}
Cheers for the help!
James

Well, you could use (untested):
var dic = splitInputStream.Zip(splitInputStream.Skip(1),
(key, value) => new { key, value })
.Where((pair, index) => index % 2 == 0)
.ToDictionary(pair => pair.key, pair => pair.value);
The Zip part will end up with:
{ "string1", "string2" }
{ "string2", "string3" }
{ "string3", "string4" }
... and the Where pair using the index will skip every other entry (which would be "value with the next key").
Of course if you really know you've got a List<string> to start with, you could just access the pairs by index, but that's boring...

Related

Replace the string in a list that are same with start string

I don't know if my title is correct.
I have a list
\1925\10\04\issue1
\1925\10\05\issue1
\1925\10\07\issue1
\1925\10\10\issue1
\1925\10\11\issue1
\1925\10\12\issue1
\1925\10\13\issue1
\1925\10\14\issue1
\1925\10\15\issue1
\1925\10\17\issue1
\1925\10\18\issue1
\1925\10\19\issue1
And what i want to do in the list is became
\1925\10\04\issue1
\05\issue1
\07\issue1
\10\issue1
\11\issue1
\12\issue1
\13\issue1
\14\issue1
\15\issue1
\17\issue1
\18\issue1
\19\issue1
I need it to be dynamic.
There may be instance that i have a list like this
\1925\10\04\issue1
\1925\10\04\issue2
\1925\10\04\issue3
\1925\10\04\issue4
And the output is like this
\1925\10\04\issue1
\issue2
\issue3
\issue4
So far i'm using diff match patch.
https://github.com/google/diff-match-patch/wiki/Language:-C%23
And here is my code.
diff_match_patch dmp = new diff_match_patch();
var d = dmp.diff_main(#"\1925\10\14\issue1", #"\1925\10\05\issue1");
//dmp.diff_cleanupEfficiency(d);
foreach (var item in d)
{
Console.WriteLine($"text {item.text} operation {item.operation}");
}
But is there a better way of doing this? or faster way
assuming you have the input as List<string> input then this code should work:
var splittet = input.Select(i => i.Split("\\".ToCharArray(), StringSplitOptions.RemoveEmptyEntries));
Action<string[], int> print = (string[] lst, int index) => Console.WriteLine("\\" + string.Join("\\", lst.Skip(index)));
splittet.Aggregate(new string[] { },
(common, item) =>
{
var index = Enumerable.Range(0, Math.Min(common.Length, item.Length)).FirstOrDefault(i => common[i] != item[i]);
print(item, index);
return item;
}
);
So given the input
var input = new List<string> { #"\1925\10\04\issue1",
#"\1925\10\05\issue1",
#"\1925\10\07\issue1",
#"\1925\10\10\issue1",
#"\1925\10\11\issue1",
#"\1925\10\12\issue1",
#"\1925\10\04\issue1",
#"\1925\10\04\issue2",
#"\1925\10\04\issue3",
#"\1925\10\04\issue4"};
this is the output:
\1925\10\04\issue1
\05\issue1
\07\issue1
\10\issue1
\11\issue1
\12\issue1
\04\issue1
\issue2
\issue3
\issue4
Some explanation:
First instead of working with a list of string, I split it up to a list of tokens.
Then I defined some print-action. you could instead add the result to an output-list or do whatever. In this case, it is just writing to console
then the list is aggregated. the aggreagtor starts with an empty string-array. then it tries to find the first index, where the first item differs from the emtpy list and prints all parts from this index on. and then the first index is returned to the aggregator. The aggregator then compares this first item with the second item, finds the first index where the parts differ and prints the parts from there on etc etc.

Using List<string>.Any() to find if a string contains an item as well as find the matching item?

I have a list of strings, which can be considered 'filters'.
For example:
List<string> filters = new List<string>();
filters.Add("Apple");
filters.Add("Orange");
filters.Add("Banana");
I have another list of strings, which contains sentences.
Example:
List<string> msgList = new List<string>();
msgList.Add("This sentence contains the word Apple.");
msgList.Add("This doesn't contain any fruits.");
msgList.Add("This does. It's a banana.");
Now I want to find out which items in msgList contains a fruit. For which, I use the following code:
foreach(string msg in msgList)
{
if(filters.Any(msg.Contains))
{
// Do something.
}
}
I'm wondering, is there a way in Linq where I can use something similar to List.Any() where I can check if msgList contains a fruit, and if it does, also get the fruit which matched the inquiry. If I can get the matching index in 'filters' that should be fine. That is, for the first iteration of the loop it should return 0 (index of 'Apple'), for the second iteration return null or something like a negative value, for the third iteration it should return 2 (index of 'Banana').
I checked around in SO as well as Google but couldn't find exactly what I'm looking for.
You want FirstOrDefault instead of Any.
FirstOrDefault will return the first object that matches, if found, or the default value (usually null) if not found.
You could use the List<T>.Find method:
foreach (string msg in msgList)
{
var fruit = filters.Find(msg.Contains);
if (fruit != null)
{
// Do something.
}
}
List<string> filters = new List<string>() { "Apple", "Orange", "Banana" };
string msg = "This sentence contains the word Apple.";
var fruit = Regex.Matches(msg, #"\w+", RegexOptions.IgnoreCase)
.Cast<Match>()
.Select(x=>x.Value)
.FirstOrDefault(s => filters.Contains(s));
A possible approach to return the indexes of the elements
foreach (string msg in msgList)
{
var found = filters.Select((x, i) => new {Key = x, Idx = i})
.FirstOrDefault(x => msg.Contains(x.Key));
Console.WriteLine(found?.Idx);
}
Note also that Contains is case sensitive, so the banana string is not matched against the Banana one. If you want a case insensitive you could use IndexOf with the StringComparison operator

Check whether a string is in a list at any order in C#

If We have a list of strings like the following code:
List<string> XAll = new List<string>();
XAll.Add("#10#20");
XAll.Add("#20#30#40");
string S = "#30#20";//<- this is same as #20#30 also same as "#20#30#40" means S is exist in that list
//check un-ordered string S= #30#20
// if it is contained at any order like #30#20 or even #20#30 ..... then return true :it is exist
if (XAll.Contains(S))
{
Console.WriteLine("Your String is exist");
}
I would prefer to use Linq to check that S in this regard is exist, no matter how the order is in the list, but it contains both (#30) and (#20) [at least] together in that list XAll.
I am using
var c = item2.Intersect(item1);
if (c.Count() == item1.Length)
{
return true;
}
You should represent your data in a more meaningful way. Don't rely on strings.
For example I would suggest creating a type to represent a set of these numbers and write some code to populate it.
But there are already set types such as HashSet which is possibly a good match with built in functions for testing for sub sets.
This should get you started:
var input = "#20#30#40";
var hashSetOfNumbers = new HashSet<int>(input
.Split(new []{'#'}, StringSplitOptions.RemoveEmptyEntries)
.Select(s=>int.Parse(s)));
This works for me:
Func<string, string[]> split =
x => x.Split(new [] { '#' }, StringSplitOptions.RemoveEmptyEntries);
if (XAll.Any(x => split(x).Intersect(split(S)).Count() == split(S).Count()))
{
Console.WriteLine("Your String is exist");
}
Now, depending on you you want to handle duplicates, this might even be a better solution:
Func<string, HashSet<string>> split =
x => new HashSet<string>(x.Split(
new [] { '#' },
StringSplitOptions.RemoveEmptyEntries));
if (XAll.Any(x => split(S).IsSubsetOf(split(x))))
{
Console.WriteLine("Your String is exist");
}
This second approach uses pure set theory so it strips duplicates.

How do I order this list of site URLs in C#?

I have a list of site URLs,
/node1
/node1/sub-node1
/node2
/node2/sub-node1
The list is given to me in a random order, I need to order it so the the top level is first, followed by sub-levels and so on (because I cannot create /node2/sub-node1 without /node2 existing). Is there a clean way to do this?
Right now I'm just making a recursive call, saying if I can't create sub-node1 because node2 exists, create node2. I'd like to have the order of the list determine the creation and get rid of my recursive call.
My first thought was ordering by length of the string... but then I thought of a list like this, that might include something like aliases for short names:
/longsitename/
/a
/a/b/c/
/a
/a/b/
/otherlongsitename/
... and I thought a better option was to order by the number of level-separator characters first:
IEnumerable<string> SortURLs(IEnumerable<string> urls)
{
return urls.OrderBy(s => s.Count(c => c == '/')).ThenBy(s => s);
}
Then I thought about it some more and I saw this line in your question:
I cannot create /node2/sub-node1 without /node2 existing
Aha! The order of sections or within a section does not really matter, as long as children are always listed after parents. With that in mind, my original thought was okay and ordering by length of the string alone should be just fine:
IEnumerable<string> SortURLs(IEnumerable<string> urls)
{
return urls.OrderBy(s => s.Length);
}
Which lead me at last to wondering why I cared about the length at all? If I just sort the strings, regardless of length, strings with the same beginning will always sort the shorter string first. Thus, at last:
IEnumerable<string> SortURLs(IEnumerable<string> urls)
{
return urls.OrderBy(s => s);
}
I'll leave the first sample up because it may be useful if, at some point in the future, you need a more lexical or logical sort order.
Is there a clean way to do this?
Just sorting the list of URI's using a standard string sort should get you what you need. In general, "a" will order before "aa" in a string sort, so "/node1" should end up before "/node1/sub-node".
For example:
List<string> test = new List<string> { "/node1/sub-node1", "/node2/sub-node1", "/node1", "/node2" };
foreach(var uri in test.OrderBy(s => s))
Console.WriteLine(uri);
This will print:
/node1
/node1/sub-node1
/node2
/node2/sub-node1
Perhaps this works for you:
var nodes = new[] { "/node1", "/node1/sub-node1", "/node2", "/node2/sub-node1" };
var orderedNodes = nodes
.Select(n => new { Levels = Path.GetFullPath(n).Split('\\').Length, Node = n })
.OrderBy(p => p.Levels).ThenBy(p => p.Node);
Result:
foreach(var nodeInfo in orderedNodes)
{
Console.WriteLine("Path:{0} Depth:{1}", nodeInfo.Node, nodeInfo.Levels);
}
Path:/node1 Depth:2
Path:/node2 Depth:2
Path:/node1/sub-node1 Depth:3
Path:/node2/sub-node1 Depth:3
var values = new string[]{"/node1", "/node1/sub-node1" ,"/node2", "/node2/sub-node1"};
foreach(var val in values.OrderBy(e => e))
{
Console.WriteLine(val);
}
The best is to use natural sorting since your strings are mixed between strings and numbers. Because if you use other sorting methods or techniques and you have like this example:
List<string> test = new List<string> { "/node1/sub-node1" ,"/node13","/node10","/node2/sub-node1", "/node1", "/node2" };
the output will be:
/node1
/node1/sub-node1
/node10
/node13
/node2
/node2/sub-node1
which is not sorted.
You can look at this Implementation
If you mean you need all the first level nodes before all the second level nodes, sort by the number of slashes /:
string[] array = {"/node1","/node1/sub-node1", "/node2", "/node2/sub-node1"};
array = array.OrderBy(s => s.Count(c => c == '/')).ToArray();
foreach(string s in array)
System.Console.WriteLine(s);
Result:
/node1
/node2
/node1/sub-node1
/node2/sub-node1
If you just need parent nodes before child nodes, it doesn't get much simpler than
Array.Sort(array);
Result:
/node1
/node1/sub-node1
/node2
/node2/sub-node1
Recursion is actually exactly what you should use, since this is most easily represented by a tree structure.
public class PathNode {
public readonly string Name;
private readonly IDictionary<string, PathNode> _children;
public PathNode(string name) {
Name = name;
_children = new Dictionary<string, PathNode>(StringComparer.InvariantCultureIgnoreCase);
}
public PathNode AddChild(string name) {
PathNode child;
if (_children.TryGetValue(name, out child)) {
return child;
}
child = new PathNode(name);
_children.Add(name, child);
return child;
}
public void Traverse(Action<PathNode> action) {
action(this);
foreach (var pathNode in _children.OrderBy(kvp => kvp.Key)) {
pathNode.Value.Traverse(action);
}
}
}
Which you can then use like this:
var root = new PathNode(String.Empty);
var links = new[] { "/node1/sub-node1", "/node1", "/node2/sub-node-2", "/node2", "/node2/sub-node-1" };
foreach (var link in links) {
if (String.IsNullOrWhiteSpace(link)) {
continue;
}
var node = root;
var lastIndex = link.IndexOf("/", StringComparison.InvariantCultureIgnoreCase);
if (lastIndex < 0) {
node.AddChild(link);
continue;
}
while (lastIndex >= 0) {
lastIndex = link.IndexOf("/", lastIndex + 1, StringComparison.InvariantCultureIgnoreCase);
node = node.AddChild(lastIndex > 0
? link.Substring(0, lastIndex) // Still inside the link
: link // No more slashies
);
}
}
var orderedLinks = new List<string>();
root.Traverse(pn => orderedLinks.Add(pn.Name));
foreach (var orderedLink in orderedLinks.Where(l => !String.IsNullOrWhiteSpace(l))) {
Console.Out.WriteLine(orderedLink);
}
Which should print:
/node1
/node1/sub-node1
/node2
/node2/sub-node-1
/node2/sub-node-2

Value lookup using key or vice versa

First of all, apologies for the nasty title. I will correct it later.
I have some data like below,
"BOULEVARD","BOUL","BOULV", "BLVD"
I need a data structure that is O(1) for looking up any of this words by other. For example, if I use a dictionary I would need to store this keys/values like this, which looks odd to me,
abbr.Add("BLVD", new List<string> { "BOULEVARD","BOUL","BOULV", "BLVD" });
abbr.Add("BOUL", new List<string> { "BOULEVARD", "BOUL", "BOULV", "BLVD" });
abbr.Add("BOULV", new List<string> { "BOULEVARD", "BOUL", "BOULV", "BLVD" });
abbr.Add("BOULEVARD", new List<string> { "BOULEVARD", "BOUL", "BOULV", "BLVD" });
Which data structure to use to keep this data appropriate to my querying terms?
Thanks in advance
Create two HashMap - one maps word to a group number. And the other one maps group number to a list of words. This way you save some memory.
Map<String, Integer> - Word to Group Number
Map<Integer, List<String>> - Group Number to a list of words
You need two O(1) lookups - first to get the group number and then by it - get the list of words.
Assuming abbr is a Dictionary<String, IEnumerable<String>>, you could use the following function:
public static void IndexAbbreviations(IEnumerable<String> abbreviations) {
for (var a in abbreviations)
abbr.Add(a, abbreviations);
}
This will populate the dictionary with the provided list of abbreviations such that when any of them is looked up in the dictionary. It is slightly better than the example code you provided, because I am not creating a new object for each value.
From the documentation, "Retrieving a value by using its key is very fast, close to O(1), because the Dictionary(Of TKey, TValue) class is implemented as a hash table."
The choice of dictionary looks fine to me. As mentioned above, you should use the same list to be referenced in the dictionary. The code could go something like this:
var allAbrList = new List<List<string>>
{
new List<string> {"BOULEVARD", "BOUL", "BOULV", "BLVD"},
new List<string> {"STREET", "ST", "STR"},
// ...
};
var allAbrLookup = new Dictionary<string, List<string>>();
foreach (List<string> list in allAbrList)
{
foreach (string abbr in list)
{
allAbrLookup.Add(abbr, list);
}
}
The last part could be converted into LINQ to have less code, but this way it is easier to understand.
If you don't create a new list for each key, then a Dictionary<string, List<string>> will be fast and reasonably memory-efficient as long as the amount of data isn't enormous. You might also be able to get a little extra benefit from reusing the strings themselves, though the optimizer might take care of that for you anyway.
var abbr = new Dictionary<string, List<string>>;
var values = new List<string> { "BOULEVARD","BOUL","BOULV", "BLVD" };
foreach(var aValue in values) abbr.add(value, values);
As Petar Minchev already said, you can split your list into an list of groups and a list of keys that points to this group. To simplify this (in usage) you can write an own implementation of IDictionary and use the Add method to build those groups. I gave it a try and it seems to work. Here are the important parts of the implementation:
public class GroupedDictionary<T> : IDictionary<T,IList<T>>
{
private Dictionary<T, int> _keys;
private Dictionary<int, IList<T>> _valueGroups;
public GroupedDictionary()
{
_keys = new Dictionary<T, int>();
_valueGroups = new Dictionary<int, IList<T>>();
}
public void Add(KeyValuePair<T, IList<T>> item)
{
Add(item.Key, item.Value);
}
public void Add(T key, IList<T> value)
{
// look if some of the values already exist
int existingGroupKey = -1;
foreach (T v in value)
{
if (_keys.Keys.Contains(v))
{
existingGroupKey = _keys[v];
break;
}
}
if (existingGroupKey == -1)
{
// new group
int newGroupKey = _valueGroups.Count;
_valueGroups.Add(newGroupKey, new List<T>(value));
_valueGroups[newGroupKey].Add(key);
foreach (T v in value)
{
_keys.Add(v, newGroupKey);
}
_keys.Add(key, newGroupKey);
}
else
{
// existing group
_valueGroups[existingGroupKey].Add(key);
// add items that are new
foreach (T v in value)
{
if(!_valueGroups[existingGroupKey].Contains(v))
{
_valueGroups[existingGroupKey].Add(v);
}
}
// add new keys
_keys.Add(key, existingGroupKey);
foreach (T v in value)
{
if (!_keys.Keys.Contains(v))
{
_keys.Add(v, existingGroupKey);
}
}
}
}
public IList<T> this[T key]
{
get { return _valueGroups[_keys[key]]; }
set { throw new NotImplementedException(); }
}
}
The usage could look like this:
var groupedDictionary = new GroupedDictionary<string>();
groupedDictionary.Add("BLVD", new List<string> {"BOUL", "BOULV"}); // after that three keys exist and one list of three items
groupedDictionary.Add("BOULEVARD", new List<string> {"BLVD"}); // now there is a fourth key and the key is added to the existing list instance
var items = groupedDictionary["BOULV"]; // will give you the list with four items
Sure it is a lot of work to implement the whole interface but it will give to an encapsulated class that you don't have to worry about, after it is finished.
I don't see a reason to define the value part of your dictionary as a List<string> object, but perhaps that is your requirement. This answer assumes that you just want to know whether the word essentially means "Boulevard".
I would pick one value as the "official" value and map all of the other values to it, like this:
var abbr = new Dictionary<string, string>(StringComparer.CurrentCultureIgnoreCase);
abbr.Add("BLVD", "BLVD"); // this line may be optional
abbr.Add("BOUL", "BLVD");
abbr.Add("BOULV", "BLVD");
abbr.Add("BOULEVARD", "BLVD");
Alternatively, you could define an enum for the value part of the dictionary, as shown below:
enum AddressLine1Suffix
{
Road,
Street,
Avenue,
Boulevard,
}
var abbr = new Dictionary<string, AddressLine1Suffix>(StringComparer.CurrentCultureIgnoreCase);
abbr.Add("BLVD", AddressLine1Suffix.Boulevard);
abbr.Add("BOUL", AddressLine1Suffix.Boulevard);
abbr.Add("BOULV", AddressLine1Suffix.Boulevard);
abbr.Add("BOULEVARD", AddressLine1Suffix.Boulevard);

Categories