How to Read Lines from File and Turn Array of Strings into Dictionary in C# with line Number as Key? - c#

I'm trying to read lines from a file (whose size is unknown) and convert the array values that start with "A" into a Dictionary with the line number as the key and the line contents as the value. I'm trying to think of a way of doing this without iterating over the array as that would be slow for large files. I'm currently putting in the line's hashcode (assuming no duplicates, but I can use .Distinct if needed) as the key but get an error as shown below the code:
Code:
Dictionary<int, string> lines = File
.ReadLines(inputFilePath)
.Select(x => x.StartsWith("A"))
.ToDictionary(x => x.GetHashCode, x => x.ToString());
The ToDictionary has the red error line with the following message:
"The Type arguments cannot be inferred from the usage. Try specifying the type arguments explicitly"
In that line, I'm just trying to get some distinct value for each Key, but I would like to get the line number if possible. I feel it's something obvious I'm missing.

Let's start from the example. You have a file like this:
Abracadabra
Boo-zoo
Sausages
Alakazam
And you want to obtain a dictionay: value is a file line which starts from A,
key is 1, 2, 3, ...
{1, "Abracadabra"}
{2, "Alakazam"}
So we can do it like this:
Dictionary<int, string> lines = File
.ReadLines(inputFilePath)
.Where(line => line.StartsWith("A"))
.Select((value, index) => (value, index))
.ToDictionary(pair => pair.index + 1, pair => pair.value);
The little trick is to obtain line index which we can do with a help of Select:
.Select((value, index) => (value, index))
If you want to use line index before filtration, i.e.
{0, "Abracadabra"}
{3, "Alakazam"}
move Select ahead:
Dictionary<int, string> lines = File
.ReadLines(inputFilePath)
.Select((value, index) => (value, index))
.Where(pair => pair.value.StartsWith("A"))
.ToDictionary(pair => pair.index + 1, pair => pair.value);

Dictionary<int, string> lines =
File.ReadLines(inputFilePath).
Select((value, number) => (value, number)).
Where(x => x.value.StartsWith("A")).
ToDictionary(x => x.number + 1, x => x.value);
Explanation:
Select((value, number) => (value, number))
Turns the line number and the line into a tuple.
Where(x => x.value.StartsWith("A")).
Applies the filter, filtering out the lines that don't start with A
ToDictionary(x => x.number + 1, x => x.value)
Turns the list of tuples into a dictionary. We add +1 to the line number because the internal count is 0-based.
If the file is bigger, the previous approach is not so good because it may load more content into memory at once and do a lot of object creation for every line. A traditional while loop will reduce memory use drastically.
Dictionary<int, string> dict = new Dictionary<int, string>();
using (StreamReader r = new StreamReader(inputFilePath))
{
string line;
int lineNumber = 0;
while ((line = r.ReadLine()) != null)
{
lineNumber++;
if (line.StartsWith("A"))
{
dict.Add(lineNumber, line);
}
}
}

Related

Efficient way to create a new list based of the differences in values in 2 dictionaries?

I currently have 2 strings that are formatted as an XML that are later converted into dictionaries for comparison.
So, I have a 2 Dictionary<string, object>, dict1 and dict2, that I need to compare. I need to:
Add the key to a list of strings if the values of these two dictionaries do not match
Add the key of dict2 to the list if dict1 does not contain this key
Currently, I have a simple foreach loop
foreach (string propName in dict2.Keys)
{
string oldDictValue;
string newDicValue = dict1[propName].ToString();
if (dict1.ContainsKey(propName))
{
oldDictValue = dict2[propName].ToString();
if (oldDictValue != newDicValue)
list.Add(propName);
}
else
{
list.Add(propName);
}
}
I would like to a faster solution to this problem if possible?
I don't claim that this is any faster, but it should be on par and it's less code:
List<string> list =
dict2
.Keys
.Where(k => !(dict1.ContainsKey(k) && dict1[k].Equals(dict2[k])))
.ToList();
I did do some testing with this:
List<string> list =
dict2
.Keys
.AsParallel()
.Where(k => !(dict1.ContainsKey(k) && dict1[k].Equals(dict2[k])))
.ToList();
That produced a significantly faster run.
Here's how I produced my test data:
var dict1 = Enumerable.Range(0, 10000000).Select(x => Random.Shared.Next(2000000)).Distinct().ToDictionary(x => x.ToString(), x => (object)Random.Shared.Next(20));
var dict2 = Enumerable.Range(0, 10000000).Select(x => Random.Shared.Next(2000000)).Distinct().ToDictionary(x => x.ToString(), x => (object)Random.Shared.Next(20));
You could make it faster by avoiding to get separately the dict1[propName] and the dict2[propName]. You could get the value along with the key, either by enumerating directly the KeyValuePairs stored in the dictionary, or by calling the TryGetValue method:
foreach (var (key, value2) in dict2)
{
if (!dict1.TryGetValue(key, out var value1)
|| value1.ToString() != value2.ToString())
{
list.Add(key);
}
}

Saving a split string to an arraylist using LINQ

I have some code that takes a string and processes it by splitting it into words, and giving the count of each word.
The trouble is it only returns void, because I am only able to print to the screen after the processing is done. Is there any way I can save the results in an arraylist, so that that I can return it to the method that called it?
The current code:
message.Split(' ').Where(messagestr => !string.IsNullOrEmpty(messagestr))
.GroupBy(messagestr => messagestr).OrderByDescending(groupCount => groupCount.Count())
.Take(20).ToList().ForEach(groupCount => Console.WriteLine("{0}\t{1}", groupCount.Key, groupCount.Count()));
Thank you.
Try this code
var wordCountList = message.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(messagestr => messagestr)
.OrderByDescending(grp => grp.Count())
.Take(20) //or take the whole
.Select(grp => new KeyValuePair<string, int>(grp.Key, grp.Count()))
.ToList(); //return wordCountList
//usage
wordCountList.ForEach(item => Console.WriteLine("{0}\t{1}", item.Key, item.Value));
If you want, you can return the wordCountList which is a List<KeyValuePair<string, int>> containing all the words and their counts in descending order.
How you can use that, is also shown in the last line.
And rather than taking first 20 from the list, if you want to take the whole, remove this .Take(20) part.
First of all, by calling Take(20) you just take the first 20 words and put the others away. So, if you want all the results, remove it.
After that, you can do it like this:
var words = message.Split(' ').
Where(messagestr => !string.IsNullOrEmpty(messagestr)).
GroupBy(messagestr => messagestr).
OrderByDescending(groupCount => groupCount.Count()).
ToList();
words.ForEach(groupCount => Console.WriteLine("{0}\t{1}", groupCount.Key, groupCount.Count()));
To put the results into some other data structure, you can use one of these ways:
var w = words.SelectMany(x => x.Distinct()).ToList(); //Add this line to get all the words in an array
// OR Use Dictionary
var dic = new Dictionary<string, int>();
foreach(var item in words)
{
dic.Add(item.Key, item.Count());
}

Dictionary get max key values else print all

I've checked many solutions on different sites but couldn't find what I was looking for. I'm working on a dictionary object with different Values against Keys. The structure is as follows:
Key Value
6 4
3 4
2 2
1 1
If they dictionary contains elements like this, the output should be 6 and 3, if Key (6) has the highest value, it should print only 6. However, if all the values are same against each key, it should print all the keys.
Trying to use the following but it only prints the highest Value.
var Keys_ = dicCommon.GroupBy(x => x.Value).Max(p => p.Key);
Any ideas
Instead of using Max(x=>x.Key) use .OrderByDescending(x=>x.Key) and .FirstOrDefault() that will give you the group that has the max value. You then can itterate over the group and display whatever you need.
var dicCommon = new Dictionary<int, int>();
dicCommon.Add(6, 4);
dicCommon.Add(3, 4);
dicCommon.Add(2, 2);
dicCommon.Add(1, 1);
var maxGroup = dicCommon.GroupBy(x => x.Value).OrderByDescending(x => x.Key).FirstOrDefault();
foreach (var keyValuePair in maxGroup)
{
Console.WriteLine("Key: {0}, Value {1}", keyValuePair.Key, keyValuePair.Value);
}
Run Code
First off a query can't return one and more than one result at the same time.So you need to pick one.
In this case if you want all Keys that has the highest corresponding Value, you can sort the groups based on Value then just get the first group which has the highest Value:
var Keys_ = dicCommon.GroupBy(x => x.Value)
.OrderByDescending(g => g.Key)
.First()
.Select(x => x.Key)
.ToList();
var keys = String.Join(",", dicCommon
.OrderByDescending(x=>x.Value)
.GroupBy(x => x.Value)
.First()
.Select(x=>x.Key));
You’re almost there:
dicCommon.GroupBy(x => x.Value)
.OrderByDescending(pair => pair.First().Value)
.First().Select(pair => pair.Key).ToList()
GroupBy returns an enumerable of IGrouping. So sort these descending by value, then get the first, and select the key of each containing element.
Since this requires sorting, the runtime complexity is not linear, although we can easily do that. One way would be figuring out the maximum value first and then getting all the keys where the value is equal to that:
int maxValue = dicCommon.Max(x => x.Value);
List<int> maxKeys = dicCommon.Where(x => x.Value == maxValue).Select(x => x.Key).ToList();

IEnumerable<string> to Dictionary<char, IEnumerable<string>>

I suppose that this question might partially duplicate other similar questions, but i'm having troubles with such a situation:
I want to extract from some string sentences
For example from
`string sentence = "We can store these chars in separate variables. We can also test against other string characters.";`
I want to build an IEnumerable words;
var separators = new[] {',', ' ', '.'};
IEnumerable<string> words = sentence.Split(separators, StringSplitOptions.RemoveEmptyEntries);
After that, go throught all these words and take firs character into a distinct ascending ordered collection of characters.
var firstChars = words.Select(x => x.ToCharArray().First()).OrderBy(x => x).Distinct();
After that, go through both collections and for each character in firstChars get all items from words which has the first character equal with current character and create a Dictionary<char, IEnumerable<string>> dictionary.
I'm doing this way:
var dictionary = (from k in firstChars
from v in words
where v.ToCharArray().First().Equals(k)
select new { k, v })
.ToDictionary(x => x);
and here is the problem: An item with the same key has already been added.
Whis is because into that dictionary It is going to add an existing character.
I included a GroupBy extension into my query
var dictionary = (from k in firstChars
from v in words
where v.ToCharArray().First().Equals(k)
select new { k, v })
.GroupBy(x => x)
.ToDictionary(x => x);
The solution above gives makes all OK, but it gives me other type than I need.
What I should do to get as result an Dictionary<char, IEnumerable<string>>dictionary but not Dictionary<IGouping<'a,'a>> ?
The result which I want is as in the bellow image:
But here I have to iterate with 2 foreach(s) which will Show me wat i want... I cannot understand well how this happens ...
Any suggestion and advice will be welcome. Thank you.
As the relation is one to many, you can use a lookup instead of a dictionary:
var lookup = words.ToLookup(word => word[0]);
loopkup['s'] -> store, separate... as an IEnumerable<string>
And if you want to display the key/values sorted by first char:
for (var sortedEntry in lookup.OrderBy(entry => entry.Key))
{
Console.WriteLine(string.Format("First letter: {0}", sortedEntry.Key);
foreach (string word in sortedEntry)
{
Console.WriteLine(word);
}
}
You can do this:
var words = ...
var dictionary = words.GroupBy(w => w[0])
.ToDictionary(g => g.Key, g => g.AsEnumerable());
But for matter, why not use an ILookup?
var lookup = words.ToLookup(w => w[0]);

Converting Collection of Strings to Dictionary

This is probably a simple question, but the answer is eluding me.
I have a collection of strings that I'm trying to convert to a dictionary.
Each string in the collection is a comma-separated list of values that I obtained from a regex match. I would like the key for each entry in the dictionary to be the fourth element in the comma-separated list, and the corresponding value to be the second element in the comma-separated list.
When I attempt a direct call to ToDictionary, I end up in some kind of loop that appears to kick me of the BackgroundWorker thread I'm in:
var MoveFromItems = matches.Cast<Match>()
.SelectMany(m => m.Groups["args"].Captures
.Cast<Capture>().Select(c => c.Value));
var dictionary1 = MoveFromItems.ToDictionary(s => s.Split(',')[3],
s => s.Split(',')[1]);
When I create the dictionary manually, everything works fine:
var MoveFroms = new Dictionary<string, string>();
foreach(string sItem in MoveFromItems)
{
string sKey = sItem.Split(',')[3];
string sVal = sItem.Split(',')[1];
if(!MoveFroms.ContainsKey(sKey))
MoveFroms[sKey.ToUpper()] = sVal;
}
I appreciate any help you might be able to provide.
The problem is most likely that the keys have duplicates. You have three options.
Keep First Entry (This is what you're currently doing in the foreach loop)
Keys only have one entry, the first one that shows up - meaning you can have a Dictionary:
var first = MoveFromItems.Select(x => x.Split(','))
.GroupBy(x => x[3])
.ToDictionary(x => x.Key, x => x.First()[1]);
Keep All Entries, Grouped
Keys will have more than one entry (each key returns an Enumerable), and you use a Lookup instead of a Dictionary:
var lookup = MoveFromItems.Select(x => x.Split(','))
.ToLookup(x => x[3], x => x[1]);
Keep All Entries, Flattened
No such thing as a key, simply a flattened list of entries:
var flat = MoveFromItems.Select(x => x.Split(','))
.Select(x => new KeyValuePair<string,string>(x[3], x[1]));
You could also use a tuple here (Tuple.Create(x[3], x[1]);) instead.
Note: You will need to decide where/if you want the keys to be upper or lower case in these cases. I haven't done anything related to that yet. If you want to store the key as upper, just change x[3] to x[3].ToUpper() in everything above.
This splits each item and selects key out of the 4th split-value, and value out of the 2nd split-value, all into a dictionary.
var dictionary = MoveFromItems.Select(s => s.Split(','))
.ToDictionary(split => split[3],
split => split[1]);
There is no point in splitting the string twice, just to use different indices.
This would be just like saving the split results into a local variable, then using it to access index 3 and 1.
However, if indeed you don't know if keys might reoccur, I would go for the simple loop you've implemented, without a doubt.
Although you have a small bug in your loop:
MoveFroms = new Dictionary<string, string>();
foreach(string sItem in MoveFromItems)
{
string sKey = sItem.Split(',')[3];
string sVal = sItem.Split(',')[1];
// sKey might not exist as a key
if (!MoveFroms.ContainsKey(sKey))
//if (!MoveFroms.ContainsKey(sKey.ToUpper()))
{
// but sKey.ToUpper() might exist!
MoveFroms[sKey.ToUpper()] = sVal;
}
}
Should do ContainsKey(sKey.ToUpper()) in your condition as well, if you really want the key all upper cases.
This will Split each string in MoveFromItems with ',' and from them make 4th item (3rd Index) as Key and 2nd item(1st Index) as Value.
var dict = MoveFromItems.Select(x => x.Split(','))
.ToLookup(x => x[3], x => x[1]);

Categories