Comparing Names in array to themselves

Comparing Names in array to themselves - c#

I have an Array of names I sorted alphabetically. Many names repeat themselves and I track each occurrence as Popularity I have been trying to figure out how I can compare each Index and the one next to it to see if its the same name or not. Each time the same name appears I have a counter that ticks up, when it reaches a different name it checks its occurrence vs "foundNamePop" it stores the counter in a separate variable and resets. The problem is that some Arrays as input have the same name repeating at the end of the array (i.e. Lane, Lane, Lane \0) it leaves out of my IF LOOP and doesn't store it because I just have only the "nameCounter++". I just can't seem to find the solution to making sure it reads every name and store it all no matter if there are multiple names at the end or single names that are different i.e.(Lane, Dane, Bane \0).
Let me also add these .txt files can contain ~50 thousand names and I have no idea what names are in there.
Why does that ending If statement not work it just enters like normal. I ran with debugging and i watched it just slip right into the function even when .ElementsAt(i).Value > (5 for this instance)
var dict = new ConcurrentDictionary<string,int>(StringComparer.OrdinalIgnoreCase);
foreach (var name in updatedName)
{
dict.AddOrUpdate(name, 1, (_, count) => ++count);
}
for (int i = 0; i < dict.Count; i++)
{
if (dict.ElementsAt(i).Value <= foundNamePop);
{
lessPopNameSum += dict.ElementAt(i).Value;
}
}

The simple solution is to add a check after the loop
if (foundNamePop >= nameCounter)
{
lessPopNameSum += nameCounter;
}
But it is not clear to me what you are actually computing, it looks like you are summing the duplicate names that have more duplicates than foundNamePop, but it is not clear what value this has, nor what actual meaning the result will have.
You should be able to use LINQ to get something similar with less code:
var lessPopNameSum = sameLengthName
.GroupBy(n => n)
.Select(group => group.Count())
.Where(c => c >= foundNamePop)
.Sum();

Although I like the elegance of the other posted solution another alternative could be to use a Dictionary to store a count of each of the names.
const int FoundNamePop = 2;
var names = new string[] { "Bill", "Jane", "Jeff", "Rebecca", "Bill" };
var count = FindPopularNames(names)
.Where(kvp => kvp.Value < FoundNamePop)
.Sum(kvp => kvp.Value);
// With 'FoundNamePop' set to two, the below line will print '3'.
Console.WriteLine($"Count: {count}");
static IDictionary<string, int> FindPopularNames(IEnumerable<string> names)
{
var dict = new ConcurrentDictionary<string, int>
(StringComparer.OrdinalIgnoreCase);
foreach (var name in names)
{
dict.AddOrUpdate(name, 1, (_, count) => ++count);
}
return dict;
}

Related

Modify duplicate values with duplication index suffix (using Linq)

I have a list:
List<string> myList = new List<string>{ "dog", "cat", "dog", "bird" };
I want the output to be list of:
"dog (1)", "cat", "dog (2)", "bird"
I've already looked through this question but it is only talking about count the duplicates, my output should be with its duplicate index. like
duplicate (index)
I've tried this code:
var q = list.GroupBy(x => x)
.Where(y => y.Count()>1)
.Select(g => new {Value = g.Key + "(" + g.Index + ")"})
but it does not seem to work because:
Need to return all of my list back \ Or just modify the existing one. (and my answer returning only the duplicate ones)
For duplicate values need to add a prefix based on their "duplicate index".
How to do this in C#? Is there a way using Linq?

The accepted solution works but is extremely inefficient when the size of the list grows large.
What you want to do is first get the information you need in an efficient data structure. Can you implement a class:
sealed class Counter<T>
{
public void Add(T item) { }
public int Count(T item) { }
}
where Count returns the number of times (possibly zero) that Add has been called with that item. (Hint: you could use a Dictionary<T, int> to good effect.)
All right. Now that we have our useful helper we can:
var c1 = new Counter<string>();
foreach(string item in myList)
c1.Add(item);
Great. Now we can construct our new list by making use of a second counter:
var result = new List<String>();
var c2 = new Counter<String>();
foreach(string item in myList)
{
c2.Add(item);
if (c1.Count(item) == 1))
result.Add(item);
else
result.Add($"{item} ({c2.Count(item)})");
}
And we're done. Or, if you want to modify the list in place:
var c2 = new Counter<String>();
// It's a bad practice to mutate a list in a foreach, so
// we'll be sticklers and use a for.
for (int i = 0; i < myList.Count; i = i + 1)
{
var item = myList[i];
c2.Add(item);
if (c1.Count(item) != 1))
myList[i] = $"{item} ({c2.Count(item)})";
}
The lesson here is: create a useful helper class that solves one problem extremely well, and then use that helper class to make the solution to your actual problem more elegant. You need to count things to solve a problem? Make a thing-counter class.

This solution is not quadratic with respect to size of list, and it modifies the list in place as preferred in OP.
Any efficient solution will involve a pre-pass in order to find and count the duplicates.
List<string> myList = new List<string>{ "dog", "cat", "dog", "bird" };
//map out a count of all the duplicate words in dictionary.
var counts = myList
.GroupBy(s => s)
.Where(p => p.Count() > 1)
.ToDictionary(p => p.Key, p => p.Count());
//modify the list, going backwards so we can take advantage of our counts.
for (int i = myList.Count - 1; i >= 0; i--)
{
string s = myList[i];
if (counts.ContainsKey(s))
{
//add the suffix and decrement the number of duplicates left to tag.
myList[i] += $" ({counts[s]--})";
}
}

One way to do this is to simply create a new list that contains the additional text for each item that appears more than once. When we find these items, we can create our formatted string using a counter variable, and increment the counter if the list of formatted strings contains that counter already.
Note that this is NOT a good performing solution. It was just the first thing that came to my head. But it's a place to start...
private static void Main()
{
var myList = new List<string> { "dog", "cat", "dog", "bird" };
var formattedItems = new List<string>();
foreach (var item in myList)
{
if (myList.Count(i => i == item) > 1)
{
int counter = 1;
while (formattedItems.Contains($"{item} ({counter})")) counter++;
formattedItems.Add($"{item} ({counter})");
}
else
{
formattedItems.Add(item);
}
}
Console.WriteLine(string.Join(", ", formattedItems));
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output

Ok, #EricLippert challenged me and I couldn't let it go. Here's my second attempt, which I believe is much better performing and modifies the original list as requested. Basically we create a second list that contains all the duplicate entries in the first. Then we walk backwards through the first list, modifying any entries that have a counterpart in the duplicates list, and remove the item from the duplicates list each time we encounter one:
private static void Main()
{
var myList = new List<string> {"dog", "cat", "dog", "bird"};
var duplicates = myList.Where(item => myList.Count(i => i == item) > 1).ToList();
for (var i = myList.Count - 1; i >= 0; i--)
{
var numDupes = duplicates.Count(item => item == myList[i]);
if (numDupes <= 0) continue;
duplicates.Remove(myList[i]);
myList[i] += $" ({numDupes})";
}
Console.WriteLine(string.Join(", ", myList));
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output

Loop to check for duplicate strings

I want to create a loop to check a list of titles for duplicates.
I currently have this:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var x in productTitles)
{
var title = x.Text;
productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var y in productTitles.Skip(productTitles.IndexOf(x) + 1))
{
if (title == y.Text)
{
Assert.Fail("Found duplicate product in the table");
}
}
}
But this is taken the item I skip out of the array for the next loop so item 2 never checks it's the same as item 1, it moves straight to item 3.
I was under the impression that skip just passed over the index you pass in rather than removing it from the list.

You can use GroupBy:
var anyDuplicates = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.GroupBy(p => p.Text, p => p)
.Any(g => g.Count() > 1);
Assert.That(anyDuplicates, Is.False);
or Distinct:
var productTitles = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text)
.ToArray();
var distinctProductTitles = productTitles.Distinct().ToArray();
Assert.AreEqual(productTitles.Length, distinctProductTitles.Length);
Or, if it is enough to find a first duplicate without counting all of them it's better to use a HashSet<T>:
var titles = new HashSet<string>();
foreach (var title in SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text))
{
if (!titles.Add(title))
{
Assert.Fail("Found duplicate product in the table");
}
}
All approaches are better in terms of computational complexity (O(n)) than what you propose (O(n2)).

You don't need a loop. Simply use the Where() function to find all same titles, and if there is more than one, then they're duplicates:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach(var x in productTitles) {
if (productTitles.Where(y => x.Text == y.Text).Count() > 1) {
Assert.Fail("Found duplicate product in the table");
}
}

I would try a slightly different way since you only need to check for duplicates in a one-dimensional array.
You only have to check the previous element with the next element within the array/collection so using Linq to iterate through all of the items seems a bit unnecessary.
Here's a piece of code to better understand:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
for ( int i = 0; i < productionTitles.Length; i++ )
{
var currentObject = productionTitles[i];
for ( int j = i + 1; j < productionTitles.Length; j++ )
{
if ( currentObject.Title == productionTitles[j].Title )
{
// here's your duplicate
}
}
}
Since you've checked that item at index 0 is not the same as item placed at index 3 there's no need to check that again when you're at index 3. The items will remain the same.

The Skip(IEnumerable, n) method returns an IEnumerable that doesn't "contain" the n first element of the IEnumerable it's called on.
Also I don't know what sort of behaviour could arise from this, but I wouldn't assign a new IEnumerable to the variable over which the foreach is being executed.
Here's another possible solution with LINQ:
int i = 0;
foreach (var x in productTitles)
{
var possibleDuplicate = productTitles.Skip(i++).Find((y) => y.title == x.title);
//if possibleDuplicate is not default value of type
//do stuff here
}
This goes without saying, but the best solution for you will depend on what you are trying to do. Also, I think the Skip method call is more trouble than it's worth, as I'm pretty sure it will most certainly make the search less eficient.

Get every nth element or last

I'm hitting a brick wall with this, and I just can't seem to wrap my head around it.
Given a List of objects, how can i get every third element starting from the end (so the third to last, sixth to last etc) but if it gets to the end and there are only 1 or 2 left, returns the first element.
I'm essentially trying to simulate drawing three cards from the Stock and checking for valid moves in a game of patience, but for some reason i'm struggling with this one concept.
EDIT:
So far I've tried looked into using the standard for loop increasing the step. That leads me to the second need which is to get the first element if there are less than three on the final loop.
I've tried other suggestions on stack overflow for getting nth element from a list, however they all also don't provide the second requirement.
Not entirely sure what code i could post that wouldn't be a simple for loop. as my problem is the logic for the code, not the code itself.
For example:
Given the list
1,2,3,4,5,6,7,8,9,10
i would like to get a list with
8, 5, 2, 1
as the return.

pseudocode:
List<object> filtered = new List<object>();
List<object> reversedList = myList.Reverse();
if(reversedList.Count % 3 != 0)
{
return reversedList.Last();
}
else
{
for(int i = 3; i < reversedList.Count; i = i +3)
{
filterList.Add(reversedList[i]);
}
if(!filterList.Contains(reversedList.Last())
{
filterList.Add(reversedList.Last());
}

Try using this code -
List<int> list = new List<int>();
List<int> resultList = new List<int>();
int count = 1;
for (;count<=20;count++) {
list.Add(count);
}
for (count=list.Count-3;count>=0;count-=3)
{
Debug.Log(list[count]);
resultList.Add(list[count]);
}
if(list.Count % 3 > 0)
{
Debug.Log(list[0]);
resultList.Add(list[0]);
}

Had to try and do it with linq.
Not sure if it live up to your requirements but works with your example.
var list = Enumerable.Range(1, 10).ToList();
//Start with reversing the order.
var result = list.OrderByDescending(x => x)
//Run a select overload with index so we can use position
.Select((number, index) => new { number, index })
//Only include items that are in the right intervals OR is the last item
.Where(x => ((x.index + 1) % 3 == 0) || x.index == list.Count() - 1)
//Select only the number to get rid of the index.
.Select(x => x.number)
.ToList();
Assert.AreEqual(8, result[0]);
Assert.AreEqual(5, result[1]);
Assert.AreEqual(2, result[2]);
Assert.AreEqual(1, result[3]);

C# take a duplicate entry in a CSV file and remove the duplicate by taking an average

My program creates a .csv file with a persons name and an integer next to them.
Occasionally there are two entries of the same name in the file, but with a different time. I only want one instance of each person.
I would like to take the mean of the two numbers to produce just one row for the name, where the number will be the average of the two existing.
So here Alex Pitt has two numbers. How can I take the mean of 105 and 71 (in this case) to produce a row that just includes Alex Pitt, 88?
Here is how I am creating my CSV file if reference is required.
public void CreateCsvFile()
{
PaceCalculator ListGather = new PaceCalculator();
List<string> NList = ListGather.NameGain();
List<int> PList = ListGather.PaceGain();
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
string filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
using (var file = File.CreateText(filepath))
{
foreach (var arr in nAndPList)
{
if (arr == null || arr.Length == 0) continue;
file.Write(arr[0]);
for (int i = 1; i < arr.Length; i++)
{
file.Write(arr[i]);
}
file.WriteLine();
}
}
}

To start with, you can write your current CreateCsvFile much more simply like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => String.Format("{0},{1}", a, b));
File.WriteAllLines(filepath, records);
}
Now, it can easily be changed to work out the average pace if you have duplicate names, like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
from record in ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => new { Name = a, Pace = b })
group record.Pace by record.Name into grs
select String.Format("{0},{1}", grs.Key, grs.Average());
File.WriteAllLines(filepath, records);
}

I would recommend to merge the duplicates before you put everything into the CSV file.
use:
// The List with all duplicate values
List<string> duplicateChecker = new List<string>();
//Takes the duplicates and puts them in a new List. I'm using the NList because I assume the Names are the important part.
duplicateChecker = NList .Distinct().ToList();
Now you can simply Iterrate through the new list and search their values in your NList. Use a foreach loop which is looking up the index of the Name value in Nlist. After that you can use the Index to merge the integers with a simple math method.
//Something like this:
Make a foreach loop for every entry in your duplicateChecker =>
Use Distrinc again on duplicateChecker to make sure you won't go twice through the same duplicate =>
Get the Value of the current String and search it in Nlist =>
Get the Index of the current Element in Nlist and search for the Index in Plist =>
Get the Integer of Plist and store it in a array =>
// make sure your math method runs before a new name starts. After that store the new values in your nAndPList
Once the Loop is through with the first name use a math method.
I hope you understand what I was trying to say. However I would recommend using a unique identifier for your persons. Sooner or later 2 persons will appear with the same name (like in a huge company).

Change the code below:
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
To
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b)
.ToList()
.GroupBy(x => x.[The field you want to group by])
.Select(y => y.First);

Count numbers in a List

In C# i have a List which contains numbers in string format. Which is the best way to count all this numbers? For example to say i have three time the number ten..
I mean in unix awk you can say something like
tempArray["5"] +=1
it is similar to a KeyValuePair but it is readonly.
Any fast and smart way?

Very easy with LINQ :
var occurrenciesByNumber = list.GroupBy(x => x)
.ToDictionary(x => x.Key, x.Count());
Of course, being your numbers represented as strings, this code does distinguish for instance between "001" and "1" even if conceptually are the same number.
To count numbers that have the same value, you could do for example:
var occurrenciesByNumber = list.GroupBy(x => int.Parse(x))
.ToDictionary(x => x.Key, x.Count());

(As noted in digEmAll's answer, I'm assuming you don't really care that they're numbers - everything here assumes that you wanted to treat them as strings.)
The simplest way to do this is to use LINQ:
var dictionary = values.GroupBy(x => x)
.ToDictionary(group => group.Key, group => group.Count());
You could build the dictionary yourself, like this:
var map = new Dictionary<string, int>();
foreach (string number in list)
{
int count;
// You'd normally want to check the return value, but in this case you
// don't care.
map.TryGetValue(number, out count);
map[number] = count + 1;
}
... but I prefer the conciseness of the LINQ approach :) It will be a bit less efficient, mind you - if that's a problem, I'd personally probably create a generic "counting" extension method:
public static Dictionary<T, int> GroupCount<T>(this IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var map = new Dictionary<T, int>();
foreach (T value in source)
{
int count;
map.TryGetValue(number, out count);
map[number] = count + 1;
}
return map;
}
(You might want another overload accepting an IEqualityComparer<T>.) Having written this once, you can reuse it any time you need to get the counts for items:
var counts = list.GroupCount();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Comparing Names in array to themselves - c#

Related

Modify duplicate values with duplication index suffix (using Linq)

Loop to check for duplicate strings

Get every nth element or last

C# take a duplicate entry in a CSV file and remove the duplicate by taking an average

Count numbers in a List

Categories

Resources