How to improve my FindDuplicate classic algorithm - c#

I have classic find duplicate algorithm like this:
int n = int.Parse(Console.ReadLine());
Console.WriteLine();
List<int> tempArr = new List<int>();
List<int> array = new List<int>();
for (int i = 0; i < n; i++)
{
Console.Write("input number {0}: ", i + 1);
tempArr.Add(int.Parse(Console.ReadLine()));
}
tempArr.Sort();
for (int i = 0; i < n; i++)
{
for (int j = i+1; j < n; j++)
{
if (tempArr[i] == tempArr[j])
{
array.Add(tempArr[i]);
}
}
}
Everything work's okay, but if i have just two duplicate numbers like (1,2,2,3,4,5) how can i add them both to List<int> **array** with one clean shot at the loop ?

Instead of lists you could use some kind of data structure that have a better search capability (hash tables or binary trees, for example). Even if you have just one duplicate, the problem is that you need to check if you have already added the element in the list, so the key operation in your algorithm is the search. The faster you perform the search, the faster the algorithm will be. Using binary search, which is the fastest way to search, you get O(nlogn) (you perform n searches of O(logn)).
An even better way to do this is to have some kind of array that has the same size as your input range and "tick" each value that you already have. This search runs in constant time, but gets inefficient if you have a large range of input.

You can use distinct:
array = tempArr.Distinct().ToList();
Distinct isn't in linear time, if that's what you're looking for ("one clean shot"). If you know more about the input you might be able to find a way to do this in linear time. For example, if you know if the integers you take are in a certain range.

To extract all the duplicates you can use Linq:
List<int> tempList = new List<int>() { 1, 2, 2, 3, 4, 5 };
// array == [2, 2]
List<int> array = tempList
.GroupBy(x => x)
.Where(x => x.Count() > 1)
.SelectMany(x => Enumerable.Repeat(x.Key, x.Count()))
.ToList();

Related

How to make a one dimensional array to display multiples of seven

I am attempting to code a one dimensional array to display code allowing me to display multiples of seven, I'm not sure how to go through with this, thank you.
I hope I understood your question. You can use generate multiples of 7 using Linq as follows.
var result = Enumerable.Range(1, 100).Select(x => x * 7).ToArray();
Enumerable.Range allows you to generate a sequence of values in specified range (First parameter is the first number in sequence, the second parameter is number of items), while the Select statement (x=>x*7, multiply each value in generated sequence with 7) ensures you get the multiples of 7.
Complete Code:
var result = Enumerable.Range(1, 100).Select(x => x * 7).ToArray();
foreach (var item in result)
{
Console.WriteLine(item);
}
Console.ReadLine();
Due to the vagueness of the question, my answer may not be applicable, but I will attempt to answer based on my assumption of what you are asking.
If you have an array of int and you want to multiply the values of the individual array objects, you would do something like this:
int[] myArray= { 3,5,8};
for (int i = 0; i < myArray.Length; i++)
{
Console.WriteLine(myArray[i]*7);
}
//outputs 21,35,56
If you want to multiply based on the index of the array object, you would do it like this:
int[] myArray= { 3,5,8};
for (int i = 0; i < myArray.Length; i++)
{
Console.WriteLine(i*7);
}
//outputs 0,7,14
//or if you need to start with an index of 1 instead of 0
int[] myArray= { 3,5,8};
for (int i = 0; i < myArray.Length; i++)
{
Console.WriteLine((i+1)*7);
}
//outputs 7,14,21
Anu Viswan also has a good answer, but depending on what it is you are trying to do, it may be better to rely on loops. Hope my answer helps.

Writing list to CSV file

I have a List of List of int called NN which i would like to write to a csv file like this:
List<List<int>> NN = new List<List<int>>();
The NN list:
1,2,3,4,5,6
2,5,6,3,1,0
0,9,2,6,7,8
And the output csv file should look like this:
1,2,0
2,5,9
3,6,2
4,3,6
5,1,7
6,0,8
What is the best way to achieve that?
If there is a better representation you would recommend instead of the nested list i'll be glad to know.
(The purpose is that each list of int is the weights between the last and next layer in the neural network).
Here is how you can do it:
List<List<int>> NN = new List <List<int>>
{
new List<int> {1,2,3,4,5,6},
new List<int> {2,5,6,3,1,0},
new List<int> {0,9,2,6,7,8}
};
//Get expected number of rows
var numberOfRows = NN[0].Count;
var rows =
Enumerable.Range(0, numberOfRows) //For each row
.Select(row => NN.Select(list => list[row]).ToList()) //Get row data from all columns
.ToList();
StringBuilder sb = new StringBuilder();
foreach (var row in rows)
{
sb.AppendLine(string.Join(",", row));
}
var result = sb.ToString();
What you want to achieve is basically a matrix transpose and then write the data to a file.
What is the most efficient way to transpose a matrix is a complex question, and really depends on your architecture.
If you're not really concerned about super optimizing that for your processor (or accelerator), I would go for a simple nested for loop, accumulating data in intermediate memory representation:
string[] lines = new string[NN[0].Count]; // assume all lines have equal length
for(int i = 0; i < NN.Count; ++i) {
for(int j = 0; j < NN[i].Count; ++j) {
lines[j] += NN[i][j] + ((i==NN.Count - 1) ? "" : ",");
}
}
File.WriteAllLines("path.csv", lines);
As first optimization pass, I wouldn't recommend using a list of lists, since accessing elements will be quite intensive. A bidimensional array would make a better job.
int[,] NN = new int[3,6] {{1, 2, 3, 4, 5, 6 }, {2, 5, 6, 3, 1, 0}, {0, 9, 2, 6, 7, 8}};
string[] lines = new string[NN.GetLength(1)];
for (int i = 0; i < NN.GetLength(0); ++i)
{
for (int j = 0; j < NN.GetLength(1); ++j)
{
lines[j] += NN[i,j] + ((i == NN.GetLength(0) - 1) ? "" : ",");
}
}
File.WriteAllLines("path.csv", lines);
Here is a performance test, for 500x500 elements (without counting the write to file):
To improve this solution I would first make the transpose in memory (without writing anything to a file or strings), and then perform a join(,) and write to a file (as a single byte array).
If you want to further optimize, think there is always room for it :)
For example, on x86, depending on the instruction set you have, you can read this article. On a CUDA enabled device, you can read that.
Anyway, a good solution will always involve aligned memory, sub-blocking, and close-to-metal written code (intrinsics or assembly).

Given a list of length n select k random elements using C#

I found this post:
Efficiently selecting a set of random elements from a linked list
But this means that in order to approach true randomness in the sample I have to iterate over all elements, throw them in memory with a random number, and then sort. I have a very large set of items here (millions) - is there a more efficient approach to this problem?
I would suggest simply shuffling elements as if you were writing a modified Fisher-Yates shuffle, but only bother shuffling the first k elements. For example:
public static void PartialShuffle<T>(IList<T> source, int count, Random random)
{
for (int i = 0; i < count; i++)
{
// Pick a random element out of the remaining elements,
// and swap it into place.
int index = i + random.Next(source.Count - i);
T tmp = source[index];
source[index] = source[i];
source[i] = tmp;
}
}
After calling this method, the first count elements will be randomly picked elements from the original list.
Note that I've specified the Random as a parameter, so that you can use the same one repeatedly. Be careful about threading though - see my article on randomness for more information.
Take a look at this extension method http://extensionmethod.net/csharp/ienumerable-t/shuffle. You could add Skip() Take() type to page the values out the final list.
If the elements can be in memory, put them in memory first
List<Element> elements = dbContext.Select<Element>();
Now you know the number of elements. Create a set of unique indexes.
var random = new Random();
var indexes = new HashSet<int>();
while (indexes.Count < k) {
indexes.Add(random.Next(elements.Count));
}
Now you can read the elements from the list
var randomElements = indexes.Select(i => elements[i]);
I assume that the DB contains unique elements. If this is not the case, you will have to create a HashSet<Elements> instead or to append .Distinct() when querying from the DB.
UPDATE
As Patricia Shanahan says, this method will work well if k is small compared to n. If it is not the case, I suggest selecting a set n - k indexes to be excluded
var random = new Random();
var indexes = new HashSet<int>();
IEnumerable<Element> randomElements;
if (k <= elements.Count / 2) {
while (indexes.Count < k) {
indexes.Add(random.Next(elements.Count));
}
randomElements = indexes.Select(i => elements[i]);
} else {
while (indexes.Count < elements.Count - k) {
indexes.Add(random.Next(elements.Count));
}
randomElements = elements
.Select((e,i) => indexes.Contains(i) ? null : elements[i])
.Where(e => e != null);
}

How to merge 2 sorted listed into one shuffled list while keeping internal order in c#

I want to generate a shuffled merged list that will keep the internal order of the lists.
For example:
list A: 11 22 33
list B: 6 7 8
valid result: 11 22 6 33 7 8
invalid result: 22 11 7 6 33 8
Just randomly select a list (e.g. generate a random number between 0 and 1, if < 0.5 list A, otherwise list B) and then take the element from that list and add it to you new list. Repeat until you have no elements left in each list.
Generate A.Length random integers in the interval [0, B.Length). Sort the random numbers, then iterate i from 0..A.Length adding A[i] to into position r[i]+i in B. The +i is because you're shifting the original values in B to the right as you insert values from A.
This will be as random as your RNG.
None of the answers provided in this page work if you need the outputs to be uniformly distributed.
To illustrate my examples, assume we are merging two lists A=[1,2,3], B=[a,b,c]
In the approach mentioned in most answers (i.e. merging two lists a la mergesort, but choosing a list head randomly each time), the output [1 a 2 b 3 c] is far less likely than [1 2 3 a b c]. Intuitively, this happens because when you run out of elements in a list, then the elements on the other list are appended at the end. Because of that, the probability for the first case is 0.5*0.5*0.5 = 0.5^3 = 0.125, but in the second case, there are more random random events, since a random head has to be picked 5 times instead of just 3, leaving us with a probability of 0.5^5 = 0.03125. An empirical evaluation also easily validates these results.
The answer suggested by #marcog is almost correct. However, there is an issue where the distribution of r is not uniform after sorting it. This happens because original lists [0,1,2], [2,1,0], [2,1,0] all get sorted into [0,1,2], making this sorted r more likely than, for example, [0,0,0] for which there is only one possibility.
There is a clever way of generating the list r in such a way that it is uniformly distributed, as seen in this Math StackExchange question: https://math.stackexchange.com/questions/3218854/randomly-generate-a-sorted-set-with-uniform-distribution
To summarize the answer to that question, you must sample |B| elements (uniformly at random, and without repetition) from the set {0,1,..|A|+|B|-1}, sort the result and then subtract its index to each element in this new list. The result is the list r that can be used in replacement at #marcog's answer.
Original Answer:
static IEnumerable<T> MergeShuffle<T>(IEnumerable<T> lista, IEnumerable<T> listb)
{
var first = lista.GetEnumerator();
var second = listb.GetEnumerator();
var rand = new Random();
bool exhaustedA = false;
bool exhaustedB = false;
while (!(exhaustedA && exhaustedB))
{
bool found = false;
if (!exhaustedB && (exhaustedA || rand.Next(0, 2) == 0))
{
exhaustedB = !(found = second.MoveNext());
if (found)
yield return second.Current;
}
if (!found && !exhaustedA)
{
exhaustedA = !(found = first.MoveNext());
if (found)
yield return first.Current;
}
}
}
Second answer based on marcog's answer
static IEnumerable<T> MergeShuffle<T>(IEnumerable<T> lista, IEnumerable<T> listb)
{
int total = lista.Count() + listb.Count();
var random = new Random();
var indexes = Enumerable.Range(0, total-1)
.OrderBy(_=>random.NextDouble())
.Take(lista.Count())
.OrderBy(x=>x)
.ToList();
var first = lista.GetEnumerator();
var second = listb.GetEnumerator();
for (int i = 0; i < total; i++)
if (indexes.Contains(i))
{
first.MoveNext();
yield return first.Current;
}
else
{
second.MoveNext();
yield return second.Current;
}
}
Rather than generating a list of indices, this can be done by adjusting the probabilities based on the number of elements left in each list. On each iteration, A will have A_size elements remaining, and B will have B_size elements remaining. Choose a random number R from 1..(A_size + B_size). If R <= A_size, then use an element from A as the next element in the output. Otherwise use an element from B.
int A[] = {11, 22, 33}, A_pos = 0, A_remaining = 3;
int B[] = {6, 7, 8}, B_pos = 0, B_remaining = 3;
while (A_remaining || B_remaining) {
int r = rand() % (A_remaining + B_remaining);
if (r < A_remaining) {
printf("%d ", A[A_pos++]);
A_remaining--;
} else {
printf("%d ", B[B_pos++]);
B_remaining--;
}
}
printf("\n");
As a list gets smaller, the probability an element gets chosen from it will decrease.
This can be scaled to multiple lists. For example, given lists A, B, and C with sizes A_size, B_size, and C_size, choose R in 1..(A_size+B_size+C_size). If R <= A_size, use an element from A. Otherwise, if R <= A_size+B_size use an element from B. Otherwise C.
Here is a solution that ensures a uniformly distributed output, and is easy to reason why. The idea is first to generate a list of tokens, where each token represent an element of a specific list, but not a specific element. For example for two lists having 3 elements each, we generate this list of tokens: 0, 0, 0, 1, 1, 1. Then we shuffle the tokens. Finally we yield an element for each token, selecting the next element from the corresponding original list.
public static IEnumerable<T> MergeShufflePreservingOrder<T>(
params IEnumerable<T>[] sources)
{
var random = new Random();
var queues = sources
.Select(source => new Queue<T>(source))
.ToArray();
var tokens = queues
.SelectMany((queue, i) => Enumerable.Repeat(i, queue.Count))
.ToArray();
Shuffle(tokens);
return tokens.Select(token => queues[token].Dequeue());
void Shuffle(int[] array)
{
for (int i = 0; i < array.Length; i++)
{
int j = random.Next(i, array.Length);
if (i == j) continue;
if (array[i] == array[j]) continue;
var temp = array[i];
array[i] = array[j];
array[j] = temp;
}
}
}
Usage example:
var list1 = "ABCDEFGHIJKL".ToCharArray();
var list2 = "abcd".ToCharArray();
var list3 = "#".ToCharArray();
var merged = MergeShufflePreservingOrder(list1, list2, list3);
Console.WriteLine(String.Join("", merged));
Output:
ABCDaEFGHIb#cJKLd
This might be easier, assuming you have a list of three values in order that match 3 values in another table.
You can also sequence with the identity using identity (1,2)
Create TABLE #tmp1 (ID int identity(1,1),firstvalue char(2),secondvalue char(2))
Create TABLE #tmp2 (ID int identity(1,1),firstvalue char(2),secondvalue char(2))
Insert into #tmp1(firstvalue,secondvalue) Select firstvalue,null secondvalue from firsttable
Insert into #tmp2(firstvalue,secondvalue) Select null firstvalue,secondvalue from secondtable
Select a.firstvalue,b.secondvalue from #tmp1 a join #tmp2 b on a.id=b.id
DROP TABLE #tmp1
DROP TABLE #tmp2

How can I count the unique numbers in an array without rearranging the array elements?

I am having trouble counting the unique values in an array, and I need to do so without rearranging the array elements.
How can I accomplish this?
If you have .NET 3.5 you can easily achieve this with LINQ via:
int numberOfElements = myArray.Distinct().Count();
Non LINQ:
List<int> uniqueValues = new List<int>();
for(int i = 0; i < myArray.Length; ++i)
{
if(!uniqueValues.Contains(myArray[i]))
uniqueValues.Add(myArray[i]);
}
int numberOfElements = uniqueValues.Count;
This is a far more efficient non LINQ implementation.
var array = new int[] { 1, 2, 3, 3, 3, 4 };
// .Net 3.0 - use Dictionary<int, bool>
// .Net 1.1 - use Hashtable
var set = new HashSet<int>();
foreach (var item in array) {
if (!set.Contains(item)) set.Add(item);
}
Console.WriteLine("There are {0} distinct values. ", set.Count);
O(n) running time max_value memory usage
boolean[] data = new boolean[maxValue];
for (int n : list) {
if (data[n]) counter++
else data[n] = true;
}
Should only the distinct values be counted or should each number in the array be counted (e.g. "number 5 is contained 3 times")?
The second requirement can be fulfilled with the starting steps of the counting sort algorithm.
It would be something like this:
build a set where the index/key is
the element to be counted
a key is connected to a variable which holds the number of occurences
of the key element
iterate the array
increment value of key(array[index])
Regards

Categories