Note: There is quite a bit of text here but its additional information that may or may not be required. If I don't include this, then responders ask for more information or make suggestions for approaches that may not work, due to my entire implementation. If there is a better way to structure this so that all information is available, please let me know the best way to provide this supporting information. The gist is that I am trying to is here:
I am building a program that will read files and determine an organization (internal org code) to associate each file to, based on a predefined list of orgs and how often each org shows up in the file. The basic premise and current manual solution is:
Manual Process
Receive files in an unusual but overall consistent data format.
Each file has data pertaining to an organization's transactions (inbound and outbound types)
Each row in the file represents a single transaction; however, the fields are not always in the same order.
Goal is to determine which org "owns" the transactions (the entire file will be associated with this org at the end)
To be determined as the transaction "owner", the org in question must show up the same amount of times on each row (e.g. if Org1 shows up twice on Row 1 and once on Row 2, it can't be our Org due to inconsistency).
Nearly 100% of the time, the correctly associated Org has the highest frequency in the file, consistent across every row checked so programming this will give us a 99% solution. We can refine it further, if needed, once these rules are implemented
So far in my project, I have implemented a class with the following members:
My class thus far (summary of all logic at this point):
Properties/fields
_orgArray - list of orgs to look for in each file
_orgLength - length of org code (used for iterating over character in file line and creating a buffer of specified length to check if buffer value contains org)
Methods
GetDictIntersects - accepts two dictionaries and returns a dictionary containing keys and values that exist in both input dictionaries
GetLinesFromFile - iterator that reads a file and yields a single lines from the file each iteration
GetOrgCounts - accepts a line of text (returned from GetLinesFromFile) and returns a dictionary containing each Org and its count of occurrences in the line of text
ReduceOrgArray - accepts a two dictionaries containing Org Counts (returned by GetOrgCounts) as well as an array of Orgs. Returns a new array of orgs having removed any entries that did not have consistencies amongst them (using GetDictIntersects) as well as any orgs that did not have any values > 0 (indicating no occurrences in lines of text)
What the DetermineOrg method should do
To determine an Org for each file, I'm working on a method named DetermineOrg. My goal is to utilize the above methods and properties to
starts with org array containing all potential orgs
iterates the specified file (using GetLinesFromFile)
reduces org array by removing orgs that don't meet criteria on each line by comparing the current line's org counts (returned from GetOrgCounts) with the previous line's org counts (using GetDictIntersects and ReduceOrgArray)
Once file is iterated, if only a single Org remains in array, that is the return value
If multiple Orgs remain, Org with highest count is returned. If there is a tie, then Org (that tied with Max count) that shows up first in _orgArray is the return value
However, I can't seem to figure out the logic of how structure this so that I can recursively implement this. My code for the method alone in question (in case you don't want the entire class) is below. Hopefully someone can point out what I'm doing incorrectly and point me in the right direction:
Stuck on this method:
/// <summary>
/// Determine the appropriate organization for specified file
/// Recursively apply the following business rules:
/// <list type="number">
/// <item>Initial run uses all Orgs</item>
/// <item>Get Org Counts for each org on a line-by-line basis</item>
/// <item>Compare each line's Org Counts with the previous line's Org Counts, removing any orgs from potential org list that do not have same counts on both lines</item>
/// <item>After entire file has been read, determine a single Org by identifying which Org has the most occurences (highest value in dict)</item>
/// <item>In case of ties, class member org array lists order of precedence. Org with lowest index takes precedence.</item>
/// </list>
/// </summary>
/// <param name="filePath"><c>string</c> - file to be processed</param>
/// <param name="numLines"><c>int</c>:
/// Number of lines to be read from file in order to determine associated org.
/// Value less than 1 indicates to read the entire file.
/// Default value is -1 (read entire file to determine associated org).
/// </param>
/// <param name="orgArray"><c>string[]</c> representing potential orgs that file may be associated with</param>
/// <param name="streamReader"><c>StreamReader</c> stream to specified file (read-only)</param>
/// <param name="prevOrgCounts"><c>int</c> representing Org Counts for previous line of text</param>
/// <returns><c>string</c> - represents org that file is associated with</returns>
public static string DetermineOrg(string filePath, int numLines = -1, string[] orgArray = null, IEnumerable<string> streamReader = null, Dictionary<string, int> prevOrgCounts = null)
{
// base condition - no streamreader exists yet
if (streamReader == null)
{
streamReader = GetLinesFromFile(filePath, numLines);
// if no orgArray value is set, then use class member as starting value
if (orgArray == null)
{
orgArray = _orgArray;
}
}
else
{
// get org counts from iterator
foreach (string line in streamReader)
{
Dictionary<string, int> currentOrgCounts = GetOrgCounts(line, orgArray);
// if we have previous and current counts, then get reduce orgs
if (prevOrgCounts != null)
{
orgArray = ReduceOrgArray(currentOrgCounts, prevOrgCounts, orgArray);
}
else
{
}
}
}
// base condition - if no counts yet, then get counts from filePath
if (prevOrgCounts == null)
{
foreach (string line in GetLinesFromFile(filePath, numLines))
{
prevOrgCounts = GetOrgCounts(filePath, _orgArray);
}
}
}
Entire Class / All Code
The entire class can be found below.
using System;
namespace OrgProcessor
{
internal class OrgProcessor
{
/// <summary>
/// <c>string[]</c> containing list of orgs in order of precedence. In case of ties between counts, order with lowest index takes precedence in org determination.
/// </summary>
static string[] _orgArray = { "Org1", "Org2", "Org3"};
/// <summary>
/// Length of Org (is consistent amongst orgs as each org is an "org code" representing an org that can be found in a lookup table.
/// </summary>
static byte _orgLength = 4;
/// <summary>
/// Compare 2 dictionaries and return a dictionary containing only keys and values that exist in both dictionaries.
/// </summary>
/// <param name="dictionary1"><c>Dictionary<string, int></c></param>
/// <param name="dictionary2"><c>Dictionary<string, int></param>
/// <returns><c>Dictionary<string, int> - New dictionary containing key-value pairs that exist in both input dictionaries</returns>
public static Dictionary<string, int> GetDictIntersects(Dictionary<string, int> dictionary1, Dictionary<string, int> dictionary2)
{
// only return entries that exist in both dictionaries
Dictionary<string, int> returnDict = new Dictionary<string, int>();
foreach (dynamic key in dictionary1.Keys)
{
// ensure key exists in other dictionary's keys AND values match in both dictionaries
if (dictionary2.ContainsKey(key))
{
if (dictionary2[key] == dictionary1[key])
{
returnDict.Add(key, dictionary1[key]);
}
}
}
return returnDict;
}
/// <summary>
/// Iterator method returning file content, line by line
/// </summary>
/// <param name="filePath"><c>string</c> - path to file to read from</param>
/// <param name="numLines"><c>int</c> - number of lines to read from the file. Negative numbers will be interpreted as "All Lines". Default value is -1 (Read "All" lines)</param>
/// <returns><c>IEnumerable</c><<c>string</c>></returns>
public static IEnumerable<string> GetLinesFromFile(string filePath, int numLines = -1)
{
// TODO: Make more generic so iterator can take instructions to manipulate lines in file
// and optionally write to file (would need to write to temp, then delete orig and rename temp
// track lines iterated
int i = 0;
// create reader
using (StreamReader reader = new StreamReader(filePath))
{
// yield line if not reached end of file AND
// num lines is not specified (i == -1) OR
// i (num lines return) has not exceeded num lines specified
while (!reader.EndOfStream && (i < numLines || numLines == -1))
{
yield return reader.ReadLine();
i++;
}
}
}
/// <summary>
/// Get number of times each org occurs in specified text
/// </summary>
/// <param name="lineOfText"><c>string</c> of text to process</param>
/// <param name="orgArray"><c>string[]</c> containing orgs to be counted</param>
/// <returns><c>Dictionary<string, int></c> containing each Org and number of occurences in specified text</returns>
public static Dictionary<string, int> GetOrgCounts(string lineOfText, string[] orgArray)
{
// instantiate return value
Dictionary<string, int> orgCounts = new Dictionary<string, int>();
// get length of line of text as it will be referenced multiple times
int textLength = lineOfText.Length;
foreach (string org in orgArray)
{
//// set matchCount to 0
//// int matchCount = 0;
// since orgs are 4 characters long, iterate each character and compare with next 3 characters for each
for(int i = 0; i < textLength; i ++)
{
// get character at index
char c = lineOfText[i];
// calculate remaining characters
int remainingChars = textLength - i;
// char can only be part of an org if enough characters remain in lineOfText
if (remainingChars >= _orgLength)
{
// Get amount of chars that equals org length
string curCharBuffer = lineOfText.Substring(i, _orgLength);
// if characters match current org, then increment count dictionary
// or add to dict with count of 1
if (curCharBuffer == org)
{
if (orgCounts.ContainsKey(org))
{
orgCounts[org] += 1;
}
else
{
orgCounts.Add(org, 1);
}
// no need to evaluate other characters that were part of org so adjust loop incrementer to start next iteration after buffer
i += curCharBuffer.Length - 1;
}
}
}
//// orgCounts[org] = matchCount;
}
return orgCounts;
}
/// <summary>
/// Accepts 2 dictionaries (containing orgs and associated counts) and an array of strings (representing potential orgs) and returns a new string array having removed any "invalid" strings based on specified business rules:
/// <list type="number">
/// <item>If no orgs exist in array, return empty array</item>
/// </list>
/// </summary>
/// <param name="dictA"><c>Dictionary<string, int> containing orgs and counts</c></param>
/// <param name="dictB"><c>Dictionary<string, int> containing orgs and counts</c></param>
/// <param name="orgArray"><c>string[]</c> portential orgs</param>
/// <returns><c>string[]</c> - represents potential orgs</returns>
public static string[] ReduceOrgArray(Dictionary<string, int> dictA, Dictionary<string, int> dictB, string[] orgArray)
{
// base condition - if orgArray is empty, then return as is
if (orgArray.Length == 0)
{
return orgArray;
}
// Return value
List<string> remainingOrgs= new List<string>();
// business rules require that org counts be same from record to record
// we can rule out potential orgs by removing those with inconsistent counts amongst records
Dictionary<string, int> remainingDict = GetDictIntersects(dictA, dictB);
// Iterate over remaining orgs
// remove those that don't have matching keys in remaining dict (showing inconsistency amongst recs)
// remove those with 0 counts (org must exist on all rows)
foreach (string org in orgArray)
{
if (remainingDict.ContainsKey(org))
{
if (remainingDict[org] > 0)
{
remainingOrgs.Add(org);
}
}
}
// return remainingOrgs as array
return remainingOrgs.ToArray();
}
/// <summary>
/// Determine the appropriate organization for specified file
/// Recursively apply the following business rules:
/// <list type="number">
/// <item>Initial run uses all Orgs</item>
/// <item>Get Org Counts for each org on a line-by-line basis</item>
/// <item>Compare each line's Org Counts with the previous line's Org Counts, removing any orgs from potential org list that do not have same counts on both lines</item>
/// <item>After entire file has been read, determine a single Org by identifying which Org has the most occurences (highest value in dict)</item>
/// <item>In case of ties, class member org array lists order of precedence. Org with lowest index takes precedence.</item>
/// </list>
/// </summary>
/// <param name="filePath"><c>string</c> - file to be processed</param>
/// <param name="numLines"><c>int</c>:
/// Number of lines to be read from file in order to determine associated org.
/// Value less than 1 indicates to read the entire file.
/// Default value is -1 (read entire file to determine associated org).
/// </param>
/// <param name="orgArray"><c>string[]</c> representing potential orgs that file may be associated with</param>
/// <param name="streamReader"><c>StreamReader</c> stream to specified file (read-only)</param>
/// <param name="prevOrgCounts"><c>int</c> representing Org Counts for previous line of text</param>
/// <returns><c>string</c> - represents org that file is associated with</returns>
public static string DetermineOrg(string filePath, int numLines = -1, string[] orgArray = null, IEnumerable<string> streamReader = null, Dictionary<string, int> prevOrgCounts = null)
{
// base condition - no streamreader exists yet
if (streamReader == null)
{
streamReader = GetLinesFromFile(filePath, numLines);
// if no orgArray value is set, then use class member as starting value
if (orgArray == null)
{
orgArray = _orgArray;
}
}
else
{
// get org counts from iterator
foreach (string line in streamReader)
{
Dictionary<string, int> currentOrgCounts = GetOrgCounts(line, orgArray);
// if we have previous and current counts, then get reduce orgs
if (prevOrgCounts != null)
{
orgArray = ReduceOrgArray(currentOrgCounts, prevOrgCounts, orgArray);
}
else
{
}
}
}
// base condition - if no counts yet, then get counts from filePath
if (prevOrgCounts == null)
{
foreach (string line in GetLinesFromFile(filePath, numLines))
{
prevOrgCounts = GetOrgCounts(filePath, _orgArray);
}
}
}
}
}
If I understood your code correctly, you only keep track of the current list of candidates, but not the ocurrence count so you have nothing to compare to.
Here is my solution:
string[] orgs = { "ORG1", "ORG2", "ORG3"};
// This array will hold
// - either the number of times the corresponding organisation name occurs in each line
// - or 0 if the corresponding organisation name does not occur in any line
// or occurs with different count in some lines
var orgCounts = new int[orgs.Length];
var readLines = 0; // the number of lines read so far (used to recognize the first line)
foreach (var line in File.ReadLines(fileName))
{
for (var i = 0; i < orgs.Length; i++)
{
// count the occurrences of orgs[i] in the read line
var count = CountOccurrences(line, orgs[i]);
if (readLines == 0)
{
// first line: just remember the count of occurrences
orgCounts[i] = count;
}
else if (orgCounts[i] != count)
{
// mismatch, set count to 0
orgCounts[i] = 0;
}
readLines++;
}
}
// helper function to count the occurrences of toFind in str
int CountOccurrences(string str, string toFind)
{
var count = 0;
var index = str.IndexOf(toFind);
while (index >= 0)
{
count++;
index = str.IndexOf(toFind, index+1);
}
return count;
}
This is the solution that worked. This was basically the final piece to the puzzle and now everything fits together:
public static string DetermineOrg(string filePath)
{
// Initialize variables to store previous line's org counts and current list of potential orgs
Dictionary<string, int> previousLineOrgCounts = null;
string[] potentialOrgs = _orgArray;
// Iteratve over the file's lines using the GetLinesFromFile method
foreach(string line in GetLinesFromFile(filePath))
{
// Get the org counts for the current line
Dictionary<string, int> currentLineOrgCounts = GetOrgCounts(line, potentialOrgs);
// Check if this is not the first line of the file
if (previousLineOrgCounts != null)
{
// reduce the list of potential orgs using the current and previous line org counts
potentialOrgs = ReduceOrgArray(previousLineOrgCounts, currentLineOrgCounts, potentialOrgs);
}
// update the previous line or g counts for the next iteration
previousLineOrgCounts = currentLineOrgCounts;
}
// Check if only one org remains in the potential orgs list
if(potentialOrgs.Length == 1)
{
// return the remaining org
return potentialOrgs[0];
}
else
{
// Initialize variables to store the max org count and the org with the max count
int maxOrgCount = 0;
string orgWithMaxCount = null;
// Iterate over the remaining potential orgs
foreach (string org in potentialOrgs)
{
// Check if the current org has a higher count that the previous max count
if (previousLineOrgCounts[org] > maxOrgCount)
{
// update the max org count and the org with the max count
maxOrgCount = previousLineOrgCounts[org];
orgWithMaxCount = org;
}
}
// Return the org with the max count
return orgWithMaxCount;
}
}
Related
I have a method that yields an IEnumerable, but it uses the yield keyword to return elements when executed. I don't always know how big the total collection is. It's kind of similar to the standard Fibonacci example when you go to Try .NET except that it will yield a finite number of elements. That being said, because there's no way of knowing how many elements it will return beforehand, it could keep yielding pretty much forever if there are too many.
When I looked for other questions about this topic here, one of the answers provided a clean LINQ query to randomly sample N elements from a collection. However, the assumption here was that the collection was static. If you go to the Try .NET website and modify the code to use the random sampling implementation from that answer, you will get an infinite loop.
public static void Main()
{
foreach (var i in Fibonacci().OrderBy(f => Guid.NewGuid()).Take(20))
{
Console.WriteLine(i);
}
}
The query tries to order all the elements in the returned IEnumerable, but to order all the elements it must first calculate all the elements, of which there are an infinite number, which means it will keep going on and on and never return an ordered collection.
So what would be a good strategyfor randomly sampling an IEnumerable with an unknown number of contained elements? Is it even possible?
If a sequence is infinite in length, you can't select N elements from it in less than infinite time where the chance of each element in the sequence being selected is the same.
However it IS possible to select N items from a sequence of unknown, but finite, length. You can do that using reservoir sampling.
Here's an example implementation:
/// <summary>
/// This uses Reservoir Sampling to select <paramref name="n"/> items from a sequence of items of unknown length.
/// The sequence must contain at least <paramref name="n"/> items.
/// </summary>
/// <typeparam name="T">The type of items in the sequence from which to randomly choose.</typeparam>
/// <param name="items">The sequence of items from which to randomly choose.</param>
/// <param name="n">The number of items to randomly choose from <paramref name="items"/>.</param>
/// <param name="rng">A random number generator.</param>
/// <returns>The randomly chosen items.</returns>
public static T[] RandomlySelectedItems<T>(IEnumerable<T> items, int n, System.Random rng)
{
// See http://en.wikipedia.org/wiki/Reservoir_sampling for details.
var result = new T[n];
int index = 0;
int count = 0;
foreach (var item in items)
{
if (index < n)
{
result[count++] = item;
}
else
{
int r = rng.Next(0, index + 1);
if (r < n)
result[r] = item;
}
++index;
}
if (index < n)
throw new ArgumentException("Input sequence too short");
return result;
}
This must still iterate over the entire sequence, however, so it does NOT work for an infinitely long sequence.
If you want to support input sequences longer than 2^31, you can use longs in the implementation like so:
public static T[] RandomlySelectedItems<T>(IEnumerable<T> items, int n, System.Random rng)
{
// See http://en.wikipedia.org/wiki/Reservoir_sampling for details.
var result = new T[n];
long index = 0;
int count = 0;
foreach (var item in items)
{
if (index < n)
{
result[count++] = item;
}
else
{
long r = rng.NextInt64(0, index + 1);
if (r < n)
result[r] = item;
}
++index;
}
if (index < n)
throw new ArgumentException("Input sequence too short");
return result;
}
Note that this implementation requires .Net 6.0 or higher because of the rng.NextInt64().
Also note that there's no point in making n long because you can't have an array that exceeds ~2^31 elements, so it wouldn't be possible to fill it. You could in theory fix that by returning multiple arrays, but I'll leave that as an exercise for the reader. ;)
Ok so, I have a list of lists, like the title says and I want to make combinations of k lists in which every list has different elements than the rest.
Example:
I have the following list of lists:
{ {1,2,3} , {1,11} , {2,3,6} , {6,5,7} , {4,8,9} }
A valid 3-sized combination of these lists could be:
{ {1,11}, {4,8,9} ,{6,5,7} }
This is only ONE of the valid combinations, what I want to return is a list of all the valid combinations of K lists.
An invalid combination would be:
{ {1,11} ,{2, 3, 6}, {6, 5, 7} }
because the element 6 is present in the second and third list.
I already have a code that does this but it just finds all possible combinations and checks if they are valid before addding it to a final result list. As this list of lists is quite large (153 lists) when K gets bigger, the time taken is ridiculously big too (at K = 5 it takes me about 10 minutes.)
I want to see if there's an efficient way of doing this.
Below is my current code (the lists I want to combine are attribute of the class Item):
public void recursiveComb(List<Item> arr, int len, int startPosition, Item[] result)
{
if (len == 0)
{
if (valid(result.ToList()))
{
//Here I add the result to final list
//valid is just a function that checks if any list has repeated elements in other
}
return;
}
for (int i = startPosition; i <= arr.Count - len; i++)
{
result[result.Length - len] = arr[i];
recursiveComb(arr, len - 1, i + 1, result);
}
}
Use a HashSet
https://msdn.microsoft.com/en-us/library/bb359438(v=vs.110).aspx
to keep track of distinct elements as you build the output from the candidates in the input list of lists/tuples
accumulate an output list of non overlapping tuples by Iterating across the input list of tuples and evaluate each tuple as a candidate as follows:
For each input tuple, insert each tuple element into the HashSet. If the element you are trying to insert is already in the set, then the tuple fails the constraint and should be skipped, otherwise the tuple elements are all distinct from ones already in the output.
The hashset object effectively maintains a registry of distinct items in your accepted list of tuples.
If I understood your code correctly then, you are passing each list<int> from your input to recursiveComb() function. which look like this
for(int i = 0; i < inputnestedList.Count; i++)
{
recursiveComb();
// Inside of recursiveComb() you are using one more for loop with recursion.
// This I observed from your first parameter i.e. List<int>
}
Correct me if I am wrong
This leads to time complexity more than O(n^2)
Here is my simplest solution, with two forloops without recursion.
List<List<int>> x = new List<List<int>>{ new List<int>(){1,2,3} , new List<int>(){1,11} , new List<int>(){2,3,6} , new List<int>(){6,5,7} , new List<int>(){4,8,9} };
List<List<int>> result = new List<List<int>>();
var watch = Stopwatch.StartNew();
for (int i = 0; i < x.Count;i++)
{
int temp = 0;
for (int j = 0; j < x.Count; j++)
{
if (i != j && x[i].Intersect(x[j]).Any())
temp++;
}
// This condition decides, that elements of ith list are available in other lists
if (temp <= 1)
result.Add(x[i]);
}
watch.Stop();
var elapsedMs = watch.Elapsed.TotalMilliseconds;
Console.WriteLine(elapsedMs);
Now when I print execution time then output is
Execution Time: 11.4628
Check execution time of your code. If execution time of your code is higher than mine, then you can consider it as efficient code
Proof of code: DotNetFiddler
Happy coding
If I understood your problem correctly then this will work:
/// <summary>
/// Get Unique List sets
/// </summary>
/// <param name="sets"></param>
/// <returns></returns>
public List<List<T>> GetUniqueSets<T>(List<List<T>> sets )
{
List<List<T>> cache = new List<List<T>>();
for (int i = 0; i < sets.Count; i++)
{
// add to cache if it's empty
if (cache.Count == 0)
{
cache.Add(sets[i]);
continue;
}
else
{
//check whether current item is in the cache and also whether current item intersects with any of the items in cache
var cacheItems = from item in cache where (item != sets[i] && item.Intersect(sets[i]).Count() == 0) select item;
//if not add to cache
if (cacheItems.Count() == cache.Count)
{
cache.Add(sets[i]);
}
}
}
return cache;
}
Tested, it's fast and took 00:00:00.0186033 for finding sets.
I am currently working on a recursively parallel Quicksort extension function for the List class. The code below represents the most basic thread distribution criteria I've considered because it should be the simplest to conceptually explain. It branches to the depth of the base-2 logarithm of the number of detected processors, and proceeds sequentially from there. Thus, each CPU should get one thread with a (roughly) equal, large share of data to process, avoiding excessive overhead time. The basic sequential algorithm is provided for comparison.
public static class Quicksort
{
/// <summary>
/// Helper class to hold information about when to parallelize
/// </summary>
/// <attribute name="maxThreads">Maximum number of supported threads</attribute>
/// <attribute name="threadDepth">The depth to which new threads should
/// automatically be made</attribute>
private class ThreadInfo
{
internal int maxThreads;
internal int threadDepth;
public ThreadInfo(int length)
{
maxThreads = Environment.ProcessorCount;
threadDepth = (int)Math.Log(maxThreads, 2);
}
}
/// <summary>
/// Helper function to perform the partitioning step of quicksort
/// </summary>
/// <param name="list">The list to partition</param>
/// <param name="start">The starting index</param>
/// <param name="end">The ending index/param>
/// <returns>The final index of the pivot</returns>
public static int Partition<T>(this List<T> list, int start, int end) where T: IComparable
{
int middle = (int)(start + end) / 2;
// Swap pivot and first item.
var temp = list[start];
list[start] = list[middle];
list[middle] = temp;
var pivot = list[start];
var swapPtr = start;
for (var cursor = start + 1; cursor <= end; cursor++)
{
if (list[cursor].CompareTo(pivot) < 0)
{
// Swap cursor element and designated swap element
temp = list[cursor];
list[cursor] = list[++swapPtr];
list[swapPtr] = temp;
}
}
// Swap pivot with final lower item
temp = list[start];
list[start] = list[swapPtr];
list[swapPtr] = temp;
return swapPtr;
}
/// <summary>
/// Method to begin parallel quicksort algorithm on a Comparable list
/// </summary>
/// <param name="list">The list to sort</param>
public static void QuicksortParallel<T>(this List<T> list) where T : IComparable
{
if (list.Count < 2048)
list.QuicksortSequential();
else
{
var info = new ThreadInfo(list.Count);
list.QuicksortRecurseP(0, list.Count - 1, 0, info);
}
}
/// <summary>
/// Method to implement parallel quicksort recursion on a Comparable list
/// </summary>
/// <param name="list">The list to sort</param>
/// <param name="start">The starting index of the partition</param>
/// <param name="end">The ending index of the partition (inclusive)</param>
/// <param name="depth">The current recursive depth</param>
/// <param name="info">Structure holding decision-making info for threads</param>
private static void QuicksortRecurseP<T>(this List<T> list, int start, int end, int depth,
ThreadInfo info)
where T : IComparable
{
if (start >= end)
return;
int middle = list.Partition(start, end);
if (depth < info.threadDepth)
{
var t = Task.Run(() =>
{
list.QuicksortRecurseP(start, middle - 1, depth + 1, info);
});
list.QuicksortRecurseP(middle + 1, end, depth + 1, info);
t.Wait();
}
else
{
list.QuicksortRecurseS(start, middle - 1);
list.QuicksortRecurseS(middle + 1, end);
}
}
/// <summary>
/// Method to begin sequential quicksort algorithm on a Comparable list
/// </summary>
/// <param name="list">The list to sort</param>
public static void QuicksortSequential<T>(this List<T> list) where T : IComparable
{
list.QuicksortRecurseS(0, list.Count - 1);
}
/// <summary>
/// Method to implement sequential quicksort recursion on a Comparable list
/// </summary>
/// <param name="list">The list to sort</param>
/// <param name="start">The starting index of the partition</param>
/// <param name="end">The ending index of the partition (inclusive)</param>
private static void QuicksortRecurseS<T>(this List<T> list, int start, int end) where T : IComparable
{
if (start >= end)
return;
int middle = list.Partition(start, end);
// Now recursively sort the (approximate) halves.
list.QuicksortRecurseS(start, middle - 1);
list.QuicksortRecurseS(middle + 1, end);
}
}
As far as I understand, this methodology should produce a one-time startup cost, then proceed in sorting the rest of the data significantly faster than sequential method. However, the parallel method takes significantly more time than the sequential method, which actually increases as the load increases. Benchmarked at a list of ten million items on a 4-core CPU, the sequential method averages around 18 seconds to run to completion, while the parallel method takes upwards of 26 seconds. Increasing the allowed thread depth rapidly exacerbates the problem.
Any help in finding the performance hog is appreciated. Thanks!
The problem is CPU cache conflict, also known as "false sharing."
Unless the memory address of the pivot point happens to fall on a cache line, one of the threads will get a lock on the L1 or L2 cache and the other one will have to wait. This can make performance even worse than a serial solution. The problem is described well in this article:
...where threads use different objects but those objects happen to be close enough in memory that they fall on the same cache line, and the cache system treats them as a single lump that is effectively protected by a hardware write lock that only one core can hold at a time. [1,2] This causes real but invisible performance contention; whichever thread currently has exclusive ownership so that it can physically perform an update to the cache line will silently throttle other threads that are trying to use different (but, alas, nearby) data that sits on the same line.
(snip)
In most cases, the parallel code ran actually ran slower than the sequential code, and in no case did we get any better than a 42% speedup no matter how many cores we threw at the problem.
Grasping at straws.... you may be able to do better if you separate the list into two objects, and pin them in memory (or even just pad them within a struct) so they are far enough apart that they won't have a cache conflict.
I try to get the previous cells of a given range. So far my code looks like this:
Get the selected range and pass it to another method
Microsoft.Office.Interop.Excel._Application app = this.ExcelAppObj as Microsoft.Office.Interop.Excel._Application;
Microsoft.Office.Interop.Excel.Range range = null;
range = app.get_Range(this.DataRangeTextBox.Text);
var caption = ExcelHelper.GetRangeHeaderCaption(range);
The following method is executed
/// <summary>
/// Gets the range header caption of a given range.
/// The methode goes through every previous cell of the given range to determine the caption.
/// If the caption can not be determined the method returns an random string.
/// </summary>
/// <param name="selectedRange">The selected range.</param>
/// <returns></returns>
public static string GetRangeHeaderCaption(Range selectedRange, Microsoft.Office.Interop.Excel._Application excelApp)
{
// The caption of the range. The default value is a random string
var rangeCaption = ExcelHelper.getRandomString(5);
// Check if the provided range is valid
if (selectedRange != null && excelApp.WorksheetFunction.CountA(selectedRange) > 0)
{
var captionNotFound = true;
Range rangeToCheck = selectedRange.Previous;
// Go to each previous cell of the provided range
// to determine the caption of the range
do
{
// No further previous cells found
// We can stop further processing
if (rangeToCheck.Cells.Count == 0)
{
break;
}
//System.Array myvalues = (System.Array)rangeToCheck.Cells.Value;
System.Array myvalues = (System.Array)rangeToCheck.Cells.Value;
rangeToCheck = rangeToCheck.Previous;
} while (captionNotFound);
}
return rangeCaption;
}
At the point
var rangeToCheck = selectedRange.Previous;
the property access throws the following exception:
A first chance exception of type 'System.Runtime.InteropServices.COMException' occurred in mscorlib.dll
Additional information: Die Previous-Eigenschaft des Range-Objektes kann nicht zugeordnet werden.
What i want to achieve:
Go through all previous cells of the given data range containing double or int values and get the header caption by check if the previous cell is a numeric value or a string. If the cell contains a string return the string to the caller.
Edit #1
Maybe this is an important information. The method GetRangeHeaderCaption is implemented in another class. It is not included in the class where i get the the range by using the Excel Interop.
Edit #2
Found the problem.
The property Previous returns the previous LEFT cell of the given range. For example, if my range has the address B2:B16 the previous property returns the address A2. So if i try to access the Previous property of A2:A16 i get the exception because there is no column before the column A.
But what i need is that if i have the range B2:B16 i need to get the content of B1. Can you follow me so far?
Okay then, i used my brain and worked out a solution for this problem. Its a workaround and i don't know if there is another way or better way to solve this - see my code here:
using XLS = Microsoft.Office.Interop.Excel;
/// <summary>
/// Gets the range header caption of a given range.
/// The methode goes through every previous cell of the given range to determine the caption.
/// If the caption can not be determined the method returns an random string.
/// </summary>
/// <param name="selectedRange">The selected range.</param>
/// <returns></returns>
public static string GetRangeHeaderCaption(Range selectedRange, XLS._Application excelApp)
{
// The caption of the range. The default value is a random string
var rangeCaption = ExcelHelper.getRandomString(5);
// Check if the provided range is valid
if (selectedRange != null && excelApp.WorksheetFunction.CountA(selectedRange) > 0)
{
// Get the included cells of the range
var rangeCells = selectedRange.Address.Split(new char[] { ':' });
// Get the beginning cell of the range
var beginCell = rangeCells[0].Trim();
// Get the column and row data of the cell
var cellColumnRow = beginCell.Split(new char[] { '$' });
// Split the beginning cell into the column and the row
var cellColumn = cellColumnRow[1];
var cellRow = Convert.ToInt32(cellColumnRow[2]);
var captionNotFound = true;
int i = 0;
// Go to each previous cell of the provided range
// to determine the caption of the range
do
{
// Check if the next cell would be invalid
var nextCellRow = cellRow - i;
if (nextCellRow == 0)
break;
// Create the cell coordinates to look at
var cellToLook = string.Format("{0}{1}",
cellColumn,
nextCellRow);
// Get the value out of the cell
var cellRangeValue = ((XLS.Worksheet)((XLS.Workbook)excelApp.ActiveWorkbook).ActiveSheet).get_Range(cellToLook).Value;
// ...just to be sure it does not crash
if (cellRangeValue != null)
{
// Convert the determined value to an string
var cellValue = cellRangeValue.ToString();
double value;
// Check if the cell value is not a numeric value
// Check if the cell value is not empty or null
if (!double.TryParse(cellValue, out value) &&
!string.IsNullOrWhiteSpace(cellValue) &&
!string.IsNullOrEmpty(cellValue))
{
// In this case we found the caption
rangeCaption = cellValue;
captionNotFound = false;
}
}
i++;
} while (captionNotFound);
}
return rangeCaption;
}
I want to make a simple program (lottery numbers generator) that takes numbers within a specific range and shuffles them "n" number of times, after each shuffle it selects one random number and moves it from the list of a given range to a new list, and does this for "n" number of times (until it selects specific amount of numbers, 7 to be exact). I have found an algorithm that does exactly that (an extension method or shuffling generic lists). But I'm not that into programming and I have a problem with displaying the results (the list with the drawn numbers) to a TextBox or Label, however I have got it to work with a MessageBox. But with TextBox/Label I get the error "The name * does not exist in current context". I've googled for a solution but no help what so ever.
Here's the code:
private void button1_Click(object sender, EventArgs e)
{
List<int> numbers;
numbers = Enumerable.Range(1, 39).ToList();
numbers.Shuffle();
}
private void brojevi_TextChanged(object sender, EventArgs e)
{
}
}
}
/// <summary>
/// Class for shuffling lists
/// </summary>
/// <typeparam name="T">The type of list to shuffle</typeparam>
public static class ListShufflerExtensionMethods
{
//for getting random values
private static Random _rnd = new Random();
/// <summary>
/// Shuffles the contents of a list
/// </summary>
/// <typeparam name="T">The type of the list to sort</typeparam>
/// <param name="listToShuffle">The list to shuffle</param>
/// <param name="numberOfTimesToShuffle">How many times to shuffle the list, by default this is 5 times</param>
public static void Shuffle<T>(this List<T> listToShuffle, int numberOfTimesToShuffle = 7)
{
//make a new list of the wanted type
List<T> newList = new List<T>();
//for each time we want to shuffle
for (int i = 0; i < numberOfTimesToShuffle; i++)
{
//while there are still items in our list
while (listToShuffle.Count >= 33)
{
//get a random number within the list
int index = _rnd.Next(listToShuffle.Count);
//add the item at that position to the new list
newList.Add(listToShuffle[index]);
//and remove it from the old list
listToShuffle.RemoveAt(index);
}
//then copy all the items back in the old list again
listToShuffle.AddRange(newList);
//display contents of a list
string line = string.Join(",", newList.ToArray());
brojevi.Text = line;
//and clear the new list
//to make ready for next shuffling
newList.Clear();
break;
}
}
}
}
The problem is that brojevi (either TextBox or Label) isn't defined in the scope of the extension method, it is a Control so it should be defined in your Form. So, when you shuffle your numbers, put them in the TextBox during the execution of the button1_Click event handler
Remove the lines:
string line = string.Join(",", newList.ToArray());
brojevi.Text = line;
EDIT:
You could change the extension method like this to return the string of drawn items or list of drawn items. Lets go for the list because you might want to use the numbers for other things. Also, I don't see the point in shuffling 7 times because you will be able to see only last shuffling. Therefore I think that one is enough. Check the code:
public static List<T> Shuffle<T>(this List<T> listToShuffle)
{
//make a new list of the wanted type
List<T> newList = new List<T>();
//while there are still items in our list
while (listToShuffle.Count >= 33)
{
//get a random number within the list
int index = _rnd.Next(listToShuffle.Count);
//add the item at that position to the new list
newList.Add(listToShuffle[index]);
//and remove it from the old list
listToShuffle.RemoveAt(index);
}
//then copy all the items back in the old list again
listToShuffle.AddRange(newList);
return newList;
}
And in button1_Click1 event handler we can have:
List<int> numbers;
numbers = Enumerable.Range(1, 39).ToList();
List<int> drawnNumbers = numbers.Shuffle();
string line = string.Join(",", drawnNumbers.ToArray());
brojevi.Text = line;
ListShufflerExtensionMethods doesn't know about your textbox (brojevi) because it's out of scope. You could restructure and make Shuffle return a value, then set the value of the textbox' text in callers scope.