I have strings like 1,2|3,4 and 1|2,3|4 and need to get the following permutations out of them (as an array/list).
Given 1,2|3,4 need to get 2 strings:
1,2,4
1,3,4
Given 1|2,3|4 need to get 4 strings:
1,3
1,4
2,3
2,4
It is basically splitting on the commas and then if those elements have a pipe create permutations for every pipe delimited sub-element (of the remaining elements). The solution needs to handle the general case of an unknown number of elements with pipes.
Interested in any solution that uses standard C# libraries.
Getting stuck on this one so searching for some thoughts from the community. I can't seem to get past the element with pipes...its almost like a "look ahead" is needed or something as I need to complete the string with the remaining comma separated elements (of which some may have pipes, which makes me think recursion but still can't wrap my head around it yet).
Ultimately order does not matter. The comma and pipe delimited elements are numbers (stored a strings) and the final string order does not matter so 1,2,4 = 1,4,2
And no, this is not homework. School ended over a decade ago.
We can do this in a fancy way with LINQ. First, we'll need Eric Lippert's CartesianProduct extension method:
static IEnumerable<IEnumerable<T>> CartesianProduct<T>( this IEnumerable<IEnumerable<T>> sequences )
{
IEnumerable<IEnumerable<T>> emptyProduct =
new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
( accumulator, sequence ) =>
from accseq in accumulator
from item in sequence
select accseq.Concat( new[] { item } ) );
}
Then we can simply do:
var a = "1|2,3|4".Split( ',' );
var b = a.Select( x => x.Split( '|' ) );
var res = b.CartesianProduct().Select( x => string.Join( ",", x ) );
And we're done!
I couldn't think of any edge cases now, but this works for both of your examples :
public IEnumerable<string> GetPermutation(string pattern)
{
var sets = pattern.Split(',');
var permutations = new[] { new string[] { } };
foreach(var set in sets)
{
permutations = set.Split('|')
.SelectMany(s => permutations.Select(x => x.Concat(new [] { s }).ToArray()))
.ToArray();
}
return permutations.Select(x => string.Join(",", x));
}
Looks like the LINQ solutions won out, at least as far as conciseness is concerned.
Here's my first attempt at the problem with plain C# code.
protected List<string> Results = new List<string>();
void GetPermutations(string s)
{
Results = new List<string>();
string[] values = s.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
GetPermutationsRecursive(String.Empty, values, 0);
}
void GetPermutationsRecursive(string soFar, string[] values, int index)
{
if (index < values.Length)
{
foreach (var y in GetVariations(values[index]))
{
string s = String.Format("{0}{1}{2}", soFar, soFar.Length > 0 ? "," : String.Empty, y);
GetPermutationsRecursive(s, values, index + 1);
}
}
else
{
Results.Add(soFar);
}
}
IEnumerable<string> GetVariations(string s)
{
int pos = s.IndexOf('|');
if (pos < 0)
{
yield return s;
}
else
{
yield return s.Substring(0, pos);
yield return s.Substring(pos + 1);
}
}
Related
I want to trim all the white-spaces and empty strings only from the starting and ending of an array without converting it into a string in C#.
This is what I've done so far to solve my problem but I'm looking for a bit more efficient solution as I don't want to be stuck with a just works solution to the prob
static public string[] Trim(string[] arr)
{
List<string> TrimmedArray = new List<string>(arr);
foreach (string i in TrimmedArray.ToArray())
{
if (String.IsEmpty(i)) TrimmedArray.RemoveAt(TrimmedArray.IndexOf(i));
else break;
}
foreach (string i in TrimmedArray.ToArray().Reverse())
{
if (String.IsEmpty(i)) TrimmedArray.RemoveAt(TrimmedArray.IndexOf(i));
else break;
}
return TrimmedArray.ToArray();
}
NOTE: String.IsEmpty is a custom function which check whether a string was NULL, Empty or just a White-Space.
Your code allocates a lot of new arrays unnecessarily. When you instantiate a list from an array, the list creates a new backing array to store the items, and every time you call ToArray() on the resulting list, you're also allocating yet another copy.
The second problem is with TrimmedArray.RemoveAt(TrimmedArray.IndexOf(i)) - if the array contains multiple copies of the same string value in the middle as at the end, you might end up removing strings from the middle.
My advice would be split the problem into two distinct steps:
Find both boundary indices (the first and last non-empty strings in the array)
Copy only the relevant middle-section to a new array.
To locate the boundary indices you can use Array.FindIndex() and Array.FindLastIndex():
static public string[] Trim(string[] arr)
{
if(arr == null || arr.Length == 0)
// no need to search through nothing
return Array.Empty<string>();
// define predicate to test for non-empty strings
Predicate<string> IsNotEmpty = string s => !String.IsEmpty(str);
var firstIndex = Array.FindIndex(arr, IsNotEmpty);
if(firstIndex < 0)
// nothing to return if it's all whitespace anyway
return Array.Empty<string>();
var lastIndex = Array.FindLastIndex(arr, IsNotEmpty);
// calculate size of the relevant middle-section from the indices
var newArraySize = lastIndex - firstIndex + 1;
// create new array and copy items to it
var results = new string[newArraySize];
Array.Copy(arr, firstIndex, results, 0, newArraySize);
return results;
}
I like the answer by Mathias R. Jessen as it is efficient and clean.
Just thought I'd show how to do it using the List<> as in your original attempt:
static public string[] Trim(string[] arr)
{
List<string> TrimmedArray = new List<string>(arr);
while (TrimmedArray.Count>0 && String.IsEmpty(TrimmedArray[0]))
{
TrimmedArray.RemoveAt(0);
}
while (TrimmedArray.Count>0 && String.IsEmpty(TrimmedArray[TrimmedArray.Count - 1]))
{
TrimmedArray.RemoveAt(TrimmedArray.Count - 1);
}
return TrimmedArray.ToArray();
}
This is not as efficient as the other answer since the internal array within the List<> has to shift all its elements to the left each time an element is deleted from the front.
I am using hashet, linq Intersect() and Count() to find intersection of two lists of strings.
Code being used
private HashSet<string> Words { get; }
public Sentence(IEnumerable<string> words)
{
Words = words.ToHashSet();
}
public int GetSameWordCount(Sentence sentence)
{
return Words.Intersect(sentence.Words).Count();
}
Method GetSameWordCount is Taking > 90% of program runtime as there are milions of Sentences to compare with each other.
Is there any faster way to do this?
I am using .net core 3.1.1 / C# 8 so any recent features can be used.
More info:
Input data is coming from text file (e.g. book excerpt, articles from web).
Sentences are then unaccented, lowercased and split to words by whitespace >regex.
Short words (<3 length) are ignored.
I am creating groups of sentences which have N words in common and ordering >these groups by number of shared words.
The below code will utilize HashSet<T>.Contains method which is more performant. Time complexity of HashSet<T>.Contains is O(1).
public int GetSameWordCount(Sentence sentence)
{
var count;
foreach(var word in sentence.Words)
{
if(Words.Contains(word))
count++;
}
return count;
}
Note
If the list of the words is sorted you can use below approach.
var enumerator1 = set1.GetEnumerator();
var enumerator2 = set2.GetEnumerator();
var count = 0;
if (enumerator1.MoveNext() && enumerator2.MoveNext())
{
while (true)
{
var value = enumerator1.Current.CompareTo(enumerator2.Current);
if (value == 0)
{
count++;
if (!enumerator1.MoveNext() || !enumerator2.MoveNext())
break;
}
else if (value < 0)
{
if (!enumerator1.MoveNext())
break;
}
else
{
if (!enumerator2.MoveNext())
break;
}
}
}
I have a list of strings, each containing a number substring, that I'd like to be reordered based on the numerical value of that substring. The set will look something like this, but much larger:
List<string> strings= new List<string>
{
"some-name-(1).jpg",
"some-name-(5).jpg",
"some-name-(5.1).jpg",
"some-name-(6).jpg",
"some-name-(12).jpg"
};
The number will always be surrounded by parentheses, which are the only parentheses in the string, so using String.IndexOf is reliable. Notice that not only may there be missing numbers, but there can also be decimals, not just integers.
I'm having a really tough time getting a reordered list of those same strings that has been ordered on the numerical value of that substring. Does anyone have a way of doing this, hopefully one that performs well? Thanks.
This will check if the items between the parenthesis is convertible to a double, if not it will return -1 for that case.
var numbers = strings.Select( x => x.Substring( x.IndexOf( "(" ) + 1,
x.IndexOf( ")" ) - x.IndexOf( "(" ) - 1 ) ).Select( x =>
{
double val;
if( double.TryParse( x, out val ) ) {
return val;
}
// Or whatever you want to do
return -1;
} ).OrderBy( x => x ); // Or use OrderByDescending
If you are sure there will always be a number between the parenthesis, then use this as it is shorter:
var numbers = strings.Select(
x => x.Substring( x.IndexOf( "(" ) + 1, x.IndexOf( ")" ) - x.IndexOf( "(" ) - 1 ) )
.Select( x => double.Parse(x))
.OrderBy( x => x ); // Or use OrderByDescending
EDIT
I need the original strings, just ordered on those numbers.
Basically what you need to do is to pass a predicate to the OrderBy and tell it to order by the number:
var items = strings.OrderBy(
x => double.Parse( x.Substring( x.IndexOf( "(" ) + 1,
x.IndexOf( ")" ) - x.IndexOf( "(" ) - 1 ) ));
How about an OO approach?
We are ordering string but we need to treat them like numbers. Wouldn't it be nice if there was a way we can just call OrderBy and it does the ordering for us? Well there is. The OrderBy method will use the IComparable<T> if there is one. Let's create a class to hold our jpg paths and implement the IComparable<T> interface.
public class CustomJpg : IComparable<CustomJpg>
{
public CustomJpg(string path)
{
this.Path = path;
}
public string Path { get; private set; }
private double number = -1;
// You can even make this public if you want.
private double Number
{
get
{
// Let's cache the number for subsequent calls
if (this.number == -1)
{
int myStart = this.Path.IndexOf("(") + 1;
int myEnd = this.Path.IndexOf(")");
string myNumber = this.Path.Substring(myStart, myEnd - myStart);
double myVal;
if (double.TryParse(myNumber, out myVal))
{
this.number = myVal;
}
else
{
throw new ArgumentException(string.Format("{0} has no parenthesis or a number between parenthesis.", this.Path));
}
}
return this.number;
}
}
public int CompareTo(CustomJpg other)
{
if (other == null)
{
return 1;
}
return this.Number.CompareTo(other.Number);
}
}
What is nice about the above approach is if we keep calling OrderBy, it will not have to search for the opening ( and ending ) and doing the parsing of the number every time. It caches it the first time it is called and then keeps using it. The other nice thing is that we can bind to the Path property and also to the Number (we would have to change the access modifier from private). We can even introduce a new property to hold the thumbnail image and bind to that as well. As you can see, this approach is far more flexible, clean and an OO approach. Plus the code for finding the number is in one place so if we switch from () to another symbol, we would just change it in one place. Or we can modify to look for () first and if not found look for another symbol.
Here is the usage:
List<CustomJpg> jpgs = new List<CustomJpg>
{
new CustomJpg("some-name-(1).jpg"),
new CustomJpg("some-name-(5).jpg"),
new CustomJpg("some-name-(5.1).jpg"),
new CustomJpg("some-name-(6).jpg"),
new CustomJpg("some-name-(12).jpg")
};
var ordered = jpgs.OrderBy(x => x).ToList();
You can use this approach for any object.
In the above example code return a list of numbers ordered by numbers but if you want have list of file names that ordered by name better you put in same zero to beginning of the numbers like "some-name-(001).jpg" and you can simply
order that
List<string> strings = new List<string>
{
"some-name-(001).jpg",
"some-name-(005.1).jpg",
"some-name-(005).jpg",
"some-name-(004).jpg",
"some-name-(006).jpg",
"some-name-(012).jpg"
};
var orederedByName =strings.Select(s =>s ).OrderBy(s=>s);
You can make the Substring selection easier, if you first cut off the part starting at the closing parenthesis ")". I.e., from "some-name-(5.1).jpg" you first get "some-name-(5.1". Then take the part after "(". This saves a length calculation, as the 2nd Substring automatically takes everything up to the end of string.
strings = strings
.OrderBy(x => Decimal.Parse(
x.Substring(0, x.IndexOf(")"))
.Substring(x.IndexOf("(") + 1)
)
)
.ToList();
This is probably not of great importance here, but generally, decimal stores numbers given in decimal notation more accurately than double. double can convert 17.2 as 17.19999999999999.
I have a list of integers that represent US ZIP codes, and I want to get unique values based on the first three digits of the ZIP code. For example this is my list:
10433
30549
10456
54933
60594
30569
30659
My result should contain only:
10433
30549
54933
60594
30659
The US ZIP codes excluded from my list are: 10456 and 30659 because I already have the ZIPs that contain 104xx and 306xx.
I really don't know how to get this done, I guess it's not that hard, but I have no idea. I've made a function, that saves me the unique first three digits, and I've added some random 2 digits at the end of each zip. But it didn't worked out because I got for example 10423 but 10423 is not in my list, and I don't have a specific pattern that all my numbers have the last 2 digits in a range.
A little Linq should work. If using a list of ints:
var zips = new[] { 10433, 30549, 10456, 54933, 60594, 30569, 30659 };
var results = zips.GroupBy(z => z / 100).Select(g => g.First());
Or if using a list of strings:
var zips = new[] { "10433", "30549", "10456", "54933", "60594", "30569", "30659" };
var results = zips.GroupBy(z => z.Remove(3)).Select(g => g.First());
Another solution would be to use a custom IEqualityComparer<T>. For ints:
class ZipComparer : IEqualityComparer<int> {
public bool Equals(int x, int y) {
return x / 100 == y / 100;
}
public int GetHashCode(int x) {
return x / 100;
}
}
For strings:
class ZipComparer : IEqualityComparer<string> {
public bool Equals(string x, string y) {
return x.Remove(3) == y.Remove(3);
}
public int GetHashCode(string x) {
return x.Remove(3).GetHashCode();
}
}
Then to use it, you can simply call Distinct:
var result = zips.Distinct(new ZipComparer());
Finally, you also use MoreLINQ's DistinctBy extension method (also available on NuGet):
var results = zips.DistinctBy(z => z / 100);
// or
var results = zips.DistinctBy(z => z.Remove(3));
The popular answer here gives code, but doesn't solving the problem. The problem is that you don't seem to have an algorithm. So...
How would you solve this problem on paper?
I would imagine the process would be something like this:
For each number, determine the 3 digit prefix
If you don't already have a zip with that prefix, keep the number
If you do already have a zip with that prefix, discard the number
How would you write this in code?
Well, there's a couple things you need:
A bucket to keep track of what prefixes you have found, and which values you've kept.
A loop over all of the items
A way to determine the prefix
Here's one way to write this (you can convert it to using strings instead as an exercise):
ICollection<int> GetUniqueZipcodes(int[] zips)
{
Dictionary<int, int> bucket = new Dictionary<int,int>();
foreach (var zip in zips)
{
int prefix = GetPrefix(zip);
if(!bucket.ContainsKey(prefix))
{
bucket.Add(prefix, zip);
}
}
return bucket.Values;
}
int GetPrefix(int zip)
{
return zip / 100;
}
Getting concise
Now, many programmers these days would say "OMG so many lines of code this could be a one liner". And they're right, it could. Borrowing from p.s.w.g's answer, this can be condensed (in a very readable manner) to:
var results = zips.GroupBy(z => z / 100).Select(g => g.First());
So how does this work? zips.GroupBy(z => z/100) does the bucketing operation. You end up with a collection of groups that looks like this:
{
104: { 10433, 10456 },
305: { 30549, 30569 },
549: { 54933 },
605: { 60594 },
306: { 30659 }
}
Then we use .Select(g => g.First()), which takes the first item from each group.
I just created a simple method to count occurences of each character within a string, without taking caps into account.
static List<int> charactercount(string input)
{
char[] characters = "abcdefghijklmnopqrstuvwxyz".ToCharArray();
input = input.ToLower();
List<int> counts = new List<int>();
foreach (char c in characters)
{
int count = 0;
foreach (char c2 in input) if (c2 == c)
{
count++;
}
counts.Add(count);
}
return counts;
}
Is there a cleaner way to do this (i.e. without creating a character array to hold every character in the alphabet) that would also take into account numbers, other characters, caps, etc?
Conceptually, I would prefer to return a Dictionary<string,int> of counts. I'll assume that it's ok to know by omission rather than an explicit count of 0 that a character occurs zero times, you can do it via LINQ. #Oded's given you a good start on how to do that. All you would need to do is replace the Select() with ToDictionary( k => k.Key, v => v.Count() ). See my comment on his answer about doing the case insensitive grouping. Note: you should decide if you care about cultural differences in characters or not and adjust the ToLower method accordingly.
You can also do this without LINQ;
public static Dictionary<string,int> CountCharacters(string input)
{
var counts = new Dictionary<char,int>(StringComparer.OrdinalIgnoreCase);
foreach (var c in input)
{
int count = 0;
if (counts.ContainsKey(c))
{
count = counts[c];
}
counts[c] = counts + 1;
}
return counts;
}
Note if you wanted a Dictionary<char,int>, you could easily do that by creating a case invariant character comparer and using that as the IEqualityComparer<T> for a dictionary of the required type. I've used string for simplicity in the example.
Again, adjust the type of the comparer to be consistent with how you want to handle culture.
Using GroupBy and Select:
aString.GroupBy(c => c).Select(g => new { Character = g.Key, Num = g.Count() })
The returned anonymous type list will contain each character and the number of times it appears in the string.
You can then filter it in any way you wish, using the static methods defined on Char.
Your code is kind of slow because you are looping through the range a-z instead of just looping through the input.
If you only need to count letters (like your code suggests), the fastest way to do it would be:
int[] CountCharacters(string text)
{
var counts = new int[26];
for (var i = 0; i < text.Length; i++)
{
var charIndex - text[index] - (int)'a';
counts[charIndex] = counts[charindex] + 1;
}
return counts;
}
Note that you need to add some thing like verify the character is in the range, and convert it to lowercase when needed, or this code might throw exceptions. I'll leave those for you to add. :)
Based on +Ran's answer to avoiding IndexOutOfRangeException:
static readonly int differ = 'a';
int[] CountCharacters(string text) {
text = text.ToLower();
var counts = new int[26];
for (var i = 0; i < text.Length; i++) {
var charIndex = text[i] - differ;
// to counting chars between 'a' and 'z' we have to do this:
if(charIndex >= 0 && charIndex < 26)
counts[charIndex] += 1;
}
return counts;
}
Actually using Dictionary and/or LINQ is not optimized enough as counting chars and working with a low level array.