Get unique 3 digit zips from a list C# - c#

I have a list of integers that represent US ZIP codes, and I want to get unique values based on the first three digits of the ZIP code. For example this is my list:
10433
30549
10456
54933
60594
30569
30659
My result should contain only:
10433
30549
54933
60594
30659
The US ZIP codes excluded from my list are: 10456 and 30659 because I already have the ZIPs that contain 104xx and 306xx.
I really don't know how to get this done, I guess it's not that hard, but I have no idea. I've made a function, that saves me the unique first three digits, and I've added some random 2 digits at the end of each zip. But it didn't worked out because I got for example 10423 but 10423 is not in my list, and I don't have a specific pattern that all my numbers have the last 2 digits in a range.

A little Linq should work. If using a list of ints:
var zips = new[] { 10433, 30549, 10456, 54933, 60594, 30569, 30659 };
var results = zips.GroupBy(z => z / 100).Select(g => g.First());
Or if using a list of strings:
var zips = new[] { "10433", "30549", "10456", "54933", "60594", "30569", "30659" };
var results = zips.GroupBy(z => z.Remove(3)).Select(g => g.First());
Another solution would be to use a custom IEqualityComparer<T>. For ints:
class ZipComparer : IEqualityComparer<int> {
public bool Equals(int x, int y) {
return x / 100 == y / 100;
}
public int GetHashCode(int x) {
return x / 100;
}
}
For strings:
class ZipComparer : IEqualityComparer<string> {
public bool Equals(string x, string y) {
return x.Remove(3) == y.Remove(3);
}
public int GetHashCode(string x) {
return x.Remove(3).GetHashCode();
}
}
Then to use it, you can simply call Distinct:
var result = zips.Distinct(new ZipComparer());
Finally, you also use MoreLINQ's DistinctBy extension method (also available on NuGet):
var results = zips.DistinctBy(z => z / 100);
// or
var results = zips.DistinctBy(z => z.Remove(3));

The popular answer here gives code, but doesn't solving the problem. The problem is that you don't seem to have an algorithm. So...
How would you solve this problem on paper?
I would imagine the process would be something like this:
For each number, determine the 3 digit prefix
If you don't already have a zip with that prefix, keep the number
If you do already have a zip with that prefix, discard the number
How would you write this in code?
Well, there's a couple things you need:
A bucket to keep track of what prefixes you have found, and which values you've kept.
A loop over all of the items
A way to determine the prefix
Here's one way to write this (you can convert it to using strings instead as an exercise):
ICollection<int> GetUniqueZipcodes(int[] zips)
{
Dictionary<int, int> bucket = new Dictionary<int,int>();
foreach (var zip in zips)
{
int prefix = GetPrefix(zip);
if(!bucket.ContainsKey(prefix))
{
bucket.Add(prefix, zip);
}
}
return bucket.Values;
}
int GetPrefix(int zip)
{
return zip / 100;
}
Getting concise
Now, many programmers these days would say "OMG so many lines of code this could be a one liner". And they're right, it could. Borrowing from p.s.w.g's answer, this can be condensed (in a very readable manner) to:
var results = zips.GroupBy(z => z / 100).Select(g => g.First());
So how does this work? zips.GroupBy(z => z/100) does the bucketing operation. You end up with a collection of groups that looks like this:
{
104: { 10433, 10456 },
305: { 30549, 30569 },
549: { 54933 },
605: { 60594 },
306: { 30659 }
}
Then we use .Select(g => g.First()), which takes the first item from each group.

Related

How to order list of strings on number substring using C#

I have a list of strings, each containing a number substring, that I'd like to be reordered based on the numerical value of that substring. The set will look something like this, but much larger:
List<string> strings= new List<string>
{
"some-name-(1).jpg",
"some-name-(5).jpg",
"some-name-(5.1).jpg",
"some-name-(6).jpg",
"some-name-(12).jpg"
};
The number will always be surrounded by parentheses, which are the only parentheses in the string, so using String.IndexOf is reliable. Notice that not only may there be missing numbers, but there can also be decimals, not just integers.
I'm having a really tough time getting a reordered list of those same strings that has been ordered on the numerical value of that substring. Does anyone have a way of doing this, hopefully one that performs well? Thanks.
This will check if the items between the parenthesis is convertible to a double, if not it will return -1 for that case.
var numbers = strings.Select( x => x.Substring( x.IndexOf( "(" ) + 1,
x.IndexOf( ")" ) - x.IndexOf( "(" ) - 1 ) ).Select( x =>
{
double val;
if( double.TryParse( x, out val ) ) {
return val;
}
// Or whatever you want to do
return -1;
} ).OrderBy( x => x ); // Or use OrderByDescending
If you are sure there will always be a number between the parenthesis, then use this as it is shorter:
var numbers = strings.Select(
x => x.Substring( x.IndexOf( "(" ) + 1, x.IndexOf( ")" ) - x.IndexOf( "(" ) - 1 ) )
.Select( x => double.Parse(x))
.OrderBy( x => x ); // Or use OrderByDescending
EDIT
I need the original strings, just ordered on those numbers.
Basically what you need to do is to pass a predicate to the OrderBy and tell it to order by the number:
var items = strings.OrderBy(
x => double.Parse( x.Substring( x.IndexOf( "(" ) + 1,
x.IndexOf( ")" ) - x.IndexOf( "(" ) - 1 ) ));
How about an OO approach?
We are ordering string but we need to treat them like numbers. Wouldn't it be nice if there was a way we can just call OrderBy and it does the ordering for us? Well there is. The OrderBy method will use the IComparable<T> if there is one. Let's create a class to hold our jpg paths and implement the IComparable<T> interface.
public class CustomJpg : IComparable<CustomJpg>
{
public CustomJpg(string path)
{
this.Path = path;
}
public string Path { get; private set; }
private double number = -1;
// You can even make this public if you want.
private double Number
{
get
{
// Let's cache the number for subsequent calls
if (this.number == -1)
{
int myStart = this.Path.IndexOf("(") + 1;
int myEnd = this.Path.IndexOf(")");
string myNumber = this.Path.Substring(myStart, myEnd - myStart);
double myVal;
if (double.TryParse(myNumber, out myVal))
{
this.number = myVal;
}
else
{
throw new ArgumentException(string.Format("{0} has no parenthesis or a number between parenthesis.", this.Path));
}
}
return this.number;
}
}
public int CompareTo(CustomJpg other)
{
if (other == null)
{
return 1;
}
return this.Number.CompareTo(other.Number);
}
}
What is nice about the above approach is if we keep calling OrderBy, it will not have to search for the opening ( and ending ) and doing the parsing of the number every time. It caches it the first time it is called and then keeps using it. The other nice thing is that we can bind to the Path property and also to the Number (we would have to change the access modifier from private). We can even introduce a new property to hold the thumbnail image and bind to that as well. As you can see, this approach is far more flexible, clean and an OO approach. Plus the code for finding the number is in one place so if we switch from () to another symbol, we would just change it in one place. Or we can modify to look for () first and if not found look for another symbol.
Here is the usage:
List<CustomJpg> jpgs = new List<CustomJpg>
{
new CustomJpg("some-name-(1).jpg"),
new CustomJpg("some-name-(5).jpg"),
new CustomJpg("some-name-(5.1).jpg"),
new CustomJpg("some-name-(6).jpg"),
new CustomJpg("some-name-(12).jpg")
};
var ordered = jpgs.OrderBy(x => x).ToList();
You can use this approach for any object.
In the above example code return a list of numbers ordered by numbers but if you want have list of file names that ordered by name better you put in same zero to beginning of the numbers like "some-name-(001).jpg" and you can simply
order that
List<string> strings = new List<string>
{
"some-name-(001).jpg",
"some-name-(005.1).jpg",
"some-name-(005).jpg",
"some-name-(004).jpg",
"some-name-(006).jpg",
"some-name-(012).jpg"
};
var orederedByName =strings.Select(s =>s ).OrderBy(s=>s);
You can make the Substring selection easier, if you first cut off the part starting at the closing parenthesis ")". I.e., from "some-name-(5.1).jpg" you first get "some-name-(5.1". Then take the part after "(". This saves a length calculation, as the 2nd Substring automatically takes everything up to the end of string.
strings = strings
.OrderBy(x => Decimal.Parse(
x.Substring(0, x.IndexOf(")"))
.Substring(x.IndexOf("(") + 1)
)
)
.ToList();
This is probably not of great importance here, but generally, decimal stores numbers given in decimal notation more accurately than double. double can convert 17.2 as 17.19999999999999.

C# Permutations of String

I have strings like 1,2|3,4 and 1|2,3|4 and need to get the following permutations out of them (as an array/list).
Given 1,2|3,4 need to get 2 strings:
1,2,4
1,3,4
Given 1|2,3|4 need to get 4 strings:
1,3
1,4
2,3
2,4
It is basically splitting on the commas and then if those elements have a pipe create permutations for every pipe delimited sub-element (of the remaining elements). The solution needs to handle the general case of an unknown number of elements with pipes.
Interested in any solution that uses standard C# libraries.
Getting stuck on this one so searching for some thoughts from the community. I can't seem to get past the element with pipes...its almost like a "look ahead" is needed or something as I need to complete the string with the remaining comma separated elements (of which some may have pipes, which makes me think recursion but still can't wrap my head around it yet).
Ultimately order does not matter. The comma and pipe delimited elements are numbers (stored a strings) and the final string order does not matter so 1,2,4 = 1,4,2
And no, this is not homework. School ended over a decade ago.
We can do this in a fancy way with LINQ. First, we'll need Eric Lippert's CartesianProduct extension method:
static IEnumerable<IEnumerable<T>> CartesianProduct<T>( this IEnumerable<IEnumerable<T>> sequences )
{
IEnumerable<IEnumerable<T>> emptyProduct =
new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
( accumulator, sequence ) =>
from accseq in accumulator
from item in sequence
select accseq.Concat( new[] { item } ) );
}
Then we can simply do:
var a = "1|2,3|4".Split( ',' );
var b = a.Select( x => x.Split( '|' ) );
var res = b.CartesianProduct().Select( x => string.Join( ",", x ) );
And we're done!
I couldn't think of any edge cases now, but this works for both of your examples :
public IEnumerable<string> GetPermutation(string pattern)
{
var sets = pattern.Split(',');
var permutations = new[] { new string[] { } };
foreach(var set in sets)
{
permutations = set.Split('|')
.SelectMany(s => permutations.Select(x => x.Concat(new [] { s }).ToArray()))
.ToArray();
}
return permutations.Select(x => string.Join(",", x));
}
Looks like the LINQ solutions won out, at least as far as conciseness is concerned.
Here's my first attempt at the problem with plain C# code.
protected List<string> Results = new List<string>();
void GetPermutations(string s)
{
Results = new List<string>();
string[] values = s.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
GetPermutationsRecursive(String.Empty, values, 0);
}
void GetPermutationsRecursive(string soFar, string[] values, int index)
{
if (index < values.Length)
{
foreach (var y in GetVariations(values[index]))
{
string s = String.Format("{0}{1}{2}", soFar, soFar.Length > 0 ? "," : String.Empty, y);
GetPermutationsRecursive(s, values, index + 1);
}
}
else
{
Results.Add(soFar);
}
}
IEnumerable<string> GetVariations(string s)
{
int pos = s.IndexOf('|');
if (pos < 0)
{
yield return s;
}
else
{
yield return s.Substring(0, pos);
yield return s.Substring(pos + 1);
}
}

Faster way to count number of sets an item appears in?

I've got a list of bookmarks. Each bookmark has a list of keywords (stored as a HashSet). I also have a set of all possible keywords ("universe").
I want to find the keyword that appears in the most bookmarks.
I have 1356 bookmarks with a combined total of 698,539 keywords, with 187,358 unique.
If I iterate through every keyword in the universe and count the number of bookmarks it appears in, I'm doing 254,057,448 checks. This takes 35 seconds on my machine.
The algorithm is pretty simple:
var biggest = universe.MaxBy(kw => bookmarks.Count(bm => bm.Keywords.Contains(kw)));
Using Jon Skeet's MaxBy.
I'm not sure it's possible to speed this up much, but is there anything I can do? Perhaps parallelize it somehow?
dtb's solution takes under 200 ms to both build the universe and find the biggest element. So simple.
var freq = new FreqDict();
foreach(var bm in bookmarks) {
freq.Add(bm.Keywords);
}
var biggest2 = freq.MaxBy(kvp => kvp.Value);
FreqDict is just a little class I made built on top of a Dictionary<string,int>.
You can get all keywords, group them, and get the biggest group. This uses more memory, but should be faster.
I tried this, and in my test it was about 80 times faster:
string biggest =
bookmarks
.SelectMany(m => m.Keywords)
.GroupBy(k => k)
.OrderByDescending(g => g.Count())
.First()
.Key;
Test run:
1536 bookmarks
153600 keywords
74245 unique keywords
Original:
12098 ms.
biggest = "18541"
New:
148 ms.
biggest = "18541"
I don't have your sample data nor have I done any benchmarking, but I'll take a stab. One problem that could be improved upon is that most of the bm.Keywords.Contains(kw) checks are misses, and I think those can be avoided. The most constraining is the set of keywords any one given bookmark has (ie: it will typically be much smaller than universe) so we should start in that direction instead of the other way.
I'm thinking something along these lines. The memory requirement is much higher and since I haven't benchmarked anything, it could be slower, or not helpful, but I'll just delete my answer if it doesn't work out for you.
Dictionary<string, int> keywordCounts = new Dictionary<string, int>(universe.Length);
foreach (var keyword in universe)
{
keywordCounts.Add(keyword, 0);
}
foreach (var bookmark in bookmarks)
{
foreach (var keyword in bookmark.Keywords)
{
keywordCounts[keyword] += 1;
}
}
var mostCommonKeyword = keywordCounts.MaxBy(x => x.Value).Key;
You don't need to iterate through whole universe. Idea is to create a lookup and track max.
public Keyword GetMaxKeyword(IEnumerable<Bookmark> bookmarks)
{
int max = 0;
Keyword maxkw = null;
Dictionary<Keyword, int> lookup = new Dictionary<Keyword, int>();
foreach (var item in bookmarks)
{
foreach (var kw in item.Keywords)
{
int val = 1;
if (lookup.ContainsKey(kw))
{
val = ++lookup[kw];
}
else
{
lookup.Add(kw, 1);
}
if (max < val)
{
max = val;
maxkw = kw;
}
}
}
return maxkw;
}
50ms in python:
>>> import random
>>> universe = set()
>>> bookmarks = []
>>> for i in range(1356):
... bookmark = []
... for j in range(698539//1356):
... key_word = random.randint(1000, 1000000000)
... universe.add(key_word)
... bookmark.append(key_word)
... bookmarks.append(bookmark)
...
>>> key_word_count = {}
>>> for bookmark in bookmarks:
... for key_word in bookmark:
... key_word_count[key_word] = key_word_count.get(key_word, 0) + 1
...
>>> print max(key_word_count, key=key_word_count.__getitem__)
408530590
>>> print key_word_count[408530590]
3
>>>

Whats the most concise way to pick a random element by weight in c#?

Lets assume:
List<element> which element is:
public class Element {
int Weight { get; set; }
}
What I want to achieve is, select an element randomly by the weight.
For example:
Element_1.Weight = 100;
Element_2.Weight = 50;
Element_3.Weight = 200;
So
the chance Element_1 got selected is 100/(100+50+200)=28.57%
the chance Element_2 got selected is 50/(100+50+200)=14.29%
the chance Element_3 got selected is 200/(100+50+200)=57.14%
I know I can create a loop, calculate total, etc...
What I want to learn is, whats the best way to do this by Linq in ONE line (or as short as possible), thanks.
UPDATE
I found my answer below. First thing I learn is: Linq is NOT magic, it's slower then well-designed loop.
So my question becomes find a random element by weight, (without as short as possible stuff :)
If you want a generic version (useful for using with a (singleton) randomize helper, consider whether you need a constant seed or not)
usage:
randomizer.GetRandomItem(items, x => x.Weight)
code:
public T GetRandomItem<T>(IEnumerable<T> itemsEnumerable, Func<T, int> weightKey)
{
var items = itemsEnumerable.ToList();
var totalWeight = items.Sum(x => weightKey(x));
var randomWeightedIndex = _random.Next(totalWeight);
var itemWeightedIndex = 0;
foreach(var item in items)
{
itemWeightedIndex += weightKey(item);
if(randomWeightedIndex < itemWeightedIndex)
return item;
}
throw new ArgumentException("Collection count and weights must be greater than 0");
}
// assuming rnd is an already instantiated instance of the Random class
var max = list.Sum(y => y.Weight);
var rand = rnd.Next(max);
var res = list
.FirstOrDefault(x => rand >= (max -= x.Weight));
This is a fast solution with precomputation. The precomputation takes O(n), the search O(log(n)).
Precompute:
int[] lookup=new int[elements.Length];
lookup[0]=elements[0].Weight-1;
for(int i=1;i<lookup.Length;i++)
{
lookup[i]=lookup[i-1]+elements[i].Weight;
}
To generate:
int total=lookup[lookup.Length-1];
int chosen=random.GetNext(total);
int index=Array.BinarySearch(lookup,chosen);
if(index<0)
index=~index;
return elements[index];
But if the list changes between each search, you can instead use a simple O(n) linear search:
int total=elements.Sum(e=>e.Weight);
int chosen=random.GetNext(total);
int runningSum=0;
foreach(var element in elements)
{
runningSum+=element.Weight;
if(chosen<runningSum)
return element;
}
This could work:
int weightsSum = list.Sum(element => element.Weight);
int start = 1;
var partitions = list.Select(element =>
{
var oldStart = start;
start += element.Weight;
return new { Element = element, End = oldStart + element.Weight - 1};
});
var randomWeight = random.Next(weightsSum);
var randomElement = partitions.First(partition => (partition.End > randomWeight)).
Select(partition => partition.Element);
Basically, for each element a partition is created with an end weight.
In your example, Element1 would associated to (1-->100), Element2 associated to (101-->151) and so on...
Then a random weight sum is calculated and we look for the range which is associated to it.
You could also compute the sum in the method group but that would introduce another side effect...
Note that I'm not saying this is elegant or fast. But it does use linq (not in one line...)
.Net 6 introduced .MaxBy making this much easier.
This could now be simplified to the following one-liner:
list.MaxBy(x => rng.GetNext(x.weight));
This works best if the weights are large or floating point numbers, otherwise there will be collisions, which can be prevented by multiplying the weight by some factor.

C#: How to use the Enumerable.Aggregate method

Lets say I have this amputated Person class:
class Person
{
public int Age { get; set; }
public string Country { get; set; }
public int SOReputation { get; set; }
public TimeSpan TimeSpentOnSO { get; set; }
...
}
I can then group on Age and Country like this:
var groups = aListOfPeople.GroupBy(x => new { x.Country, x.Age });
Then I can output all the groups with their reputation totals like this:
foreach(var g in groups)
Console.WriteLine("{0}, {1}:{2}",
g.Key.Country,
g.Key.Age,
g.Sum(x => x.SOReputation));
My question is, how can I get a sum of the TimeSpentOnSO property? The Sum method won't work in this case since it is only for int and such. I thought I could use the Aggregate method, but just seriously can't figure out how to use it... I'm trying all kinds properties and types in various combinations but the compiler just won't recognize it.
foreach(var g in groups)
Console.WriteLine("{0}, {1}:{2}",
g.Key.Country,
g.Key.Age,
g.Aggregate( what goes here?? ));
Have I completely missunderstood the Aggregate method? Or what is going on? Is it some other method I should use instead? Or do I have to write my own Sum variant for TimeSpans?
And to add to the mess, what if Person is an anonymous class, a result from for example a Select or a GroupJoin statement?
Just figured out that I could make the Aggregate method work if I did a Select on the TimeSpan property first... but I find that kind of annoying... Still don't feel I understand this method at all...
foreach(var g in groups)
Console.WriteLine("{0}, {1}:{2}",
g.Key.Country,
g.Key.Age,
g.Select(x => x.TimeSpentOnSO)
g.Aggregate((sum, x) => sum + y));
List<TimeSpan> list = new List<TimeSpan>
{
new TimeSpan(1),
new TimeSpan(2),
new TimeSpan(3)
};
TimeSpan total = list.Aggregate(TimeSpan.Zero, (sum, value) => sum.Add(value));
Debug.Assert(total.Ticks == 6);
g.Aggregate(TimeSpan.Zero, (i, p) => i + p.TimeSpentOnSO)
Basically, the first argument to Aggregate is an initializer, which is used as the first value of "i" in the function passed in the second argument. It'll iterate over the list, and each time, "i" will contain the total so far.
For example:
List<int> nums = new List<int>{1,2,3,4,5};
nums.Aggregate(0, (x,y) => x + y); // sums up the numbers, starting with 0 => 15
nums.Aggregate(0, (x,y) => x * y); // multiplies the numbers, starting with 0 => 0, because anything multiplied by 0 is 0
nums.Aggregate(1, (x,y) => x * y); // multiplies the numbers, starting with 1 => 120
A combination of Chris and Daniels answers solved it for me. I needed to initialize the TimeSpan, and I did things in the wrong order. The solution is:
foreach(var g in groups)
Console.WriteLine("{0}, {1}:{2}",
g.Key.Country,
g.Key.Age,
g.Aggregate(TimeSpan.Zero, (sum, x) => sum + x.TimeSpentOnSO));
Thanks!
And also... D'oh!
You could write TimeSpan Sum method...
public static TimeSpan Sum(this IEnumerable<TimeSpan> times)
{
return TimeSpan.FromTicks(times.Sum(t => t.Ticks));
}
public static TimeSpan Sum<TSource>(this IEnumerable<TSource> source,
Func<TSource, TimeSpan> selector)
{
return TimeSpan.FromTicks(source.Sum(t => selector(t).Ticks));
}
Alternatively, MiscUtil has generic-enabled Sum methods, so Sum should work on a TimeSpan just fine (since there is a TimeSpan+TimeSpan=>TimeSpan operator defined).
Just please don't tell me the number... it would scare me...
You could sum on one of the Total properties of the TimeSpan. For instance, you could get the represented TotalHours of time spent on SO like this:
g.Sum(x => x.SOReputation.TotalHours)
I believe this would give you the result you're looking for, but with the caveat that you'd have to put the units of measure according to what you need (hours, minutes, second, days, etc.)

Categories