linq to sql remove URLS that starts with same domain - c#

I have a list of URLS in a data table. I want to remove rows that starts with same domain. Right now I have this code:
List<int> toRemove = new List<int>();
toRemove.Clear();
string initialDomain;
string compareDomainName;
for(int i = 0; i<UrlList.Rows.Count -1; i++)
{
if (toRemove.Contains(i))
continue;
initialDomain = new Uri(UrlList.Rows[i][0] as String).Host;
for(int j = i + 1; j < UrlList.Rows.Count; j++)
{
compareDomainName = new Uri(UrlList.Rows[j][0] as String).Host;
if (String.Compare(initialDomain, compareDomainName, true) == 0)
{
toRemove.Add(j);
}
}
percent = i * 100 / total;
if (percent > lastPercent)
{
progress.EditValue = percent;
Application.DoEvents();
lastPercent = percent;
}
}
for(int i = toRemove.Count-1; i>=0; i--)
{
UrlList.Rows.RemoveAt(toRemove[i]);
}
It works well for small amount of data, but when I load a long list of URLs it is very slow. Now I want to move to linq, but I do not know how to realize this using linq. Any help?
Update *
I do not need to remove eduplicate rows. For ex.
I have a list of URLS
Now, I know how to remove duplicate rows. My problem is:
I have a simple list of urls:
http://centroid.steven.centricagency.com/forms/contact-us?page=1544
http://chirp.wildcenter.org/poll
http://itdiscover.com/links/
http://itdiscover.com/links/?page=132
http://itdiscover.com/links/?page=2
http://itdiscover.com/links/?page=3
http://itdiscover.com/links/?page=4
http://itdiscover.com/links/?page=6
http://itdiscover.com/links/?page=8
http://www.foreignpolicy.com/articles/2010/06/21/la_vie_en
http://www.foreignpolicy.com/articles/2010/06/21/the_worst_of_the_worst
http://www.foreignpolicy.com/articles/2011/04/25/think_again_dictators
http://www.foreignpolicy.com/articles/2011/08/22/the_dictators_survival_guide
http://www.gsioutdoors.com/activities/pdp/glacier_ss_nesting_wine_glass/gourmet_backpacking/
http://www.gsioutdoors.com/products/pdp/telescoping_foon_orange/
http://www.gsioutdoors.com/products/pdp/telescoping_spoon_blue/
now I want this list:
http://centroid.steven.centricagency.com/forms/contact-us?page=1544
http://chirp.wildcenter.org/poll
http://itdiscover.com/links/
http://www.foreignpolicy.com/articles/2010/06/21/la_vie_en
http://www.gsioutdoors.com/activities/pdp/glacier_ss_nesting_wine_glass/gourmet_backpacking/

var result = urls.Distinct(new UrlComparer());
public class UrlComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return new Uri(x).Host == new Uri(y).Host;
}
public int GetHashCode(string obj)
{
return new Uri(obj).Host.GetHashCode();
}
}
You can also implement an extension method DistinctBy
public static partial class MyExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector)
{
HashSet<TKey> knownKeys = new HashSet<TKey>();
return source.Where(x => knownKeys.Add(keySelector(x)));
}
}
var result = urls.DistinctBy(url => new Uri(url).Host);

Try to use this:
IEnumerable<string> DeleteDuplicates(IEnumerable<string> source)
{
var hosts = new HashSet<string>();
foreach (var s in source)
{
var host = new Uri(s).Host.ToLower();
if (hosts.Contains(host))
continue;
hosts.Add(host);
yield return s;
}
}

Hi implement this function to remove the duplicate rows
public DataTable FilterURLS(DataTable urllist)
{
return
(from urlrow in urllist.Rows.OfType<DataRow>()
group urlrow by urlrow.Field<string>("Host") into g
select g
.OrderBy(r => r.Field<int>("ID"))
.First()).CopyToDataTable();
}

Related

What is the best, more practical way to write such an entry with nested dictionaries? Which design pattern to use? C#

There is a lot of data in the database, and it is necessary to produce statistics (find the average number of each operation per day by each user of the application) using c # collections. In my opinion, it is necessary to use dictionaries:
var dict = new Dictionary<long?, Dictionary<DateTime, Dictionary<OperationsGroupType, int>>>();
Please advise a more practical way to write it. As it looks strange. Thank you
I wrote a function:
public void D()
{
var dict = new Dictionary<long?, Dictionary<DateTime, Dictionary<OperationsGroupType, int>>>();
int pageNumber = 0;
int pageSize = 5;
int pageCount = 1;
while (pageNumber < pageCount)
{
int count;
foreach (OperationData op in OperationService.GetPage(pageNumber, pageSize, out count))
if(op.PerformedBy.HasValue)
if(op.PerformedDate.HasValue)
if (dict.ContainsKey(op.PerformedBy))
if (dict[op.PerformedBy].ContainsKey(op.PerformedDate.Value.Date.Date))
if (dict[op.PerformedBy][op.PerformedDate.Value.Date.Date.Date.Date].ContainsKey(op.Type)) dict[op.PerformedBy][op.PerformedDate.Value.Date.Date.Date.Date][op.Type]++;
else dict[op.PerformedBy][op.PerformedDate.Value.Date.Date.Date.Date].Add(op.Type, 1);
else dict[op.PerformedBy].Add(op.PerformedDate.Value.Date.Date.Date.Date, new Dictionary<OperationsGroupType, int> { { op.Type, 1 } });
else dict.Add(op.PerformedBy, new Dictionary<DateTime, Dictionary<OperationsGroupType, int>> { { op.PerformedDate.Value.Date.Date.Date.Date, new Dictionary<OperationsGroupType, int> { { op.Type, 1 } } } });
pageCount = (count - 1) / pageSize + 1;
pageNumber++;
}
foreach (var item in dict)
{
var opDateDict = new Dictionary<DateTime, int>();
foreach (var operDate in item.Value) opDateDict.Add(operDate.Key, operDate.Value.Sum(count => count.Value));
SystemLogger.Instance.WriteErrorTrace(String.Format("Average number of user operations {0} per day: {1}\n", item.Key, opDateDict.Values.Sum() / opDateDict.Count));
}
}
OperationsGroupType - this enum
Please tell me how to replace the dictionary with a more practical design?
Which pattern is best for solving this problem?
It's terribly difficult to say what's best or most practical - and that's because you didn't really define what you mean by "best" or "practical".
I'm going to define them as minimal code and minimal repetition.
To start with I created these extension methods:
public static class Ex
{
public static R Ensure<T, R>(this Dictionary<T, R> #this, T key) where R : new
{
if (#this.ContainsKey(key))
return #this[key];
else
{
var r = new R();
#this[key] = r;
return r;
}
}
public static R Ensure<T, R>(this Dictionary<T, R> #this, T key, Func<R> factory)
{
if (#this.ContainsKey(key))
return #this[key];
else
{
var r = factory();
#this[key] = r;
return r;
}
}
}
With those I can rewrite you code like this:
foreach (OperationData op in OperationService.GetPage(pageNumber, pageSize, out count))
{
if (op.PerformedBy.HasValue)
if (op.PerformedDate.HasValue)
{
dict.Ensure(op.PerformedBy).Ensure(op.PerformedDate.Value.Date).Ensure(op.Type, () => 0);
dict[op.PerformedBy][op.PerformedDate.Value.Date][op.Type]++;
}
}

Sort Array on on Value Difference

I Have An Array,for example
string[] stArr= new string[5] { "1#3", "19#24", "10#12", "13#18", "20#21" };
i want to sort this array on
3-1=2;
24-19=5;
12-10=2;
18-13=5;
21-20=1;
and the sorting result should be like
string[] stArr= new string[5] { "20#21", "1#3", "10#12", "13#18", "20#21" };
I have to find the solution for all possible cases.
1>length of the array is not fixed(element in the array)
2>y always greater than x e.g x#y
3> i can not use list
You can use LINQ:
var sorted = stArr.OrderBy(s => s.Split('#')
.Select(n => Int32.Parse(n))
.Reverse()
.Aggregate((first,second) => first - second));
For Your Case:
stArr = stArr.OrderBy(s => s.Split('#')
.Select(n => Int32.Parse(n))
.Reverse()
.Aggregate((first,second) => first - second)).ToArray();
try this
string[] stArr = new string[5] { "1#3", "19#24", "10#12", "13#18", "20#21" };
Array.Sort(stArr, new Comparison<string>(compare));
int compare(string z, string t)
{
var xarr = z.Split('#');
var yarr = t.Split('#');
var x1 = int.Parse(xarr[0]);
var y1 = int.Parse(xarr[1]);
var x2 = int.Parse(yarr[0]);
var y2 = int.Parse(yarr[1]);
return (y1 - x1).CompareTo(y2 - x2);
}
Solving this problem is identical to solving any other sorting problem where the order is to be specified by your code - you have to write a custom comparison method, and pass it to the built-in sorter.
In your situation, it means writing something like this:
private static int FindDiff(string s) {
// Split the string at #
// Parse both sides as int
// return rightSide-leftSide
}
private static int CompareDiff(string a, string b) {
return FindDiff(a).CompareTo(FindDiff(b));
}
public static void Main() {
... // Prepare your array
string[] stArr = ...
Array.Sort(stArr, CompareDiff);
}
This approach uses Array.Sort overload with the Comparison<T> delegate implemented in the CompareDiff method. The heart of the solution is the FindDiff method, which takes a string, and produces a numeric value which must be used for comparison.
you can try the following ( using traditional way)
public class Program
{
public static void Main()
{
string[] strArr= new string[5] { "1#3", "19#24", "10#12", "13#18", "20#21" };
var list = new List<Item>();
foreach(var item in strArr){
list.Add(new Item(item));
}
strArr = list.OrderBy(t=>t.Sort).Select(t=>t.Value).ToArray();
foreach(var item in strArr)
Console.WriteLine(item);
}
}
public class Item
{
public Item(string str)
{
var split = str.Split('#');
A = Convert.ToInt32(split[0]);
B = Convert.ToInt32(split[1]);
}
public int A{get; set;}
public int B{get; set;}
public int Sort { get { return Math.Abs(B - A);}}
public string Value { get { return string.Format("{0}#{1}",B,A); }}
}
here a working demo
hope it will help you
Without LINQ and Lists :) Old School.
static void Sort(string [] strArray)
{
try
{
string[] order = new string[strArray.Length];
string[] sortedarray = new string[strArray.Length];
for (int i = 0; i < strArray.Length; i++)
{
string[] values = strArray[i].ToString().Split('#');
int index=int.Parse(values[1].ToString()) - int.Parse(values[0].ToString());
order[i] = strArray[i].ToString() + "," + index;
}
for (int i = 0; i < order.Length; i++)
{
string[] values2 = order[i].ToString().Split(',');
if (sortedarray[int.Parse(values2[1].ToString())-1] == null)
{
sortedarray[int.Parse(values2[1].ToString())-1] = values2[0].ToString();
}
else
{
if ((int.Parse(values2[1].ToString())) >= sortedarray.Length)
{
sortedarray[(int.Parse(values2[1].ToString())-1) - 1] = values2[0].ToString();
}
else if ((int.Parse(values2[1].ToString())) < sortedarray.Length)
{
sortedarray[(int.Parse(values2[1].ToString())-1) + 1] = values2[0].ToString();
}
}
}
for (int i = 0; i < sortedarray.Length; i++)
{
Console.WriteLine(sortedarray[i]);
}
Console.Read();
}
catch (Exception ex)
{
throw;
}
finally
{
}

How to select last value from each run of similar items?

I have a list. I'd like to take the last value from each run of similar elements.
What do I mean? Let me give a simple example. Given the list of words
['golf', 'hip', 'hop', 'hotel', 'grass', 'world', 'wee']
And the similarity function 'starting with the same letter', the function would return the shorter list
['golf', 'hotel', 'grass', 'wee']
Why? The original list has a 1-run of G words, a 3-run of H words, a 1-run of G words, and a 2-run of W words. The function returns the last word from each run.
How can I do this?
Hypothetical C# syntax (in reality I'm working with customer objects but I wanted to share something you could run and test yourself)
> var words = new List<string>{"golf", "hip", "hop", "hotel", "grass", "world", "wee"};
> words.LastDistinct(x => x[0])
["golf", "hotel", "grass", "wee"]
Edit: I tried .GroupBy(x => x[0]).Select(g => g.Last()) but that gives ['grass',
'hotel', 'wee'] which is not what I want. Read the example carefully.
Edit. Another example.
['apples', 'armies', 'black', 'beer', 'bastion', 'cat', 'cart', 'able', 'art', 'bark']
Here there are 5 runs (a run of A's, a run of B's, a run of C's, a new run of A's, a new run of B's). The last word from each run would be:
['armies', 'bastion', 'cart', 'art', 'bark']
The important thing to understand is that each run is independent. Don't mix-up the run of A's at the start with the run of A's near the end.
There's nothing too complicated with just doing it the old-fashioned way:
Func<string, object> groupingFunction = s => s.Substring(0, 1);
IEnumerable<string> input = new List<string>() {"golf", "hip", "..." };
var output = new List<string>();
if (!input.Any())
{
return output;
}
var lastItem = input.First();
var lastKey = groupingFunction(lastItem);
foreach (var currentItem in input.Skip(1))
{
var currentKey = groupingFunction(str);
if (!currentKey.Equals(lastKey))
{
output.Add(lastItem);
}
lastKey = currentKey;
lastItem = currentItem;
}
output.Add(lastItem);
You could also turn this into a generic extension method as Tim Schmelter has done; I have already taken a couple steps to generalize the code on purpose (using object as the key type and IEnumerable<T> as the input type).
You could use this extension that can group by adjacent/consecutive elements:
public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
TKey last = default(TKey);
bool haveLast = false;
List<TSource> list = new List<TSource>();
foreach (TSource s in source)
{
TKey k = keySelector(s);
if (haveLast)
{
if (!k.Equals(last))
{
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
list = new List<TSource>();
list.Add(s);
last = k;
}
else
{
list.Add(s);
last = k;
}
}
else
{
list.Add(s);
last = k;
haveLast = true;
}
}
if (haveLast)
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
}
public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
public TKey Key { get; set; }
private List<TSource> GroupList { get; set; }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
}
System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
{
foreach (var s in GroupList)
yield return s;
}
public GroupOfAdjacent(List<TSource> source, TKey key)
{
GroupList = source;
Key = key;
}
}
Then it's easy:
var words = new List<string>{"golf", "hip", "hop", "hotel", "grass", "world", "wee"};
IEnumerable<string> lastWordOfConsecutiveFirstCharGroups = words
.GroupAdjacent(str => str[0])
.Select(g => g.Last());
Output:
string.Join(",", lastWordOfConsecutiveFirstCharGroups); // golf,hotel,grass,wee
Your other sample:
words=new List<string>{"apples", "armies", "black", "beer", "bastion", "cat", "cart", "able", "art", "bark"};
lastWordOfConsecutiveFirstCharGroups = words
.GroupAdjacent(str => str[0])
.Select(g => g.Last());
Output:
string.Join(",", lastWordOfConsecutiveFirstCharGroups); // armies,bastion,cart,art,bark
Demonstration
Try this algoritm
var words = new List<string> { "golf", "hip", "hop", "hotel", "grass", "world", "wee" };
var newList = new List<string>();
int i = 0;
while (i < words.Count - 1 && i <= words.Count)
{
if (words[i][0] != words[i+1][0])
{
newList.Add(words[i]);
i++;
}
else
{
var j = i;
while ( j < words.Count - 1 && words[j][0] == words[j + 1][0])
{
j++;
}
newList.Add(words[j]);
i = j+1;
}
}
You can use following extension method to split your sequence into groups (i.e. sub-sequnces) by some condition:
public static IEnumerable<IEnumerable<T>> Split<T, TKey>(
this IEnumerable<T> source, Func<T, TKey> keySelector)
{
var group = new List<T>();
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
else
{
TKey currentKey = keySelector(iterator.Current);
var keyComparer = Comparer<TKey>.Default;
group.Add(iterator.Current);
while (iterator.MoveNext())
{
var key = keySelector(iterator.Current);
if (keyComparer.Compare(currentKey, key) != 0)
{
yield return group;
currentKey = key;
group = new List<T>();
}
group.Add(iterator.Current);
}
}
}
if (group.Any())
yield return group;
}
And getting your expected results looks like:
string[] words = { "golf", "hip", "hop", "hotel", "grass", "world", "wee" };
var result = words.Split(w => w[0])
.Select(g => g.Last());
Result:
golf
hotel
grass
wee
Because your input is a List<>, so I think this should work for you with an acceptable performance and especially it's very concise:
var result = words.Where((x, i) => i == words.Count - 1 ||
words[i][0] != words[i + 1][0]);
You can append ToList() on the result to get a List<string> if you want.
I went with
/// <summary>
/// Given a list, return the last value from each run of similar items.
/// </summary>
public static IEnumerable<T> WithoutDuplicates<T>(this IEnumerable<T> source, Func<T, T, bool> similar)
{
Contract.Requires(source != null);
Contract.Requires(similar != null);
Contract.Ensures(Contract.Result<IEnumerable<T>>().Count() <= source.Count(), "Result should be at most as long as original list");
T last = default(T);
bool first = true;
foreach (var item in source)
{
if (!first && !similar(item, last))
yield return last;
last = item;
first = false;
}
if (!first)
yield return last;
}

Is there a better way to use Lambda with groups of N?

I have the method Process(IEnumerable<Record> records) which can take UP TO but NO MORE THAN 3 records at a time. I have hundreds of records, so I need to pass in groups. I do this:
var _Records = Enumerable.Range(1, 16).ToArray();
for (int i = 0; i < int.MaxValue; i += 3)
{
var _ShortList = _Records.Skip(i).Take(3);
if (!_ShortList.Any())
break;
Process(_ShortList);
}
// TODO: finish
It works, but... is there a better way?
you can use MoreLinq's Batch
var result=Enumerable.Range(1, 16).Batch(3);
or
var arrayOfArrays = Enumerable.Range(1, 16).Batch(3).Select(x => x.ToArray()).ToArray();
And here is the source if you want to take a look at it.
You may use this extension method:
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int chunkSize)
{
return source
.Select((value, i) => new { Index = i, Value = value })
.GroupBy(item => item.Index % chunkSize)
.Select(chunk => chunk.Select(item => item.Value));
}
It splits a source collection of items into several chunks with given size.
So your code will look next:
foreach (var chunk in Enumerable.Range(1, 16).Split(3))
{
Process(chunk);
}
Here's another LINQ-y way to do it:
var batchSize = 3;
Enumerable.Range(0, (_Records.Length - 1)/batchSize + 1)
.ToList()
.ForEach(i => Process(_Records.Skip(i * batchSize).Take(batchSize)));
In case you need "pagination" multiple times in your solution, you may consider using an extension method.
Hacked one together in LINQPad, it will do the trick.
public static class MyExtensions {
public static IEnumerable<IEnumerable<T>> Paginate<T>(this IEnumerable<T> source, int pageSize) {
T[] buffer = new T[pageSize];
int index = 0;
foreach (var item in source) {
buffer[index++] = item;
if (index >= pageSize) {
yield return buffer.Take(pageSize);
index = 0;
}
}
if (index > 0) {
yield return buffer.Take(index);
}
}
}
Basically, it pre-fills a buffer of size pageSize and yields it just when it's full. If there are < pageSize elements left, we yield them as well. So,
Enumerable.Range(1, 10).Paginate(3).Dump(); // Dump is a LINQPad extension
will yield
{{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10}}
You can create your own extension method:
static class Extensions {
public static IEnumerable<IEnumerable<T>> ToBlocks<T>(this IEnumerable<T> source, int blockSize) {
var count = 0;
T[] block = null;
foreach (var item in source) {
if (block == null)
block = new T[blockSize];
block[count++] = item;
if (count == blockSize) {
yield return block;
block = null;
count = 0;
}
}
if (count > 0)
yield return block.Take(count);
}
}
public static void ChunkProcess<T>(IEnumerable<T> source, int size, Action<IEnumerable<T>> action)
{
var chunk = source.Take(size);
while (chunk.Any())
{
action(chunk);
source = source.Skip(size);
chunk = source.Take(size);
}
}
and your code would be
ChunkProcess(_Records, 3, Process);
var _Records = Enumerable.Range(1, 16).ToArray();
int index = 0;
foreach (var group in _Records.GroupBy(element => index++ / 3))
Process(group);
NOTE: The code above is short and relatively efficient, but is still not as efficient as it can be (it will essentially build a hashtable behind the scenes). A slightly more cumbersome, but faster way would be:
var _Records = Enumerable.Range(1, 16).ToArray();
var buff = new int[3];
int index = 0;
foreach (var element in _Records) {
if (index == buff.Length) {
Process(buff);
index = 0;
}
buff[index++] = element;
}
if (index > 0)
Process(buff.Take(index));
Or, pack it to a more reusable form:
public static class EnumerableEx {
public static void Paginate<T>(this IEnumerable<T> elements, int page_size, Action<IEnumerable<T>> process_page) {
var buff = new T[3];
int index = 0;
foreach (var element in elements) {
if (index == buff.Length) {
process_page(buff);
index = 0;
}
buff[index++] = element;
}
if (index > 0)
process_page(buff.Take(index));
}
}
// ...
var _Records = Enumerable.Range(1, 16).ToArray();
_Records.Paginate(3, Process);
This extension method is working properly.
public static class EnumerableExtentions
{
public static IEnumerable<IEnumerable<T>> Chunks<T>(this IEnumerable<T> items, int size)
{
return
items.Select((member, index) => new { Index = index, Value = member })
.GroupBy(item => (int)item.Index / size)
.Select(chunk => chunk.Select(item => item.Value));
}
}

Creating multiple array using a foreach loop and a select statement

I have a database that I call select all of its contents of a table. It has 18000+ items. I have a method uses a web service that can have an array of up to ten element pass into it. Right now I am doing item by item instead of by an array. I want to create an array of ten and then call the function. I could make an array of ten and then call the function be what is I have an extra three records?
public static void Main()
{
inventoryBLL inv = new inventoryBLL();
DataSet1.sDataTable dtsku = inv.SelectEverything();
foreach (DataSet1.Row row in dtsku)
{
webservicefunction(row.item);
}
}
My question is how would I transform this?
Generic solution of your problem could look like this:
static class LinqHelper
{
public static IEnumerable<T[]> SplitIntoGroups<T>(this IEnumerable<T> items, int N)
{
if (items == null || N < 1)
yield break;
T[] group = new T[N];
int size = 0;
var iter = items.GetEnumerator();
while (iter.MoveNext())
{
group[size++] = iter.Current;
if (size == N)
{
yield return group;
size = 0;
group = new T[N];
}
}
if (size > 0)
yield return group.Take(size).ToArray();
}
}
So your Main function become
public static void Main()
{
inventoryBLL inv = new inventoryBLL();
DataSet1.sDataTable dtsku = inv.SelectEverything();
foreach (var items in dtsku.Select(r => r.item).SplitIntoGroups(10))
{
webservicefunction(items);
}
}
var taken = 0;
var takecount = 10;
while(list.Count() >= taken)
{
callWebService(list.Skip(taken).Take(takecount));
taken += takecount;
}
Generic Extension Method version:
public static void AtATime<T>(this IEnumerable<T> list, int eachTime, Action<IEnumerable<T>> action)
{
var taken = 0;
while(list.Count() >= taken)
{
action(list.Skip(taken).Take(eachTime));
taken += eachTime;
}
}
Usage:
inv.SelectEverything().AtATime<Row>(10, webservicefunction);

Categories