I'm trying to search "everything" in an index for a search term, and display the context with the terms highlighted. I get an appropriate set of documents returned, but cannot figure out how I'm supposed to handle the highlighting in code.
At this point i'm just trying to dump it into a literal, and the below code "kinda sorta" works, but it doesn't seem to have highlights for every document, and just doesn't feel right. I have found many examples on how to do the query with highlights, but i haven't found any example of how to do anything with displaying the results. Any suggestions? Thanks!
var searchResults = client.Search<Document>(s => s.Query(qs => qs.QueryString(q => q.Query(stringsearch))).Highlight(h => h
.PreTags("<b>")
.PostTags("</b>")
.OnFields(
f => f
.OnField("*")
.PreTags("<em>")
.PostTags("</em>")
)
));
Literal1.Text = "";
foreach(var h in searchResults.Hits)
{
foreach(var hh in h.Highlights)
{
foreach(var hhh in hh.Value.Highlights)
{
Literal1.Text += hhh+#"<br>";
}
}
}
Edit: The solution below is only tested on ElasticSearch 2.x, not ElasticSearch 5.x/6.x
The highlights can either be accessed in searchResults.Highlights (for all highlights), or in the IHit<T>.Highlights for that hit.
Is this along the lines of what you're trying to achieve ?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using Elasticsearch.Net.ConnectionPool;
using Nest;
namespace ESTester
{
internal class Program
{
private static void Main(string[] args)
{
const string indexName = "testindex";
var connectionSettings = new ConnectionSettings(new SingleNodeConnectionPool(new Uri("http://127.0.0.1:9200")));
var client = new ElasticClient(connectionSettings);
var existResponse = client.IndexExists(descriptor => descriptor.Index(indexName));
if (existResponse.Exists)
client.DeleteIndex(descriptor => descriptor.Index(indexName));
// Making sure the refresh interval is low, since it's boring to have to wait for things to catch up
client.PutTemplate("", descriptor => descriptor.Name("testindex").Template("testindex").Settings(objects => objects.Add("index.refresh_interval", "1s")));
client.CreateIndex(descriptor => descriptor.Index(indexName));
var docs = new List<Document>
{
new Document{Text = "This is the first document" },
new Document{Text = "This is the second document" },
new Document{Text = "This is the third document" }
};
var bulkDecsriptor = new BulkDescriptor().IndexMany(docs, (descriptor, document) => descriptor.Index(indexName));
client.Bulk(bulkDecsriptor);
// Making sure ES has indexed the documents
Thread.Sleep(TimeSpan.FromSeconds(2));
var searchDescriptor = new SearchDescriptor<Document>()
.Index(indexName)
.Query(q => q
.Match(m => m
.OnField(d => d.Text)
.Query("the second")))
.Highlight(h => h
.OnFields(f => f
.OnField(d => d.Text)
.PreTags("<em>")
.PostTags("</em>")));
var result = client.Search<Document>(searchDescriptor);
if (result.Hits.Any())
{
foreach (var hit in result.Hits)
{
Console.WriteLine("Found match: {0}", hit.Source.Text);
if (!hit.Highlights.Any()) continue;
foreach (var highlight in hit.Highlights.SelectMany(highlight => highlight.Value.Highlights))
{
Console.WriteLine("Found highlight: {0}", highlight);
}
}
}
Console.WriteLine("Press any key to exit!");
Console.ReadLine();
}
}
internal class Document
{
public string Text { get; set; }
}
}
Edit for comments:
In this example, there's no real reason for the if(!hit.Highlights.Any()) continue;, except for being safe, but if you were to do the following query instead, you could end up with hits without highlights:
var docs = new List<Document>
{
new Document{Text = "This is the first document", Number = 1 },
new Document{Text = "This is the second document", Number =500 },
new Document{Text = "This is the third document", Number = 1000 }
};
var searchDescriptor = new SearchDescriptor<Document>()
.Index(indexName)
.Query(q => q
.Bool(b => b
.Should(s1 => s1
.Match(m => m
.Query("second")
.OnField(f => f.Text)),
s2 => s2
.Range(r =>r
.OnField(f => f.Number)
.Greater(750)))
.MinimumShouldMatch(1)))
.Highlight(h => h
.OnFields(f => f
.OnField(d => d.Text)
.PreTags("<em>")
.PostTags("</em>")));
internal class Document
{
public string Text { get; set; }
public int Number { get; set; }
}
In this case, you could get a hit on the range query, but that wouldn't have any highlights.
For number 2, for me I just explored the object I got back from search, both in Quick Watch, the object browser and through IntelliSense in VS.
Related
I must be doing something fundamentally wrong here. I'm trying to get a "More Like This" query working in a search engine project we have that uses Elastic Search. The idea is that the CMS can write tags (like categories) to the page in a Meta tag or something, and we would read those into Elastic and use them to drive a "more like this" search based upon an input document id.
So if the input document has tags of catfish, chicken, goat I would expect Elastic Search to find other documents that share those tags and not return ones for racecar and airplane.
I've built a proof of concept console app by:
Getting a local Elastic Search 6.6.1 instance running in Docker by following the instructions on https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
Creating a new .NET Framework 4.6.1 Console App
Adding the NuGet packages for NEST 6.5.0 and ElasticSearch.Net 6.5.0
Then I created a new elastic index that contains objects (Type "MyThing") that have a "Tags" property. This tag is a random comma-delimited set of words from a set of possible values. I've inserted anywhere from 100 to 5000 items in the index in testing. I've tried more and fewer possible words in the set.
No matter what I try the MoreLikeThis query never returns anything, and I don't understand why.
Query that isn't returning results:
var result = EsClient.Search<MyThing>(s => s
.Index(DEFAULT_INDEX)
.Query(esQuery =>
{
var mainQuery = esQuery
.MoreLikeThis(mlt => mlt
.Include(true)
.Fields(f => f.Field(ff => ff.Tags, 5))
.Like(l => l.Document(d => d.Id(id)))
);
return mainQuery;
}
Full "program.cs" source:
using Nest;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Test_MoreLikeThis_ES6
{
class Program
{
public class MyThing
{
public string Tags { get; set; }
}
const string ELASTIC_SERVER = "http://localhost:9200";
const string DEFAULT_INDEX = "my_index";
const int NUM_RECORDS = 1000;
private static Uri es_node = new Uri(ELASTIC_SERVER);
private static ConnectionSettings settings = new ConnectionSettings(es_node).DefaultIndex(DEFAULT_INDEX);
private static ElasticClient EsClient = new ElasticClient(settings);
private static Random rnd = new Random();
static void Main(string[] args)
{
Console.WriteLine("Rebuild index? (y):");
var answer = Console.ReadLine().ToLower();
if (answer == "y")
{
RebuildIndex();
for (int i = 0; i < NUM_RECORDS; i++)
{
AddToIndex();
}
}
Console.WriteLine("");
Console.WriteLine("Getting a Thing...");
var aThingId = GetARandomThingId();
Console.WriteLine("");
Console.WriteLine("Looking for something similar to document with id " + aThingId);
Console.WriteLine("");
Console.WriteLine("");
GetMoreLikeAThing(aThingId);
}
private static string GetARandomThingId()
{
var firstdocQuery = EsClient
.Search<MyThing>(s =>
s.Size(1)
.Query(q => {
return q.FunctionScore(fs => fs.Functions(fn => fn.RandomScore(rs => rs.Seed(DateTime.Now.Ticks).Field("_seq_no"))));
})
);
if (!firstdocQuery.IsValid || firstdocQuery.Hits.Count == 0) return null;
var hit = firstdocQuery.Hits.First();
Console.WriteLine("Found a thing with id '" + hit.Id + "' and tags: " + hit.Source.Tags);
return hit.Id;
}
private static void GetMoreLikeAThing(string id)
{
var result = EsClient.Search<MyThing>(s => s
.Index(DEFAULT_INDEX)
.Query(esQuery =>
{
var mainQuery = esQuery
.MoreLikeThis(mlt => mlt
.Include(true)
.Fields(f => f.Field(ff => ff.Tags, 5))
.Like(l => l.Document(d => d.Id(id)))
);
return mainQuery;
}
));
if (result.IsValid)
{
if (result.Hits.Count > 0)
{
Console.WriteLine("These things are similar:");
foreach (var hit in result.Hits)
{
Console.WriteLine(" " + hit.Id + " : " + hit.Source.Tags);
}
}
else
{
Console.WriteLine("No similar things found.");
}
}
else
{
Console.WriteLine("There was an error running the ES query.");
}
Console.WriteLine("");
Console.WriteLine("Enter (y) to get another thing, or anything else to exit");
var y = Console.ReadLine().ToLower();
if (y == "y")
{
var aThingId = GetARandomThingId();
GetMoreLikeAThing(aThingId);
}
Console.WriteLine("");
Console.WriteLine("Any key to exit...");
Console.ReadKey();
}
private static void RebuildIndex()
{
var existsResponse = EsClient.IndexExists(DEFAULT_INDEX);
if (existsResponse.Exists) //delete existing mapping (and data)
{
EsClient.DeleteIndex(DEFAULT_INDEX);
}
var rebuildResponse = EsClient.CreateIndex(DEFAULT_INDEX, c => c.Settings(s => s.NumberOfReplicas(1).NumberOfShards(5)));
var response2 = EsClient.Map<MyThing>(m => m.AutoMap());
}
private static void AddToIndex()
{
var myThing = new MyThing();
var tags = new List<string> {
"catfish",
"tractor",
"racecar",
"airplane",
"chicken",
"goat",
"pig",
"horse",
"goose",
"duck"
};
var randNum = rnd.Next(0, tags.Count);
//get randNum random tags
var rand = tags.OrderBy(o => Guid.NewGuid().ToString()).Take(randNum);
myThing.Tags = string.Join(", ", rand);
var ir = new IndexRequest<MyThing>(myThing);
var indexResponse = EsClient.Index(ir);
Console.WriteLine("Index response: " + indexResponse.Id + " : " + string.Join(" " , myThing.Tags));
}
}
}
The issue here is that the default min_term_freq value of 2 will never be satisfied for any of the terms of the prototype document because all documents contain only each tag (term) once. If you drop min_term_freq to 1, you'll get results. Might also want to set min_doc_freq to 1 too, and combine with a query that excludes the prototype document.
Here's an example to play with
const string ELASTIC_SERVER = "http://localhost:9200";
const string DEFAULT_INDEX = "my_index";
const int NUM_RECORDS = 1000;
private static readonly Random _random = new Random();
private static readonly IReadOnlyList<string> Tags =
new List<string>
{
"catfish",
"tractor",
"racecar",
"airplane",
"chicken",
"goat",
"pig",
"horse",
"goose",
"duck"
};
private static ElasticClient _client;
private static void Main()
{
var pool = new SingleNodeConnectionPool(new Uri(ELASTIC_SERVER));
var settings = new ConnectionSettings(pool)
.DefaultIndex(DEFAULT_INDEX);
_client = new ElasticClient(settings);
Console.WriteLine("Rebuild index? (y):");
var answer = Console.ReadLine().ToLower();
if (answer == "y")
{
RebuildIndex();
AddToIndex();
}
Console.WriteLine();
Console.WriteLine("Getting a Thing...");
var aThingId = GetARandomThingId();
Console.WriteLine();
Console.WriteLine("Looking for something similar to document with id " + aThingId);
Console.WriteLine();
Console.WriteLine();
GetMoreLikeAThing(aThingId);
}
public class MyThing
{
public List<string> Tags { get; set; }
}
private static string GetARandomThingId()
{
var firstdocQuery = _client
.Search<MyThing>(s =>
s.Size(1)
.Query(q => q
.FunctionScore(fs => fs
.Functions(fn => fn
.RandomScore(rs => rs
.Seed(DateTime.Now.Ticks)
.Field("_seq_no")
)
)
)
)
);
if (!firstdocQuery.IsValid || firstdocQuery.Hits.Count == 0) return null;
var hit = firstdocQuery.Hits.First();
Console.WriteLine($"Found a thing with id '{hit.Id}' and tags: {string.Join(", ", hit.Source.Tags)}");
return hit.Id;
}
private static void GetMoreLikeAThing(string id)
{
var result = _client.Search<MyThing>(s => s
.Index(DEFAULT_INDEX)
.Query(esQuery => esQuery
.MoreLikeThis(mlt => mlt
.Include(true)
.Fields(f => f.Field(ff => ff.Tags))
.Like(l => l.Document(d => d.Id(id)))
.MinTermFrequency(1)
.MinDocumentFrequency(1)
) && !esQuery
.Ids(ids => ids
.Values(id)
)
)
);
if (result.IsValid)
{
if (result.Hits.Count > 0)
{
Console.WriteLine("These things are similar:");
foreach (var hit in result.Hits)
{
Console.WriteLine($" {hit.Id}: {string.Join(", ", hit.Source.Tags)}");
}
}
else
{
Console.WriteLine("No similar things found.");
}
}
else
{
Console.WriteLine("There was an error running the ES query.");
}
Console.WriteLine();
Console.WriteLine("Enter (y) to get another thing, or anything else to exit");
var y = Console.ReadLine().ToLower();
if (y == "y")
{
var aThingId = GetARandomThingId();
GetMoreLikeAThing(aThingId);
}
Console.WriteLine();
Console.WriteLine("Any key to exit...");
}
private static void RebuildIndex()
{
var existsResponse = _client.IndexExists(DEFAULT_INDEX);
if (existsResponse.Exists) //delete existing mapping (and data)
{
_client.DeleteIndex(DEFAULT_INDEX);
}
var rebuildResponse = _client.CreateIndex(DEFAULT_INDEX, c => c
.Settings(s => s
.NumberOfShards(1)
)
.Mappings(m => m
.Map<MyThing>(mm => mm.AutoMap())
)
);
}
private static void AddToIndex()
{
var bulkAllObservable = _client.BulkAll(GetMyThings(), b => b
.RefreshOnCompleted()
.Size(1000));
var waitHandle = new ManualResetEvent(false);
Exception exception = null;
var bulkAllObserver = new BulkAllObserver(
onNext: r =>
{
Console.WriteLine($"Indexed page {r.Page}");
},
onError: e =>
{
exception = e;
waitHandle.Set();
},
onCompleted: () => waitHandle.Set());
bulkAllObservable.Subscribe(bulkAllObserver);
waitHandle.WaitOne();
if (exception != null)
{
throw exception;
}
}
private static IEnumerable<MyThing> GetMyThings()
{
for (int i = 0; i < NUM_RECORDS; i++)
{
var randomTags = Tags.OrderBy(o => Guid.NewGuid().ToString())
.Take(_random.Next(0, Tags.Count))
.OrderBy(t => t)
.ToList();
yield return new MyThing { Tags = randomTags };
}
}
And here's an example output
Found a thing with id 'Ugg9LGkBPK3n91HQD1d5' and tags: airplane, goat
These things are similar:
4wg9LGkBPK3n91HQD1l5: airplane, goat
9Ag9LGkBPK3n91HQD1l5: airplane, goat
Vgg9LGkBPK3n91HQD1d5: airplane, goat, goose
sQg9LGkBPK3n91HQD1d5: airplane, duck, goat
lQg9LGkBPK3n91HQD1h5: airplane, catfish, goat
9gg9LGkBPK3n91HQD1l5: airplane, catfish, goat
FQg9LGkBPK3n91HQD1p5: airplane, goat, goose
Jwg9LGkBPK3n91HQD1p5: airplane, goat, goose
Fwg9LGkBPK3n91HQD1d5: airplane, duck, goat, tractor
Kwg9LGkBPK3n91HQD1d5: airplane, goat, goose, horse
I've got a stream of tokens that are produced very quickly and a processer that is relatively slow. The tokens are of three sub-types and I would prefer them to processed by their priority. So, I would like the tokens to be buffered after they've been produced and are waiting to be processed and have that buffer sorted by priority.
Here're my classes:
public enum Priority
{
High = 3,
Medium = 2,
Low = 1
}
public class Base : IComparable<Base>
{
public int Id { get; set; }
public int CompareTo(Base other)
{
return Id.CompareTo(other.Id);
}
}
public class Foo : Base { }
public class Bar : Base { }
public class Baz : Base { }
public class Token : IComparable<Token>
{
private readonly string _toString;
public Foo Foo { get; }
public Bar Bar { get; }
public Baz Baz { get; }
public Priority Priority =>
Baz == null
? Bar == null
? Priority.High
: Priority.Medium
: Priority.Low;
public int CompareTo(Token other)
{
if (Priority > other.Priority)
{
return -1;
}
if (Priority < other.Priority)
{
return 1;
}
switch (Priority)
{
case Priority.High:
return Foo.CompareTo(other.Foo);
case Priority.Medium:
return Bar.CompareTo(other.Bar);
case Priority.Low:
return Baz.CompareTo(other.Baz);
default:
throw new ArgumentOutOfRangeException();
}
}
public override string ToString()
{
return _toString;
}
public Token(Foo foo)
{
_toString = $"{nameof(Foo)}:{foo.Id}";
Foo = foo;
}
public Token(Foo foo, Bar bar) : this(foo)
{
_toString += $":{nameof(Bar)}:{bar.Id}";
Bar = bar;
}
public Token(Foo foo, Baz baz) : this(foo)
{
_toString += $":{nameof(Baz)}:{baz.Id}";
Baz = baz;
}
}
And here is my producer code:
var random = new Random();
var bazId = 0;
var barId = 0;
var fooTokens = (from id in Observable.Interval(TimeSpan.FromSeconds(1))
.Select(Convert.ToInt32)
.Take(3)
select new Token(new Foo { Id = id }))
.Publish();
var barTokens = (from fooToken in fooTokens
from id in Observable.Range(0, random.Next(5, 10))
.Select(_ => Interlocked.Increment(ref barId))
select new Token(fooToken.Foo, new Bar { Id = id }))
.Publish();
var bazTokens = (from barToken in barTokens
from id in Observable.Range(0, random.Next(1, 5))
.Select(_ => Interlocked.Increment(ref bazId))
select new Token(barToken.Foo, new Baz { Id = id }))
.Publish();
var tokens = bazTokens.Merge(barTokens)
.Merge(fooTokens)
.Do(dt =>
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine($"{DateTime.Now:mm:ss.fff}:{dt}");
});
// Subscription
bazTokens.Connect();
barTokens.Connect();
fooTokens.Connect();
However I'm a bit stuck as to how to buffer and sort the tokens. If I do this, the tokens appear to be produced and consumed at the same time, which suggests that there's some buffering going on behind the scenes, but I can't control it.
tokens.Subscribe(dt =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(250));
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"{DateTime.Now:mm:ss.fff}:{dt}");
});
If I use a TPL Dataflow ActionBlock, I can see the tokens being produced correctly and processed correctly, but I'm still not sure how to do the sorting.
var proc = new ActionBlock<Token>(dt =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(250));
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"{DateTime.Now:mm:ss.fff}:{dt}");
});
tokens.Subscribe(dt => proc.Post(dt));
Any ideas or pointers where to go next would be appreciated!
Update:
I got something to work. I added a helper to clean up the code for displaying the test data:
private static void Display(Token dt, ConsoleColor col, int? wait = null)
{
if (wait.HasValue)
{
Thread.Sleep(TimeSpan.FromMilliseconds(wait.Value));
}
Console.ForegroundColor = col;
Console.WriteLine($"{DateTime.Now:mm:ss.fff}:{dt}");
}
I added a SortedSet:
var set = new SortedSet<Token>();
var tokens = bazTokens
.Merge(barTokens)
.Merge(fooTokens)
.Do(dt => Display(dt, ConsoleColor.Red));
tokens.Subscribe(dt => set.Add(dt));
And I also added a consumer, although I'm not a fan of my implementation:
var source = new CancellationTokenSource();
Task.Run(() =>
{
while (!source.IsCancellationRequested)
{
var dt = set.FirstOrDefault();
if (dt == null)
{
continue;
}
if (set.Remove(dt))
{
Display(dt, ConsoleColor.Green, 250);
}
}
}, source.Token);
So, now I'm getting exactly the results I'm looking for, but a) I'm not happy with the while polling and b) If I want multiple consumers, I'm going to run into race conditions. So, I'm still looking for better implementations if anyone has one!
The container you want is a priority queue, unfortunately there is no implementation in the .net runtime (there is in the c++ stl/cli but priority_queue is not made available to other languages from that).
There are existing non-MS containers that fill this role, you would need to search and look at the results to pick one that meets your needs.
Using Dataflow you can filter the tokens such that each priority level goes down a different path in your pipeline. The tokens are filtered through the use of a predicate on each priority typed link. Then it's up to you how you want to give preference based on priority.
Sorting:
var highPriority = new ActionBlock<Token>(dt =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(250));
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"{DateTime.Now:mm:ss.fff}:{dt}");
});
var midPriority = new ActionBlock<Token>(dt =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(250));
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"{DateTime.Now:mm:ss.fff}:{dt}");
});
var lowPriority = new ActionBlock<Token>(dt =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(250));
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"{DateTime.Now:mm:ss.fff}:{dt}");
});
var proc = new BufferBlock<Token>();
proc.LinkTo(highPriority, dt => dt.Priority == Priority.High);
proc.LinkTo(midPriority, dt => dt.Priority == Priority.Medium);
proc.LinkTo(lowPriority, dt => dt.Priority == Priority.Low);
tokens.Subscribe(dt => proc.Post(dt));
One way to give preference to higher priority items would be to allow more than the default sequential processing. You can do that by setting the MaxDegreeOfParallelism for each priority block.
Giving Preference:
var highPriOptions = new DataflowLinkOptions(){MaxDegreeOfParallelism = 3}
var highPriority = new ActionBlock<Token>(dt =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(250));
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"{DateTime.Now:mm:ss.fff}:{dt}");
}, highPriOptions);
var midPriOptions = new DataflowLinkOptions(){MaxDegreeOfParallelism = 2}
var midPriority = new ActionBlock<Token>(dt =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(250));
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"{DateTime.Now:mm:ss.fff}:{dt}");
}, midPriOptions);
var lowPriority = new ActionBlock<Token>(dt =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(250));
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"{DateTime.Now:mm:ss.fff}:{dt}");
});
var proc = new BufferBlock<Token>();
proc.LinkTo(highPriority, dt => dt.Priority == Priority.High);
proc.LinkTo(midPriority, dt => dt.Priority == Priority.Medium);
proc.LinkTo(lowPriority, dt => dt.Priority == Priority.Low);
tokens.Subscribe(dt => proc.Post(dt));
These samples are by no means complete but should at least give you the idea.
Okay, so I used a normal lock for accessing the SortedSet, then increased the number of consumers and it seems to be working fine, so although I've not been able to come up with a full RX or a split RX / TPL DataFlow solution, this now does what I want, so I'll just show the changes I made in addition to the update in the original question and leave it there.
var set = new SortedSet<Token>();
var locker = new object();
var tokens = bazTokens
.Merge(barTokens)
.Merge(fooTokens)
.Do(dt => Display(dt, ConsoleColor.Red));
tokens.Subscribe(dt =>
{
lock (locker)
{
set.Add(dt);
}
});
for (var i = 0; i < Environment.ProcessorCount; i++)
{
Task.Run(() =>
{
while (!source.IsCancellationRequested)
{
Token dt;
lock (locker)
{
dt = set.FirstOrDefault();
}
if (dt == null)
{
continue;
}
bool removed;
lock (locker)
{
removed = set.Remove(dt);
}
if (removed)
{
Display(dt, ConsoleColor.Green, 750);
}
}
}, source.Token);
}
Thank you to the people who posted solutions, I appreciate the time you spent.
I think the conundrum here is that what you seem to be really after is the results of a pull model, based on fast, hot, push sources. What you seem to want is the "highest" priority yet received, but the question is "received by what?" If you had multiple subscribers, operating at different paces, they could each have their own view of what "highest" was.
So the way I see it is that you want to merge the sources into a kind of reactive, prioritized (sorted) queue, from which you pull results when the observer is ready.
I approached that by using a signal back to the Buffer, saying "my one observer is now ready to see the state of the prioritized list". This is achieved by using the Buffer overload that takes in an observable closing signal. That buffer contains the new list of elements received, which I just merge into the last list, sans 'highest'.
The code is just demo knocked up code for the purposes of this question - there are probably bugs:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Reactive.Concurrency;
using System.Reactive.Linq;
using System.Reactive.Subjects;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace RxTests
{
class Program
{
static void Main(string[] args)
{
var p = new Program();
p.TestPrioritisedBuffer();
Console.ReadKey();
}
void TestPrioritisedBuffer()
{
var source1 = Observable.Interval(TimeSpan.FromSeconds(1)).Do((source) => Console.WriteLine("Source1:"+source));
var source2 = Observable.Interval(TimeSpan.FromSeconds(5)).Scan((x,y)=>(x+100)).Do((source) => Console.WriteLine("Source2:" + source)); ;
BehaviorSubject<bool> closingSelector = new BehaviorSubject<bool>(true);
var m = Observable.Merge(source1, source2).
Buffer(closingSelector).
Select(s => new { list =s.ToList(), max=(long)0 }).
Scan((x, y) =>
{
var list = x.list.Union(y.list).OrderBy(k=>k);
var max = list.LastOrDefault();
var res = new
{
list = list.Take(list.Count()-1).ToList(),
max= max
};
return res;
}
).
Do((sorted) => Console.WriteLine("Sorted max:" + sorted.max + ". Priority queue length:" + sorted.list.Count)).
ObserveOn(Scheduler.Default); //observe on other thread
m.Subscribe((v)=> { Console.WriteLine("Observed: "+v.max); Thread.Sleep(3000); closingSelector.OnNext(true); }) ;
}
}
}
I have looked into this Q/A , though it is working too some extent but not as expected. I want it to happen sequentially.How to do that?
Thanks in advance.
You can use Enumerable.Zip to combine the agents and accounts together (after repeating the list of agents to match or exceed the number of accounts). Then GroupBy agent.
var repeatCount = lstAccounts.Count / lstAgents.Count + 1;
var agents = Enumerable.Repeat(lstAgents, repeatCount).SelectMany(x => x);
// agents = { "Agent1", "Agent2", "Agent3", "Agent1", "Agent2", "Agent3" }
// lstAccounts = { "1001" , "1002" , "1003" , "1004" , "1005" }
var result = agents
.Zip(lstAccounts, (agent, account) => new { Agent = agent, Account = account })
.GroupBy(x => x.Agent)
.Select(g => new { Agent = g.Key, Accounts = g.Select(x => x.Account).ToList() })
.ToList();
It might not be the fastest way to do it, but it's short and readable.
Edit
Another way (probably nicer) to achieve the same result is to start by mapping each account to an index of agent using index % lstAgents.Count.
var result = lstAccounts
.Select((acc, index) => new { AgentIndex = index % lstAgents.Count, Account = acc })
.GroupBy(x => x.AgentIndex)
.Select(g => new { Agent = lstAgents[g.Key], Accounts = g.Select(x => x.Account).ToList() })
.ToList();
The algorithm is very similar to the one proposed by varocarbas, but expressed in a functional (not imperative) way.
I think that conventional loops are the best approach here: easy-to-build, clear and very scalable-/modifiable-friendly. For example:
Dictionary<string, List<string>> results = new Dictionary<string, List<string>>();
int i = -1;
while (i < lstAccounts.Count - 1)
{
for (int i2 = 0; i2 < lstAgents.Count; i2++)
{
i = i + 1;
string curAccount = lstAccounts[i];
string curAgent = lstAgents[i2];
if (!results.ContainsKey(curAgent)) results.Add(curAgent, new List<string>());
results[curAgent].Add(curAccount);
if (i >= lstAccounts.Count - 1) break;
}
}
Additionally, note that this approach is quite fast. As a reference: around 4-5 times faster (results after a simplistic test with one of the provided inputs and a Stopwatch) than the alternative proposed by Jakub in his answer.
You can try this approach with linq extention. Split extension method will split the accounts list into "n" parts (number of agents) so that you can assign each part to agents.
class Program
{
static void Main(string[] args)
{
List<string> lstAgents = new List<string>() { "Agent1", "Agent2","Agent3" };
List<string> lstAccounts = new List<string>() { "1001", "1002" ,"1003", "1004", "1005" };
var op = lstAccounts.Split(lstAgents.Count);
int i = 0;
foreach (var accounts in op)
{
//Get agent
Console.WriteLine("Account(s) for Agent: ", lstAgents[i]);
foreach (var acc in accounts)
{
Console.WriteLine(acc);
}
Console.WriteLine(Environment.NewLine);
i++;
}
Console.ReadKey();
}
}
static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
int i = 0;
var splits = from item in list
group item by i++ % parts into part
select part.AsEnumerable();
return splits;
}
}
This is a stripped down version of code I am working on. The purpose of the code is to take a string of information, break it down, and parse it into key value pairs.
Using the info in the example below, a string might look like:
"DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567"
One further point about the above example, at least three of the features we have to parse out will occasionally include additional values. Here is an updated fake example string.
"DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568"
The problem with this is that the code refuses to split out DIVIDE and DIV information separately. Instead, it keeps splitting at DIV and then assigning the rest of the information as the value.
Is there a way to tell my code that DIVIDE and DIV need to be parsed out as two separate values, and to not turn DIVIDE into DIV?
public List<string> FeatureFilterStrings
{
// All possible feature types from the EWSD switch.
get
{
return new List<string>() { "DIVIDE", "DIV", "CLACOS", "INT"};
}
}
public void Parse(string input){
Func<string, bool> queryFilter = delegate(string line) { return FeatureFilterStrings.Any(s => line.Contains(s)); };
Regex regex = new Regex(#"(?=\\bDIVIDE|DIV|CLACOS|INT)");
string[] ms = regex.Split(updatedInput);
List<string> queryLines = new List<string>();
// takes the parsed out data and assigns it to the queryLines List<string>
foreach (string m in ms)
{
queryLines.Add(m);
}
var features = queryLines.Where(queryFilter);
foreach (string feature in features)
{
foreach (Match m in Regex.Matches(workLine, valueExpression))
{
string key = m.Groups["key"].Value.Trim();
string value = String.Empty;
value = Regex.Replace(m.Groups["value"].Value.Trim(), #"s", String.Empty);
AddKeyValue(key, value);
}
}
private void AddKeyValue(string key, string value)
{
try
{
// Check if key already exists. If it does, remove the key and add the new key with updated value.
// Value information appends to what is already there so no data is lost.
if (this.ContainsKey(key))
{
this.Remove(key);
this.Add(key, value.Split('&'));
}
else
{
this.Add(key, value.Split('&'));
}
}
catch (ArgumentException)
{
// Already added to the dictionary.
}
}
}
Further information, the string information does not have a set number of spaces between each key/value, each string may not include all of the values, and the features aren't always in the same order. Welcome to parsing old telephone switch information.
I would create a dictionary from your input string
string input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";
var dict = Regex.Matches(input, #"(\w+?) = (.+?)( |$)").Cast<Match>()
.ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);
Test the code:
foreach(var kv in dict)
{
Console.WriteLine(kv.Key + "=" + kv.Value);
}
This might be a simple alternative for you.
Try this code:
var input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";
var parts = input.Split(new [] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var dictionary =
parts.Select((x, n) => new { x, n })
.GroupBy(xn => xn.n / 2, xn => xn.x)
.Select(xs => xs.ToArray())
.ToDictionary(xs => xs[0], xs => xs[1]);
I then get the following dictionary:
Based on your updated input, things get more complicated, but this works:
var input = "DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568";
Func<string, char, string> tighten =
(i, c) => String.Join(c.ToString(), i.Split(c).Select(x => x.Trim()));
var parts =
tighten(tighten(input, '&'), ',')
.Split(new[] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var dictionary =
parts
.Select((x, n) => new { x, n })
.GroupBy(xn => xn.n / 2, xn => xn.x)
.Select(xs => xs.ToArray())
.ToDictionary(
xs => xs[0],
xs => xs
.Skip(1)
.SelectMany(x => x.Split(','))
.SelectMany(x => x.Split('&'))
.ToArray());
I get this dictionary:
This is really holding up the process.
I have a db of marks stored as short, and want to extract a single entry out and create summary statistics on it. It seems simple but I seem to have to jump through a ton of hoops to get it so must be missing something basic.
Here is the basic method, and this works happily. however all i can do with it is pass it to the DataGridView.
private void MarksSummary(string StudentID)
{
int ID = Convert.ToInt32(StudentID);
//get the average of the marks using entity
using (var context = new collegestudentsEntities1())
{
var StudentMarks = (from m in context.Marks
where m.StudIDFK == ID
select new
{
m.Marks1,
m.marks2,
m.Marks3,
m.Marks4
});
dataGridView1.DataSource = StudentMarks.ToList();
Anything else, seems to be ridiculously long winded.
Eg: I can't do this
var Marklist = new List<Int16>();
StudentMarks.ToList().ForEach(m => Marklist.Add(m));
as I get "cannot convert from 'AnonymousType#1' to 'short'"
or this
Marklist = StudentMarks.ToList();
or this
double av = Marklist.Average();
Yet I can do a forEach which is silly on one row of data
foreach (var s in StudentMarks)
{
Marklist.Add(s.Marks1);
Marklist.Add(s.marks2);
Marklist.Add(s.Marks3);
Marklist.Add(s.Marks4);
}
and this works outputting happily
txtMarksOverFifty.Text = Marklist.Count(s => s > 50).ToString();
txtMarksFailed.Text = Marklist.Count(s => s < 50).ToString();
So what am I missing to get the values out of the query easily?
Thanks for your help :-)
Your foreach is trying to add an anonymous type
select new
{
m.Marks1,
m.marks2,
m.Marks3,
m.Marks4
} //...
To a List<Int16> so it's not surprising that fails. What it looks like you want to do with that is:
StudentMarks.ToList().ForEach(m => Marklist.AddRange(new [] { m.Marks1, m.marks2, m.Marks3, m.Marks4 }));
Edit: If you're just looking for a solution with less code you might try:
using (var context = new collegestudentsEntities1())
{
var StudentMarks = (from m in context.Marks
where m.StudIDFK == ID
select new[]
{
m.Marks1,
m.marks2,
m.Marks3,
m.Marks4
}).SelectMany(mark => mark).ToList();
}
or simply:
List<Int16> Marklist = context.Marks.Where(mark => mark.StudIDFK == ID)
.SelectMany(m => new [] { m.Marks1, m.marks2, m.Marks3, m.Marks4 })
.ToList();
Look at what you are creating here:
select new
{
m.Marks1,
m.marks2,
m.Marks3,
m.Marks4
});
This is an object that contains shorts.
StudentMarks.ToList().ForEach(m => Marklist.Add(m));
Here you are trying to add an object to a list of shorts. Try:
StudentMarks.ToList().ForEach(m => {
Marklist.Add(m.Mark1);
Marklist.Add(m.Mark2);
Marklist.Add(m.Mark3);
Marklist.Add(m.Mark4);
}
);