To Implement watch operation on multiple mongo databases in c# using changestreams - c#

var options= new ChangeStreamOptions {
FullDocument = ChangeFullStreamOption.UpdateLookup
};
var enumerator = newclient
.GetDatabase(databaseName)
.Watch(options)
.ToEnumerable()
.GetEnumerator();
enumerator.MoveNext();
Above code is watching single database , need to watch the updates in multiple database parallelly

I would suggest using channels for this as they are designed for synchronising parallel sources for processing
This would look something like this
using MongoDB.Bson;
using MongoDB.Driver;
using System.Threading.Channels;
var dblist = new string[]{
"db1",
"db2",
"db3"
};
var options = new ChangeStreamOptions
{
FullDocument = ChangeStreamFullDocumentOption.UpdateLookup
};
var client = new MongoClient();
var watcher = Channel.CreateUnbounded<ChangeStreamDocument<BsonDocument>>();
var tokenSource = new CancellationTokenSource();
List<Task> tasks = new List<Task>();
foreach (var db in dblist)
{
//create a monitor for each database
tasks.Add(
monitorDB(db,client, watcher.Writer, tokenSource.Token)
);
}
//create the processor for your changes
var processor = processChanges(watcher.Reader);
//wait for all the monitors to complete
await Task.WhenAll(tasks);
// when all monitors have ended mark the channel as completed
watcher.Writer.Complete();
//wait for the processorto complete
await processor;
async Task monitorDB(string name, IMongoClient client, ChannelWriter<ChangeStreamDocument<BsonDocument>> writer, CancellationToken token)
{
var enumerator = client
.GetDatabase(name)
.Watch(options, token)
.ToEnumerable()
;
foreach (var change in enumerator)
{
// wait for the channel to be ready to write
await writer.WaitToWriteAsync();
// write the change to the channel
await writer.WriteAsync(change);
//if cancelled exit
if (token.IsCancellationRequested)
break;
}
}
async Task processChanges(ChannelReader<ChangeStreamDocument<BsonDocument>> reader)
{
//while the channel isn't complete, wait for a new document to be received
while(await reader.WaitToReadAsync())
{
//get the waiting document
ChangeStreamDocument<BsonDocument>> doc = await reader.ReadAsync();
//do something with doc ...
}
}

Related

Why is using tasks with HttpClient synchronously so much slower?

So I was trying to do a quick performance test against a web api to see how it would handle multiple synchronous HTTP requests at once. I did this by spinning up 30 multiple tasks and have each of them send a http request with the HttpClient. To my surprise, it was extremely slow. I thought it was due to the lack of async/await or the web api was slow, but it turns out it's only when I'm using tasks and synchronous http calls (see TestSynchronousWithParallelTasks() below).
So I did a comparison between using without Tasks, async/await with tasks, and ParallelForEach by making some simple tests. All of these finished quickly around 10-20 milliseconds, but the original case which takes around 20 seconds!
Class: HttpClientTest Passed (5) 19.2 sec TestProject.HttpClientTest.TestAsyncWithParallelTasks Passed 12 ms TestProject.HttpClientTest.TestIterativeAndSynchronous Passed 22 ms TestProject.HttpClientTest.TestParallelForEach Passed 15 ms TestProject.HttpClientTest.TestSynchronousWithParallelTasks Passed 19.1 sec TestProject.HttpClientTest.TestSynchronousWithParallelThreads Passed 10 ms
public class HttpClientTest
{
private HttpClient httpClient;
private readonly ITestOutputHelper _testOutputHelper;
public HttpClientTest(ITestOutputHelper testOutputHelper)
{
_testOutputHelper = testOutputHelper;
ServicePointManager.DefaultConnectionLimit = 100;
httpClient = new HttpClient(new HttpClientHandler { MaxConnectionsPerServer = 100 });
}
[Fact]
public async Task TestSynchronousWithParallelTasks()
{
var tasks = new List<Task>();
var url = "https://localhost:44388/api/values";
for (var i = 0; i < 30; i++)
{
var task = Task.Run(() =>
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
}
[Fact]
public void TestIterativeAndSynchronous()
{
var url = "https://localhost:44388/api/values";
for (var i = 0; i < 30; i++)
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
}
}
[Fact]
public async Task TestAsyncWithParallelTasks()
{
var url = "https://localhost:44388/api/values";
var tasks = new List<Task>();
for (var i = 0; i < 30; i++)
{
var task = Task.Run(async () =>
{
var response = await httpClient.GetAsync(url);
var content = await response.Content.ReadAsStringAsync();
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
}
[Fact]
public void TestParallelForEach()
{
var url = "https://localhost:44388/api/values";
var n = new int[30];
Parallel.ForEach(n, new ParallelOptions { MaxDegreeOfParallelism = 2 }, (i) =>
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
});
}
[Fact]
public async Task TestSynchronousWithParallelThreads()
{
var tasks = new List<Task>();
var url = "https://localhost:44388/api/values";
var threads = new List<Thread>();
for (var i = 0; i < 30; i++)
{
var thread = new Thread( () =>
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
});
thread.Start();
threads.Add(thread);
}
foreach(var thread in threads)
{
thread.Join();
}
}
}
So any idea what's causing this performance hit?
I would have expected TestSynchronousWithParallelTasks() to be faster than TestIterativeAndSynchronous() as you'd be starting more requests at once, even if it's IO bound. While the latter is waiting for each request before starting a new one. So it seems like it's related to the tasks somehow blocking each other?
Edit: Added a test case to use threads instead and it's quick like the rest.

Task WhenAll Method Get List of Completed Tasks for Azure ServiceBus Send Bulk Messages

I have written a Code below to send Bulk Messages to Azure Service Bus but I am getting Error in between and I wanted to know the list of Ids that are Pushed to Azure Service Bus.
public async Task ProcessProfile(NIFFUNDbContext dbContext)
{
List<Task> concurrentTasks = new List<Task>();
queueClient = new QueueClient(sbConnectionString, sbQueueName, ReceiveMode.ReceiveAndDelete, RetryPolicy.Default);
List<int> personIds = new List<int>();
for (int i = 0; i < 10; i++)
{
queueClient.OperationTimeout = TimeSpan.FromSeconds(10);
var list = dbContext.TpPerson.Include("TpPersonContactInformation").Take(10000).ToList();
foreach (var item in list)
{
var dfObjectMessage = NIF.FUN.Framework.DAL.MSSQL.Converter.TransformToDFMessage(item, ISDObjectTypeEnum.TpPerson, EntityChangeStateEnum.Added, DateTime.Now);
var message = SB.Converter.ToMessage(dfObjectMessage);
concurrentTasks.Add(queueClient.SendAsync(message));
}
await Task.WhenAll(concurrentTasks);
//await queueClient.CloseAsync();
}
}
var concurrentTasks = new List<Task>();
var succeededMessages = new ConcurrentBag<your entity type>();
var failedMessages = new ConcurrentBag<your entity type>();
for (int i = 0; i < 10; i++) // just for the sake of showing iteration
{
var entity = <your logic of getting entity>;
concurrentTasks.Add(Task.Run(async () =>
{
try
{
var message = <message from your entity>;
await queueClient.SendAsync(message);
succeededMessages.Add(entity);
}
catch
{
failedMessages.Add(entity);
}
}));
}
await Task.WhenAll(concurrentTasks);
you can utilize the following snippet to send messages in batch so that if it fails it fails for the entire batch you dont have to worry about which message is failed and its also the suggested approach to send messages in batch.
var messagelist = new List<Message>();
messagelist.Add(new Message(Encoding.ASCII.GetBytes("Test1")));
messagelist.Add(new Message(Encoding.ASCII.GetBytes("Test2")));
var sender = new QueueClient("connectionstring","entitypath");
sender.SendAsync(messagelist);

Parallel.ForEach blocking calling method

I am having a problem with Parallel.ForEach. I have written simple application that adds file names to be downloaded to the queue, then using while loop it iterates through the queue, downloads file one at a time, then when file has been downloaded, another async method is called to create object from downloaded memoryStream. Returned result of this method is not awaited, it is discarded, so the next download starts immediately. Everything works fine if I use simple foreach in object creation - objects are being created while download is continuing. But if I would like to speed up the object creation process and use Parallel.ForEach it stops download process until the object is created. UI is fully responsive, but it just won't download the next object. I don't understand why is this happening - Parallel.ForEach is inside await Task.Run() and to my limited knowledge about asynchronous programming this should do the trick. Can anyone help me understand why is it blocking first method and how to avoid it?
Here is a small sample:
public async Task DownloadFromCloud(List<string> constructNames)
{
_downloadDataQueue = new Queue<string>();
var _gcsClient = StorageClient.Create();
foreach (var item in constructNames)
{
_downloadDataQueue.Enqueue(item);
}
while (_downloadDataQueue.Count > 0)
{
var memoryStream = new MemoryStream();
await _gcsClient.DownloadObjectAsync("companyprojects",
_downloadDataQueue.Peek(), memoryStream);
memoryStream.Position = 0;
_ = ReadFileXml(memoryStream);
_downloadDataQueue.Dequeue();
}
}
private async Task ReadFileXml(MemoryStream memoryStream)
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
await Task.Run(() =>
{
var entityList = new List<Entity>();
foreach (var item in properties)
{
entityList.Add(CreateObjectsFromDownloadedProperties(item));
}
//Parallel.ForEach(properties item =>
//{
// entityList.Add(CreateObjectsFromDownloadedProperties(item));
//});
});
}
EDIT
This is simplified object creation method:
public Entity CreateObjectsFromDownloadedProperties(RebarProperties properties)
{
var path = new LinearPath(properties.Path);
var section = new Region(properties.Region);
var sweep = section.SweepAsMesh(path, 1);
return sweep;
}
Returned result of this method is not awaited, it is discarded, so the next download starts immediately.
This is also dangerous. "Fire and forget" means "I don't care when this operation completes, or if it completes. Just discard all exceptions because I don't care." So fire-and-forget should be extremely rare in practice. It's not appropriate here.
UI is fully responsive, but it just won't download the next object.
I have no idea why it would block the downloads, but there's a definite problem in switching to Parallel.ForEach: List<T>.Add is not threadsafe.
private async Task ReadFileXml(MemoryStream memoryStream)
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
await Task.Run(() =>
{
var entityList = new List<Entity>();
Parallel.ForEach(properties, item =>
{
var itemToAdd = CreateObjectsFromDownloadedProperties(item);
lock (entityList) { entityList.Add(itemToAdd); }
});
});
}
One tip: if you have a result value, PLINQ is often cleaner than Parallel:
private async Task ReadFileXml(MemoryStream memoryStream)
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
await Task.Run(() =>
{
var entityList = proeprties
.AsParallel()
.Select(CreateObjectsFromDownloadedProperties)
.ToList();
});
}
However, the code still suffers from the fire-and-forget problem.
For a better fix, I'd recommend taking a step back and using something more suited to "pipeline"-style processing. E.g., TPL Dataflow:
public async Task DownloadFromCloud(List<string> constructNames)
{
// Set up the pipeline.
var gcsClient = StorageClient.Create();
var downloadBlock = new TransformBlock<string, MemoryStream>(async constructName =>
{
var memoryStream = new MemoryStream();
await gcsClient.DownloadObjectAsync("companyprojects", constructName, memoryStream);
memoryStream.Position = 0;
return memoryStream;
});
var processBlock = new TransformBlock<MemoryStream, List<Entity>>(memoryStream =>
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
return proeprties
.AsParallel()
.Select(CreateObjectsFromDownloadedProperties)
.ToList();
});
var resultsBlock = new ActionBlock<List<Entity>>(entities => { /* TODO */ });
downloadBlock.LinkTo(processBlock, new DataflowLinkOptions { PropagateCompletion = true });
processBlock.LinkTo(resultsBlock, new DataflowLinkOptions { PropagateCompletion = true });
// Push data into the pipeline.
foreach (var constructName in constructNames)
await downloadBlock.SendAsync(constructName);
downlodBlock.Complete();
// Wait for pipeline to complete.
await resultsBlock.Completion;
}

Processing large number of tasks concurrently and asynchronously

I would like to process a list of 50,000 urls through a web service, The provider of this service allows 5 connections per second.
I need to process these urls in parallel with adherence to provider's rules.
This is my current code:
static void Main(string[] args)
{
process_urls().GetAwaiter().GetResult();
}
public static async Task process_urls()
{
// let's say there is a list of 50,000+ URLs
var urls = System.IO.File.ReadAllLines("urls.txt");
var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 5);
foreach (var url in urls)
{
await throttler.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
Console.WriteLine(String.Format("Starting {0}", url));
var client = new HttpClient();
var xml = await client.GetStringAsync(url);
//do some processing on xml output
client.Dispose();
}
finally
{
throttler.Release();
}
}));
}
await Task.WhenAll(allTasks);
}
Instead of var client = new HttpClient(); I will create a new object of the target web service but this is just to make the code generic.
Is this the correct approach to handle and process a huge list of connections? and is there anyway I can limit the number of established connections per second to 5 as the current implementation will not consider any timeframe?
Thanks
Reading values from web service is IO operation which can be done asynchronously without multithreading.
Threads do nothing - only waiting for response in this case. So using parallel is just wasting of resources.
public static async Task process_urls()
{
var urls = System.IO.File.ReadAllLines("urls.txt");
var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 5);
foreach (var urlGroup in SplitToGroupsOfFive(urls))
{
var tasks = new List<Task>();
foreach(var url in urlGroup)
{
var task = ProcessUrl(url);
tasks.Add(task);
}
// This delay will sure that next 5 urls will be used only after 1 seconds
tasks.Add(Task.Delay(1000));
await Task.WhenAll(tasks.ToArray());
}
}
private async Task ProcessUrl(string url)
{
using (var client = new HttpClient())
{
var xml = await client.GetStringAsync(url);
//do some processing on xml output
}
}
private IEnumerable<IEnumerable<string>> SplitToGroupsOfFive(IEnumerable<string> urls)
{
var const GROUP_SIZE = 5;
var string[] group = null;
var int count = 0;
foreach (var url in urls)
{
if (group == null)
group = new string[GROUP_SIZE];
group[count] = url;
count++;
if (count < GROUP_SIZE)
continue;
yield return group;
group = null;
count = 0;
}
if (group != null && group.Length > 0)
{
yield return group.Take(group.Length);
}
}
Because you mention that "processing" of response is also IO operation, then async/await approach is most efficient, because it using only one thread and process other tasks when previous tasks waiting for response from web service or from file writing IO operations.

How to use HttpClient PostAsync() with threadpool in C#?

I'm using the following code to post an image to a server.
var image= Image.FromFile(#"C:\Image.jpg");
Task<string> upload = Upload(image);
upload.Wait();
public static async Task<string> Upload(Image image)
{
var uriBuilder = new UriBuilder
{
Host = "somewhere.net",
Path = "/path/",
Port = 443,
Scheme = "https",
Query = "process=false"
};
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.Add("locale", "en_US");
client.DefaultRequestHeaders.Add("country", "US");
var content = ConvertToHttpContent(image);
content.Headers.ContentType = MediaTypeHeaderValue.Parse("image/jpeg");
using (var mpcontent = new MultipartFormDataContent("--myFakeDividerText--")
{
{content, "fakeImage", "myFakeImageName.jpg"}
}
)
{
using (
var message = await client.PostAsync(uriBuilder.Uri, mpcontent))
{
var input = await message.Content.ReadAsStringAsync();
return "nothing for now";
}
}
}
}
I'd like to modify this code to run multiple threads. I've used "ThreadPool.QueueUserWorkItem" before and started to modify the code to leverage it.
private void UseThreadPool()
{
int minWorker, minIOC;
ThreadPool.GetMinThreads(out minWorker, out minIOC);
ThreadPool.SetMinThreads(1, minIOC);
int maxWorker, maxIOC;
ThreadPool.GetMaxThreads(out maxWorker, out maxIOC);
ThreadPool.SetMinThreads(4, maxIOC);
var events = new List<ManualResetEvent>();
foreach (var image in ImageCollection)
{
var resetEvent = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(
arg =>
{
var img = Image.FromFile(image.getPath());
Task<string> upload = Upload(img);
upload.Wait();
resetEvent.Set();
});
events.Add(resetEvent);
if (events.Count <= 0) continue;
foreach (ManualResetEvent e in events) e.WaitOne();
}
}
The problem is that only one thread executes at a time due to the call to "upload.Wait()". So I'm still executing each thread in sequence. It's not clear to me how I can use PostAsync with a thread-pool.
How can I post images to a server using multiple threads by tweaking the code above? Is HttpClient PostAsync the best way to do this?
I'd like to modify this code to run multiple threads.
Why? The thread pool should only be used for CPU-bound work (and I/O completions, of course).
You can do concurrency just fine with async:
var tasks = ImageCollection.Select(image =>
{
var img = Image.FromFile(image.getPath());
return Upload(img);
});
await Task.WhenAll(tasks);
Note that I removed your Wait. You should avoid using Wait or Result with async tasks; use await instead. Yes, this will cause async to grow through you code, and you should use async "all the way".

Categories