CosmosDB Query Performance - c#

I wrote my latest update, and then got the following error from Stack Overflow: "Body is limited to 30000 characters; you entered 38676."
It's fair to say I have been very verbose in documenting my adventures, so I've rewritten what I have here to be more concise.
I have stored my (long) original post and updates on pastebin. I don't think many people will read them, but I put a lot of effort in to them so it'd be nice not to have them lost.
I have a collection which contains 100,000 documents for learning how to use CosmosDB and for things like performance testing.
Each of these documents has a Location property which is a GeoJSON Point.
According to the documentation, a GeoJSON point should be automatically indexed.
Azure Cosmos DB supports automatic indexing of Points, Polygons, and LineStrings
I've checked the Indexing Policy for my collection, and it has the entry for automatic point indexing:
{
"automatic":true,
"indexingMode":"Consistent",
"includedPaths":[
{
"path":"/*",
"indexes":[
...
{
"kind":"Spatial",
"dataType":"Point"
},
...
]
}
],
"excludedPaths":[ ]
}
I've been looking for a way to list, or otherwise interrogate the indexes that have been created, but I haven't found such a thing yet, so I haven't been able to confirm that this property definitely is being indexed.
I created a GeoJSON Polygon, and then used that to query my documents.
This is my query:
var query = client
.CreateDocumentQuery<TestDocument>(documentCollectionUri)
.Where(document => document.Type == this.documentType && document.Location.Intersects(target.Area));
And I then pass that query object to the following method so I can get the results while tracking the Request Units used:
protected async Task<IEnumerable<T>> QueryTrackingUsedRUsAsync(IQueryable<T> query)
{
var documentQuery = query.AsDocumentQuery();
var documents = new List<T>();
while (documentQuery.HasMoreResults)
{
var response = await documentQuery.ExecuteNextAsync<T>();
this.AddUsedRUs(response.RequestCharge);
documents.AddRange(response);
}
return documents;
}
The point locations are randomly chosen from 10s of millions of UK addresses, so they should have a fairly realistic spread.
The polygon is made up of 16 points (with the first and last point being the same), so it's not very complex. It covers most of the most southern part of the UK, from London down.
An example run of this query returned 8728 documents, using 3917.92 RU, in 170717.151 ms, which is just under 171 seconds, or just under 3 minutes.
3918 RU / 171 s = 22.91 RU/s
I currently have the Throughput (RU/s) set to the lowest value, at 400 RU/s.
It was my understanding that this is the reserved level you are guaranteed to get. You can "burst" above that level at times, but do that too frequently and you'll be throttled back to your reserved level.
The "query speed" of 23 RU/s is, obviously, much much lower than the Throughput setting of 400 RU/s.
I am running the client "locally" i.e. in my office, and not up in the Azure data center.
Each document is roughly 500 bytes (0.5 kb) in size.
So what's happening?
Am I doing something wrong?
Am I misunderstanding how my query is being throttled with regard to RU/s?
Is this the speed at which the GeoSpatial indexes operate, and so the best performance I'll get?
Is the GeoSpatial index not being used?
Is there a way I can view the created indexes?
Is there a way I can check if the index is being used?
Is there a way I can profile the query and get metrics about where time is being spent? e.g. s was used looking up documents by their type, s was used filtering them GeoSpatially, and s was used transferring the data.
UPDATE 1
Here's the polygon I'm using in the query:
Area = new Polygon(new List<LinearRing>()
{
new LinearRing(new List<Position>()
{
new Position(1.8567 ,51.3814),
new Position(0.5329 ,51.4618),
new Position(0.2477 ,51.2588),
new Position(-0.5329 ,51.2579),
new Position(-1.17 ,51.2173),
new Position(-1.9062 ,51.1958),
new Position(-2.5434 ,51.1614),
new Position(-3.8672 ,51.139 ),
new Position(-4.1578 ,50.9137),
new Position(-4.5373 ,50.694 ),
new Position(-5.1496 ,50.3282),
new Position(-5.2212 ,49.9586),
new Position(-3.7049 ,50.142 ),
new Position(-2.1698 ,50.314 ),
new Position(0.4669 ,50.6976),
new Position(1.8567 ,51.3814)
})
})
I have also tried reversing it (since ring orientation matters), but the query with the reversed polygon took significantly longer (I don't have the time to hand) and returned 91272 items.
Also, the coordinates are specified as Longitude/Latitude, as this is how GeoJSON expects them (i.e. as X/Y), rather than the traditional order used when speaking of Latitude/Longitude.
The GeoJSON specification specifies longitude first and latitude second.
UPDATE 2
Here's the JSON for one of my documents:
{
"GeoTrigger": null,
"SeverityTrigger": -1,
"TypeTrigger": -1,
"Name": "13, LONSDALE SQUARE, LONDON, N1 1EN",
"IsEnabled": true,
"Type": 2,
"Location": {
"$type": "Microsoft.Azure.Documents.Spatial.Point, Microsoft.Azure.Documents.Client",
"type": "Point",
"coordinates": [
-0.1076407397346815,
51.53970315059827
]
},
"id": "0dc2c03e-082b-4aea-93a8-79d89546c12b",
"_rid": "EQttAMGhSQDWPwAAAAAAAA==",
"_self": "dbs/EQttAA==/colls/EQttAMGhSQA=/docs/EQttAMGhSQDWPwAAAAAAAA==/",
"_etag": "\"42001028-0000-0000-0000-594943fe0000\"",
"_attachments": "attachments/",
"_ts": 1497973747
}
UPDATE 3
I created a minimal reproduction of the issue, and I found the issue no longer occured.
This indicated that the problem was indeed in my own code.
I set out to check all the differences between the original and the reproduction code and eventually found that something that appeared to be fairly innocent to me was infact having a big impact. And thankfully, that code wasn't needed at all, so it was an easy fix to simply not use that bit of code.
At one point I was using a custom ContractResolver and I hadn't removed it once it was no longer needed.
Here's the offending reproduction code:
using System;
using System.Collections.Generic;
using System.Configuration;
using System.Diagnostics;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Spatial;
using Newtonsoft.Json;
using Newtonsoft.Json.Serialization;
namespace Repro.Cli
{
public class Program
{
static void Main(string[] args)
{
JsonConvert.DefaultSettings = () =>
{
return new JsonSerializerSettings
{
ContractResolver = new PropertyNameMapContractResolver(new Dictionary<string, string>()
{
{ "ID", "id" }
})
};
};
//AJ: Init logging
Trace.AutoFlush = true;
Trace.Listeners.Add(new ConsoleTraceListener());
Trace.Listeners.Add(new TextWriterTraceListener("trace.log"));
//AJ: Increase availible threads
//AJ: https://learn.microsoft.com/en-us/azure/storage/storage-performance-checklist#subheading10
//AJ: https://github.com/Azure/azure-documentdb-dotnet/blob/master/samples/documentdb-benchmark/Program.cs
var minThreadPoolSize = 100;
ThreadPool.SetMinThreads(minThreadPoolSize, minThreadPoolSize);
//AJ: https://learn.microsoft.com/en-us/azure/cosmos-db/performance-tips
//AJ: gcServer enabled in app.config
//AJ: Prefer 32-bit disabled in project properties
//AJ: DO IT
var program = new Program();
Trace.TraceInformation($"Starting # {DateTime.UtcNow}");
program.RunAsync().Wait();
Trace.TraceInformation($"Finished # {DateTime.UtcNow}");
//AJ: Wait for user to exit
Console.WriteLine();
Console.WriteLine("Hit enter to exit...");
Console.ReadLine();
}
public async Task RunAsync()
{
using (new CodeTimer())
{
var client = await this.GetDocumentClientAsync();
var documentCollectionUri = UriFactory.CreateDocumentCollectionUri(ConfigurationManager.AppSettings["databaseID"], ConfigurationManager.AppSettings["collectionID"]);
//AJ: Prepare Test Documents
var documentCount = 10000; //AJ: 10,000
var documentsForUpsert = this.GetDocuments(documentCount);
await this.UpsertDocumentsAsync(client, documentCollectionUri, documentsForUpsert);
var allDocuments = this.GetAllDocuments(client, documentCollectionUri);
var area = this.GetArea();
var documentsInArea = this.GetDocumentsInArea(client, documentCollectionUri, area);
}
}
private async Task<DocumentClient> GetDocumentClientAsync()
{
using (new CodeTimer())
{
var serviceEndpointUri = new Uri(ConfigurationManager.AppSettings["serviceEndpoint"]);
var authKey = ConfigurationManager.AppSettings["authKey"];
var connectionPolicy = new ConnectionPolicy
{
ConnectionMode = ConnectionMode.Direct,
ConnectionProtocol = Protocol.Tcp,
RequestTimeout = new TimeSpan(1, 0, 0),
RetryOptions = new RetryOptions
{
MaxRetryAttemptsOnThrottledRequests = 10,
MaxRetryWaitTimeInSeconds = 60
}
};
var client = new DocumentClient(serviceEndpointUri, authKey, connectionPolicy);
await client.OpenAsync();
return client;
}
}
private List<TestDocument> GetDocuments(int count)
{
using (new CodeTimer())
{
return External.CreateDocuments(count);
}
}
private async Task UpsertDocumentsAsync(DocumentClient client, Uri documentCollectionUri, List<TestDocument> documents)
{
using (new CodeTimer())
{
//TODO: AJ: Parallelise
foreach (var document in documents)
{
await client.UpsertDocumentAsync(documentCollectionUri, document);
}
}
}
private List<TestDocument> GetAllDocuments(DocumentClient client, Uri documentCollectionUri)
{
using (new CodeTimer())
{
var query = client
.CreateDocumentQuery<TestDocument>(documentCollectionUri, new FeedOptions()
{
MaxItemCount = 1000
});
var documents = query.ToList();
return documents;
}
}
private Polygon GetArea()
{
//AJ: Longitude,Latitude i.e. X/Y
//AJ: Ring orientation matters
return new Polygon(new List<LinearRing>()
{
new LinearRing(new List<Position>()
{
new Position(1.8567 ,51.3814),
new Position(0.5329 ,51.4618),
new Position(0.2477 ,51.2588),
new Position(-0.5329 ,51.2579),
new Position(-1.17 ,51.2173),
new Position(-1.9062 ,51.1958),
new Position(-2.5434 ,51.1614),
new Position(-3.8672 ,51.139 ),
new Position(-4.1578 ,50.9137),
new Position(-4.5373 ,50.694 ),
new Position(-5.1496 ,50.3282),
new Position(-5.2212 ,49.9586),
new Position(-3.7049 ,50.142 ),
new Position(-2.1698 ,50.314 ),
new Position(0.4669 ,50.6976),
//AJ: Last point must be the same as first point
new Position(1.8567 ,51.3814)
})
});
}
private List<TestDocument> GetDocumentsInArea(DocumentClient client, Uri documentCollectionUri, Polygon area)
{
using (new CodeTimer())
{
var query = client
.CreateDocumentQuery<TestDocument>(documentCollectionUri, new FeedOptions()
{
MaxItemCount = 1000
})
.Where(document => document.Location.Intersects(area));
var documents = query.ToList();
return documents;
}
}
}
public class TestDocument : Resource
{
public string Name { get; set; }
public Point Location { get; set; } //AJ: Longitude,Latitude i.e. X/Y
public TestDocument()
{
this.Id = Guid.NewGuid().ToString("N");
}
}
//AJ: This should be "good enough". The times being recorded are seconds or minutes.
public class CodeTimer : IDisposable
{
private Action<TimeSpan> reportFunction;
private Stopwatch stopwatch = new Stopwatch();
public CodeTimer([CallerMemberName]string name = "")
: this((ellapsed) =>
{
Trace.TraceInformation($"{name} took {ellapsed}, or {ellapsed.TotalMilliseconds} ms.");
})
{ }
public CodeTimer(Action<TimeSpan> report)
{
this.reportFunction = report;
this.stopwatch.Start();
}
public void Dispose()
{
this.stopwatch.Stop();
this.reportFunction(this.stopwatch.Elapsed);
}
}
public class PropertyNameMapContractResolver : DefaultContractResolver
{
private Dictionary<string, string> propertyNameMap;
public PropertyNameMapContractResolver(Dictionary<string, string> propertyNameMap)
{
this.propertyNameMap = propertyNameMap;
}
protected override string ResolvePropertyName(string propertyName)
{
if (this.propertyNameMap.TryGetValue(propertyName, out string resolvedName))
return resolvedName;
return base.ResolvePropertyName(propertyName);
}
}
}

I was using a custom ContractResolver and that was evidently having a big impact on the performance of the DocumentDB classes from the .Net SDK.
This was how I was setting the ContractResolver:
JsonConvert.DefaultSettings = () =>
{
return new JsonSerializerSettings
{
ContractResolver = new PropertyNameMapContractResolver(new Dictionary<string, string>()
{
{ "ID", "id" }
})
};
};
And this is how it was implemented:
public class PropertyNameMapContractResolver : DefaultContractResolver
{
private Dictionary<string, string> propertyNameMap;
public PropertyNameMapContractResolver(Dictionary<string, string> propertyNameMap)
{
this.propertyNameMap = propertyNameMap;
}
protected override string ResolvePropertyName(string propertyName)
{
if (this.propertyNameMap.TryGetValue(propertyName, out string resolvedName))
return resolvedName;
return base.ResolvePropertyName(propertyName);
}
}
The solution was easy, don't set JsonConvert.DefaultSettings so the ContractResolver isn't used.
Results:
I was able to perform my spatial query in 21799.0221 ms, which is 22 seconds.
Previously it took 170717.151 ms, which is 2 minutes 50 seconds.
That's about 8x faster!

Related

Akka.Net in-memory persistence calls `Recover` after app restart

I'am trying to test persistence actor, but behavior is wierd.
My tested actor:
public class PredictionManager : ReceivePersistentActor
{
public override string PersistenceId => _persistanceId;
public PredictionManager(string persistenceId)
{
_persistanceId = persistenceId;
Command<AddPredictionRequest>(OnPrediction);
Recover<SnapshotOffer>(x => OnRecover((PredictionManagerState)x.Snapshot), x => x.Snapshot is PredictionManagerState);
}
private void OnPrediction(AddPredictionRequest request)
{
/* some code */
_state.Add(request);
SaveSnapshot(_state);
}
private void OnRecover(PredictionManagerState state)
{
foreach(var request in state.RequestMap)
{
OnPrediction(request.Value);
}
}
}
My state save all messages and deletes them after manager actor recieve some message. When I try to debug my test, Recover function called first and after this called OnPrediction. My question is - how it's possible? If data stores in momory, why it have SnapshotOffer? Also I have tried to generate new percistenceId from Guid.NewGuid() but it doesn't work.
public void AddPrediction_PassToChild_CreateNewManager_PassToChild()
{
var sender = CreateTestProbe(Sys);
var persistanceId = "AddPrediction_PassToChild_CreateNewManager_PassToChild";
var props = Props.Create(() => new PredictionManager(Mock.Of<IEventBus>(), persistanceId));
var predictionManager = ActorOf(props);
var message = new PredictionManager.AddPredictionRequest(Props.Create(() => new ChildTestActor(sender.Ref)),
new StartPrediction<IPredictionParameter>("a", 1, "a", new Param() ));
//Act
predictionManager.Tell(message, sender);
sender.ExpectMsg<string>(x => x == "ok", TimeSpan.FromSeconds(15));
Sys.Stop(predictionManager);
predictionManager = Sys.ActorOf(props);
sender.ExpectMsg<string>(x => x == "ok", TimeSpan.FromSeconds(15));
Sys.Stop(predictionManager);
}
I found out that default storage for snapshots is LocalStorage not MemoryStorage. So it stores snapshots in files, and this is why it has SnapshotOffer after app restart. But I still can't get why Guid.NewGuid() as persistanceId is not working.

How to get ram and cpu info using AmazonEC2Client in Asp.Net Core?

In my Asp.Net Core 3.1 project I am using AmazonEC2Client for getting info about AWS instances.
I implemented helper method for getting instance list.Method looks like:
public static async Task<List<string>> AwsList(string awsAccessKeyId, string
awsSecretAccessKey)
{
AmazonEC2Client client = new AmazonEC2Client(awsAccessKeyId,awsSecretAccessKey,
RegionEndpoint.EUWest1);
bool done = false;
var instanceIds = new List<string>();
DescribeInstancesRequest request = new DescribeInstancesRequest();
while (!done)
{
DescribeInstancesResponse response = await
client.DescribeInstancesAsync(request);
foreach ( Reservation reservation in response.Reservations)
{
foreach (Instance instance in reservation.Instances)
{
instanceIds.Add(instance.InstanceType);
}
}
request.NextToken= response.NextToken;
if (response.NextToken == null)
{
done = true;
}
}
return instanceIds;
}
Json result is:
[
"t3a.xlarge",
"t2.medium",
"t2.medium",
"t2.micro",
"t3a.xlarge",
"t2.medium",
"t3a.xlarge",
"t3a.xlarge",
"t3a.xlarge"
]
I dont know ram and cpu info inside instance type or not, no experience with aws.
I would like to get cpu and ram info according to instance type.
Later I would like to create method which is accepting string instanceType and according to this get ram and cpu.
For ex: GetRam("t2.micro") -> 2gb
Instead of using DescribeInstanceRequests need to be use DescribeInstanceTypesRequest and appropriate response as well.
foreach ( var instanceType in response.InstanceTypes.Where(x => x.InstanceType == name))
{
instanceIds.Add(instanceType.MemoryInfo.SizeInMiB); // ram
instanceIds.Add(instanceType.VCpuInfo.DefaultVCpus); //cpu
}

C# - Get total usage of GPU in percentage

I am adding a couple of new features to my program that currently sends the CPU usage and RAM usage to Arduino via serial connection (see this). I am trying to add GPU and Disk usage as well. Disk usage is not a problem but fetching GPU usage from Windows have become a real trouble.
I've tried using PerformanceCounter but that doesn't seem to work at all! See the code below.
PerformanceCounter gpuCounter = new PerformanceCounter("GPU Engine", "Utilization Percentage");
string gpuUsage = gpuCounter.NextValue()
I want the GPU usage in percentage like this:
GPU usage: #.#%
Is there any possible way i can achieve this?
You need to create a PerformanceCounterCategory for "GPU Engine", then call GetInstanceNames () on the category. You can then iterate on the name string array calling GetCounters (name) method.
public List<PerformanceCounter> Sample ()
{
var list = new List<PerformanceCounter> ();
var category = new PerformanceCounterCategory ("GPU Engine");
var names = category.GetInstanceNames ();
foreach (var name in names)
list.AddRange (category.GetCounters (name));
return list;
}
There will be both "Running Time" and "Utilization Percentage" counters in your list. You could filter based on the CounterName property for the counters returned by GetCounters.
I use the following:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Diagnostics;
using System.Threading;
public class GetCPUUsage
{
static void Main(string[] args)
{
while (true)
{
try
{
var gpuCounters = GetGPUCounters();
var gpuUsage = GetGPUUsage(gpuCounters);
Console.WriteLine(gpuUsage);
continue;
} catch {}
Thread.Sleep(1000);
}
}
public static List<PerformanceCounter> GetGPUCounters()
{
var category = new PerformanceCounterCategory("GPU Engine");
var counterNames = category.GetInstanceNames();
var gpuCounters = counterNames
.Where(counterName => counterName.EndsWith("engtype_3D"))
.SelectMany(counterName => category.GetCounters(counterName))
.Where(counter => counter.CounterName.Equals("Utilization Percentage"))
.ToList();
return gpuCounters;
}
public static float GetGPUUsage(List<PerformanceCounter> gpuCounters)
{
gpuCounters.ForEach(x => x.NextValue());
Thread.Sleep(1000);
var result = gpuCounters.Sum(x => x.NextValue());
return result;
}
}

C# - OutOfMemoryException saving a List on a JSON file

I'm trying to save the streaming data of a pressure map.
Basically I have a pressure matrix defined as:
double[,] pressureMatrix = new double[e.Data.GetLength(0), e.Data.GetLength(1)];
Basically, I'm getting one of this pressureMatrix every 10 milliseconds and I want to save all the information in a JSON file to be able to reproduce it later.
What I do is, first of all, write what I call the header with all the settings used to do the recording like this:
recordedData.softwareVersion = Assembly.GetExecutingAssembly().GetName().Version.Major.ToString() + "." + Assembly.GetExecutingAssembly().GetName().Version.Minor.ToString();
recordedData.calibrationConfiguration = calibrationConfiguration;
recordedData.representationConfiguration = representationSettings;
recordedData.pressureData = new List<PressureMap>();
var json = JsonConvert.SerializeObject(csvRecordedData, Formatting.None);
File.WriteAllText(this.filePath, json);
Then, every time I get a new pressure map I create a new Thread to add the new PressureMatrix and re-write the file:
var newPressureMatrix = new PressureMap(datos, DateTime.Now);
recordedData.pressureData.Add(newPressureMatrix);
var json = JsonConvert.SerializeObject(recordedData, Formatting.None);
File.WriteAllText(this.filePath, json);
After about 20-30 min I get an OutOfMemory Exception because the system cannot hold the recordedData var because the List<PressureMatrix> in it is too big.
How can I handle this to save a the data? I would like to save the information of 24-48 hours.
Your basic problem is that you are holding all of your pressure map samples in memory rather than writing each one individually and then allowing it to be garbage collected. What's worse, you are doing this in two different places:
You serialize your entire list of samples to a JSON string json before writing the string to a file.
Instead, as explained in Performance Tips: Optimize Memory Usage, you should serialize and deserialize directly to and from your file in such situations. For instructions on how to do this see this answer to Can Json.NET serialize / deserialize to / from a stream? and also Serialize JSON to a file.
The recordedData.pressureData = new List<PressureMap>(); accumulates all pressure map samples, then writes all of them every time a sample is made.
A better solution would be to write each sample once and forget it, but the requirement for each sample to be nested inside some container objects in the JSON makes it nonobvious how to do that.
So, how to attack issue #2?
First, let's modify your data model as follows, partitioning the header data into a separate class:
public class PressureMap
{
public double[,] PressureMatrix { get; set; }
}
public class CalibrationConfiguration
{
// Data model not included in question
}
public class RepresentationConfiguration
{
// Data model not included in question
}
public class RecordedDataHeader
{
public string SoftwareVersion { get; set; }
public CalibrationConfiguration CalibrationConfiguration { get; set; }
public RepresentationConfiguration RepresentationConfiguration { get; set; }
}
public class RecordedData
{
// Ensure the header is serialized first.
[JsonProperty(Order = 1)]
public RecordedDataHeader RecordedDataHeader { get; set; }
// Ensure the pressure data is serialized last.
[JsonProperty(Order = 2)]
public IEnumerable<PressureMap> PressureData { get; set; }
}
Option #1 is a version of the producer-comsumer pattern. It involves spinning up two threads: one to generate PressureData samples, and one to serialize the RecordedData. The first thread will generate samples and add them to a BlockingCollection<PressureMap> collection that is passed to the second thread. The second thread will then serialize BlockingCollection<PressureMap>.GetConsumingEnumerable()
as the value of RecordedData.PressureData.
The following code gives a skeleton for how to do this:
var sampleCount = 400; // Or whatever stopping criterion you prefer
var sampleInterval = 10; // in ms
using (var pressureData = new BlockingCollection<PressureMap>())
{
// Adapted from
// https://learn.microsoft.com/en-us/dotnet/standard/collections/thread-safe/blockingcollection-overview
// https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.blockingcollection-1?view=netframework-4.7.2
// Spin up a Task to sample the pressure maps
using (Task t1 = Task.Factory.StartNew(() =>
{
for (int i = 0; i < sampleCount; i++)
{
var data = GetPressureMap(i);
Console.WriteLine("Generated sample {0}", i);
pressureData.Add(data);
System.Threading.Thread.Sleep(sampleInterval);
}
pressureData.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
var recordedDataHeader = new RecordedDataHeader
{
SoftwareVersion = softwareVersion,
CalibrationConfiguration = calibrationConfiguration,
RepresentationConfiguration = representationConfiguration,
};
var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
};
using (var stream = new FileStream(this.filePath, FileMode.Create))
using (var textWriter = new StreamWriter(stream))
using (var jsonWriter = new JsonTextWriter(textWriter))
{
int j = 0;
var query = pressureData
.GetConsumingEnumerable()
.Select(p =>
{
// Flush the writer periodically in case the process terminates abnormally
jsonWriter.Flush();
Console.WriteLine("Serializing item {0}", j++);
return p;
});
var recordedData = new RecordedData
{
RecordedDataHeader = recordedDataHeader,
// Since PressureData is declared as IEnumerable<PressureMap>, evaluation will be lazy.
PressureData = query,
};
Console.WriteLine("Beginning serialization of {0} to {1}:", recordedData, this.filePath);
JsonSerializer.CreateDefault(settings).Serialize(textWriter, recordedData);
Console.WriteLine("Finished serialization of {0} to {1}.", recordedData, this.filePath);
}
}))
{
Task.WaitAll(t1, t2);
}
}
}
Notes:
This solution uses the fact that, when serializing an IEnumerable<T>, Json.NET will not materialize the enumerable as a list. Instead it will take full advantage of lazy evaluation and simply enumerate through it, writing then forgetting each individual item encountered.
The first thread samples PressureData and adds them to the blocking collection.
The second thread wraps the blocking collection in an IEnumerable<PressureData> then serializes that as RecordedData.PressureData.
During serialization, the serializer will enumerate through the IEnumerable<PressureData> enumerable, streaming each to the JSON file then proceeding to the next -- effectively blocking until one becomes available.
You will need to do some experimentation to make sure that the serialization thread can "keep up" with the sampling thread, possibly by setting a BoundedCapacity during construction. If not, you may need to adopt a different strategy.
PressureMap GetPressureMap(int count) should be some method of yours (not shown in the question) that returns the current pressure map sample.
In this technique the JSON file remains open for the duration of the sampling session. If sampling terminates abnormally the file may be truncated. I make some attempt to ameliorate the problem by flushing the writer periodically.
While data serialization will no longer require unbounded amounts of memory, deserializing a RecordedData later will deserialize the PressureData array into a concrete List<PressureMap>. This may possibly cause memory issues during downstream processing.
Demo fiddle #1 here.
Option #2 would be to switch from a JSON file to a Newline Delimited JSON file. Such a file consists of sequences of JSON objects separated by newline characters. In your case, you would make the first object contain the RecordedDataHeader information, and the subsequent objects be of type PressureMap:
var sampleCount = 100; // Or whatever
var sampleInterval = 10;
var recordedDataHeader = new RecordedDataHeader
{
SoftwareVersion = softwareVersion,
CalibrationConfiguration = calibrationConfiguration,
RepresentationConfiguration = representationConfiguration,
};
var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
};
// Write the header
Console.WriteLine("Beginning serialization of sample data to {0}.", this.filePath);
using (var stream = new FileStream(this.filePath, FileMode.Create))
{
JsonExtensions.ToNewlineDelimitedJson(stream, new[] { recordedDataHeader });
}
// Write each sample incrementally
for (int i = 0; i < sampleCount; i++)
{
Thread.Sleep(sampleInterval);
Console.WriteLine("Performing sample {0} of {1}", i, sampleCount);
var map = GetPressureMap(i);
using (var stream = new FileStream(this.filePath, FileMode.Append))
{
JsonExtensions.ToNewlineDelimitedJson(stream, new[] { map });
}
}
Console.WriteLine("Finished serialization of sample data to {0}.", this.filePath);
Using the extension methods:
public static partial class JsonExtensions
{
// Adapted from the answer to
// https://stackoverflow.com/questions/44787652/serialize-as-ndjson-using-json-net
// by dbc https://stackoverflow.com/users/3744182/dbc
public static void ToNewlineDelimitedJson<T>(Stream stream, IEnumerable<T> items)
{
// Let caller dispose the underlying stream
using (var textWriter = new StreamWriter(stream, new UTF8Encoding(false, true), 1024, true))
{
ToNewlineDelimitedJson(textWriter, items);
}
}
public static void ToNewlineDelimitedJson<T>(TextWriter textWriter, IEnumerable<T> items)
{
var serializer = JsonSerializer.CreateDefault();
foreach (var item in items)
{
// Formatting.None is the default; I set it here for clarity.
using (var writer = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
{
serializer.Serialize(writer, item);
}
// http://specs.okfnlabs.org/ndjson/
// Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A).
// The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
textWriter.Write("\n");
}
}
// Adapted from the answer to
// https://stackoverflow.com/questions/29729063/line-delimited-json-serializing-and-de-serializing
// by Yuval Itzchakov https://stackoverflow.com/users/1870803/yuval-itzchakov
public static IEnumerable<TBase> FromNewlineDelimitedJson<TBase, THeader, TRow>(TextReader reader)
where THeader : TBase
where TRow : TBase
{
bool first = true;
using (var jsonReader = new JsonTextReader(reader) { CloseInput = false, SupportMultipleContent = true })
{
var serializer = JsonSerializer.CreateDefault();
while (jsonReader.Read())
{
if (jsonReader.TokenType == JsonToken.Comment)
continue;
if (first)
{
yield return serializer.Deserialize<THeader>(jsonReader);
first = false;
}
else
{
yield return serializer.Deserialize<TRow>(jsonReader);
}
}
}
}
}
Later, you can process the newline delimited JSON file as follows:
using (var stream = File.OpenRead(filePath))
using (var textReader = new StreamReader(stream))
{
foreach (var obj in JsonExtensions.FromNewlineDelimitedJson<object, RecordedDataHeader, PressureMap>(textReader))
{
if (obj is RecordedDataHeader)
{
var header = (RecordedDataHeader)obj;
// Process the header
Console.WriteLine(JsonConvert.SerializeObject(header));
}
else
{
var row = (PressureMap)obj;
// Process the row.
Console.WriteLine(JsonConvert.SerializeObject(row));
}
}
}
Notes:
This approach looks simpler because the samples are added incrementally to the end of the file, rather than inserted inside some overall JSON container.
With this approach both serialization and downstream processing can be done with bounded memory use.
The sample file does not remain open for the duration of sampling, so is less likely to be truncated.
Downstream applications may not have built-in tools for processing newline delimited JSON.
This strategy may integrate more simply with your current threading code.
Demo fiddle #2 here.

Any examples of getnextpage usage in the twilio api for c#?

The old code I've inherited for Twilio retrieves messages using the absolute PageNumber property of the MessageListRequest but according to the documentation this is obsolete and I should be using GetNextPage and GetPrevPage.
The API metadata shows this as obsolete with the message "Use GetNextPage and GetPreviousPage for paging. Page parameter is scheduled for end of life https://www.twilio.com/engineering/2015/04/16/replacing-absolute-paging-with-relative-paging".
Are there any examples of this usage? I couldn't find any in the documentation except in one of the API test methods and I'm not sure how well I can get to processing multiple pages with this example as a guide.
public class Foo : TwilioBase
{
public string Bar { get; set; }
}
public class FooResult : TwilioListBase
{
public List<Foo> Foos { get; set; }
}
[Test]
public void ShouldGetNextPage()
{
IRestRequest savedRequest = null;
FooResult firstPage = new FooResult();
firstPage.NextPageUri = new Uri("/Foos?PageToken=abc123", UriKind.Relative);
mockClient.Setup(trc => trc.Execute<FooResult>(It.IsAny<IRestRequest>()))
.Callback<IRestRequest>((request) => savedRequest = request)
.Returns(new FooResult());
var client = mockClient.Object;
var response = client.GetNextPage<FooResult>(firstPage);
mockClient.Verify(trc => trc.Execute<FooResult>(It.IsAny<IRestRequest>()), Times.Once);
Assert.IsNotNull(savedRequest);
Assert.AreEqual("/Foos?PageToken=abc123", savedRequest.Resource);
Assert.AreEqual(Method.GET, savedRequest.Method);
Assert.IsNotNull(response);
}
The old usage might look something like so:
var twilio = new TwilioRestClient(config.AccountSid, config.AuthToken);
var result = new List<Message>();
MessageResult tempResult;
int page = 0;
do
{
var request = new MessageListRequest();
request = new MessageListRequest { Count = 1000, DateSent = newestDate, DateSentComparison = ComparisonType.GreaterThanOrEqualTo, PageNumber = page++, To = config.FromNumber };
tempResult = twilio.ListMessages(request);
result.AddRange(tempResult.Messages);
} while (tempResult.NextPageUri != null);
Finally, I built the Twilio API 3.4.1.0 from the twilio-csharp GitHub project instead of NuGet since I need to update it to use the MessagingServiceSid which isn't included in the API yet.
Thanks for any pointers. I'll post a solution if I can figure it out on my own.
Actually, I got it to work now!
MessageResult messages = twilio.ListMessages(request);
do
{
if (messages.Messages != null)
{
foreach (var message in messages.Messages)
{
... process results
}
if (messages.NextPageUri != null)
{
messages = twilio.GetNextPage<MessageResult>(messages);
}
}
} while (messages.NextPageUri != null);
Did you try the example from the API Explorer?
https://www.twilio.com/console/dev-tools/api-explorer/sms/sms-mms-list
var twilio = new TwilioRestClient(AccountSid, AuthToken);
// Build the parameters
var options = new MessageListRequest();
var messages = twilio.ListMessages(options);
foreach (var message in messages.Messages)
{
Console.WriteLine(message.Body);
}
The helper library will automatically fetch from the API as you loop over the list until all records matching your criteria are processed.
You can limit the results with MessageListRequest.
Please give that a try and let me know how it goes.

Categories