How to avoid "Maximum serialisation depth exceeded" when using insertmany in Mongodb

How to avoid "Maximum serialisation depth exceeded" when using insertmany in Mongodb - c#

Saving Disk information from Azure:
var credentials = SdkContext.AzureCredentialsFactory.FromServicePrincipal("myclientId", "mytenant", "mysecretId", AzureEnvironment.AzureGlobalCloud);
var azure = Azure.Authenticate(credentials).WithSubscription("mySubscription");
var groupName = "myResourceGroup";
var vmName = "myVM";
var location = Region.USWest;
var vm = azure.Disks.List();
Console.WriteLine("Getting information about the virtual machine...");
MongoClient client = new MongoClient("mylocalhost");
IMongoDatabase database = client.GetDatabase("VM");
var collection = database.GetCollection<IDisk>("Disk ");
collection.InsertManyAsync(vm);
When I save them to Mongodb, I get an error:
maximum serialization depth exceeded (does the object being serialized have a circular reference?).
What am I doing wrong here?

It sounds like the IDisk that you're getting back from that API is resolving to a circular graph that the Mongodb isn't very happy with. The most simple fix, then, is : don't serialize the IFile - after all, it isn't your type, and you can't control it. Instead, create some type of your own that has exactly what you want on it, and serialize that. For example:
sealed class MyDisk // TODO: rename me
{
public string Key {get;set;}
public string Name {get;set;}
public string RegionName {get;set;}
public int SizeGB {get;set;}
// etc, but *only* the information you actually want, and *only* using
// primitive types or types you control
}
...
var disks = (from disk in azure.Disks.List();
select new MyDisk {
Key = disk.Key,
Name= disk.Name,
RegionName = disk.RegionName,
SizeGB = disk.SizeInGB,
// etc
}).ToList();
And store disks, which we know doesn't have a circular reference, because we control it.

Related

Using OneHotEncoding in Microsoft.ML.AutoML

In my project, I am forced to do some Machine Learning with C#. Unfortunately, ml.net is much less intuitive than in all other languages, and I fail to execute a RegressionExperiment.
First, here are my data classes:
public class DataPoint
{
[ColumnName("Label")]
public float y { get; set; }
[ColumnName("catFeature")]
public string str { get; set; }
[ColumnName("smth")]
public float smth { get; set; }
}
public class MLOutput
{
[ColumnName("Score")]
public float score { get; set; }
}
I think my problem lies in the encoding of a category variable. For a single model, the code below works fine.
//Create an ML Context
var ctx = new MLContext();
IDataView trainingData = ctx.Data.LoadFromEnumerable(data: data as IEnumerable<DataPoint>);
// Build your data processing and training pipeline
var pipeline = ctx.Transforms.Categorical.OneHotEncoding(outputColumnName: "catFeatureEnc", inputColumnName: "catFeature")
.Append(ctx.Transforms.Concatenate("Features", new[] {"catFeatureEnc","smth"}))
.Append(ctx.Regression.Trainers.FastForest());
// Train your model ????
var trainedModel = pipeline.Fit(trainingData); // shouldn't we transform before fit?????
IDataView transformedData = trainedModel.Transform(trainingData);
Now, removing the FastForest model from the pipeline and adding the AutoML code, Microsoft.ML cannot handle the encoding:
// Build your data processing and training pipeline
var pipeline = ctx.Transforms.Categorical.OneHotEncoding(outputColumnName: "catFeatureEnc", inputColumnName: "catFeature")
.Append(ctx.Transforms.Concatenate("Features", new[] {"catFeatureEnc","smth"}))
.Append(ctx.Transforms.Conversion.ConvertType("Features", "Features", DataKind.Single));
// do smth ???
var trainedModel = pipeline.Fit(trainingData); // nothing there to be fitted???
IDataView transformedData = trainedModel.Transform(trainingData); // shouldn't we transform before fit?????
var experimentSettings = new RegressionExperimentSettings();
experimentSettings.MaxExperimentTimeInSeconds = 60;
// Cancel experiment after the user presses any key
var cts = new CancellationTokenSource();
experimentSettings.CancellationToken = cts.Token;
RegressionExperiment experiment = ctx.Auto().CreateRegressionExperiment(experimentSettings);
ExperimentResult<RegressionMetrics> experimentResult = experiment.Execute(transformedData, "Label");
Now, I get the following exception:
Only supported feature column types are Boolean, Single, and String. Please change the feature column catFeatureEnc of type Key<UInt32, 0-2> to one of the supported types. "
If I remove catFeatureEnc from the Concatenate call, the code works fine. Alternatively, I tried to create a new pipeline for the training with the transformed data. Unfortunately, this approach doesn't work in the slightest, as the new pipeline expects arbitrary data types for many features.
Another alternative approach:
ExperimentResult<RegressionMetrics> experimentResult = experiment.Execute(trainingData, "Label");
throws the exception:
Training failed with the exception: System.InvalidOperationException: Concatenated columns should have the same type. Column 'smth' has type of Single, but the expected column type is Byte
Idk...Why is a Byte expected?
How can I use the encoded feature with Microsoft Auto.ML?

It looks like you're using the old version of the API. I would recommend trying the latest version.
To use it, you'll have to add the ML.NET daily feed.
https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json
Here is a few samples using the new API:
AutoML with column inference and Auto featurizer
You can also take a look at this other sample which include OneHotEncoding. AutoML with data processing pipeline

How to serialize/deserialize from ksql avro format to c# using confluent platform

I am using KsqlDb a table with the following form:
KSQL-DB Query
create table currency (id integer,name varchar) with (kafka_topic='currency',partitions=1,value_format='avro');
C# model
public class Currency
{
public int Id{get;set;}
public string Name{get;set;}
}
Now i want to know how should i write/read data from this topic in C# using the Confluent library:
Writing
IProducer<int, Currency> producer=....
Currency cur=new Currency();
Message<int,Currency> message = new Message<int, Currency>
{
Key = msg.Id,
Timestamp = new Timestamp(DateTime.UtcNow, TimestampType.CreateTime),
Value = msg
};
DeliveryResult<int,Currency> delivery = await this.producer.ProduceAsync(topic,message);
Reading
IConsumer<int,Currency> iconsumer = new ConsumerBuilder<int, Currency>(config)
.SetKeyDeserializer(Deserializers.Int32) //i assume i need to use the id from my dto
.SetValueDeserializer(...) //what deserializer
.Build();
ConsumeResult<int,Currency> result = consumer.Consume();
Currency message = // what deserializer JsonSerializer.Deserialize<Currency>(result.Message.Value);
I am not sure how to go about this so i tried looking for serializer. I found this library AvroSerializer , but i do not get where the author fetches the schema.
Any help on how to read/write to a specific topic that would match with my ksqldb models ?
Update
After some research and some answers here i have started using the schemaRegistry
var config = new ConsumerConfig
{
GroupId = kafkaConfig.ConsumerGroup,
BootstrapServers = kafkaConfig.ServerUrl,
AutoOffsetReset = AutoOffsetReset.Earliest
};
var schemaRegistryConfig = new SchemaRegistryConfig
{
Url = kafkaConfig.SchemaRegistryUrl
};
var schemaRegistry = new CachedSchemaRegistryClient(schemaRegistryConfig);
IConsumer<int,Currency> consumer = new ConsumerBuilder<int, Currency>(config)
.SetKeyDeserializer(new AvroDeserializer<int>(schemaRegistry).AsSyncOverAsync())
.SetValueDeserializer(new AvroDeserializer<Currency>(schemaRegistry).AsSyncOverAsync())
.Build();
ConsumeResult<int, Currency> result = consumer.Consume();
Now i am getting another error:
Expecting data framing of length 5 bytes or more but total data size
is 4 bytes
As someone kindly pointed out it seems i retrieving only the id from the schema registry.
How can i just : insert into currency (id,name) values (1,3) and retrieve it in C# as a POCO (listed above) ?
Update 2
After i have found this source program it seems i am not able to publish messages to tables for some reason.
There is no error when sending the message but it is not published to Kafka.

I found this library AvroSerializer , but i do not get where the author fetches the schema.
Unclear why you need to use a library other than the Confluent one, but they get it from the Schema Registry. You can use CachedSchemaRegistryClient to get the schema string easily, however you shouldn't need this in the code as the deserializer will download from the registry on its own.
If you refer to the examples/ in the confluent-kafka-dotnet repo for Specific Avro consumption, you can see they generate the User class from User.avsc file, which seems to be exactly what you want to do here for Currency rather than write it yourself

I have solved the problem by defining my custom serializer , thus implementing the ISerializer<T> and IDeserializer<T> interfaces which in their belly are just wrappers over System.Text.Json.JsonSerializer or NewtonsoftJson.
Serializer
public class MySerializer:ISerializer<T>
{
byte[] Serialize(T data, SerializationContext context)
{
var str=System.Text.Json.JsonSerializer.Serialize(data); //you can also use Newtonsoft here
var bytes=Encoding.UTF8.GetBytes(str);
return bytes;
}
}
Usage
var config = new ConsumerConfig
{
GroupId = kafkaConfig.ConsumerGroup,
BootstrapServers = kafkaConfig.ServerUrl,
AutoOffsetReset = AutoOffsetReset.Earliest
};
IConsumer<int,Currency> consumer = new ConsumerBuilder<int, Currency>(config)
.SetValueDeserializer(new MySerializer<Currency>())
.Build();
ConsumeResult<int, Currency> result = consumer.Consume();
P.S
I am not even using the schema registry here afteri implemented the interface

Why do I get the "reference not set to an instance of an object?

I am working with the new CosmosDB SDK v3 https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sdk-dotnet-standard and a very simple insert, I have verified all the objects are indeed not null and have reasonable values but I still get the error message:
[1/12/2019 10:35:04] System.Private.CoreLib: Exception while executing function: HAPI_HM_Seasons. Microsoft.Azure.Cosmos.Direct: Object reference not set to an instance of an object.
I dont see why this is I must be missing something really basic here but I cant put my finger on it.
The code is as below:
List<SeasonInformation> seasonInformationList = new List<SeasonInformation>();
foreach(JObject document in listOfSeasons)
{
SeasonInformation seasonInformation = new SeasonInformation
{
id = Guid.NewGuid().ToString(),
Brand = brand,
IntegrationSource = source,
DocumentType = Enums.DocumentType.Season,
UpdatedBy = "HAPI_HM_Seasons",
UpdatedDate = DateTime.Now.ToString(),
UpdatedDateUtc = string.Format("{0:yyyy-MM-ddTHH:mm:ss.FFFZ}", DateTime.UtcNow),
OriginalData = document
};
seasonInformationList.Add(seasonInformation);
}
database = cosmosClient.GetDatabase(cosmosDBName);
container = database.GetContainer(cosmosDBCollectionNameRawData);
log.LogInformation(string.Format("HAPI_HM_Seasons BASIC setup done at {0:yyyy-MM-ddTHH:mm:ss.FFFZ}", DateTime.UtcNow));
log.LogInformation(string.Format("HAPI_HM_Seasons import {1} items BEGIN at {0:yyyy-MM-ddTHH:mm:ss.FFFZ}", DateTime.UtcNow, seasonInformationList.Count));
foreach(var season in seasonInformationList)
{
ItemResponse<SeasonInformation> response = await container.CreateItemAsync(season);
}
I have verified that the List is populated and that the season variable in the loop contains the correct data so I am a bit stuck here.
The exception happens in the last foreach loop where I try CreateItemAsync into CosmosDB

As a best practice, you need to use Async method with await in all the Cosmosdb methods just to make sure that they are getting executed and you get the response,
and modify your CreateItemAsync as follows,
ItemResponse<SeasonInformation> response = await container.CreateItemAsync(season, new PartitionKey(season.whatever));
Here is the Sample Repository

RavenDB throws a JSON deserialisation error when retrieving document

I've just completed a round of refactoring of my application, which has resulted in my removing a project that was no longer required and moving its classes into a different project. A side effect of this is that my User class, which is stored in RavenDB, has a collection property of a type moved to the new assembly. As soon as I attempt to query the session for the User class I get a Json deserialisation error. The issue is touched upon here but the answers don't address my issue. Here's the offending property:
{
"OAuthAccounts": {
"$type": "System.Collections.ObjectModel.Collection`1[
[Friendorsement.Contracts.Membership.IOAuthAccount,
Friendorsement.Contracts]], mscorlib",
"$values": []
},
}
OAuthAccounts is a collection property of User that used to map here:
System.Collections.ObjectModel.Collection`1[[Friendorsement.Contracts.Membership.IOAuthAccount, Friendorsement.Contracts]]
It now maps here:
System.Collections.ObjectModel.Collection`1[[Friendorsement.Domain.Membership.IOAuthAccount, Friendorsement.Domain]]
Friendorsement.Contracts no longer exists. All of its types are now in Friendorsement.Domain
I've tried using store.DatabaseCommands.StartsWith("User", "", 0, 128) but that didn't return anything.
I've tried looking at UpdateByIndex but not got very far with it:
store.DatabaseCommands.UpdateByIndex("Raven/DocumentsByEntityName",
new IndexQuery {Query = "Tag:Users"},
new[]
{
new PatchRequest { // unsure what to set here }
});
I'm using Raven 2.0

Below is a simple sample application that shows you the patching Metadata. While your example is a little different this should be a good starting point
namespace SO19941925
{
internal class Program
{
private static void Main(string[] args)
{
IDocumentStore store = new DocumentStore
{
Url = "http://localhost:8080",
DefaultDatabase = "SO19941925"
}.Initialize();
using (IDocumentSession session = store.OpenSession())
{
for (int i = 0; i < 10; i++)
{
session.Store(new User {Name = "User" + i});
}
session.SaveChanges();
}
using (IDocumentSession session = store.OpenSession())
{
List<User> users = session.Query<User>().Customize(x => x.WaitForNonStaleResultsAsOfNow()).ToList();
Console.WriteLine("{0} SO19941925.Users", users.Count);
}
Operation s = store.DatabaseCommands.UpdateByIndex("Raven/DocumentsByEntityName",
new IndexQuery {Query = "Tag:Users"},
new ScriptedPatchRequest
{
Script = #"this['#metadata']['Raven-Clr-Type'] = 'SO19941925.Models.User, SO19941925';"
}, true
);
s.WaitForCompletion();
using (IDocumentSession session = store.OpenSession())
{
List<Models.User> users =
session.Query<Models.User>().Customize(x => x.WaitForNonStaleResultsAsOfNow()).ToList();
Console.WriteLine("{0} SO19941925.Models.Users", users.Count);
}
Console.ReadLine();
}
}
internal class User
{
public string Name { get; set; }
}
}
namespace SO19941925.Models
{
internal class User
{
public string Name { get; set; }
}
}
UPDATE: Based on the initial answer above, here is the code that actually solves the OP question:
store.DatabaseCommands.UpdateByIndex("Raven/DocumentsByEntityName",
new IndexQuery {Query = "Tag:Users"},
new ScriptedPatchRequest
{
Script = #"this['OAuthAccounts']['$type'] =
'System.Collections.ObjectModel.Collection`1[
[Friendorsement.Domain.Membership.IFlexOAuthAccount,
Friendorsement.Domain]], mscorlib';",
}, true
);

Here are two possible solutions:
Option 1: Depending on what state your project is in, for example if you are still in development, you could easily just delete that collection out of RavenDB from the Raven Studio and recreate all those User documents. All the new User documents should then have the correct class name and assembly and should then deserialize correctly. Obviously, if you are already in production, this probably won't be a good option.
Option 2: Depending on how many User documents you have, you should be able to manually edit each one to specify the correct C# class name and assembly, so that they will be deserialized correctly. Again, if you have too many objects to manually modify, this may not be a good option; however, if there are just a few, it shouldn't be too bad to open each one up go to the metadata tab and paste the correct value for "Raven-Entity-Name" and "Raven-Clr-Type".

I ended up doing this:
Advanced.DatabaseCommands.UpdateByIndex(
"Raven/DocumentsByEntityName",
new IndexQuery {Query = "Tag:Album"},
new []{ new PatchRequest() {
Type = PatchCommandType.Modify,
Name = "#metadata",
Nested= new []{
new PatchRequest{
Name= "Raven-Clr-Type",
Type = PatchCommandType.Set,
Value = "Core.Model.Album, Core" }}}},
false);

Migrate serialized objects to new version

I would like to migrate me previously serialized objects in database to new schema.
My previous object.
Public interface MyReport
{
string Id { get; set;}
string Name { get; set;}
Dictionary<string, string> PropColl { get; set;}
}
But for some reasons we had to make interface changes
Public interface IMarkme
{
}
Public interface MyReport<T> where T : Imarkme
{
string Id { get; set;}
string Name { get; set;}
T ExtendedProp { get; set;}
}
Public NewProp : Imarkme
{
/// some code here
}
So as you can see my interface has been modified and I would like to migrate my serialized objects which were serialized based on MyReport to MyReport
Can someone provide me some input as what kind of utility I should aim to write which can help me achieve migrating my serialized object to new modified interface version.
Thanks,
AG

I have actually done something similar recently, where I have created a simple console application to be able to transform some serialized objects from one version to another. I have simply used both versions of dlls and reflection to read and write the values of different properties. Probably you'll find this helpful as an inspiration ;)
static void Main(string[] args)
{
object test;
AppDomain.CurrentDomain.AssemblyResolve += domain_AssemblyResolve;
using (var con = new SqlConnection(connectionString))
{
using (var cmd = new SqlCommand())
{
cmd.CommandText = "select top 1 Data_Blob from dbo.Serialized";
cmd.CommandType = CommandType.Text;
cmd.Connection = con;
con.Open();
var blob = (byte[])cmd.ExecuteScalar();
var bf = new BinaryFormatter();
var stream = new MemoryStream(blob);
bf.AssemblyFormat = FormatterAssemblyStyle.Full;
test = bf.Deserialize(stream);
}
}
var objNewVersion = Activator.CreateInstance(Type.GetType("ObjectGraphLibrary.Test, ObjectGraphLibrary, Version=1.0.0.10, Culture=neutral, PublicKeyToken=33c7c38cf0d65826"));
var oldType = test.GetType();
var newType = objNewVersion.GetType();
var oldName = (string) oldType.GetProperty("Name").GetValue(test, null);
var oldAge = (int) oldType.GetProperty("Age").GetValue(test, null);
newType.GetProperty("Name").SetValue(objNewVersion, oldName, null);
newType.GetProperty("DateOfBirth").SetValue(objNewVersion, DateTime.Now.AddYears(-oldAge), null);
Console.Read();
}
static Assembly domain_AssemblyResolve(object sender, ResolveEventArgs args)
{
var assName = new AssemblyName(args.Name);
var uriBuilder = new UriBuilder(Assembly.GetExecutingAssembly().CodeBase);
var assemblyPath = Uri.UnescapeDataString(uriBuilder.Path);
var codeBase = Path.GetDirectoryName(assemblyPath);
var assPath = Path.Combine(codeBase, string.Format("old\\{0}.{1}.{2}.{3}\\{4}.dll", assName.Version.Major,
assName.Version.Minor, assName.Version.Build,
assName.Version.Revision, assName.Name));
return File.Exists(assPath) ? Assembly.LoadFile(assPath) : null;
}

1) Write a utility that reads the serialized objects in the old object definition.
2) The utility writes your objects into the DB in a non-serialized manner (ie, with one piece of data in every field, etc...).
Don't get into the habit of serializing objects, and storing the somewhere in persistent storage for retrieval (much) later. Serialization was not built for that.
You have run into the problem of C programmers in the old days: they would create a struct in memory, save that struct into a file. Then the struct's members would change, and they woudl wonder how to read it back, since the data was encoded differently.
then along came database formats, INI files, and so on, specifically to address this need, so saving data in one format, and then being able to read it without error.
So don't repeat errors of the past. Serialization was created to facilitate short-term binary storage and the ability to, say, transmit an object over TCP/IP.
At worst, store your data as XML, not as serialized binary stream. Also, there is no assurance that I know about from MS that says that serialized data from one version of .NET will be able to be read from another. Convert your data to a legible format while you can.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to avoid "Maximum serialisation depth exceeded" when using insertmany in Mongodb - c#

Related

Using OneHotEncoding in Microsoft.ML.AutoML

How to serialize/deserialize from ksql avro format to c# using confluent platform

Why do I get the "reference not set to an instance of an object?

RavenDB throws a JSON deserialisation error when retrieving document

Migrate serialized objects to new version

Categories

Resources