I have have a bunch of information that is saved inside an array of a document in MongoDb, the structure is something like this:
{
"Ticker":"TSLA34",
"History":[
{
"Price":26.36,
"UpdatedAt":"10/22/2015 10:12:00 AM"
},
{
"Price":26.37,
"UpdatedAt":"10/22/2015 10:13:00 AM"
}
]
}
Im saving the information inside the "History" array, and this information is time based, i need to insert minute by minute.
Sometimes the same minute came to me two or three times in a row. So i need to check if the current minute that i received is already inserted in this array before push into this array and save it in MongoDb, but only if the whole document does not exist, i need to create it and insert the information.
Im using MongoDb.Driver in .NET 6 and i build this method to insert the information only if the "UpdatedAt" field is not already saved into the "History" array
// var item = new Chart("TSLA34", new HistoryEntry(26.36, DateTime.UtcNow))
var filterByTicker = Builders<Chart>.Filter.Eq(g => g.Ticker, item.Ticker);
var filterByDate = Builders<Chart>.Filter.ElemMatch(g => g.History, Builders<HistoryEntry>.Filter.Eq(x => x.UpdatedAt, item.History.First().UpdatedAt));
var filter = filterByTicker & Builders<Chart>.Filter.Not(filterByDate);
var update = Builders<Chart>.Update.Push(asset => asset.History, item.History.First());
var updateOneModel = new UpdateOneModel<Chart>(filter, update) {IsUpsert = true};
I need to create the document if it isn't created, but if it is already created i need to verify if this information minute is inserted into the history.
I can do this doing a Find search and check it with Linq before do the Update process, but i was wondering if i can do this more efficiently with MongoDb.
Anyone has an idea?
Related
I'd like to do a bulk upsert in Mongo. Basically I'm getting a list of objects from a vendor, but I don't know which ones I've gotten before (and need to be updated) vs which ones are new. One by one I could do an upsert, but UpdateMany doesn't work with upsert options.
So I've resorted to selecting the documents, updating in C#, and doing a bulk insert.
public async Task BulkUpsertData(List<MyObject> newUpsertDatas)
{
var usernames = newUpsertDatas.Select(p => p.Username);
var filter = Builders<MyObject>.Filter.In(p => p.Username, usernames);
//Find all records that are in the list of newUpsertDatas (these need to be updated)
var collection = Db.GetCollection<MyObject>("MyCollection");
var existingDatas = await collection.Find(filter).ToListAsync();
//loop through all of the new data,
foreach (var newUpsertData in newUpsertDatas)
{
//and find the matching existing data
var existingData = existingDatas.FirstOrDefault(p => p.Id == newUpsertData.Id);
//If there is existing data, preserve the date created (there are other fields I preserve)
if (existingData == null)
{
newUpsertData.DateCreated = DateTime.Now;
}
else
{
newUpsertData.Id = existingData.Id;
newUpsertData.DateCreated = existingData.DateCreated;
}
}
await collection.DeleteManyAsync(filter);
await collection.InsertManyAsync(newUpsertDatas);
}
Is there a more efficient way to do this?
EDIT:
I did some speed tests.
In preparation I inserted 100,000 records of a pretty simple object. Then I upserted 200,000 records into the collection.
Method 1 is as outlined in the question. SelectMany, update in code, DeleteMany, InsertMany. This took approximately 5 seconds.
Method 2 was making a list of UpdateOneModel with Upsert = true and then doing one BulkWriteAsync. This was super slow. I could see the count in the mongo collection increasing so I know it was working. But after about 5 minutes it had only climbed to 107,000 so I canceled it.
I'm still interested if anyone else has a potential solution
Given that you've said you could do a one-by-one upsert, you can achieve what you want with BulkWriteAsync. This allows you to create one or more instances of the abstract WriteModel, which in your case would be instances of UpdateOneModel.
In order to achieve this, you could do something like the following:
var listOfUpdateModels = new List<UpdateOneModel<T>>();
// ...
var updateOneModel = new UpdateOneModel<T>(
Builders<T>.Filter. /* etc. */,
Builders<T>.Update. /* etc. */)
{
IsUpsert = true;
};
listOfUpdateModels.Add(updateOneModel);
// ...
await mongoCollection.BulkWriteAsync(listOfUpdateModels);
The key to all of this is the IsUpsert property on UpdateOneModel.
I'm trying to implement a technique known as bucketing in MongoDb (or so it was referred to as in a MongoDB workshop) and it uses the Push and Slice to achieve this. This is to achieve a user feed system similar to that of twitter/facebook.
Essentially I have a document with an array of items (feed items). I want to create a new document when this number of items reaches a certain number for a user.
So, if the latest userFeed document's collection has 50 items, i want a new document to be created and the new item to be inserted into the item array of the newly created document.
This is the code I have thus far:
var update = Builders<UserFeed>
.Update
.CurrentDate(x => x.DateLastUpdated)
.PushEach(x =>
x.Items,
new List<FeedItemBase> { feedItem },
50);
var result = await Collection.UpdateOneAsync(x =>
x.User.Id == userFeedToWriteTo,
update,
new UpdateOptions { IsUpsert = true }
).ConfigureAwait(false);
...
But it does not appear to create a new document, or even insert the item into the existing document's array. I thought the creation of the new document would be handled by this
new UpdateOptions { IsUpsert = true }
but apparently not. Any help would be greatly appreciated
So after having written out the problem and saying it out loud a few times i realised what the problem was.
I needed a counter on the main userfeed document which needed to be incremented every time an item was added (a feed item was posted). And in my query to perform the update/upsert i just needed to check for < 50. After which, everything works as expected. Here is the corrected code
var update = Builders<UserFeed>
.Update
.CurrentDate(x => x.DateLastUpdated)
.PushEach(x =>
x.Items,
new List<FeedItemBase> { feedItem },
50)
.Inc(x => x.Count, 1);
var result = await Collection.UpdateOneAsync(x =>
x.User.Id == userFeedToWriteTo && x.Count < 50,
update,
new UpdateOptions { IsUpsert = true }
).ConfigureAwait(false);
And as long as the count is NOT corrected if a user feed item is deleted from the items array, everything should work as expected. However, problems will arise if you modify the count on removal because you will end up with items added to previous documents and you will need to perform sorts after you unwind the data, which at present, i do not need to. It does mean you will end up with some documents with less than 50 items in the array, but to me that doesn't really matter.
I hope this helps someone trying to implement a similar solution in c#.
I'm using Entity Framework to build a database. There's two models; Workers and Skills. Each Worker has zero or more Skills. I initially read this data into memory from a CSV file somewhere, and store it in a dictionary called allWorkers. Next, I write the data to the database as such:
// Populate database
using (var db = new SolverDbContext())
{
// Add all distinct skills to database
db.Skills.AddRange(allSkills
.Distinct(StringComparer.InvariantCultureIgnoreCase)
.Select(s => new Skill
{
Reference = s
}));
db.SaveChanges(); // Very quick
var dbSkills = db.Skills.ToDictionary(k => k.Reference, v => v);
// Add all workers to database
var workforce = allWorkers.Values
.Select(i => new Worker
{
Reference = i.EMPLOYEE_REF,
Skills = i.GetSkills().Select(s => dbSkills[s]).ToArray(),
DefaultRegion = "wa",
DefaultEfficiency = i.TECH_EFFICIENCY
});
db.Workers.AddRange(workforce);
db.SaveChanges(); // This call takes 00:05:00.0482197
}
The last db.SaveChanges(); takes over five minutes to execute, which I feel is far too long. I ran SQL Server Profiler as the call is executing, and basically what I found was thousands of calls to:
INSERT [dbo].[SkillWorkers]([Skill_SkillId], [Worker_WorkerId])
VALUES (#0, #1)
There are 16,027 rows being added to SkillWorkers, which is a fair amount of data but not huge by any means. Is there any way to optimize this code so it doesn't take 5min to run?
Update: I've looked at other possible duplicates, such as this one, but I don't think they apply. First, I'm not bulk adding anything in a loop. I'm doing a single call to db.SaveChanges(); after every row has been added to db.Workers. This should be the fastest way to bulk insert. Second, I've set db.Configuration.AutoDetectChangesEnabled to false. The SaveChanges() call now takes 00:05:11.2273888 (In other words, about the same). I don't think this really matters since every row is new, thus there are no changes to detect.
I think what I'm looking for is a way to issue a single UPDATE statement containing all 16,000 skills.
One easy method is by using the EntityFramework.BulkInsert extension.
You can then do:
// Add all workers to database
var workforce = allWorkers.Values
.Select(i => new Worker
{
Reference = i.EMPLOYEE_REF,
Skills = i.GetSkills().Select(s => dbSkills[s]).ToArray(),
DefaultRegion = "wa",
DefaultEfficiency = i.TECH_EFFICIENCY
});
db.BulkInsert(workforce);
I am new to Deedle. I searched everywhere looking for examples that can help me to complete the following task:
Index data frame using multiple columns (3 in the example - Date, ID and Title)
Add numeric columns in multiple data frames together (Sales column in the example)
Group and add together sales occurred on the same day
My current approach is given below. First of all - it does not work because of the missing values and I don't know how to handle them easily while adding data frames. Second - I wonder if there is a better more elegant way to do it.
// Remove unused columns
var df = dfRaw.Columns[new[] { "Date", "ID", "Title", "Sales" }];
// Index data frame using 3 columns
var dfIndexed = df.IndexRowsUsing(r => Tuple.Create(r.GetAs<DateTime>("Date"), r.GetAs<string>("ID"), r.GetAs<string>("Title")) );
// Remove indexed columns
dfIndexed.DropColumn("Date");
dfIndexed.DropColumn("ID");
dfIndexed.DropColumn("Title");
// Add data frames. Does not work as it will add only
// keys existing in both data frames
dfTotal += dfIndexed
Table 1
Date,ID,Title,Sales,Market
2014-03-01,ID1,Title1,1,US
2014-03-01,ID1,Title1,2,CA
2014-03-03,ID2,Title2,3,CA
Table 2
Date,ID,Title,Sales,Market
2014-03-02,ID1,Title1,2,US
2014-03-03,ID2,Title2,2,CA
Expected Results
Date,ID,Title,Sales
2014-03-01,ID1,Title1,3
2014-03-02,ID1,Title1,2
2014-03-03,ID2,Title2,5
I think that your approach with using tuples makes sense.
It is a bit unfortunate that there is no easy way to specify default values when adding!
The easiest solution I can think of is to realign both series to the same set of keys and use fill operation to provide defaults. Using simple series as an example, something like this should do the trick:
var allKeys = seris1.Keys.Union(series2.Keys);
var aligned1 = series1.Realign(allKeys).FillMissing(0.0);
var aligned2 = series2.Realign(allKeys).FillMissing(0.0);
var res = aligned1 + aligned2;
I am in the process of improving a console app and at the moment I cant get it to update rows instead of just creating a new row with the newer information in it.
class Program
{
List<DriveInfo> driveList = DriveInfo.GetDrives().Where(x => x.IsReady).ToList<DriveInfo>(); //Get all the drive info
Server server = new Server(); //Create the server object
ServerDrive serverDrives = new ServerDrive();
public static void Main()
{
Program c = new Program();
c.RealDriveInfo();
c.WriteInToDB();
}
public void RealDriveInfo()
{
//Insert information of one server
server.ServerID = 0; //(PK) ID Auto-assigned by SQL
server.ServerName = string.Concat(System.Environment.MachineName);
//Inserts ServerDrives information.
for (int i = 0; i < driveList.Count; i++)
{
//All Information used in dbo.ServerDrives
serverDrives.DriveLetter = driveList[i].Name;
serverDrives.TotalSpace = driveList[i].TotalSize;
serverDrives.DriveLabel = driveList[i].VolumeLabel;
serverDrives.FreeSpace = driveList[i].TotalFreeSpace;
serverDrives.DriveType = driveList[i].DriveFormat;
server.ServerDrives.Add(serverDrives);
}
}
public void WriteInToDB()
{
//Add the information to an SQL Database using Linq.
DataClasses1DataContext db = new DataClasses1DataContext(#"sqlserver");
db.Servers.InsertOnSubmit(server);
db.SubmitChanges();
What I would like it to use to update the information would be the RealDriveInfo() Method so instead of creating new entries it updates the currently stored information by running the method then inserting the information from the method and if needed will enter a new entry instead of simply entering new entries every time it has newer information.
At the moment it is running the method, gathering the relevant data then entering it in as a new row in both tables.
Any help would be appreciated :)
It's creating a new db entry each time because you are making a new server object each time, then calling InsertOnSubmit() - which inserts (creates) a new record.
I'm not entirely sure what you are trying to do, but a db update would involve selecting an existing record, modifying it, then attaching it back to the data context and calling SubmitChanges().
This article on Updating Entities (Linq toSQL) might help.
The problem is that you are trying to achieve Update functionality with a tool that is designed to provide object-oriented quering. LINQ allows for updating exisitng records, but you have to use it in a proper way to achieve this.
The proper way is to fetch data you want to update from the DB, perform modifications and then flush it back to the DB. So, assuming there are table named Servers in your data context, here's an abstract example:
DataClasses1DataContext db = new DataClasses1DataContext(#"sqlserver");
var servers = db.Servers.Where(srv=>srv.ID>1000); //extracting all servers with ID > 100 using lambda expression
foreach (server in servers){
server.Memory *=2; //let's feed them up with memory
}
db.Servers.SubmitChanges();
Another way to achieve this is to create an entity, than attach it to the DataContext using Table.Attach method, but it's quite a slippery slope, so I wouldn't recommend you taking it unless you have your LINQ skills improved.
For a detailed description, see
SubmitChanges
Lambda Expressions
I understand what is being asked, and I do not have an easy answer.
Example, you have a form of values, several of the values are changed, maybe some calculated. Or the form can contain a new record.
You create a record of the values
myrecord = new MyRecord()
Then fill in myRecord. doing what ever validation/calculations you want before you even touch the database itself.
//GetID either returns an existing ID or it returns a zero if this is a new record.
myrecord.id = GetIDForRecordOrZeroIfANewRecord(uniqueName);
myrecord.value1 = txtValue1.text;
myrecord.value2 = (DateTime)dtDate.value;
and so on through the fields.
You now have a record, if id is zero you can add it as a new record. But if id is an existing record you seem to have no choice with Linq except to have a function that writes each value from myrecord, so you have to have a function that contains something like -
var thisRecord = from n in mydatacontext.MyTable
where n.id == myrecord.id
select n;
thisrecord.value1 = myrecord.value1;
thisrecord.value2 = myrecord.value2;
and so on through all fields.
I do it, but it seems long winded when I already have all of the information ready in myrecord. A simple function of
mydatacontext.MyTable.Update(myrecord);
Would be ideal. Simmilar in fact to what I do with stored SQL functions in other databases, it simplifies the transfer of a record that is an update rather than new.