Delete document from MongoDB by _id range in c# - c#

My Mongo DB (using Azure Cosmos) reached max size of 20 GB. I didn't realize that, and now the app is not working. We were planning to delete old records (last 2 years). However, there is no date field in the document. I was thinking if _ts is maintained internally but looks like it is not. Then the only options is to use the _id (which is ObjectId). Can someone help on how to delete based on a date range using c#?

You can use getTimestamp method
In the shell, you can find creation date by this:
db.c.find()[0]["_id"].getTimestamp()
C# code will look similar to:
var client = new MongoClient();
var d = client.GetDatabase("d");
var c = d.GetCollection<BsonDocument>("c");
c.InsertOne(new BsonDocument());
var result = c.AsQueryable().First()["_id"].AsObjectId;
Console.WriteLine(result.CreationTime);

I managed to make it work using this:
var startId = new ObjectId(startDateTime, 0, 0, 0);
var endId = new ObjectId(endDateTime, 0, 0, 0);
using (var cursor = await collection.Find(x => x["_id"] > startId && x["_id"] < endId).ToCursorAsync())
{
while (await cursor.MoveNextAsync())
{
foreach (var doc in cursor.Current)
{
var creationTime = doc["_id"].AsObjectId.CreationTime;
var filter = Builders<BsonDocument>.Filter.Eq("_id", doc["_id"].AsObjectId);
try
{
var deleteResult = collection.DeleteMany(filter);
}
catch (Exception ex)
{
}
}
}
}

Related

Lucene.Net 4.8.0-beta00016 is throwing exception when i try to read FastTaxonomyFacetCounts from File Directory

I'm using Lucene.Net 4.8.0-beta00016 version and using .Net 6.0.
When i write to RAMDirectory i'm able to fetch FastTaxonomyFacetCounts if i try to fetch from Directory (File System) it is throwing Index Corrupted. Missing parent data for category 0.
Below is the code which I'm facing issue. IndexDirectory and TaxoDirectory is the physical file path where lucene indexes generated.
using (DirectoryReader indexReader = DirectoryReader.Open(IndexDirectory))
using (DirectoryTaxonomyReader taxoReader = new DirectoryTaxonomyReader(TaxoDirectory))
{
IndexSearcher searcher = new IndexSearcher(indexReader);
FacetsCollector fc = new FacetsCollector();
Query q = new WildcardQuery(new Term("Brand", "*Ji*"));
TopScoreDocCollector tdc = TopScoreDocCollector.Create(10, true);
var topDocs = FacetsCollector.Search(searcher, q, 10, fc);
var topHits = topDocs.ScoreDocs;
var hits = searcher.Search(q, 10, Sort.INDEXORDER).ScoreDocs;
if (hits != null)
{
foreach (var hit in hits)
{
var document = searcher.Doc(hit.Doc);
}
}
Facets facets = new FastTaxonomyFacetCounts(taxoReader, config, fc);
var result = facets.GetAllDims(1000);
}
If anyone has solution. Please guide me
It should return Facets.

EventLogQuery ignoring TimeCreated criteria

I am using the following helper function:
public List<EventRecord> GetEvents(DateTime afterTime)
{
var formattedDateTime = $"{afterTime:yyyy-MM-dd}T{afterTime:HH:mm:ss}.000000000Z";
var query = $"*[(System/Provider/#Name='.Net Runtime') and (System/EventID=1000) and (System/TimeCreated/#SystemTime >= '{formattedDateTime}')]";
var queryResult = new EventLogQuery("Application", PathType.LogName, query);
var reader = new EventLogReader(queryResult);
var events = new List<EventRecord>();
while (true)
{
var rec = reader.ReadEvent();
if (rec == null)
{
break;
}
events.Add(rec);
}
return events;
}
This code almost works except the query seems to be ignoring the TimeCreated entirely. It's returning all events with the given ProviderName and EventId. I have tried all sorts of different things to get this to work but no matter what, TimeCreated is ignored.
Anyone see what I'm doing wrong?
Edit 1
Even replacing the query line with:
var query = $"*[System[TimeCreated[#SystemTime >= '{formattedDateTime}']]]";
Doesn't work. Returns all events regardless of when they were Created.
Edit 2
So I tried using the 'custom view' builder to generate an XML query for me and what I found was even more perplexing:
So currently the time displayed on my machine is: 2:42pm.
In 24 hour time it should be 14:42pm.
When I create a query using the custom view and select:
From: 'Events On' 03/18/2021 2:42pm , it creates the following:
<QueryList>
<Query Id="0" Path="Application">
<Select Path="Application">*[System[Provider[#Name='.NET Runtime'] and (EventID=1000) and TimeCreated[#SystemTime>='2021-03-18T20:42:13.000Z']]] </Select>
</Query>
</QueryList>
Why on gods green earth did it convert 2:42pm to 20:42?
So apparently you need to convert your time to UniversalTime for this to work.
Here is a working sample:
public List<EventRecord> GetEvents(DateTime afterTime)
{
var formattedDateTime = afterTime.ToUniversalTime().ToString("o");
var query = $"*[System[Provider[#Name='.NET Runtime'] and (EventID=1000) and TimeCreated[#SystemTime>='{formattedDateTime}']]]";
var queryResult = new EventLogQuery("Application", PathType.LogName, query);
var reader = new EventLogReader(queryResult);
var events = new List<EventRecord>();
while (true)
{
var rec = reader.ReadEvent();
if (rec == null)
{
break;
}
events.Add(rec);
}
return events;
}

C# MVC Loop through list and update each record efficiently

I have a list of 'Sites' that are stored in my database. The list is VERY big and contains around 50,000+ records.
I am trying to loop through each record and update it. This takes ages, is there a better more efficient way of doing this?
using (IRISInSiteLiveEntities DB = new IRISInSiteLiveEntities())
{
var allsites = DB.Sites.ToList();
foreach( var sitedata in allsites)
{
var siterecord = DB.Sites.Find(sitedata.Id);
siterecord.CabinOOB = "Test";
siterecord.TowerOOB = "Test";
siterecord.ManagedOOB = "Test";
siterecord.IssueDescription = "Test";
siterecord.TargetResolutionDate = "Test";
DB.Entry(siterecord).State = EntityState.Modified;
}
DB.SaveChanges();
}
I have cut the stuff out of the code to get to the point. The proper function code I am using basically pulls a list out from Excel, then matches the records in the sites list and updates each record that matches accordingly. The DB.Find is slowing the loop down dramatically.
[HttpPost]
public ActionResult UploadUpdateOOBList()
{
CheckPermissions("UpdateOOBList");
string[] typesallowed = new string[] { ".xls", ".xlsx" };
HttpPostedFileBase file = Request.Files[0];
var fname = file.FileName;
if (!typesallowed.Any(fname.Contains))
{
return Json("NotAllowed");
}
file.SaveAs(Server.MapPath("~/Uploads/OOB List/") + fname);
//Create empty OOB data list
List<OOBList.OOBDetails> oob_data = new List<OOBList.OOBDetails>();
//Using ClosedXML rather than Interop Excel....
//Interop Excel: 30 seconds for 750 rows
//ClosedXML: 3 seconds for 750 rows
string fileName = Server.MapPath("~/Uploads/OOB List/") + fname;
using (var excelWorkbook = new XLWorkbook(fileName))
{
var nonEmptyDataRows = excelWorkbook.Worksheet(2).RowsUsed();
foreach (var dataRow in nonEmptyDataRows)
{
//for row number check
if (dataRow.RowNumber() >= 4 )
{
string siteno = dataRow.Cell(1).GetValue<string>();
string sitename = dataRow.Cell(2).GetValue<string>();
string description = dataRow.Cell(4).GetValue<string>();
string cabinoob = dataRow.Cell(5).GetValue<string>();
string toweroob = dataRow.Cell(6).GetValue<string>();
string manageoob = dataRow.Cell(7).GetValue<string>();
string resolutiondate = dataRow.Cell(8).GetValue<string>();
string resolutiondate_converted = resolutiondate.Substring(resolutiondate.Length - 9);
oob_data.Add(new OOBList.OOBDetails
{
SiteNo = siteno,
SiteName = sitename,
Description = description,
CabinOOB = cabinoob,
TowerOOB = toweroob,
ManageOOB = manageoob,
TargetResolutionDate = resolutiondate_converted
});
}
}
}
//Now delete file.
System.IO.File.Delete(Server.MapPath("~/Uploads/OOB List/") + fname);
Debug.Write("DOWNLOADING LIST ETC....\n");
using (IRISInSiteLiveEntities DB = new IRISInSiteLiveEntities())
{
var allsites = DB.Sites.ToList();
//Loop through sites and the OOB list and if they match then tell us
foreach( var oobdata in oob_data)
{
foreach( var sitedata in allsites)
{
var indexof = sitedata.SiteName.IndexOf(' ');
if( indexof > 0 )
{
var OOBNo = oobdata.SiteNo;
var OOBName = oobdata.SiteName;
var SiteNo = sitedata.SiteName;
var split = SiteNo.Substring(0, indexof);
if (OOBNo == split && SiteNo.Contains(OOBName) )
{
var siterecord = DB.Sites.Find(sitedata.Id);
siterecord.CabinOOB = oobdata.CabinOOB;
siterecord.TowerOOB = oobdata.TowerOOB;
siterecord.ManagedOOB = oobdata.ManageOOB;
siterecord.IssueDescription = oobdata.Description;
siterecord.TargetResolutionDate = oobdata.TargetResolutionDate;
DB.Entry(siterecord).State = EntityState.Modified;
Debug.Write("Updated Site ID/Name Record: " + sitedata.Id + "/" + sitedata.SiteName);
}
}
}
}
DB.SaveChanges();
}
var nowdate = DateTime.Now.ToString("dd/MM/yyyy");
System.IO.File.WriteAllText(Server.MapPath("~/Uploads/OOB List/lastupdated.txt"),nowdate);
return Json("Success");
}
Looks like you are using Entity Framework (6 or Core). In either case both
var siterecord = DB.Sites.Find(sitedata.Id);
and
DB.Entry(siterecord).State = EntityState.Modified;
are redundant, because the siteData variable is coming from
var allsites = DB.Sites.ToList();
This not only loads the whole Site table in memory, but also EF change tracker keeps reference to every object from that list. You can easily verify that with
var siterecord = DB.Sites.Find(sitedata.Id);
Debug.Assert(siterecord == sitedata);
The Find (when the data is already in memory) and Entry methods themselves are fast. But the problem is that they by default trigger automatic DetectChanges, which leads to quadratic time complexity - in simple words, very slow.
With that being said, simply remove them:
if (OOBNo == split && SiteNo.Contains(OOBName))
{
sitedata.CabinOOB = oobdata.CabinOOB;
sitedata.TowerOOB = oobdata.TowerOOB;
sitedata.ManagedOOB = oobdata.ManageOOB;
sitedata.IssueDescription = oobdata.Description;
sitedata.TargetResolutionDate = oobdata.TargetResolutionDate;
Debug.Write("Updated Site ID/Name Record: " + sitedata.Id + "/" + sitedata.SiteName);
}
This way EF will detect changes just once (before SaveChanges) and also will update only the modified record fields.
I have followed Ivan Stoev's suggestion and have changed the code by removing the DB.Find and the EntitySate Modified - It now takes about a minute and a half compared to 15 minutes beforehand. Very suprising as I didn't know that you dont actually require that to update the records. Clever. The code is now:
using (IRISInSiteLiveEntities DB = new IRISInSiteLiveEntities())
{
var allsites = DB.Sites.ToList();
Debug.Write("Starting Site Update loop...");
//Loop through sites and the OOB list and if they match then tell us
//750 records takes around 15-20 minutes.
foreach( var oobdata in oob_data)
{
foreach( var sitedata in allsites)
{
var indexof = sitedata.SiteName.IndexOf(' ');
if( indexof > 0 )
{
var OOBNo = oobdata.SiteNo;
var OOBName = oobdata.SiteName;
var SiteNo = sitedata.SiteName;
var split = SiteNo.Substring(0, indexof);
if (OOBNo == split && SiteNo.Contains(OOBName) )
{
sitedata.CabinOOB = oobdata.CabinOOB;
sitedata.TowerOOB = oobdata.TowerOOB;
sitedata.ManagedOOB = oobdata.ManageOOB;
sitedata.IssueDescription = oobdata.Description;
sitedata.TargetResolutionDate = oobdata.TargetResolutionDate;
Debug.Write("Thank you, next: " + sitedata.Id + "\n");
}
}
}
}
DB.SaveChanges();
}
So first of all you should turn your HTTPPost in an async function
more info https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/
What you then should do is create the tasks and add them to a list. Then wait for them to complete (if you want/need to) by calling Task.WaitAll()
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitall?view=netframework-4.7.2
This will allow your code to run in parallel on multiple threads optimizing performance quite a bit already.
You can also use linq to for example reduce the size of allsites beforehand by doing something that will roughly look like this
var sitedataWithCorrectNames = allsites.Where(x => x //evaluate your condition here)
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/ef/language-reference/supported-and-unsupported-linq-methods-linq-to-entities
and then start you foreach (var oobdata) with the now foreach(sitedate in sitedataWithCorrectNames)
Same goes for SiteNo.Contains(OOBName)
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/getting-started-with-linq
P.S. Most db sdk's also provide asynchornous functions so use those aswell.
P.P.S. I didn't have an IDE so I eyeballed the code but the links should provide you with plenty of samples. Reply if you need more help.

How to combine a distinct query with projection with C# MongoDB driver

I try to combine a projection and a distinct with the MongoDB driver but don't get anywhere...
I have:
var coll = db.GetCollection<Vat>(CommonConstants.VatCodeCollection);
// I like to combine in one statement:
var result = coll.Distinct<DateTime>("ValidSince", filter).ToList();
var projection = Builders<Vat>.Projection.Expression(x => new VatPeriod { ValidSince = x.ValidSince });
So at the end I like to get a List<VatPeriod> as a result of one statement. Of course I could do something like
var coll = db.GetCollection<Vat>(CommonConstants.VatCodeCollection);
List<VatPeriod> vatPeriods = null;
try
{
var result = coll.Distinct<DateTime>("ValidSince", filter).ToList();
if (result.Count > 0)
{
vatPeriods = new List<VatPeriod>(result.Count);
foreach (var dateTime in result)
{
vatPeriods.Add(new VatPeriod() {ValidSince = dateTime});
}
}
return vatPeriods;
}
catch .....
in my repository class, but I would prefer to do everything on the Mongo server. Any idea if and how this is possible?

C# MongoDB Driver OutOfMemoryException

I am trying to read data from a remote MongoDB instance from a c# console application but keep getting an OutOfMemoryException. The collection that I am trying to read data from has about 500,000 records. Does anyone see any issue with the code below:
var mongoCred = MongoCredential.CreateMongoCRCredential("xdb", "x", "x");
var mongoClientSettings = new MongoClientSettings
{
Credentials = new[] { mongoCred },
Server = new MongoServerAddress("x-x.mongolab.com", 12345),
};
var mongoClient = new MongoClient(mongoClientSettings);
var mongoDb = mongoClient.GetDatabase("xdb");
var mongoCol = mongoDb.GetCollection<BsonDocument>("Persons");
var list = await mongoCol.Find(new BsonDocument()).ToListAsync();
This is a simple workaround: you can page your results using .Limit(?int) and .Skip(?int); in totNum you have to store the documents number in your collection using
coll.Count(new BsonDocument) /*use the same filter you will apply in the next Find()*/
and then
for (int _i = 0; _i < totNum / 1000 + 1; _i++)
{
var result = coll.Find(new BsonDocument()).Limit(1000).Skip(_i * 1000).ToList();
foreach(var item in result)
{
/*Write your document in CSV file*/
}
}
I hope this can help...
P.S.
I used 1000 in .Skip() and .Limit() but, obviously, you can use what you want :-)

Categories