DotNetRDF: Graph or CompressingTurtleWriter does not release memory - c#

I'm using dotnetRDF framework and C# to export graphs in turtle format for patients, creating one turtle file per patient. After about 400 patients the program stalls due to memory issues. Each turtle file is between 2 - 150 MB. The program occupies about 4GB of memory after 100 patients and 19GB after 500 patients, as shown in task manager.
I've a function in an export class that reads the data from an MSSQL server, creates the graph and at the end uses CompressingTurtleWriter to create a turtle file with the graph.
private int ExportPatient(string SubjectPseudoId)
{
Graph exportGraph = new Graph();
AddNamespaces(exportGraph);
// for each type of predicate
{
// read data from SQL (SqlConnection, SqlCommand and reader are using the using(){} statement)
// for each datareader
{
// save values in subjectvalue, predicatevalue, objectvalue strings
switch (objecttype)
{
case "string":
exportGraph.Assert(new Triple(exportGraph.CreateUriNode(prefixRessource + EncodeIRI(dataprovidervalue + "-" + semanticDefinition.ClassName + "-" + subjectvalue)),
exportGraph.CreateUriNode(semanticDefinition.AttributePrefixId + ":" + semanticDefinition.AttributeName),
exportGraph.CreateLiteralNode(objectvalue, new Uri(XmlSpecsHelper.XmlSchemaDataTypeString))));
break;
case "double":
exportGraph.Assert(new Triple(exportGraph.CreateUriNode(prefixRessource + EncodeIRI(dataprovidervalue + "-" + semanticDefinition.ClassName + "-" + subjectvalue)),
exportGraph.CreateUriNode(semanticDefinition.AttributePrefixId + ":" + semanticDefinition.AttributeName),
exportGraph.CreateLiteralNode(objectvalue, new Uri(XmlSpecsHelper.XmlSchemaDataTypeDouble))));
break;
case "datetime":
exportGraph.Assert(new Triple(exportGraph.CreateUriNode(prefixRessource + EncodeIRI(dataprovidervalue + "-" + semanticDefinition.ClassName + "-" + subjectvalue)),
exportGraph.CreateUriNode(semanticDefinition.AttributePrefixId + ":" + semanticDefinition.AttributeName),
exportGraph.CreateLiteralNode(objectvalue, new Uri(XmlSpecsHelper.XmlSchemaDataTypeDateTime))));
break;
case "uri":
exportGraph.Assert(new Triple(exportGraph.CreateUriNode(prefixRessource + EncodeIRI(dataprovidervalue + "-" + semanticDefinition.ClassName + "-" + subjectvalue)),
exportGraph.CreateUriNode(semanticDefinition.AttributePrefixId + ":" + semanticDefinition.AttributeName),
exportGraph.CreateUriNode(prefixRessource + EncodeIRI(dataprovidervalue + "-" + semanticDefinition.Range + "-" + objectvalue)))); //
break;
default:
log.Warn("undefined objecttype=" + objecttype, process, runConfig.Project);
break;
} // switch
} // for each datareader
} // for each predicate
// all the triplets are added to the graph, write it to the turtle file now.
CompressingTurtleWriter turtlewriter = new CompressingTurtleWriter(5, TurtleSyntax.W3C);
turtlewriter.PrettyPrintMode = true;
turtlewriter.Save(exportGraph, CreateFileName(SubjectPseudoId));
// dispose of the graph class
exportGraph.Dispose();
}
// return control to the calling function to process the next patient
// take the next SubjectPseudoId and call the function again until array is processed.
What I've tried so far is to Dispose or Finalize the CompressingTurtleWriter but both methods don't exist even those https://www.dotnetrdf.org/api/html/T_VDS_RDF_Writing_CompressingTurtleWriter.htm#! suggest that CompressingTurtleWriter has a protected Finalize() method.
The Graph I Dispose() before exiting the function.
I tried to solve the issue with .Net5.0 and .Net Core 3.1 but the behaviour is the same.
I also tried to run this function as a Task but it didn't change the memory issue.
I did run the VS Diagnostic tools and created a snapshot after the exportGraph.Dispose(); it shows after extract 15:
Object Type Count Size(Bytes) InclusiveSize (Bytes)
VDS.Common.Tries.SparseCharacterTrieNode<Uri> 5'823'385 326'109'560 1'899'037'768
and after extract 25:
Object Type Count Size(Bytes) InclusiveSize (Bytes)
VDS.Common.Tries.SparseCharacterTrieNode<Uri> 11'882'772 665'435'232 1'540'054'160
In Task Manager the program uses after 25 extracts 1'646'964 K versus about 250'000 K at the start of the program.
The total size of the 25 Extract files is about 302 MB.
I can't see any issue in my code and I wonder why are there some many VDS.Common.Tries.SparseCharacterTrieNode<Uri> still in the heap?
Did anybody make similar experience or has an idea how to solve this?

I think the problem is that dotNetRDF is caching all of the URIs that are created during the creation of each graph and that cache is a global cache. I would suggest setting VDS.RDF.Options.InternUris to false before starting processing - this is a global setting so it only needs to be done once at the start of your program.
You can also reduce memory usage of each individual graph by opting for just simple indexing (set VDS.RDF.Options.FullTripleIndexing to false), or by using the NonIndexedGraph instead of the default Graph implementation (this is assuming all you are doing is generating and then serializing the graphs). There are some tips on reducing memory usage here.

Related

Adding users to ZKTeco access door using Push SDK

I'm trying to add users to the Door Access Control Device: "inBio 260"
I'm told I need to use the Push/Pull SDK to do that.
public bool AddUser(User u) {
return axCZKEM1.SSR_SetDeviceData(machineNumber, "user", u + "\r\n", "");
}
class User {
...
public override string ToString()
{
return
"CardNo=" + ID + "\t" +
"Pin=" + Pin + "\t" +
"Name=" + Name + "\t" +
"Password=" + Password + "\t" +
"StartTime=" + StartTime + "\t" +
"EndTime=" + EndTime;
}
}
public bool AddFingerprint(Fingerprint p)
{
return
IsPinValid(p.Pin) &&
p.Template != null &&
p.Template.Length > 100 &&
axCZKEM1.SSR_SetDeviceData(machineNumber, "templatev10", p + "\r\n", "");
}
}
class Fingerprint {
...
public override string ToString()
{
int size = Convert.FromBase64String(Template).Length;
return
"Size=" + size +
"\tPin=" + Pin +
"\tFingerID=" + FingerID +
"\tValid=1\tTemplate=" + Template +
"\tEndTag=" + EndTag;
}
}
I use "ZKAccess 3.5" to check and I find the users I added and everything seems fine.
But suddenly the machine will report 0 valid fingerprints. And the doors won't open.
Calling AddFingerprint to restore the lost fingerprint returns a false "true", i.e. nothing was added and the machine still has 0 fingerprints left.
Note: ZKAccess is limited to 2000 users, I added 2600+ users.
Update: ZKAccess has 2654 users in its database, clicking sync to device only restores the 900 users that where added using ZKAccess itself (foul play suspected).
Update: I confused push & pull sdk, they are not the same. PullSDK is free, push SDK is private to ZKTeco
ZKAccess3.5 deleted all data because the limit of the free version was exceeded.
EDIT: Just for anyone looking for an answer, Push sdk is a private sdk used only by ZKteco. Pull sdk how ever is free but has no documentation. I wrote a wrapper in C#: code

Trying to get my SSIS to continue running if a file doesn't exist

So, I am running SSIS (through VS) and I have two segments that hang me up when my clients don't send in the exact files every day. I have a task that deletes old files, and then renames the current files to the filename with _OLD at the end of it.
The issue is: If the files that are in there aren't the exact same, it crashes, failing the entire thing.
An example:
A client sends in on Monday files: Names, Addresses, Grades, Schools
The same client, on Tuesday sends in: Names, Addresses, Schools
Since the Grades file doesn't exist, it still gets renamed to Grades_OLD but the SSIS fails.
The scripts are:
del Names_OLD.csv
bye
This will then go to the Rename Script:
ren Names.csv Names_OLD.csv
bye
and will then go on to Addresses, to do the same thing. It is super frustrating that these fail when a single file doesn't exist the next day, and there doesn't seem to be a need for it.
We have two scripts that generate the archive data to process:
public void Main()
{
Dts.Variables["ARCHIVEFILE"].Value = Path.GetFileNameWithoutExtension(Dts.Variables["FTPFILE"].Value.ToString()) + "_OLD" + Path.GetExtension(Dts.Variables["FTPFILE"].Value.ToString());
Dts.TaskResult = (int)ScriptResults.Success;
}
and
public void Main()
{
/*PSFTP_DEL_script.txt
del %1
bye
PSFTP_REN_script.txt
ren %1 %2
bye
*/
var lineOut = String.Empty;
var File1 = Dts.Variables["User::FTPWORKINGDIR"].Value.ToString() + "\\SSIS_PSFTP_DEL_script.txt";
var File2 = Dts.Variables["User::FTPWORKINGDIR"].Value.ToString() + "\\SSIS_PSFTP_REN_script.txt";
lineOut = "del " + Dts.Variables["User::ARCHIVEFILE"].Value.ToString() + Environment.NewLine + "bye";
System.IO.File.WriteAllText(File1, lineOut);
lineOut = "ren " + Dts.Variables["User::FTPFILE"].Value.ToString() + " " + Dts.Variables["User::ARCHIVEFILE"].Value.ToString() + Environment.NewLine + "bye";
System.IO.File.WriteAllText(File2, lineOut);
Dts.TaskResult = (int)ScriptResults.Success;
}
Researching it doesn't really give anything helpful, and kind of just leads me back to where I am right now.
Try using a foreach loop on files for each file that can be processed and put all the processing of the file inside it. And do not put any precendence constraints between the foreach loops.
This will process the files that are there an not fail when the others aren't there.
The foreach loop essentially works as a check if the file exists.
This assumes you do not need all the files to properly process them.
Why not checking if the file exists before writing the script:
if (System.IO.File.Exists(Dts.Variables["User::ARCHIVEFILE"].Value.ToString())){
lineOut = "del " + Dts.Variables["User::ARCHIVEFILE"].Value.ToString() + Environment.NewLine + "bye";
System.IO.File.WriteAllText(File1, lineOut);
}
if (Dts.Variables["User::FTPFILE"].Value.ToString())){
lineOut = "ren " + Dts.Variables["User::FTPFILE"].Value.ToString() + " " + Dts.Variables["User::ARCHIVEFILE"].Value.ToString() + Environment.NewLine + "bye";
System.IO.File.WriteAllText(File2, lineOut);
}

XML: DeepEquals fails on downloaded vs saved downloaded documents?

I'm using this to compare the documents
if (XNode.DeepEquals(cachedDocument, document))
Let's do a little science here. I download my XML documents from an API, I'm basically checking the 2 documents to ensure the latest doesn't have any changed. Basically ensuring that its not had any changes since I last cached the API xml file.
XDocument document = null;
if (useCachedDocuments && File.Exists(postCacheDirectory + "/photos/page " + (i + 1) + ".xml"))
{
document = XDocument.Parse(postCacheDirectory + "/photos/page " + (i + 1) + ".xml");
}
else
{
document = XDocument.Load(GetApiLink(pageAddress, i * 50, true));
}
if (i == 1 && File.Exists(postCacheDirectory + "/photos/page 1.xml"))
{
var cachedDocument = XDocument.Load(postCacheDirectory + "/photos/page 1.xml");
if (XNode.DeepEquals(cachedDocument, document))
{
Logger.Warn("We can start to use cached documents now, wayyyy faster :D");
useCachedDocuments = true;
}
else
{
Logger.Warn("Sorry, no cache avalible here...");
}
}
The cached document is the exact same as I cached, I literally download it and save it. I know for certain theres been no changes yet DeepEquals fails??

Parse log files for relevant data from multiple lines

C#, Winforms:
I have a log file I need to parse. This file contains transactions requests from a program, but the program writes the transaction across multiple lines.
I need to get the ID# and if the request was processed or denied for whatever reason. The problem is that these requests are on multiple lines. My only saving grace is that they contain the same time stamp from the logger. The (##) is not usable since it is a temporary placeholder, thus (19) may repeat multiple times throughout the log.
I was thinking of scanning for a PR_Request, substringing the ID# and the time stamp, but I dont know how to make a streamreader move down to the next 4 lines and write it out to be one single line in a file.
Examples:
06/10/16 08:09:33.031 (1) PR_Request: IID=caa23b14,
06/10/16 08:09:33.031 (1) PR_Mon: IID=caa23b14,
06/10/16 08:09:33.031 (1) RESUME|BEGIN
06/10/16 08:09:33.031 (1) RESUME_TRIG|SC-TI
06/10/16 08:19:04.384 (19) PR_Request: IID=90dg01b,
06/10/16 08:19:04.384 (19) PR_Mon: IID=90dg01b,
06/10/16 08:19:04.384 (19) RESUME|DENIED: Access not granted.
I need output to be in a single line for a file. That way, I can just parse it with another program and feed the data into a database.
06/10/16 08:09:33.031 PR_Request: IID=caa23b14 | RESUME | BEGIN | RESUME_TRIG | SC-TI
06/10/16 08:19:04.384 PR_Request: IID=90dg01b | RESUME | DENIED: Access not granted.
EDIT:
Okay I think I have a base code here. It works, kind of. It takes such a long time because I had to open another file streamer when it found a match to PR_Request, then scan the file again with the same fullstamp (date + process number). It will then look for RESUME|BEGIN or RESUME|DENIED and then write out that it succeeded or failed.
Is there any way to perhaps speed this up by getting the streamreader line where it originally found the PR_Request, have it start on another line, count maybe to 5 more lines, then stop it? This would help speed up the program considerably.
string inputfolder = inputloctxt.Text;
string outputfolder = saveloctxt.Text;
string outputfile = #"ParsedFile.txt";
try
{
string[] readfromdir = Directory.GetFiles(outputfolder);
foreach (string readnow in readfromdir)
{
using (StreamReader fileread = new StreamReader(readnow))
{
string fileisreading;
while ((fileisreading = fileread.ReadLine()) != null)
{
if (fileisreading.Contains("PR_Request"))
{
string resumed = null;
string fullstamp = fileisreading.Substring(1, 26);
string datestamp = fileisreading.Substring(1, 21);
string requesttype = fileisreading.Substring(27, 22);
string iidnum = fileisreading.Substring(53, 8);
using (StreamReader grabnext01 = new StreamReader(readnow))
{
string grabnow01;
while ((grabnow01 = grabnext01.ReadLine()) != null)
{
if (grabnow01.Contains(fullstamp))
{
if (grabnow01.Contains("RESUME|BEGIN"))
{
resumed = "TRUE";
break;
}
else if (grabnow01.Contains("RESUME|DENIED"))
{
resumed = "FALSE";
break;
}
}
}
}
File.AppendAllText(outputfolder + outputfile,
datestamp + " " + requesttype + " " + iidnum + " " + resumed + Environment.NewLine);
resumed = null;
}
}
}
}
}
This sounds like you need to use Regular Expressions. There is a namespace System.Text.RegularExpressions you can use and reference the capture groups that I made for you in the example.
Use these sites for reference:
https://regex101.com/
https://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.110).aspx
I started off the Regex for you, it is not pretty but it should get the job done.
(?:\d{2}\/\d{2}\/\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}\s\(\d+\)\s)(PR_Request: IID=[^,\n]+)(?:\,\n\d{2}\/\d{2}\/\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}\s\(\d+\)\sPR_Mon: IID=[^,\n]*\,\n\d{2}\/\d{2}\/\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}\s\(\d+\)\s)((RESUME|BEGIN|\||DENIED: Access not granted.)*)(?:\n\d{2}\/\d{2}\/\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}\s\(\d+\)\s)*((RESUME_TRIG|SC\-TI|\|)*)

Opening a PDF with search parameters just opens the PDF

I was requested to add some searching functionality to an existing system for the collection of PDFs that we have. I know about searching PDFs and opening them with search parameters and in a test application I wrote, it works like a dream. When trying to convert it over to our existing application the PDF opens but without the search terms or the advanced find of Acrobat Reader popping up. Any help would be greatly appreciated!
Here is a snippet of the cs code :
case "PDF":
string searchTerms = SearchWordsTB.Text;
searchTerms = searchTerms.Replace(',', ' ');
launchStr = "OpenPDF('" + e.Row.Cells[9].Text.Replace("\\", "/") + "','" + HttpUtility.UrlEncode(e.Row.Cells[2].Text) + "','" + e.Row.Cells[0].Text + "','" + searchTerms + "')";
break;
We are creating the list of documents on the fly and PDF is one of the options. Assuming I am understanding this correctly, A DataGrid is created with all these clickable rows that will execute a Javascript function when clicked. The Javascript function OpenPDF is shown below:
function OpenPDF(url, filename, ID, searchTerms) {
if (searchTerms.length > 0) {
window.open('FileViewer.aspx?name=' + filename + '&ID=' + ID + '&url=' + url + '#search="' + searchTerms + '"', 'mywindow' + windowCnt, 'width=800,height=600,location=no,resizable=yes');
}
else {
window.open('FileViewer.aspx?name=' + filename + '&ID=' + ID + '&url=' + url, 'mywindow' + windowCnt, 'width=800,height=600,location=no,resizable=yes');
}
windowCnt++;
}
From following the debugging in the CS code, I know that I am properly stripping out the commas in the search terms so that shouldn't be the problem. What currently happens is the PDF file will open up just fine, but the search terms are not being used. I have tried following the debugger through the Javascript (which for me has always been spotty at best) but the breakpoint is never hit. It should also probably be noted that the Javascript function is kept in a separate Javascript File and is not inline in the aspx page. And yes, we are correctly referencing the Javascript file. I will be more than happy to update this post with any extra info that is requested. Thanks in advance for any help!
I was able to achieve the desired results by using the http encode on the launch string as shown below.
launchStr = "OpenFile('" + HttpUtility.UrlEncode(e.Row.Cells[9].Text.Replace("\\", "/") + "#search=\"" + searchTerms + "\"") + "','" + HttpUtility.UrlEncode(e.Row.Cells[2].Text) + "','" + e.Row.Cells[0].Text + "','" + e.Row.Cells[1].Text + "')";
I then used the function to just open the window with the PDF in it. The problem I had was that without the HTTP Encode, the URL was just cutting off the search parameters. I believe this is because the #search="blah" isn't normally recognized as part of a URL and was therefore truncated. If anyone has a better reason, I would love to hear it.

Categories