Reading multiple XML files freezes applicaton

Reading multiple XML files freezes applicaton - c#

I'm trying to write an application for a game called Eve Online, where people mine planets for product. The app is basically training for me, and is supposed to tell people when their operations expire, so they can reset. Some players have 30 characters, with 120 planets and over 200 timed operations.
I have written code to store the characters in an SQLite db, then I'm pulling each characters 2 verification codes and return their equipment. This info is stored in 2 separate XML documents. One with the planets, and another for each planets with the equipment on it. I have tried to solve getting it all in the following way:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;
namespace PITimer
{
class GetPlanets
{
private long KeyID, CharacterID;
private string VCode;
string CharName;
public List<long> planets = new List<long>();
public List<string> planetNames = new List<string>();
public List<AllInfo> allInfo = new List<AllInfo>();
public GetPlanets(long KeyID, long CharacterID, string VCode, string CharName)
{
this.KeyID = KeyID;
this.CharacterID = CharacterID;
this.VCode = VCode;
this.CharName = CharName;
}
public async Task PullPlanets()
{
XmlReader lesern = XmlReader.Create("https://api.eveonline.com/char/PlanetaryColonies.xml.aspx?keyID=" + KeyID + "&vCode=" + VCode + "&characterID=" + CharacterID);
while (lesern.Read())
{
long planet = 000000;
string planetName;
planet = Convert.ToInt64(lesern.GetAttribute("planetID"));
planetName = lesern.GetAttribute("planetName");
if ((planet != 000000) && (planetName != null))
{
planets.Add(planet);
planetNames.Add(planetName);
await GetExpirationTimes(planet, planetName);
}
}
}
public async Task GetExpirationTimes(long planetID, string planetName)
{
string planet = planetID.ToString();
XmlReader lesern = XmlReader.Create("https://api.eveonline.com/char/PlanetaryPins.xml.aspx?keyID=" + KeyID + "&vCode=" + VCode + "&characterID=" + CharacterID + "&planetID=" + planet);
while (lesern.Read())
{
string expTime;
expTime = lesern.GetAttribute("expiryTime");
if ((expTime != null) && (expTime != "0001-01-01 00:00:00"))
{
allInfo.Add(new AllInfo(CharName, planetName, Convert.ToDateTime(expTime)));
}
}
}
public List<long> ReturnPlanets()
{
PullPlanets();
return planets;
}
public List<string> ReturnNames()
{
return planetNames;
}
public List<AllInfo> ReturnAllInfo()
{
return allInfo;
}
}
}
This does work and it returns the data, which I run into a ListView on Xamarin Android. My problem is that it freezes the system when it runs. I am testing with 2 characters, and the UI is sometimes frozen for several seconds. Other times it runs fine. But with 30 characters I am going to run into trouble.
I have tried solving this with async and task in every configuration I have been able to imagine. Its still either slow or not working at all. How do I set this up in a way that will work in an app?
Its also running a lighter check in a background service, which it does, but I fear I will really strain the system.
The planet XML can be seen here: https://api.eveonline.com/char/PlanetaryColonies.xml.aspx?keyID=3060230&vCode=1ft0nDTRaXgVM6r0co9QhJUq3tC5hYErfBFrt7Skilk4181krBiIRVhshH1TzkDP&characterID=94304895
I pull the planet ID and from that run this XML:
https://api.eveonline.com/char//PlanetaryPins.xml.aspx?keyID=3060230&vCode=1ft0nDTRaXgVM6r0co9QhJUq3tC5hYErfBFrt7Skilk4181krBiIRVhshH1TzkDP&characterID=94304895&planetID=40175117
I have used XMLDoc before, but thought reader would be better and faster for this.

Related

Update console text from multiple threads not working

I am executing/processing very big files in multi threaded mode in a console app.
When I don't update/write to the console from threads, for testing the whole process take about 1 minute.
But when I try to update/write to console from threads to show the progress, the process stuck and it never finishes (waited several minutes even hours). And also console text/window does not updated as it should.
Update-1: As requested by few kind responder, i added minimal code that can reproduce the same error/problem
Here is the code from the thread function/method:
using System;
using System.Collections;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
namespace Large_Text_To_Small_Text
{
class Program
{
static string sAppPath;
static ArrayList objThreadList;
private struct ThreadFileInfo
{
public string sBaseDir, sRFile;
public int iCurFile, iTFile;
public bool bIncludesExtension;
}
static void Main(string[] args)
{
string sFileDir;
DateTime dtStart;
Console.Clear();
sAppPath = Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location);
sFileDir = #"d:\Test";
dtStart = DateTime.Now;
///process in multi threaded mode
List<string> lFiles;
lFiles = new List<string>();
lFiles.AddRange(Directory.GetFiles(sFileDir, "*.*", SearchOption.AllDirectories));
if (Directory.Exists(sFileDir + "-Processed") == true)
{
Directory.Delete(sFileDir + "-Processed", true);
}
Directory.CreateDirectory(sFileDir + "-Processed");
sPrepareThreading();
for (int iFLoop = 0; iFLoop < lFiles.Count; iFLoop++)
{
//Console.WriteLine(string.Format("{0}/{1}", (iFLoop + 1), lFiles.Count));
sThreadProcessFile(sFileDir + "-Processed", lFiles[iFLoop], (iFLoop + 1), lFiles.Count, Convert.ToBoolean(args[3]));
}
sFinishThreading();
Console.WriteLine(DateTime.Now.Subtract(dtStart).ToString());
Console.ReadKey();
return;
}
private static void sProcSO(object oThreadInfo)
{
var inputLines = new BlockingCollection<string>();
long lACounter, lCCounter;
ThreadFileInfo oProcInfo;
lACounter = 0;
lCCounter = 0;
oProcInfo = (ThreadFileInfo)oThreadInfo;
var readLines = Task.Factory.StartNew(() =>
{
foreach (var line in File.ReadLines(oProcInfo.sRFile))
{
inputLines.Add(line);
lACounter++;
}
inputLines.CompleteAdding();
});
var processLines = Task.Factory.StartNew(() =>
{
Parallel.ForEach(inputLines.GetConsumingEnumerable(), line =>
{
lCCounter++;
/*
some process goes here
*/
/*If i Comment out these lines program get stuck!*/
//Console.SetCursorPosition(0, oProcInfo.iCurFile);
//Console.Write(oProcInfo.iCurFile + " = " + lCCounter.ToString());
});
});
Task.WaitAll(readLines, processLines);
}
private static void sPrepareThreading()
{
objThreadList = new ArrayList();
for (var iTLoop = 0; iTLoop < 5; iTLoop++)
{
objThreadList.Add(null);
}
}
private static void sThreadProcessFile(string sBaseDir, string sRFile, int iCurFile, int iTFile, bool bIncludesExtension)
{
Boolean bMatched;
Thread oCurThread;
ThreadFileInfo oProcInfo;
Salma_RecheckThread:
bMatched = false;
for (int iTLoop = 0; iTLoop < 5; iTLoop++)
{
if (objThreadList[iTLoop] == null || ((System.Threading.Thread)(objThreadList[iTLoop])).IsAlive == false)
{
oProcInfo = new ThreadFileInfo()
{
sBaseDir = sBaseDir,
sRFile = sRFile,
iCurFile = iCurFile,
iTFile = iTFile,
bIncludesExtension = bIncludesExtension
};
oCurThread = new Thread(sProcSO);
oCurThread.IsBackground = true;
oCurThread.Start(oProcInfo);
objThreadList[iTLoop] = oCurThread;
bMatched = true;
break;
}
}
if (bMatched == false)
{
System.Threading.Thread.Sleep(250);
goto Salma_RecheckThread;
}
}
private static void sFinishThreading()
{
Boolean bRunning;
Salma_RecheckThread:
bRunning = false;
for (int iTLoop = 0; iTLoop < 5; iTLoop++)
{
if (objThreadList[iTLoop] != null && ((System.Threading.Thread)(objThreadList[iTLoop])).IsAlive == true)
{
bRunning = true;
}
}
if (bRunning == true)
{
System.Threading.Thread.Sleep(250);
goto Salma_RecheckThread;
}
}
}
}
And here is the screenshot, if I try to update console window:
You see? Nor the line number (oProcInfo.iCurFile) or the whole line is correct!
It should be like this:
1 = xxxxx
2 = xxxxx
3 = xxxxx
4 = xxxxx
5 = xxxxx
Update-1: To test just change the sFileDir to any folder that has some big text file or if you like you can download some big text files from following link:
https://wetransfer.com/downloads/8aecfe05bb44e35582fc338f623ad43b20210602005845/bcdbb5
Am I missing any function/method to update console text from threads?

I can't reproduce it. In my tests the process always runs to completion, without getting stuck. The output is all over the place though, because the two lines below are not synchronized:
Console.SetCursorPosition(0, oProcInfo.iCurFile);
Console.Write(oProcInfo.iCurFile + " = " + lCCounter.ToString());
Each thread of the many threads involved in the computation invokes these two statements concurrently with the other threads. This makes it possible for one thread to preempt another, and move the cursor before the first thread has the chance to write in the console. To solve this problem you must add proper synchronization, and the easiest way to do it is to use the lock statement:
class Program
{
static object _locker = new object();
And in the sProcSO method:
lock (_locker)
{
Console.SetCursorPosition(0, oProcInfo.iCurFile);
Console.Write(oProcInfo.iCurFile + " = " + lCCounter.ToString());
}
If you want to know more about thread synchronization, I recommend this online resource: Threading in C# - Part 2: Basic Synchronization
If you would like to hear my opinion about the code in the question, and you don't mind receiving criticism, my opinion is that honestly the code is so much riddled with problems that the best course of action would be to throw it away and restart from scratch. Use of archaic data structures (ArrayList???), liberal use of casting from object to specific types, liberal use of the goto statement, use of hungarian notation in public type members, all make the code difficult to follow, and easy for bugs to creep in. I found particularly problematic that each file is processed concurrently with all other files using a dedicated thread, and then each dedicated thread uses a ThreadPool thread (Task.Factory.StartNew) to starts a parallel loop (Parallel.ForEach) with unconfigured MaxDegreeOfParallelism. This setup ensures that the ThreadPool will be saturated so badly, that there is no hope that the availability of threads will ever match the demand. Most probably it will also result to a highly inefficient use of the storage device, especially if the hardware is a classic hard disk.

Your freezing problem may not be C# or code related
on the top left of your console window, on the icon .. right click
select Properties
remove the option of Quick Edit Mode and Insert Mode
you can google that feature, but essentially manifests in the problem you describe above
The formatting problem on the other hand does seem to be, here you need to create a class that serializes writes to the console window from a singe thread. a consumer/producer pattern would work (you could use a BlockingCollection to implement this quite easily)

Is this the most efficient way of skipping random changesets when getting latest from TFS?

Is this the most efficient way of skipping random changesets when getting latest from TFS?
I have done a LOT of research into this subject and have yet to run across a solution. All comments / suggestions are welcome. Even if that suggestion is to use a completely different solution (that works).
My first attempt I filtered the changesets and then looped through them issuing a workspace.get(). This was incredibly slow, and did not get the right results. It ended up taking over 45 minutes for one folder where my final solution ended up taking 3:30 minutes for the same folder. Whereas a normal get process on the same folder took around 50 seconds each time.
This code is test code right now and is only intended to get this working and as such is missing basic things like exception handling and other best practices. It has passed all the tests I have thrown at it so far, but it is a bit slower than the normal get however I do not see a way to make it faster.
Here is what I ended up with:
You will need references to:
assemblyref://Microsoft.TeamFoundation.Client&
assemblyref://Microsoft.TeamFoundation.Common&
assemblyref://Microsoft.TeamFoundation.VersionControl.Client&
assemblyref://Microsoft.TeamFoundation.VersionControl.Common&
assemblyref://Microsoft.VisualStudio.Services.Common
The code:
using Microsoft.TeamFoundation.Client;
using Microsoft.TeamFoundation.Framework.Common;
using Microsoft.TeamFoundation.VersionControl.Client;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
class Program
{
static void Main(string[] args)
{
//these would be the changesets to ignore
var ignoreChangeSets = new List<int>()
{
1,10,50,900 // change these to ids you want to ignore. These are just random example numbers
};
// Replace with your setup
var tfsServer = #"http://server_name:8080/TFS/";
var serverPath = #"$/TFS_PATH_TO_FOLDER/";
// Connect to server
var tfs = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri(tfsServer));
tfs.Connect(ConnectOptions.None);
var vcs = tfs.GetService<VersionControlServer>();
//get both sets so we can do a comparison of the final changes
var folderName = "Foo";
var sourceDir = $#"{Path.GetTempPath()}\{folderName}\";
var targetDir = $#"{Path.GetTempPath()}\{folderName}-ChangeSets\";
//download the entire source
DownloadSource(vcs, serverPath, sourceDir);
var changeSets = GetChangeSets(vcs, serverPath);
//technically this query could be anything. As long as it filters the changesets out.....
//you could filter by user, date, info in the changesets, anything really. up to you.
var filteredChangeSets = from cs in changeSets
where !ignoreChangeSets.Contains(cs.ChangesetId)
select cs;
if (changeSets.Count() == filteredChangeSets.Count())
{
// we did not filter anything so do a normal pull as it is faster
//download the entire source
DownloadSource(vcs, serverPath, targetDir);
}
else
{
GetChangeSetsLatest(vcs, serverPath, filteredChangeSets, targetDir);
}
}
private static void RecreateDir(string dir)
{
if (Directory.Exists(dir))
{
Directory.Delete(dir, true);
}
Directory.CreateDirectory(dir);
}
private static GetStatus DownloadSource(VersionControlServer vcs, string serverPath, string dir)
{
string wsName = "TempWorkSpace";
Workspace ws = null;
try
{
ws = vcs.GetWorkspace(wsName, Environment.UserName);
}
catch (WorkspaceNotFoundException)
{
ws = vcs.CreateWorkspace(wsName, Environment.UserName);
}
RecreateDir(dir);
ws.Map(serverPath, dir);
var getResponse = ws.Get(VersionSpec.Latest, GetOptions.GetAll | GetOptions.Overwrite);
vcs.DeleteWorkspace(wsName, Environment.UserName);
return getResponse;
}
private static IEnumerable<Changeset> GetChangeSets(VersionControlServer vcs, string serverPath)
{
VersionSpec versionFrom = null; // VersionSpec.ParseSingleSpec("C529", null);
VersionSpec versionTo = VersionSpec.Latest;
// Get Changesets
var changesets = vcs.QueryHistory(
serverPath,
VersionSpec.Latest,
0,
RecursionType.Full,
null,
versionFrom,
versionTo,
Int32.MaxValue,
true,
false
).Cast<Changeset>();
return changesets;
}
private static void GetChangeSetsLatest(VersionControlServer vcs, string serverPath, IEnumerable<Changeset> changesets, string dir)
{
//we are going to hold the latest item (file) in this dictionary, so we can do all our downloads at the end. The key will be the TFS server file path
var items = new Dictionary<string, Item>();
RecreateDir(dir);
//we need the changesets ordered by changesetid.
var changesetsOrdered = changesets.OrderBy(c => c.ChangesetId);
//DO NOT PARALLEL HERE. We need these changesets in EXACT order
foreach (var changeset in changesetsOrdered)
{
foreach (var change in changeset?.Changes.Where(i => i.Item.ItemType == ItemType.File))
{
var itemPath = change.Item.ServerItem.Replace(serverPath, dir).Replace("/", "\\");
if (change.ChangeType.HasFlag(ChangeType.Edit) && change.ChangeType.HasFlag(ChangeType.SourceRename))
{
if (change.Item.DeletionId == 0)
{ items.AddOrUpdate(change.Item.ServerItem, change.Item); }
else
{ items.TryRemove(change.Item.ServerItem); }
}
else if (change.ChangeType.HasFlag(ChangeType.Delete) && change.ChangeType.HasFlag(ChangeType.SourceRename))
{
var previousChange = GetPreviousServerChange(vcs, change.Item);
if (previousChange != null) { items.TryRemove(previousChange?.Item.ServerItem); }
if (change.Item.DeletionId == 0)
{ items.AddOrUpdate(change.Item.ServerItem, change.Item); }
else
{ items.TryRemove(change.Item.ServerItem); }
}
else if (change.ChangeType.HasFlag(ChangeType.Rollback) && change.ChangeType.HasFlag(ChangeType.Delete))
{
items.TryRemove(change.Item.ServerItem);
}
else if (change.ChangeType.HasFlag(ChangeType.Rollback))
{
var item = GetPreviousServerChange(vcs, change.Item)?.Item;
if (item != null) { items.AddOrUpdate(item.ServerItem, item); }
}
else if (change.ChangeType.HasFlag(ChangeType.Add) || change.ChangeType.HasFlag(ChangeType.Edit) || change.ChangeType.HasFlag(ChangeType.Rename))
{
if (change.Item.DeletionId == 0) { items.AddOrUpdate(change.Item.ServerItem, change.Item); }
}
else if (change.ChangeType.HasFlag(ChangeType.Delete))
{
items.TryRemove(change.Item.ServerItem);
}
else
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($"Unknown change types: {change.ChangeType.ToString()}");
Console.ResetColor();
}
}
}
//HUGE penalty for switching to parallel, stick to single file at a time, one test went from 3:30 to 11:05. File system does not appreciate threading. :|
//Parallel.ForEach(items, (i) =>
foreach (var item in items)
{
var itemPath = item.Key.Replace(serverPath, dir).Replace("/", "\\");
item.Value.DownloadFile(itemPath);
Console.WriteLine(item.Value.ChangesetId + " - " + itemPath);
};
}
//really not sure this is the right way to do this. works quite well, but it begs the question that surely there must be an easier way?
private static Change GetPreviousServerChange(VersionControlServer vcs, Item item)
{
//get the changesets and reverse their order, so we can take the next one after it
var changesets = GetChangeSets(vcs, item.ServerItem).OrderByDescending(cs => cs.ChangesetId);
//skip until we find our changeset, then take the following changeset
var previousChangeSet = changesets.SkipWhile(c => c.ChangesetId != item.ChangesetId).Skip(1).FirstOrDefault();
//return the Change that matches the itemid (file id)
return previousChangeSet?.Changes.FirstOrDefault(c => c.Item.ItemId == item.ItemId);
}
}
static class Extensions
{
public static void AddOrUpdate<TKey, TValue>(this IDictionary<TKey, TValue> dictionary, TKey key, TValue value)
{
if (dictionary.ContainsKey(key))
{
dictionary[key] = value;
}
else
{
dictionary.Add(key, value);
}
}
public static void TryRemove<TKey, TValue>(this IDictionary<TKey, TValue> dictionary, TKey key)
{
if (dictionary.ContainsKey(key))
{
dictionary.Remove(key);
}
}
}
EDIT: I am being forced into this because of existing business procedures. Multiple teams or devs can work on the same database at the same time. The changes per team or dev are cataloged under an RFC #. Each RFC # can have its own release schedule and release at any various point in time. I will use Red Gate SQL Compare to compare the folder with everything (as the source) to the folder minus the RFC change-sets (as the target) to generate a change script for that RFC.
Then there are these rules:
An RFC can get parked for an indeterminate period of time. For example I have seen RFC's parked in Staging for over a year. Other RFC's will pass by this RFC on their way to production.
Individual RFC's can be withdrawn from a production push at the last minute.
The chance of me changing these existing procedures is nil. So I had to figure out a way to work around them. This was that way. I would prefer to follow a normal release schedule of pushing all changes out every release that then flow all the way to production. Unfortunately that is not the case here.

Seems like you need to have a Release branch and a Dev branch. Every check-in is then done in the Dev branch when the work is done. Whenever an RFC is approved for release you do a merge of that particular changeset from Dev to Release followed by whatever steps you need for your release.
Depending on the complexity of your development and releases you might need more branches of each, but I would suggest to aim for as few branches as possible, because it tends to get complicated very fast.

Neo4j locks CSV file even after importing

EDIT It seems that locks (mostly/only?) stay locked when a transactional error occurs. I have to restart the database for it to work again, but it is not actively processing anything (no CPU/RAM/HDD activity).
Environment
I have an ASP.NET application that uses the Neo4jClient NuGet package to talk to a Neo4j database. I have N SimpleNode objects that need to be inserted where N can be anything from 100 to 50.000. There are other objects for the M edges that need to be inserted where M can be from 100 to 1.000.000.
Code
Inserting using normal inserts is too slow with 8.000 nodes taking about 80 seconds with the following code:
Client.Cypher
.Unwind(sublist, "node")
.Merge("(n:Node { label: node.label })")
.OnCreate()
.Set("n = node")
.ExecuteWithoutResults();
Therefore I used the import CSV function with the following code:
using (var sw = new StreamWriter(File.OpenWrite("temp.csv")))
{
sw.Write(SimpleNodeModel.Header + "\n");
foreach (var simpleNodeModel in nodes)
{
sw.Write(simpleNodeModel.ToCSVWithoutID() + "\n");
}
}
var f = new FileInfo("temp.csv");
Client.Cypher
.LoadCsv(new Uri("file://" + f.FullName), "csvNode", true)
.Merge("(n:Node {label:csvNode.label, source:csvNode.source})")
.ExecuteWithoutResults();
While still slow, it is an improvement.
Problem
The problem is that the CSV files are locked by the neo4j client (not C# or any of my own code it seems). I would like to overwrite the temp .CSV files so the disk does not fill up, or delete them after use. This is now impossible as the process locks them and I cannot use them. This also means that running this code twice crashes the program, as it cannot write to file the second time.
The nodes are inserted and do appear normally, so it is not the case that it is still working on them. After some unknown and widely varying amount of time, files do seem to unlock.
Question
How can I stop the neo4j client from locking the files long after use? Why does it lock them for so long? Another question: is there a better way of doing this in C#? I am aware of the java importer but I would like my tool to stay in the asp.net environment. It must be possible to insert 8.000 simple nodes within 2 seconds in C#?
SimpleNode class
public class SimpleNodeModel
{
public long id { get; set; }
public string label { get; set; }
public string source { get; set; } = "";
public override string ToString()
{
return $"label: {label}, source: {source}, id: {id}";
}
public SimpleNodeModel(string label, string source)
{
this.label = label;
this.source = source;
}
public SimpleNodeModel() { }
public static string Header => "label,source";
public string ToCSVWithoutID()
{
return $"{label},{source}";
}
}

What is the best way to work with multiple files in multithread in C#?

I am creating a Windows Form application, where I select a folder that contains multiple *.txt files. Their length may vary from few thousand lines (kB) to up to 50 milion lines (1GB). Every line of the code has three informations. Date in long, location id in int and value in float all separated by semicolon (;). I need to calculate min and max value in all those files and tell in which file it is, and then the most frequent value.
I already have these files verified and stored in an arraylist. I am opening a thread to read the files one by one and I read the data by line. It works fine, but when there are 1GB files, I run out of memory. I tried to store the values in dictionary, where key would be the date and the value would be an object that contains all the info loaded from the line alongside with the filename. I see I cannot use a dictionary, because at about 6M values, I ran out of memory. So I should probably do it in multithread. I though I could run two threads, one that reads the file and puts the info in some kind of container and the other that reads from it and makes calculations and then deletes the values from the container. But I don't know which container could do such thing. Moreover I need to calculate the most frequent value, so they need to be stored somewhere which leads me back to some kind of dictionary, but I already know I will run out of memory. I don't have much experience with threads either, so I don't know what is possible. Here is my code so far:
GUI:
namespace STI {
public partial class GUI : Form {
private String path = null;
public static ArrayList txtFiles;
public GUI() {
InitializeComponent();
_GUI1 = this;
}
//I run it in thread. I thought I would run the second
//one here that would work with the values inputed in some container
private void buttonRun_Click(object sender, EventArgs e) {
ThreadDataProcessing processing = new ThreadDataProcessing();
Thread t_process = new Thread(processing.runProcessing);
t_process.Start();
//ThreadDataCalculating calculating = new ThreadDataCalculating();
//Thread t_calc = new Thread(calculating.runCalculation());
//t_calc.Start();
}
}
}
ThreadProcessing.cs
namespace STI.thread_package {
class ThreadDataProcessing {
public static Dictionary<long, object> finalMap = new Dictionary<long, object>();
public void runProcessing() {
foreach (FileInfo file in GUI.txtFiles) {
using (FileStream fs = File.Open(file.FullName.ToString(), FileMode.Open))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs)) {
String line;
String[] splitted;
try {
while ((line = sr.ReadLine()) != null) {
splitted = line.Split(';');
if (splitted.Length == 3) {
long date = long.Parse(splitted[0]);
int location = int.Parse(splitted[1]);
float value = float.Parse(splitted[2], CultureInfo.InvariantCulture);
Entry entry = new Entry(date, location, value, file.Name);
if (!finalMap.ContainsKey(entry.getDate())) {
finalMap.Add(entry.getDate(), entry);
}
}
}
GUI._GUI1.update("File \"" + file.Name + "\" completed\n");
}
catch (FormatException ex) {
GUI._GUI1.update("Wrong file format.");
}
catch (OutOfMemoryException) {
GUI._GUI1.update("Out of memory");
}
}
}
}
}
}
and the object in which I put the values from lines:
Entry.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace STI.entities_package {
class Entry {
private long date;
private int location;
private float value;
private String fileName;
private int count;
public Entry(long date, int location, float value, String fileName) {
this.date = date;
this.location = location;
this.value = value;
this.fileName = fileName;
this.count = 1;
}
public long getDate() {
return date;
}
public int getLocation() {
return location;
}
public String getFileName() {
return fileName;
}
}
}

I don't think that multithreading is going to help you here - it could help you separate the IO-bound tasks from the CPU-bound tasks, but your CPU-bound tasks are so trivial that I don't think they warrant their own thread. All multithreading is going to do is unnecessarily increase the problem complexity.
Calculating the min/max in constant memory is trivial: just maintain a minFile and maxFile variable that gets updated when the current file's value is less-than minFile or greater-than maxFile. Finding the most frequent value is going to require more memory, but with only a few million files you ought to have enough RAM to store a Dictionary<float, int> that maintains the frequency of each value, after which you iterate through the map to determine which value had the highest frequency. If for some reason you don't have enough RAM (make sure that your files are being closed and garbage collected if you're running out of memory, because a Dictionary<float, int> with a few million entries ought to fit in less than a gigabyte of RAM) then you can make multiple passes over the files: on the first pass store the values in a Dictionary<interval, int> where you've split up the interval between MIN_FLOAT and MAX_FLOAT into a few thousand sub-intervals, then on the next pass you can ignore all values that didn't fit into the interval with the highest frequency thus shrinking the dictionary's size. However, the Dictionary<float, int> ought to fit into memory, so unless you start processing billions of files instead of millions of files you probably won't need a multi-pass procedure.

Unique EventId generation

I'm using the Windows Event Log to record some events. Events within the Windows Event Log can be assigned a handful of properties. One of which, is an EventID.
Now I want to use the EventId to try and group related errors. I could just pick a number for each call to the logging method I do, but that seems a little tedious.
I want the system to do this automatically. It would choose an eventId that is "unique" to the position in the code where the logging event occurred. Now, there's only 65536 unique event IDs, so there are likely to be collisions but they should be rare enough to make the EventId a useful way to group errors.
One strategy would be to take the hashcode of the stacktrace but that would mean that the first and second calls in the following code would have generate the same event ID.
public void TestLog()
{
LogSomething("Moo");
// Do some stuff and then a 100 lines later..
LogSomething("Moo");
}
I thought of walking up the call stack using the StackFrame class which has a GetFileLineNumber method. The problem with this strategy is that it will only work when built with debug symbols on. I need it to work in production code too.
Does anyone have any ideas?

Here is some code you can use to generate an EventID with the properties I describe in my question:
public static int GenerateEventId()
{
StackTrace trace = new StackTrace();
StringBuilder builder = new StringBuilder();
builder.Append(Environment.StackTrace);
foreach (StackFrame frame in trace.GetFrames())
{
builder.Append(frame.GetILOffset());
builder.Append(",");
}
return builder.ToString().GetHashCode() & 0xFFFF;
}
The frame.GetILOffset() method call gives the position within that particular frame at the time of execution.
I concatenate these offsets with the entire stacktrace to give a unique string for the current position within the program.
Finally, since there are only 65536 unique event IDs I logical AND the hashcode against 0xFFFF to extract least significant 16-bits. This value then becomes the EventId.

The IL offset number is available without debug symbols. Combined with the stack information and hashed, I think that would do the trick.
Here's an article that, in part, covers retrieving the IL offset (for the purpose of logging it for an offline match to PDB files--different problem but I think it'll show you what you need):
http://timstall.dotnetdevelopersjournal.com/getting_file_and_line_numbers_without_deploying_the_pdb_file.htm

Create a hash using the ILOffset of the last but one stack frame instead of the line number (i.e. the stack frame of your TestLog method above).

*Important: This post focuses at solving the root cause of what it appears your problem is instead of providing a solution you specifically asked for. I realize this post is old, but felt it important to contribute. *
My team had a similar issue, and we changed the way we managed our logging which has reduced production support and bug patching times significantly. Pragmatically this works in most enterprise apps my team works on:
Prefix log messages with the "class name"."function name".
For true errors, output the captured Exception to the event logger.
Focus on having clear messages as part of the peer code review as opposed to event id's.
Use a unique event id for each function, just go top to bottom and key them.
when it becomes impractical to code each function a different event ID, each class should just just have a unique one (collisions be damned).
Utilize Event categories to reduce event id reliance when filtering the log
Of course it matters how big your apps are and how sensitive the data is. Most of ours are around 10k to 500k lines of code with minimally sensitive information. It may feel oversimplified, but from a KISS standpoint it pragmatically works.
That being said, using an abstract Event Log class to simplify the process makes it easy to utilize, although cleanup my be unpleasant. For Example:
MyClass.cs (using the wrapper)
class MyClass
{
// hardcoded, but should be from configuration vars
private string AppName = "MyApp";
private string AppVersion = "1.0.0.0";
private string ClassName = "MyClass";
private string LogName = "MyApp Log";
EventLogAdapter oEventLogAdapter;
EventLogEntryType oEventLogEntryType;
public MyClass(){
this.oEventLogAdapter = new EventLogAdapter(
this.AppName
, this.LogName
, this.AppName
, this.AppVersion
, this.ClassName
);
}
private bool MyFunction() {
bool result = false;
this.oEventLogAdapter.SetMethodInformation("MyFunction", 100);
try {
// do stuff
this.oEventLogAdapter.WriteEntry("Something important found out...", EventLogEntryType.Information);
} catch (Exception oException) {
this.oEventLogAdapter.WriteEntry("Error: " + oException.ToString(), EventLogEntryType.Error);
}
return result;
}
}
EventLogAdapter.cs
class EventLogAdapter
{
//vars
private string _EventProgram = "";
private string _EventSource = "";
private string _ProgramName = "";
private string _ProgramVersion = "";
private string _EventClass = "";
private string _EventMethod = "";
private int _EventCode = 1;
private bool _Initialized = false;
private System.Diagnostics.EventLog oEventLog = new EventLog();
// methods
public EventLogAdapter() { }
public EventLogAdapter(
string EventProgram
, string EventSource
, string ProgramName
, string ProgramVersion
, string EventClass
) {
this.SetEventProgram(EventProgram);
this.SetEventSource(EventSource);
this.SetProgramName(ProgramName);
this.SetProgramVersion(ProgramVersion);
this.SetEventClass(EventClass);
this.InitializeEventLog();
}
public void InitializeEventLog() {
try {
if(
!String.IsNullOrEmpty(this._EventSource)
&& !String.IsNullOrEmpty(this._EventProgram)
){
if (!System.Diagnostics.EventLog.SourceExists(this._EventSource)) {
System.Diagnostics.EventLog.CreateEventSource(
this._EventSource
, this._EventProgram
);
}
this.oEventLog.Source = this._EventSource;
this.oEventLog.Log = this._EventProgram;
this._Initialized = true;
}
} catch { }
}
public void WriteEntry(string Message, System.Diagnostics.EventLogEntryType EventEntryType) {
try {
string _message =
"[" + this._ProgramName + " " + this._ProgramVersion + "]"
+ "." + this._EventClass + "." + this._EventMethod + "():\n"
+ Message;
this.oEventLog.WriteEntry(
Message
, EventEntryType
, this._EventCode
);
} catch { }
}
public void SetMethodInformation(
string EventMethod
,int EventCode
) {
this.SetEventMethod(EventMethod);
this.SetEventCode(EventCode);
}
public string GetEventProgram() { return this._EventProgram; }
public string GetEventSource() { return this._EventSource; }
public string GetProgramName() { return this._ProgramName; }
public string GetProgramVersion() { return this._ProgramVersion; }
public string GetEventClass() { return this._EventClass; }
public string GetEventMethod() { return this._EventMethod; }
public int GetEventCode() { return this._EventCode; }
public void SetEventProgram(string EventProgram) { this._EventProgram = EventProgram; }
public void SetEventSource(string EventSource) { this._EventSource = EventSource; }
public void SetProgramName(string ProgramName) { this._ProgramName = ProgramName; }
public void SetProgramVersion(string ProgramVersion) { this._ProgramVersion = ProgramVersion; }
public void SetEventClass(string EventClass) { this._EventClass = EventClass; }
public void SetEventMethod(string EventMethod) { this._EventMethod = EventMethod; }
public void SetEventCode(int EventCode) { this._EventCode = EventCode; }
}

Thanks for the idea of hashing the call stack, I was going to ask that very same question of how to pick an eventId.
I recommend putting a static variable in LogSomething that increments each time it is called.

Now I want to use the EventId to try
and group related errors.
You have filters in event viewer so why (Go to find ? You have 65536 unique event IDs too.
Or rather use log4net or something ??
just my ideas....

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading multiple XML files freezes applicaton - c#

Related

Update console text from multiple threads not working

Is this the most efficient way of skipping random changesets when getting latest from TFS?

Neo4j locks CSV file even after importing

What is the best way to work with multiple files in multithread in C#?

Unique EventId generation

Categories

Resources