How can I manage which items are in the ThreadPool? - c#

I have a windows service that runs a method when the services main Timer elapses (OnElapse).
The OnElapse method gets a list of .xml files to process.
Each xml file is inserted into a ThreadPool.
I want to make sure I don't insert 2 XML's with the same name into the ThreadPool.
How can I manage which items are in the ThreadPool? I basically want to do this:
if xmlfilename not in threadpool
insert in threadpool

This is pretty trick because you need to closely monitor the ThreadPool and it will require a form of synchronization. Here's a quick and dirty example of a way to do this.
class XmlManager {
private object m_lock = new object();
private HashSet<string> m_inPool = new HashSet<string>();
private void Run(object state) {
string name = (string)state;
try {
FunctionThatActuallyProcessesFiles(name);
} finally {
lock ( m_lock ) { m_inPool.Remove(name); }
}
}
public void MaybeRun(string xmlName) {
lock ( m_lock ) {
if (!m_pool.Add(xmlName)) {
return;
}
}
ThreadPool.QueueUserWorkItem(Run, xmlName);
}
}
This is not a foolproof solution. There is at least one race condition in the code. Namely that an item could be being removed from the pool while you're trying to add it back in and it won't actually get added. But if your are only concerned with them being processed a single time, this doesn't matter.

Something like this should do it (use a HashSet instead of a Dictionary if you are using .Net 3.5 or higher):
using System;
using System.Collections.Generic;
using System.Threading;
namespace Something
{
class ProcessFilesClass
{
private object m_Lock = new object();
private Dictionary<string, object> m_WorkingItems =
new Dictionary<string, object>();
private Timer m_Timer;
public ProcessFilesClass()
{
m_Timer = new Timer(OnElapsed, null, 0, 10000);
}
public void OnElapsed(object context)
{
List<string> xmlList = new List<string>();
//Process xml files into xmlList
foreach (string xmlFile in xmlList)
{
lock (m_Lock)
{
if (!m_WorkingItems.ContainsKey(xmlFile))
{
m_WorkingItems.Add(xmlFile, null);
ThreadPool.QueueUserWorkItem(DoWork, xmlFile);
}
}
}
}
public void DoWork(object xmlFile)
{
//process xmlFile
lock (m_Lock)
{
m_WorkingItems.Remove(xmlFile.ToString());
}
}
}
}

OnElaspe can't you rename the xml file ? so it has a unqiue name on the threadpool.

Couldn't you make a dictionary> and check that before you do the insertion? Something like this...
Dictionary<ThreadPool, List<String>> poolXmlFiles = new Dictionary<ThreadPool, List<String>>();
if(poolXmlFiles.ContainsKey(ThreadPool) && !poolXmlFiles[ThreadPool].Contains(xmlFileName))
{
poolXmlFiles[ThreadPool].Add(xmlFileName);
//Add the xmlFile to the ThreadPool
}
Sorry if there are syntax errors, I'm coding in VB these days.

Related

Lazy load a list that is obtained from a using statement

I am using CSVHelper library to read from a CSV file. But that's not what this is about
Please refer to the code below
public class Reader
{
public IEnumerable<CSVModel> Read(string file)
{
using var reader = new StreamReader(#"C:\Users\z0042d8s\Desktop\GST invoice\RISK All RISKs_RM - Copy.CSV");
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
IEnumerable<CSVModel> records = csv.GetRecords<CSVModel>();
return records;
}
}
The csv.GetRecords in the above method uses a yield return and returns every CSV row as soon as it's read and not wait until the entire CSV file is read to return (Streams data from the CSV)
I have a consumer class which as the name suggests consumes data returned by the Read method.
class Consumer
{
public void Consume(IEnumerable<CSVModel> data)
{
foreach(var item in data)
{
//Do whatever you want with the data. I am gonna log it to the console
Console.WriteLine(item);
}
}
And below is the caller
public static void main()
{
var data = new Reader().Read();
new Consumer().Consume();
}
Hope I didn't lose you.
The problem I am facing is below
As data variable above is IEnumerable, it will be Lazy loaded (In other words, it doesn't read the CSV file as long as it's not iterated over). But, by the time I call the Consume() method, which iterates over the data variable, forcing the reading from the CSV file in the Read() method, the reader and csv objects which are in using statements will be disposed off throwing an ObjectDisposed exception.
Also, I don't want to remove the reader and csv objects outside of the using blocks as they should be disposed to prevent memory leaks.
The exception message is below
System.ObjectDisposedException: 'GetRecords<T>() returns an IEnumerable<T>
that yields records. This means that the method isn't actually called until
you try and access the values. e.g. .ToList() Did you create CsvReader inside
a using block and are now trying to access the records outside of that using
block?
And I know I can use a greedy operator (.ToList()). But I want the lazy loading to work.
Please suggest if there are any ways out.
Thanks in advance.
You may pass an action as parameter to the reader. It changes a bit the approach thought:
public class Reader
{
public void Read(string file, Action<CSVModel> action)
{
using var reader = new StreamReader(#"C:\Users\z0042d8s\Desktop\GST invoice\RISK All RISKs_RM - Copy.CSV");
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
IEnumerable<CSVModel> records = csv.GetRecords<CSVModel>();
foreach(var record in records){
action(record);
}
}
}
class Consumer
{
public void Consume(CSVModel data)
{
//Do whatever you want with the data. I am gonna log it to the console
Console.WriteLine(item);
}
}
public static void Main()
{
var consumer = new Consumer();
new Reader().Read(consumer.Consume); // Pass the action here
}
ALternatively, you can make the whole Reader class Disposable :
public class Reader : IDisposable
{
private readonly StreamReader _reader;
private readonly CsvReader _csv;
public Reader(string file)
{
_reader = new StreamReader(file);
_csv = new CsvReader(_reader, CultureInfo.InvariantCulture);
}
public IEnumerable<CSVModel> Read()
{
return csv.GetRecords<CSVModel>();
}
public void Dispose() => Dispose(true);
protected virtual void Dispose(bool disposing)
{
if (_disposed)
{
return;
}
if (disposing)
{
_csv.Dispose();
_reader.Dispose();
}
_disposed = true;
}
}
class Consumer
{
public void Consume(IEnumerable<CSVModel> data)
{
foreach(var item in data)
{
//Do whatever you want with the data. I am gonna log it to the console
Console.WriteLine(item);
}
}
}
public static void Main()
{
using var myReader = new Reader("c:\\path.csv");
var consumer = new Consumer().Consume(myReader.Read());
}
You can lazily enumerate items from GetRecords IEnumerable and yield records to the consumer like this:
public class Reader
{
public IEnumerable<CSVModel> Read(string file)
{
using var reader = new StreamReader(#"C:\Users\z0042d8s\Desktop\GST invoice\RISK All RISKs_RM - Copy.CSV");
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
foreach (var csvRecord in csv.GetRecords<CSVModel>())
{
yield return csvRecord;
}
}
}
This way you guarantee to enumerate records before the underlying data gets disposed and you don't need to load all data up front.

Fast way to find key-value pair in C#

Sounds pretty trivial but I really despair:
How do I import and match a value to a given key in an elegant fast way?
Just telephone area code - finding the matching prefix for a given zone (no multi-user). I have it as CSV but SQLite would be fine, too. SQLite connector? Or Dictionary? I don't know ...
Thank you in advance!
Nico
Simple as that. It's a console application.
Not pretty with the global creation and intialization but I didn't manage to do it across multiple source files by using only just a public class (type "Dictionary" and return this).
Main .cs file:
static class GlobalVar
{
public static Dictionary<string, string> areaCodesDict = new Dictionary<string, string>();
}
Second .cs file:
public class AreaCodes
{
public static void ParseCsv()
{
var path = ConfigurationManager.AppSettings["areaCodesCsv"];
using (var strReader = new StreamReader(path))
{
while (!strReader.EndOfStream)
{
var line = strReader.ReadLine();
if (line == null) { continue; }
var csv = line.Split(Convert.ToChar(ConfigurationManager.AppSettings["areaCodesCsvDelim"]));
// areaCodesDict.Add(key, value)
GlobalVar.areaCodesDict.Add(csv[0], csv[1]);
}
}
}
}
Example usage in Main.cs file again:
if (regexMatch.Success)
{
foreach (KeyValuePair<string, string> pair in GlobalVar.areaCodesDict)
{
if(destNumber.StartsWith(pair.Value))
{
destNumber = destNumber.Replace(pair.Value, pair.Key);
}
}
}

Resource (.resx) data is not saved

I can't get what's the problem. Please check my code's fragments. Each time when I add resource data, it clears last data and writes new records in .resx.
For example, Applications.resx has "MyApp1" key with "MyApp1Path" value. Next time if I add "MyApp2" key with "MyApp2Path" value, I notice that {"MyApp1", "MyApp1Path"} doesn't exist.
//Adding Application in Applications List
ResourceHelper.AddResource("Applications", _appName, _appPath);
Here is ResourceHelper class:
public class ResourceHelper
{
public static void AddResource(string resxFileName, string name, string value)
{
using (var resx = new ResXResourceWriter(String.Format(#".\Resources\{0}.resx", resxFileName)))
{
resx.AddResource(name, value);
}
}
}
Yes this is expected, ResXResourceWriter just adds nodes, it doesn't append.
However, you could just read the nodes out, and add them again
public static void AddResource(string resxFileName, string name, object value)
{
var fileName = $#".\Resources\{resxFileName}.resx";
using (var writer = new ResXResourceWriter(fileName))
{
if (File.Exists(fileName))
{
using (var reader = new ResXResourceReader(fileName))
{
var node = reader.GetEnumerator();
while (node.MoveNext())
{
writer.AddResource(node.Key.ToString(), node.Value);
}
}
}
writer.AddResource(name, value);
}
}
Disclaimer, untested and probably needs error checking

Does the TryTake in this background print ensure efficient writing to log?

I want to print text to a file but I want to ensure main thread is not held up by writing to disk.
I have created the following scheme using a BlockingCollection.
I have an endless while-loop that contain a 60sec TryTake.
Can anyone tell me if they see any problem with the efficiency of this approach? I believe it will write to disk as soon as new txt is added to the Collection, but otherwise wait for 60secs. So it is not spinning whilst waiting for input.
Previously, I have used non blockingcollection approach with 2 queues that I switch every few seconds. One queue got new txts, whilst I wrote the other to disk. This means there is some delay before writing to disk, and I can lose last data in event of a crash.
public class OutputHandler
{
public BlockingCollection<string> O = new BlockingCollection<string>();
public string fileName;
public OutputHandler(string _fileName)
{
fileName = _fileName;
}
public OutputHandler(string _fileName, bool forceBackgroundPrint = false)
{
fileName = _fileName;
if (forceBackgroundPrint)
{
Task.Factory.StartNew(() =>
{
BackgroundPrint();
});
}
}
private void BackgroundPrint()
{
using(var stream = new StreamWriter(fileName, true))
{
string txt;
while (true)
{
if (O.TryTake(out txt, 60000))
{
stream.WriteLine(txt);
stream.Flush();
}
}
}
}
public void WriteLine(string txt)
{
O.Add(DateTime.UtcNow.ToString("yyyyMMddTHHmmss.fff ") + txt);
}
}
Use:
OutputHandler OUT = new OutputHandler("C:/test.txt",true);
OUT.WriteLine("Whatever");

Serialize large amount of objects in C# on the fly rather than all at once?

I have created a couple of classes mean to represent a relational data structure ( parent child structures ). Below is an example of XML representation so far giving you an idea of what I mean
<BillingFile>
<Account>
<acctnum>122344231414</acctnum>
<adjustments>34.44</adjustments>
<Charges>
<lineitem>
<chargetype>PENALTY</chargetype>
<amount>40.50</amount>
<ratecode>E101</ratecode>
</lineitem>
<lineitem>
<chargetype>LATE CHARGE</chargetype>
<amount>445.35</amount>
<ratecode>D101</ratecode>
</lineitem>
</Charges>
</Account>
</BillingFile>
What I'm doing with my application is parsing through a large text file which could have upwards of 50,000+ accounts in it. Each time an account is read, I will create an "Account" object that has the parent objects, etc. The end goal is to be able to create an XML file containing all this account info that is serialized from the objects created.
The problem I see with this, is that if I store all these objects in memory it will cause a performance issue as it runs in those 50k+ record files.
What I'm wondering is, is there a way to sequentially serialize an object in C#, rather than all at once?
I've done some googling and it seems that the built in serialization methods of .NET are a one and done kind of deal. Is there a better way I can do this?
I'd rather avoid having to do any intermediate steps like storing the data in a database, since it's easier to modify code than it is to mess with a bunch of tables and JOIN statements.
Thoughts?
XmlSerializer.Deserialize takes an XmlReader parameter. You could place the XmlReader just at the <Account> tag, and call the XmlSerializer there.
public IEnumerable<Account> ReadAccounts(TextReader source)
{
var ser = new XmlSerializer(typeof(Account));
using (var reader = XmlReader.Create(source))
{
if (!reader.IsStartElement("BillingFile"))
{
yield break;
}
reader.Read();
while (reader.MoveToContent() == XmlNodeType.Element)
{
yield return (Account) ser.Deserialize(reader);
}
}
}
Similarly for serialization
public void WriteAccounts(IEnumerable<Account> data, TextWriter target)
{
// Use XmlSerializerNamespaces to supress xmlns:xsi and xmlns:xsd
var namespaces = new XmlSerializerNamespaces();
namespaces.Add("", "");
var ser = new XmlSerializer(typeof(Account));
using (var writer = XmlWriter.Create(target))
{
writer.WriteStartElement("BillingFile");
foreach (var acct in data)
{
ser.Serialize(writer, acct, namespaces);
writer.Flush();
}
writer.WriteEndElement();
}
}
You could also create a BillingFile class that implements IXmlSerializable, and put this functionality there.
Or if you prefer a push-based model instead:
public class AccountWriter : IDisposable
{
private XmlWriter _writer;
private XmlSerializer _ser;
private XmlSerializerNamespaces _namespaces;
private bool _wroteHeader = false;
private bool _disposed = false;
public bool IsDisposed { get { return _disposed; } }
public AccountWriter(TextWriter target)
{
_namespaces = new XmlSerializerNamespaces();
_namespaces.Add("", "");
_ser = new XmlSerializer(typeof(Account));
_writer = XmlWriter.Create(target);
}
public void Write(Account acct)
{
if (_disposed) throw new ObjectDisposedException("AccountWriter");
if (!_wroteHeader)
{
_writer.WriteStartElement("BillingFile");
_wroteHeader = true;
}
_ser.Serialize(_writer, acct, _namespaces);
}
public void Flush()
{
if (_disposed) throw new ObjectDisposedException("AccountWriter");
_writer.Flush();
}
public void Dispose()
{
if (!_disposed)
{
if (_wroteHeader)
{
_writer.WriteEndElement();
_wroteHeader = true;
}
_writer.Dispose();
_disposed = true;
}
}
}
using (var writer = new AccountWriter(Console.Out))
{
foreach (var acct in accounts)
{
writer.Write(acct);
}
}
The problem I see with this, is that if I store all these objects in memory it will cause a performance issue as it runs in those 50k+ record files.
Test that first. 50k * 1kB is still only 50 MB.
Don't solve problems you don't have.
You can create your own Account objects that would take an XElement and read the data from that node, example:
public class Account
{
XElement self;
public Account(XElement account)
{
if(null == account)
self = new XElement("Account");
else
self = account;
}
public int Number
{
get { return self.Get("acctnum", 0); }
set { self.Set("acctnum", value, false); }
}
public Charges Charges { get { return new Charges(self.GetElement("Charges")); } }
}
I'm using these extensions to get the information that handles empty nodes / default values like above, 0 being the default int value for the Number get. And the GetElement() creates a new Charges node if it doesn't exist.
You will need to create your enumerable Charges class & LineItem classes, but you only create what you need as needed.
You can populate an account with an XPath lookup like:
Account account = new Account(
root.XPathSelectElement("Account[acctnum='"+ someAccount + "']"));
XPath is found with using System.Xml.XPath.

Categories