Quickest Array Initialization? - c#

In an application of mine, I need a large constant (actually static readonly) array of objects. The array is initialized in the type's static constructor.
The array contains more than a thousand items, and when the type is first used, my program experiences a serious slowdown. I would like to know if there is a way to initialise a large array quickly in C#.
public static class XSampa {
public class XSampaPair : IComparable<XSampaPair> {
public XSampaPair GetReverse() {
return new XSampaPair(Key, Target);
}
public string Key { get; private set; }
public string Target { get; private set; }
internal XSampaPair(string key, string target) {
Key = key;
Target = target;
}
public int CompareTo(XSampaPair other) {
if (other == null)
throw new ArgumentNullException("other",
"Cannot compare with Null.");
if (Key == null)
throw new NullReferenceException("Key is null!");
if (other.Key == null)
throw new NullReferenceException("Key is null!");
if (Key.Length == other.Key.Length)
return string.Compare(Key, other.Key,
StringComparison.InvariantCulture);
return other.Key.Length - other.Key;
}
}
private static readonly XSampaPair[] pairs, reversedPairs;
public static string ParseXSampaToIpa(this string xsampa) {
// Parsing code here...
}
public static string ParseIpaToXSampa(this string ipa) {
// reverse code here...
}
static XSampa() {
pairs = new [] {
new XSampaPair("a", "\u0061"),
new XSampaPair("b", "\u0062"),
new XSampaPair("b_<", "\u0253"),
new XSampaPair("c", "\u0063"),
// And many more pairs initialized here...
};
var temp = pairs.Select(x => x.GetReversed());
reversedPairs = temp.ToArray();
Array.Sort(pairs);
Array.Sort(reversedPairs);
}
}
PS: I use to array to convert X-SAMPA phonetic transcription to a Unicode string with the corresponding IPA characters.

You can serialize a completely initialized onject into a binary file, add that file as a resource, and load it into your array on startup. If your constructors are CPU-intensive, you might get an improvement. Since your code appears to perform some sort of parsing, the chances of getting a decent improvement there are fairly high.

You could use an IEnumerable<yourobj> which would let you lazily yield return the enumerable only as needed.
The problem with this is you won't be able to index into it like you could using the array.

Related

C# manipulating data parsed from CSV

I'm creating a program to generate schematics based off of user input. This has to be done dynamically/by hand due to the sheer volume of different possibilities (6.8M, growing exponentially). Right now I'm working on importing some data via CSV.
Example data:
Type,TIN_pos,TIN_ID,Desc
Elect, 0, X, Manual Regulator
Elect, 0, A, Electronic Regulator
Import code:
List<TIN_Fields> values = File.ReadAllLines("C:\\Users\\User\\Desktop\\Visual Basic\\CSV_Test_1.csv")
.Skip(1)
.Select(v => TIN_Fields.FromCsv(v))
.ToList();
public class TIN_Fields
{
public string Type;
public int TIN_pos;
public string TIN_ID;
public string Desc;
public static TIN_Fields FromCsv(string csvLine)
{
string[] values = csvLine.Split(',');
TIN_Fields _Fields = new TIN_Fields();
_Fields.Type = Convert.ToString(values[0]);
_Fields.TIN_pos = Convert.ToInt16(values[1]);
_Fields.TIN_ID = Convert.ToString(values[2]);
_Fields.Desc = Convert.ToString(values[3]);
return _Fields;
}
}
Once that data is Imported, I need to do two things with it,
display the raw csv data in a ListView table, just so users can see if anything in the list needs updating.
be able to compare the items in the list to various characters in a 10-digit hexadecimal code, and spit out some results.
First and foremost, i need to run through the list that was created with the above code, make sure that:
TIN_pos value = 0
because that is the character position of the input box.
Then, with the remaining options, look for the character represented in the input in the TIN_ID field.
Once found, it should then output the Desc field.
Everywhere I have looked says to use foreach, but that requires the array name, which is the part that is confusing me. I've tried filling in basically all of the variables in the FromCSV Method and usually get an error that the class doesn't have a definition.
to hopefully clear up confusion with my explanation, here is the code I created that does the same thing, but with the CSV data hard coded into it, using switch cases and if statements.
public partial class Form1 : Form
{
public string Model_Chassis;
public string Model_Test_Type;
public int ChannelNumberVar => Convert.ToInt32(TextBox_TIN[2]);
public string Tester_Type_Selector;
public string TextBox_TIN
{
get { return TIN_Entry_TextBox.Text; }
set { TIN_Entry_TextBox.Text = value; }
}
public string Model_Data_D
{
get { return Model_Data.Text; }
set { Model_Data.Text = value; }
}
public Form1()
{
InitializeComponent();
}
//Method grabs TIN Box data and decodes it to model information.
public void Model_Select()
{
//Picks Model Chassis
switch (char.ToUpper(TextBox_TIN[0]))
{
case 'H':
{
Model_Chassis = Coding.Model1.description;
}
break;
default:
{
Model_Data_D = "INVALID TIN";
}
break;
}
//Picks Test Type
switch (char.ToUpper(TextBox_TIN[3]))
{
case '0':
{
Model_Test_Type = Test_Types.TT_PD.TT_tt;
}
break;
case '1':
{
Model_Test_Type = Test_Types.TT_PV.TT_tt;
}
break;
default:
{
Model_Test_Type = "";
}
break;
}
//Puts chassis and Test Type together
if (Model_Data_D.Equals("INVALID TIN"))
{
;
}
else if (char.ToUpper(TextBox_TIN[2]).Equals(Coding.Num_Chan_1_2.tin_id))
{
Model_Data_D = $"{Model_Chassis}-{Model_Test_Type}";
}
else
{
Model_Data_D = $"{Model_Chassis}-{TextBox_TIN[2]}{Model_Test_Type}";
}
}
public class Coding
{
public char tin_id;
public string description;
public Coding(char TIN_ID, string Desc)
{
tin_id = TIN_ID;
description = Desc;
}
public static Coding Model1 = new Coding('H', "Model1");
public static Coding Num_Chan_1_2 = new Coding('X', "Single Channel");
public static Coding Elect_Reg_F_1 = new Coding('X', "Manual Regulator");
}
}
INPUT:
HXX0X
OUTPUT
Model1-PD
Thanks in advance for the help!
You're asking quite a few questions, and providing a lot of extra details in here, but for this:
"First and foremost, i need to run through the list that was created with the above code, make sure that:
TIN_pos value = 0
because that is the character position of the input box."
(seeing as you say you need to do this 'first and foremost').
In your FromCsv method, check the value as you create the record, and throw an error if it is invalid. Like this:
public static TIN_Fields FromCsv(string csvLine)
{
string[] values = csvLine.Split(',');
TIN_Fields _Fields = new TIN_Fields();
_Fields.Type = Convert.ToString(values[0]);
_Fields.TIN_pos = Convert.ToInt16(values[1]);
if(_Fields.TIN_pos != 0){
throw new Exception("TIN_pos must be 0");
}
_Fields.TIN_ID = Convert.ToString(values[2]);
_Fields.Desc = Convert.ToString(values[3]);
return _Fields;
}
Assuming you've read in your CSV correctly, which it seems you have, then selecting the appropriate TIN from the list is a simple LINQ statement. The following code assumes that TIN IDs are unique and only a single character in length.
static void Main(string[] args)
{
string testCsv = #"C:\Users\User\Desktop\Visual Basic\CSV_Test_1.csv";
List<TIN_Fields> values = File.ReadAllLines(testCsv)
.Skip(1)
.Select(v => TIN_Fields.FromCsv(v))
.ToList();
// Simulates input received from form
string input = "HXX0X";
TIN_Fields selectedTIN = values.First(x => x.TIN_ID == Convert.ToString(input[0]));
// Insert the description as needed in your ouput.
string output = $"{ selectedTIN.Desc }-";
}
Hopefully that answers another part of the problem. The Convert.ToString() is required because the output of input[0] is a char.

Optimize cache with multiple keys in c# - remove duplication of objects

I have a project in Asp.Net Core. This project has a ICacheService as below:
public interface ICacheService
{
T Get<T>(string key);
T Get<T>(string key, Func<T> getdata);
Task<T> Get<T>(string key, Func<Task<T>> getdata);
void AddOrUpdate(string key, object value);
}
The implementation is simply based on ConcurrentDictionary<string, object>, so its not that complicated, just storing and retrieving data from this dictionary. At one of my services I have a method as below:
public async Task<List<LanguageInfoModel>> GetLanguagesAsync(string frontendId, string languageId, string accessId)
{
async Task<List<LanguageInfoModel>> GetLanguageInfoModel()
{
var data = await _commonServiceProxy.GetLanguages(frontendId, languageId, accessId);
return data;
}
_scheduler.ScheduleAsync($"{CacheKeys.Jobs.LanguagesJob}_{frontendId}_{languageId}_{accessId}", async () =>
{
_cacheService.AddOrUpdate($"{CacheKeys.Languages}_{frontendId}_{languageId}_{accessId}", await GetLanguageInfoModel());
return JobStatus.Success;
}, TimeSpan.FromMinutes(5.0));
return await _cacheService.Get($"{CacheKeys.Languages}_{frontendId}_{languageId}_{accessId}", async () => await GetLanguageInfoModel());
}
The problem is that I have three params in this method that I use as a cache key. This works fine but the problem is that the combination of three params is pretty high so there will be so many duplication of objects in cache. I was thinking to create a cache without duplication like below:
To have a cache with a list as a key where I can store more than one key for one object. So when I get new elements I will check for each of them if it is in the cache, if it is in the cache I will only add a key in the key list otherwise insert a new element in the cache. The problem here is that testing if an object is in the cache is a big problem. I think it will consume a lot of resources and would need some serialization into a specific form to make the comparison possible which will make again the comparison consuming a lot of resources.
The cache might look something like this CustomDictionary<List<string>, object>
Does anybody know a good approach of solving this issue to not duplicate objects in the cache ?
EDIT 1:
My main concern is when I retrieve List<MyModel> from my webservices because they might have 80% of the objects with the same data which will drastically increase the size in memory. But this would be relevant for simple cases as well.
Lest suppose I have something like this:
MyClass o1 = new MyObject();
_cache.Set("key1", o1);
_cashe.Set("key2", o1);
In this case when trying to add the same object twice I would like to not duplicate it but to have key2 somehow pointing to the same object as key1. If this achieved it will be problem to invalidate them but I expect to have something like this:
_cache.Invalidate("key2");
This will check if there is another key pointing to same object. If so, it will only remove the key otherwise destroy the object itself.
Maybe we could reformulate this problem to two separate issues ...
executing the call for each combination and
storing n times the identical result, wasting tons of memory
For 1 I don't have any idea how we could prevent it, as we do not know prior to execution if we will fetch a duplicate in this setup. We would need more information that is based on when these values vary, which may or may not be possible.
For 2 one solution would be to override hashcode so it is based on the actual returned values. A good solution would be generic and walk through the object tree (which probably can be expensive). Would like to know if there are any pre-made solutions for this actually.
This answer is specifically for returning List<TItem>s, rather than just individual TItems, and it avoids duplication of any TItem as well as any List<T>. It uses arrays, because you're trying to save memory, and arrays will use less than a List.
Note that for this (and any solution really) to work, you MUST override Equals and GetHashCode on TItem, so that it knows what a duplicate item is. (Unless the data provider is returning the same object each time, which is unlikely.) If you don't have control of TItem, but you can yourself determine whether two TItems are equal, you can use an IEqualityComparer to do this, but the below solution would need to be modified very slightly in order to do that.
View the solution with a basic test at:
https://dotnetfiddle.net/pKHLQP
public class DuplicateFreeCache<TKey, TItem> where TItem : class
{
private ConcurrentDictionary<TKey, int> Primary { get; } = new ConcurrentDictionary<TKey, int>();
private List<TItem> ItemList { get; } = new List<TItem>();
private List<TItem[]> ListList { get; } = new List<TItem[]>();
private Dictionary<TItem, int> ItemDict { get; } = new Dictionary<TItem, int>();
private Dictionary<IntArray, int> ListDict { get; } = new Dictionary<IntArray, int>();
public IReadOnlyList<TItem> GetOrAdd(TKey key, Func<TKey, IEnumerable<TItem>> getFunc)
{
int index = Primary.GetOrAdd(key, k =>
{
var rawList = getFunc(k);
lock (Primary)
{
int[] itemListByIndex = rawList.Select(item =>
{
if (!ItemDict.TryGetValue(item, out int itemIndex))
{
itemIndex = ItemList.Count;
ItemList.Add(item);
ItemDict[item] = itemIndex;
}
return itemIndex;
}).ToArray();
var intArray = new IntArray(itemListByIndex);
if (!ListDict.TryGetValue(intArray, out int listIndex))
{
lock (ListList)
{
listIndex = ListList.Count;
ListList.Add(itemListByIndex.Select(ii => ItemList[ii]).ToArray());
}
ListDict[intArray] = listIndex;
}
return listIndex;
}
});
lock (ListList)
{
return ListList[index];
}
}
public override string ToString()
{
StringBuilder sb = new StringBuilder();
sb.AppendLine($"A cache with:");
sb.AppendLine($"{ItemList.Count} unique Items;");
sb.AppendLine($"{ListList.Count} unique lists of Items;");
sb.AppendLine($"{Primary.Count} primary dictionary items;");
sb.AppendLine($"{ItemDict.Count} item dictionary items;");
sb.AppendLine($"{ListDict.Count} list dictionary items;");
return sb.ToString();
}
//We have this to make Dictionary lookups on int[] find identical arrays.
//One could also just make an IEqualityComparer, but I felt like doing it this way.
public class IntArray
{
private readonly int _hashCode;
public int[] Array { get; }
public IntArray(int[] arr)
{
Array = arr;
unchecked
{
_hashCode = 0;
for (int i = 0; i < arr.Length; i++)
_hashCode = (_hashCode * 397) ^ arr[i];
}
}
protected bool Equals(IntArray other)
{
return Array.SequenceEqual(other.Array);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((IntArray)obj);
}
public override int GetHashCode() => _hashCode;
}
}
It occurred to me that a ReaderWriterLockSlim would be better than the lock(ListList), if the lock is causing performance to lag, but it's very slightly more complicated.
Similar to #MineR, this solution is performing a 'double caching' operation: it caches the key'ed lists (lookups) as well as the individual objects - performing an automatic deduplication.
It is a fairly simple solution using two ConcurrentDictionaries - one acting as a HashSet and one as a keyed lookup. This allows most of the threading concerns to be handled by the framework.
You can also pass in and share the hashset between multiple Cachedlookups allowing lookups with different keys.
Note that object equality or an IEqualityComparer are required to make any such solution function.
Class:
public class CachedLookup<T, TKey>
{
private readonly ConcurrentDictionary<T, T> _hashSet;
private readonly ConcurrentDictionary<TKey, List<T>> _lookup = new ConcurrentDictionary<TKey, List<T>>();
public CachedLookup(ConcurrentDictionary<T, T> hashSet)
{
_hashSet = hashSet;
}
public CachedLookup(IEqualityComparer<T> equalityComparer = default)
{
_hashSet = equalityComparer is null ? new ConcurrentDictionary<T, T>() : new ConcurrentDictionary<T, T>(equalityComparer);
}
public List<T> Get(TKey key) => _lookup.ContainsKey(key) ? _lookup[key] : null;
public List<T> Get(TKey key, Func<TKey, List<T>> getData)
{
if (_lookup.ContainsKey(key))
return _lookup[key];
var result = DedupeAndCache(getData(key));
_lookup.TryAdd(key, result);
return result;
}
public async ValueTask<List<T>> GetAsync(TKey key, Func<TKey, Task<List<T>>> getData)
{
if (_lookup.ContainsKey(key))
return _lookup[key];
var result = DedupeAndCache(await getData(key));
_lookup.TryAdd(key, result);
return result;
}
public void Add(T value) => _hashSet.TryAdd(value, value);
public List<T> AddOrUpdate(TKey key, List<T> data)
{
var deduped = DedupeAndCache(data);
_lookup.AddOrUpdate(key, deduped, (k,l)=>deduped);
return deduped;
}
private List<T> DedupeAndCache(IEnumerable<T> input) => input.Select(v => _hashSet.GetOrAdd(v,v)).ToList();
}
Example Usage:
public class ExampleUsage
{
private readonly CachedLookup<LanguageInfoModel, (string frontendId, string languageId, string accessId)> _lookup
= new CachedLookup<LanguageInfoModel, (string frontendId, string languageId, string accessId)>(new LanguageInfoModelComparer());
public ValueTask<List<LanguageInfoModel>> GetLanguagesAsync(string frontendId, string languageId, string accessId)
{
return _lookup.GetAsync((frontendId, languageId, accessId), GetLanguagesFromDB(k));
}
private async Task<List<LanguageInfoModel>> GetLanguagesFromDB((string frontendId, string languageId, string accessId) key) => throw new NotImplementedException();
}
public class LanguageInfoModel
{
public string FrontendId { get; set; }
public string LanguageId { get; set; }
public string AccessId { get; set; }
public string SomeOtherUniqueValue { get; set; }
}
public class LanguageInfoModelComparer : IEqualityComparer<LanguageInfoModel>
{
public bool Equals(LanguageInfoModel x, LanguageInfoModel y)
{
return (x?.FrontendId, x?.AccessId, x?.LanguageId, x?.SomeOtherUniqueValue)
.Equals((y?.FrontendId, y?.AccessId, y?.LanguageId, y?.SomeOtherUniqueValue));
}
public int GetHashCode(LanguageInfoModel obj) =>
(obj.FrontendId, obj.LanguageId, obj.AccessId, obj.SomeOtherUniqueValue).GetHashCode();
}
Notes:
The CachedLookup class is generic on both the value and key. The example use of ValueTuple makes it easy to have compound keys. I have also used ValueTuples to simplify the equality comparisons.
This usage of ValueTask fits nicely with its intended purpose, returning the cached list synchronously.
If you have access to the lower level data access layer, one optimization would be to move the deduplication to happen before the objects are instantiated (based on property value equality). This would reduce the allocations and load on the GC.
If you have control over your complete solution then you can do something like this.
Whatever object that is capable of storing in Cache. You have to identify that.
All Such object implement common interface.
public interface ICacheable
{
string ObjectId(); // This will implement logic to calculate each object identity. You can count hash code but you have to add some other value to.
}
Now when you store object in Cache. You do two thing.
Store Two way things. Like one cache store ObjectId to Key.
Another will contains ObjectId to Object.
Overall idea is that when you get object. You search in first cache and see that the key you want is there against ObjectId. If yes then no further action otherwise you have to create new entry in First Cache for ObjectId to Key Map.
If object is not present then you have to create entry in both cache
Note : You have to overcome performance issue. Because your keys is some kind of list so it create problem while searching.
It sound to me as though you need to implement some sort of index. Assuming that your model is fairly large, which is why you want to save memory then you could do this with two concurrent dictionaries.
The first would be ConcurrentDictionary<string, int> (or whatever unique id applies to your model object) and would contain your key values. Each key is obviously be different as per all your combinations, but you are only duplicating the int unique key for all of your objects, not the entire object.
The second dictionary would be a ConcurrentDictionary<int, object> or ConcurrentDictionary<int, T> and would contain your unique large objects indexed via their unique key.
When building the cache you would need to populate both dictionaries, the exact method would depend upon how you are doing it at the moment.
To retrieve an object you would build the key as you do at the moment, retrieve the hashcode value from the first dictionary, and then use that to locate the actual object from the second dictionary.
It is also possible to invalidate one key without invalidating the main object another key is also using it, although it does require you to iterate over the index dictionary to check if any other key is pointing to the same object.
I think this is not a caching concern where one key map to one and only one data. Yours is not in this case. You are trying to manipulate a local data repository in memory work as cached data.
You are trying to create mappers between keys and objects that loaded from remote. One key is able to map to many objects. One object can be mapped by many Keys, so the relationship is n <======> n
I have created a sample modal as following
Key, KeyMyModel and MyModel are classes for caching handler
RemoteModel is class that you got from remote service
With this models, you are able to meet the requirements. This utilizes entity Id to specify an object, does not need to hash to specify duplications. This is very basic that I have implemented set method. Invaildate a key is very similar. You must write code that ensure thread safe as well
public class MyModel
{
public RemoteModel RemoteModel { get; set; }
public List<KeyMyModel> KeyMyModels { get; set; }
}
public class RemoteModel
{
public string Id { get; set; } // Identity property this get from remote service
public string DummyProperty { get; set; } // Some properties returned by remote service
}
public class KeyMyModel
{
public string Key { get; set; }
public string MyModelId { get; set; }
}
public class Key
{
public string KeyStr { get; set; }
public List<KeyMyModel> KeyMyModels { get; set; }
}
public interface ICacheService
{
List<RemoteModel> Get(string key);
List<RemoteModel> Get(string key, Func<List<RemoteModel>> getdata);
Task<List<RemoteModel>> Get(string key, Func<Task<List<RemoteModel>>> getdata);
void AddOrUpdate(string key, object value);
}
public class CacheService : ICacheService
{
public List<MyModel> MyModels { get; private set; }
public List<Key> Keys { get; private set; }
public List<KeyMyModel> KeyMyModels { get; private set; }
public CacheService()
{
MyModels = new List<MyModel>();
Keys = new List<Key>();
KeyMyModels = new List<KeyMyModel>();
}
public List<RemoteModel> Get(string key)
{
return MyModels.Where(s => s.KeyMyModels.Any(t => t.Key == key)).Select(s => s.RemoteModel).ToList();
}
public List<RemoteModel> Get(string key, Func<List<RemoteModel>> getdata)
{
var remoteData = getdata();
Set(key, remoteData);
return MyModels.Where(s => s.KeyMyModels.Any(t => t.Key == key)).Select(t => t.RemoteModel).ToList();
}
public Task<List<RemoteModel>> Get(string key, Func<Task<List<RemoteModel>>> getdata)
{
throw new NotImplementedException();
}
public void AddOrUpdate(string key, object value)
{
throw new NotImplementedException();
}
public void Invalidate(string key)
{
}
public void Set(string key, List<RemoteModel> data)
{
var Key = Keys.FirstOrDefault(s => s.KeyStr == key) ?? new Key()
{
KeyStr = key
};
foreach (var remoteModel in data)
{
var exist = MyModels.FirstOrDefault(s => s.RemoteModel.Id == remoteModel.Id);
if (exist == null)
{
// add data to the cache
var myModel = new MyModel()
{
RemoteModel = remoteModel
};
var keyMyModel = new KeyMyModel()
{
Key = key,
MyModelId = remoteModel.Id
};
myModel.KeyMyModels.Add(keyMyModel);
Key.KeyMyModels.Add(keyMyModel);
Keys.Add(Key);
}
else
{
exist.RemoteModel = remoteModel;
var existKeyMyModel =
KeyMyModels.FirstOrDefault(s => s.Key == key && s.MyModelId == exist.RemoteModel.Id);
if (existKeyMyModel == null)
{
existKeyMyModel = new KeyMyModel()
{
Key = key,
MyModelId = exist.RemoteModel.Id
};
Key.KeyMyModels.Add(existKeyMyModel);
exist.KeyMyModels.Add(existKeyMyModel);
KeyMyModels.Add(existKeyMyModel);
}
}
}
// Remove MyModels if need
var remoteIds = data.Select(s => s.Id);
var currentIds = KeyMyModels.Where(s => s.Key == key).Select(s => s.MyModelId);
var removingIds = currentIds.Except(remoteIds);
var removingKeyMyModels = KeyMyModels.Where(s => s.Key == key && removingIds.Any(i => i == s.MyModelId)).ToList();
removingKeyMyModels.ForEach(s =>
{
KeyMyModels.Remove(s);
Key.KeyMyModels.Remove(s);
});
}
}
class CacheConsumer
{
private readonly CacheService _cacheService = new CacheService();
public List<RemoteModel> GetMyModels(string frontendId, string languageId, string accessId)
{
var key = $"{frontendId}_{languageId}_{accessId}";
return _cacheService.Get(key, () =>
{
// call to remote service here
return new List<RemoteModel>();
});
}
}

Trying to write a more robust TemplateEngine with replacements defined externally

I am in the process of writing a TemplateEngine. The intention is that will read in a HTML file containing placeholders. Something along the lines of <p><Dear |*name|* you have won 1st prize in |*competition|* thanks for your entry on |*date|*</p> where anything between || needs to be replaced by data from a key value pair dictionary.
I currently initialise a dictionary manually like this:
mergeData.Add("name", dataRow["UserName"].ToString());
mergeData.Add("competiton", dataRow["CompName"].ToString());
mergeData.Add("date", dataRow["EntryDate"].ToString());
templateEngine.Initialise(mergeData);
templateEngine.Run();`
As you can see it makes use of a few magic strings. What I would like to do is make it a bit more extensible so that the placeholders and replacements could be defined externally. At the moment I am explictly specifying which columns to use for the data in code. I think that maybe a DataTable would work for the Initialise() but not sure how to approach it. Any ideas / suggestions would be most welcome.
public class TemplateEngine
{
private string _template;
private bool _initialised = false;
private Dictionary<string, string> _mergeData;
private TemplateEngine(string templateString)
{
_template = templateString;
}
public void Initialise(Dictionary<string, string> mergeData)
{
if (mergeData == null)
{
_initialised = false;
throw new ArgumentException("Must specify key value pairs to perform merge correctly");
}
_mergeData = mergeData;
_initialised = true;
}
public string Run()
{
if (_initialised == false)
{
throw new Exception("Cannot run engine as mergeData dictionary is not initalised");
}
foreach (var kvp in _mergeData)
{
_template = _template.Replace("|*" + kvp.Key + "|*", kvp.Value);
}
return _template;
}
public static TemplateEngine FromFile(string filePath)
{
if (filePath == string.Empty)
{
throw new ArgumentException("FilePath not specified cannot instantiate TemplateEngine");
}
string html = System.IO.File.ReadAllText(filePath);
var templateEngine = new TemplateEngine(html);
return templateEngine;
}
}
}

Is there a good C# design pattern for parsing strings that when split have different amounts of data?

I am dealing with values delimited by commas sent to me as a string. The strings come in many different structures (meaning different data types in different locations of the string as well as varying amounts of data). So while one string might be represented as:
- common data,identifier,int,string,string,string.
Another might be represented as:
- common data,identifier,int,int,string,string,string.
Design goals:
Common parse method
Common validation (i.e. int.TryParse() returns true)
Readily able to add different structures
Is there a good design pattern, or combination of design patterns, that allows me to parse the values, check them, and return an object only if the right amount of values were pulled in and those values were the expected data types?
Note: I am dealing with more than 30 different string structures.
If all the lines start with common data, identifier, and then are followed by a variable but expected (i.e. known based on the identifier) set of values, then a table approach could work well. To continue your example, say you have two different types:
common data,identifier,int,string,string,string.
common data,identifier,int,int,string,string,string.
You can build a class that defines what you're looking for:
class ItemDesc
{
public string Ident { get; private set; }
public string Fields { get; private set; }
public ItemDesc(string id, string flds)
{
Ident = id;
Fields = flds;
}
}
The Fields property is just a string that contains one-character type descriptions for the variable data. That is, "isss" would be interpreted as int,string,string,string.
You can then build a Dictionary<string, ItemDesc> that you can use to look these up:
Dictionary<string, ItemDesc> ItemLookup = new Dictionary<string, ItemDesc>
{
{ "ItemType1", new ItemDesc("ItemType1", "isss") },
{ "ItemType2", new ItemDesc("ItemType2", "iisss") },
};
Now when you read a line, use string.Split() to split it into fields. Get the identifier, look it up the dictionary to get the item descriptions, and then parse the rest of the fields. Something like:
string line = GetLine();
var fields = line.Split(',');
// somehow get the identifier
string id = GetIdentifier();
ItemDesc desc;
if (!ItemLookup.TryGetValue(id, out desc))
{
// unrecognized identifier
}
else
{
int fieldNo = 3; // or whatever field is after the identifier
foreach (var c in desc.Fields)
{
switch (c)
{
case 'i' :
// try to parse an int and save it.
break;
case 's' :
// save the string
break;
default:
// error, unknown field type
break;
}
++fieldNo;
}
}
// at this point if no errors occurred, then you have a collection
// of parsed fields that you saved. You can now create your object.
would need little more details, based on your problem domain it could entirely change. but following seem to be the first set of patterns, they are ordered on suitability.
Interpreter
Strategy
Builder
Just split them using string.Split(), and then int.Parse() or int.TryParse() each int value in the resulting array as needed.
var myStrings = string.Split(sourceString);
int myint1 = int.Parse(myStrings[0]);
There are several ways of dealing with this. Here's a simple one (outputting just an object array):
class Template
{
// map identifiers to templates
static Dictionary<string, string> templates = new Dictionary<string, string>
{
{ "type1", "isss" },
{ "type2", "iisss" },
};
static bool ParseItem(string input, char type, out object output)
{
output = null;
switch (type)
{
case 'i':
int i;
bool valid = int.TryParse(input, out i);
output = i;
return valid;
case 's':
output = input;
return true;
}
return false;
}
public static object[] ParseString(string input)
{
string[] items = input.Split(',');
// make sure we have enough items
if (items.Length < 2)
return null;
object[] output = new object[items.Length - 2];
string identifier = items[1];
string template;
// make sure a valid identifier was specified
if (!templates.TryGetValue(identifier, out template))
return null;
// make sure we have the right amount of data
if (template.Length != output.Length)
return null;
// parse each item
for (int i = 0; i < template.Length; i++)
if (!ParseItem(items[i + 2], template[i], out output[i]))
return null;
return output;
}
}
If you're interested in returning actual objects instead of just object arrays, you can put metadata into the class definitions of the objects you're returning. Then when you get the object type you look for the metadata to figure out where to find its value in the input array. Here's a quick example:
namespace Parser
{
// create metadata attribute
class CsvPositionAttribute : Attribute
{
public int Position { get; set; }
public CsvPositionAttribute(int position)
{
Position = position;
}
}
// define some classes that use our metadata
public class type1
{
[CsvPosition(0)]
public int int1;
[CsvPosition(1)]
public string str1;
[CsvPosition(2)]
public string str2;
[CsvPosition(3)]
public string str3;
}
public class type2
{
[CsvPosition(0)]
public int int1;
[CsvPosition(1)]
public int int2;
[CsvPosition(2)]
public string str1;
[CsvPosition(3)]
public string str2;
[CsvPosition(4)]
public string str3;
}
public class CsvParser
{
public static object ParseString(string input)
{
string[] items = input.Split(',');
// make sure we have enough items
if (items.Length < 2)
return null;
string identifier = items[1];
// assume that our identifiers refer to a type in our namespace
Type type = Type.GetType("Parser." + identifier, false);
if (type == null)
return null;
object output = Activator.CreateInstance(type);
// iterate over fields in the type -- you may want to use properties
foreach (var field in type.GetFields())
// find the members that have our position attribute
foreach (CsvPositionAttribute attr in
field.GetCustomAttributes(typeof(CsvPositionAttribute),
false))
// if the item exists, convert it to the type of the field
if (attr.Position + 2 >= items.Length)
return null;
else
// ChangeType may throw exceptions on failure;
// catch them and return an error
try { field.SetValue(output,
Convert.ChangeType(items[attr.Position + 2],
field.FieldType));
} catch { return null; }
return output;
}
}
}

Best practice for WP7 serializeable class [is that the way its done]

I am trying to get this basic class right, its supposed to do the following:
Have a list of string, were each second a new string is created.
Have a current string, which represents the current string (in this case last created)
Use an observable collection for data binding support
The code for the class looks like this, the whole project can be found at this link:
http://www.filesavr.com/XXRM3TJ9LSW6FEC
Any way to make this nicer, or is it "as good as it gets".
Thanks,
Chris
PS: I know, not a real question, but if I will base a lot of classes on this design, so I want to be sure not to duplicate mistakes. I though about createing my own observable collection which supports "current" and serialization, but I struggle a little bit with the generic attribute. Would you create one, or use the approach I used in the example below?
[DataContract]
public class SerializerTest : INotifyPropertyChanged
{
private DispatcherTimer _dT;
private List<string> _strings;
public static string Key { get{return typeof (SerializerTest).FullName;} }
[DataMember]
public List<string> Strings
{
get
{
return _strings;
}
set
{
_strings = value;
StringsObservable = new ObservableCollection<string>();
foreach (var s in _strings) StringsObservable.Add(s);
}
}
[DataMember]
public int CurrentStringIndex { get; set; }
public ObservableCollection<string> StringsObservable { get; set; }
public string CurrentString
{
get
{
if (Strings == null) return null;
if (Strings.Count <= CurrentStringIndex) return null;
return Strings[CurrentStringIndex];
}
}
public SerializerTest()
{
Strings = new List<string>();
StringsObservable = new ObservableCollection<string>();
InteralInit();
}
[OnDeserialized]
public void Init(StreamingContext c)
{
InteralInit();
}
private void InteralInit()
{
_dT = new DispatcherTimer();
_dT.Tick += (a, b) => AddString();
_dT.Interval = new TimeSpan(0, 0, 0, 2);
_dT.Start();
}
public void AddString()
{
Strings.Add(DateTime.Now.ToLongTimeString() + ":" + DateTime.Now.Millisecond);
StringsObservable.Add(Strings.Last());
CurrentStringIndex = Strings.Count - 1;
if (PropertyChanged != null) PropertyChanged(this, new PropertyChangedEventArgs(""));
}
public event PropertyChangedEventHandler PropertyChanged;
}
Binary serialization has proven to be much faster than data contract serializer, so you may want to consider that option instead. Kevin Marshall has a great post on this: http://blogs.claritycon.com/kevinmarshall/2010/11/03/wp7-serialization-comparison/
You might find our articles on serializing to binary on Windows Phone 7 useful:
http://verysoftware.co.uk/blog/serializing-to-binary-on-wp7-part1/
http://verysoftware.co.uk/blog/serializing-to-binary-on-wp7-part2/

Categories