Using C# Dictionary to parse log file - c#

I am trying to parse a rather long log file and creating a better more manageable listing of issues.
I am able to read and parse out the individual log line by line, but what I need to do is display only unique entries, as some errors occur more often than others and are always recorded with identical text.
What I was going to try to do was create a Dictionary object to hold each unique entry and as I work through the log file, search the Dictionary object to see if the same values are already in there.
Here is a crude sample of the code I have (a work in progress, I hope I have all syntax right) that does not work. For some reason this script never sees any distinct entries (if statement never passes):
string[] rowdta = new string[4];
Dictionary<string[], int> dict = new Dictionary<string[], int>();
int ctr = -1;
if (linectr == 1)
{
ctr++;
dict.Add(rowdta, ctr);
}
else
{
foreach (KeyValuePair<string[], int> pair in dict)
{
if ((pair.Key[1] != rowdta[1]) || (pair.Key[2] != rowdta[2])| (pair.Key[3] != rowdta[3]))
{
ctr++;
dict.Add(rowdta, ctr);
}
}
}
Some sample data:
First line
rowdta[0]="ErrorType";
rowdta[1]="Undefined offset: 0";
rowdta[2]="/url/routesDisplay2.svc.php";
rowdta[3]="Line Number 5";
2nd line
rowdta[0]="ErrorType";
rowdta[1]="Undefined offset: 0";
rowdta[2]="/url/routesDisplay2.svc.php";
rowdta[3]="Line Number 5";
3rd line
rowdta[0]="ErrorType";
rowdta[1]="Undefined variable: fvmsg";
rowdta[2]="/url/processes.svc.php";
rowdta[3]="Line Number 787";
So, with this, the Dictionary will have 2 items in it, first line and 3rd line.
I have also tried this with the following which nalso does not find any variations in the log file text.
if (!dict.ContainsKey(rowdta)) {}
Can someone please help me get this syntax right? I am just a newbie at C# but this should be relatively straightforward. As always, I am thinking that this should be enough information to get the conversation started. If you want/need more detail, please let me know.

Either create a wrapper for your strings which implements IEquatable.
public class LogFileEntry :IEquatable<LogFileEntry>
{
private readonly string[] _rows;
public LogFileEntry(string[] rows)
{
_rows = rows;
}
public override int GetHashCode()
{
return
_rows[0].GetHashCode() << 3 |
_rows[2].GetHashCode() << 2 |
_rows[1].GetHashCode() << 1 |
_rows[0].GetHashCode();
}
#region Implementation of IEquatable<LogFileEntry>
public override bool Equals(Object obj)
{
if (obj == null)
return base.Equals(obj);
return Equals(obj as LogFileEntry);
}
public bool Equals(LogFileEntry other)
{
if(other == null)
return false;
return _rows.SequenceEqual(other._rows);
}
#endregion
}
Then use that in your dictionary:
var d = new Dictionary<LogFileEntry, int>();
var entry = new LogFileEntry(rows);
if( d.ContainsKey(entry) )
{
d[entry] ++;
}
else
{
d[entry] = 1;
}
Or create a custom comparer similar to that proposed by #dasblinkenlight and use as follows
public class LogFileEntry
{
}
public class LogFileEntryComparer : IEqualityComparer<LogFileEntry>{ ... }
var d = new Dictionary<LogFileEntry, int>(new LogFileEntryComparer());
var entry = new LogFileEntry(rows);
if( d.ContainsKey(entry) )
{
d[entry] ++;
}
else
{
d[entry] = 1;
}

The reason that you see the problem is that an array of strings cannot be used as a key in a dictionary without supplying a custom IEqualityComparer<string[]> or writing a wrapper around it.
EDIT Here is a quick and dirty implementation of a custom comparer:
private class ArrayEq<T> : IEqualityComparer<T[]> {
public bool Equals(T[] x, T[] y) {
return x.SequenceEqual(y);
}
public int GetHashCode(T[] obj) {
return obj.Sum(o => o.GetHashCode());
}
}
Here is how you can use it:
var dd = new Dictionary<string[], int>(new ArrayEq<string>());
dd[new[] { "a", "b" }] = 0;
dd[new[] { "a", "b" }]++;
dd[new[] { "a", "b" }]++;
Console.WriteLine(dd[new[] { "a", "b" }]);

The problem is that array equality is reference equality. In other words, it does not depend on the values stored in the array, it depends only on the identity of the array.
Some solutions
use Tuple to hold the row data
use an anonymous type to hold the row data
create a custom type to hold the row data, and, if it is a class, override Equals and GetHashCode.
create a custom implementation of IEqualityComparer to compare the arrays according to their values, and pass that to the dictionary when you create it.

Related

C# has no SortedList<T>?

I'm trying to solve a problem in which it would be useful to have a data structure like
var list = new SortedList<int>();
list.Add(3); // list = { 3 }
list.Add(1); // list = { 1, 3 }
list.Add(2); // list = { 1, 2, 3 }
int median = list[list.Length / 2];
i.e.
O(n) insertion
O(1) lookup by index
but I can't see that such a thing exists? I see that there's some confusing SortedList<T,U> and then an interface SortedList, but neither of those are what I'm looking for.
The sorted list in the .NET framework is an associative list (that is it is for key/value pairs). You can use a regular List<T> if you use its binary search functionality, which works if you keep the list sorted at all times. You can encapsulate it in an extension method:
static class SortedListExtensions {
public static void SortedAdd<T>(this List<T> list, T value) {
int insertIndex = list.BinarySearch(value);
if (value < 0) {
value = ~value;
}
list.Insert(insertIndex, value);
}
//Added bonus: a faster Contains method
public static bool SortedContains<T>(this List<T> list, T value) {
return list.BinarySearch(value) >= 0;
}
}
List<int> values = new List<int>();
values.SortedAdd(3);
values.SortedAdd(1);
values.SortedAdd(2);

Compare values of objects in two lists

I have two ObservableCollection<Model>, each model has 10 properties and there are approximately 30 objects within both collections in the beggining. They basicaly work like this: initialy there are saved the same objects in both OCs, where one is the original and the other one is where changes are happening. Basicaly I would need the first one just to see if changes have been made to compare the values. So far I have come up with
list1.SequenceEquals(list2);
but this only works if i add a new object, it does not recognize changes in the actual properties. Is there a fast way this could be done or I need to do foreach for every object and compare individual properties one by one? Because there may be more than 30 objects to compare values. Thanks.
Is there a fast way this could be done or I need to do foreach for every object and compare individual properties one by one?
If by "fast" you mean "performant" then comparing property-by property is probably the fastest way. If by "fast" you mean "less code to write" then you could use reflection to loop through the properties and compare the values of each item.
Note that you'll probably spend more time researching, writing, and debugging the reflection algorithm that you would just hand-coding the property comparisons.
A simple way to use the built-in Linq methods would be do define an IEqualityComparer<Model> that defines equality of two Model objects:
class ModelEqualityComparer : IEqualityComparer<Model>
{
public bool Equals(Model m1, Model m2)
{
if(m1 == null || 2. == null)
return false;
if (m1.Prop1 == m2.Prop1
&& m1.Prop2 == m2.Prop2
&& m1.Prop3 == m2.Prop3
...
)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(Model m)
{
int hCode = m.Prop1.GetHashCode();
hCode = hCode * 23 + ^ m.Prop2.GetHashCode();
hCode = hCode * 23 + ^ m.Prop32.GetHashCode();
...
return hCode;
}
}
I think you can compare them defining a custom IEqualityComparer<T>, and using the overload of IEnumerable.SequenceEqualsthat supports a custom comparer: Enumerable.SequenceEqual<TSource> Method (IEnumerable<TSource>, IEnumerable<TSource>, IEqualityComparer<TSource>)more info about it here: http://msdn.microsoft.com/it-it/library/bb342073(v=vs.110).aspx
I'll post here an usage example from that page in case it goes missing:
Here is how to define a IEqualityComparer<T>
public class Product
{
public string Name { get; set; }
public int Code { get; set; }
}
// Custom comparer for the Product class
class ProductComparer : IEqualityComparer<Product>
{
// Products are equal if their names and product numbers are equal.
public bool Equals(Product x, Product y)
{
//Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
//Check whether the products' properties are equal.
return x.Code == y.Code && x.Name == y.Name;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(Product product)
{
//Check whether the object is null
if (Object.ReferenceEquals(product, null)) return 0;
//Get hash code for the Name field if it is not null.
int hashProductName = product.Name == null ? 0 : product.Name.GetHashCode();
//Get hash code for the Code field.
int hashProductCode = product.Code.GetHashCode();
//Calculate the hash code for the product.
return hashProductName ^ hashProductCode;
}
}
Here's how to use it:
Product[] storeA = { new Product { Name = "apple", Code = 9 },
new Product { Name = "orange", Code = 4 } };
Product[] storeB = { new Product { Name = "apple", Code = 9 },
new Product { Name = "orange", Code = 4 } };
bool equalAB = storeA.SequenceEqual(storeB, new ProductComparer());
Console.WriteLine("Equal? " + equalAB);
/*
This code produces the following output:
Equal? True
*/

Only Add Unique Item To List

I'm adding remote devices to a list as they announce themselves across the network. I only want to add the device to the list if it hasn't previously been added.
The announcements are coming across an async socket listener so the code to add a device can be run on multiple threads. I'm not sure what I'm doing wrong but no mater what I try I end up with duplications. Here is what I currently have.....
lock (_remoteDevicesLock)
{
RemoteDevice rDevice = (from d in _remoteDevices
where d.UUID.Trim().Equals(notifyMessage.UUID.Trim(), StringComparison.OrdinalIgnoreCase)
select d).FirstOrDefault();
if (rDevice != null)
{
//Update Device.....
}
else
{
//Create A New Remote Device
rDevice = new RemoteDevice(notifyMessage.UUID);
_remoteDevices.Add(rDevice);
}
}
If your requirements are to have no duplicates, you should be using a HashSet.
HashSet.Add will return false when the item already exists (if that even matters to you).
You can use the constructor that #pstrjds links to below (or here) to define the equality operator or you'll need to implement the equality methods in RemoteDevice (GetHashCode & Equals).
//HashSet allows only the unique values to the list
HashSet<int> uniqueList = new HashSet<int>();
var a = uniqueList.Add(1);
var b = uniqueList.Add(2);
var c = uniqueList.Add(3);
var d = uniqueList.Add(2); // should not be added to the list but will not crash the app
//Dictionary allows only the unique Keys to the list, Values can be repeated
Dictionary<int, string> dict = new Dictionary<int, string>();
dict.Add(1,"Happy");
dict.Add(2, "Smile");
dict.Add(3, "Happy");
dict.Add(2, "Sad"); // should be failed // Run time error "An item with the same key has already been added." App will crash
//Dictionary allows only the unique Keys to the list, Values can be repeated
Dictionary<string, int> dictRev = new Dictionary<string, int>();
dictRev.Add("Happy", 1);
dictRev.Add("Smile", 2);
dictRev.Add("Happy", 3); // should be failed // Run time error "An item with the same key has already been added." App will crash
dictRev.Add("Sad", 2);
Just like the accepted answer says a HashSet doesn't have an order. If order is important you can continue to use a List and check if it contains the item before you add it.
if (_remoteDevices.Contains(rDevice))
_remoteDevices.Add(rDevice);
Performing List.Contains() on a custom class/object requires implementing IEquatable<T> on the custom class or overriding the Equals. It's a good idea to also implement GetHashCode in the class as well. This is per the documentation at https://msdn.microsoft.com/en-us/library/ms224763.aspx
public class RemoteDevice: IEquatable<RemoteDevice>
{
private readonly int id;
public RemoteDevice(int uuid)
{
id = id
}
public int GetId
{
get { return id; }
}
// ...
public bool Equals(RemoteDevice other)
{
if (this.GetId == other.GetId)
return true;
else
return false;
}
public override int GetHashCode()
{
return id;
}
}

Maintaining proper order of a collection of ordinals

I have a simple domain object:
class FavoriteFood
{
public string Name;
public int Ordinal;
}
I want to have a collection of this domain object that maintains the correct ordinal. For example, given 4 favorite foods:
Name: Banana, Ordinal: 1
Name: Orange, Ordinal: 2
Name: Pear, Ordinal: 3
Name: Watermelon, Ordinal: 4
If I change Pear's ordinal to 4 it should shift Watermelon's ordinal down to 3.
If I add a new favorite food (Strawberry) with ordinal 3 it should shift Pear up to 4 and Watermelon up to 5.
If I change Pear's ordinal to 2 it should shift Orange up to 3.
If I change Watermelon's ordinal to 1, Banana would bump up to 2, Orange would bump up to 3, and Pear would bump up to 4.
What's the best way to accomplish this?
UPDATE: The name property of the domain object is dynamic and based on user input. The object has to have this Ordinal property because a user can change the order in which their favorite foods are displayed. This ordinal value is saved in a database and when populating the structure I cannot guarantee the items are added in order of their ordinals.
The trouble I am running into is when the underlying domain object is changed, there isn't a good way of updating the rest of the items in the list. For example:
var favoriteFoods = new List<FavoriteFood>();
var banana = new FavoriteFood { Name = "Banana", Ordinal = 1};
favoriteFoods.Add(banana);
favoriteFoods.Add(new FavoriteFood { Name = "Orange", Ordinal = 2 });
banana.Ordinal = 2;
// at this point both Banana and Orange have the same ordinal in the list. How can we make sure that Orange's ordinal gets updated too?
So far I have tried doing the following which works :
class FavoriteFood : INotifyPropertyChanging
{
public string Name;
public int Ordinal
{
get { return this.ordinal; }
set
{
var oldValue = this.ordinal;
if (oldValue != value && this.PropertyChanging != null)
{
this.PropertyChanging(new FavoriteFoodChangingObject { NewOrdinal = value, OldOrdinal = oldValue }, new PropertyChangingEventArgs("Ordinal"));
}
this.ordinal = value;
}
}
internal struct FavoriteFoodChangingObject
{
internal int NewOrdinal;
internal int OldOrdinal;
}
// THIS IS A TEMPORARY WORKAROUND
internal int ordinal;
public event PropertyChangingEventHandler PropertyChanging;
}
public class FavoriteFoodCollection : IEnumerable<FavoriteFood>
{
private class FavoriteFoodOrdinalComparer : IComparer<FavoriteFood>
{
public int Compare(FavoriteFood x, FavoriteFood y)
{
return x.Ordinal.CompareTo(y.Ordinal);
}
}
private readonly SortedSet<FavoriteFood> underlyingList = new SortedSet<FavoriteFood>(new FavoriteFoodOrdinalComparer());
public IEnumerator<FavoriteFood> GetEnumerator()
{
return this.underlyingList.GetEnumerator();
}
public void AddRange(IEnumerable<FavoriteFood> items)
{
foreach (var i in items)
{
this.underlyingList.Add(i);
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
private void UpdateOrdinalsDueToRemoving(FavoriteFood item)
{
foreach (var i in this.underlyingList.Where(x => x.Ordinal > item.Ordinal))
{
i.ordinal--;
}
}
public void Remove(FavoriteFood item)
{
this.underlyingList.Remove(item);
this.UpdateOrdinalsDueToRemoving(item);
}
public void Add(FavoriteFood item)
{
this.UpdateOrdinalsDueToAdding(item);
this.underlyingList.Add(item);
item.PropertyChanging += this.item_PropertyChanging;
}
private void item_PropertyChanging(object sender, PropertyChangingEventArgs e)
{
if (e.PropertyName.Equals("Ordinal"))
{
var ordinalsChanging = (FavoriteFood.FavoriteFoodChangingObject)sender;
this.UpdateOrdinalsDueToEditing(ordinalsChanging.NewOrdinal, ordinalsChanging.OldOrdinal);
}
}
private void UpdateOrdinalsDueToEditing(int newOrdinal, int oldOrdinal)
{
if (newOrdinal > oldOrdinal)
{
foreach (var i in this.underlyingList.Where(x => x.Ordinal <= newOrdinal && x.Ordinal > oldOrdinal))
{
//i.Ordinal = i.Ordinal - 1;
i.ordinal--;
}
}
else if (newOrdinal < oldOrdinal)
{
foreach (var i in this.underlyingList.Where(x => x.Ordinal >= newOrdinal && x.Ordinal < oldOrdinal))
{
//i.Ordinal = i.Ordinal + 1;
i.ordinal++;
}
}
}
private void UpdateOrdinalsDueToAdding(FavoriteFood item)
{
foreach (var i in this.underlyingList.Where(x => x.Ordinal >= item.Ordinal))
{
i.ordinal++;
}
}
}
This works alright, but the use of the internal Ordinal field is a strange workaround. It's needed so that the PropertyChangingEvent wont be infinitely raised.
Just use a List<string>:
List<string> foods = new List<string> { "Banana", "Orange", "Pear" };
int ordinalOfOrange = foods.IndexOf("Orange");
It's not a good idea to 'store' that ordinal if it has to change the way you describe.
Sounds like you want a SortedList. Add each item using it's Ordinal as the key.
I'd do something like the following:
public class FavoriteFoods
{
StringComparer comparer ;
List<string> list ;
public FavoriteFoods()
{
this.list = new List<string>() ;
this.comparer = StringComparer.InvariantCultureIgnoreCase ;
return ;
}
public void Add( string food , int rank )
{
if ( this.list.Contains(food,comparer ) ) throw new ArgumentException("food") ;
this.list.Insert(rank,food) ;
return ;
}
public void Remove( string food )
{
this.list.Remove( food ) ;
return ;
}
public void ChangeRank( string food , int newRank )
{
int currentRank = this.list.IndexOf(food) ;
if ( currentRank < 0 ) throw new ArgumentOutOfRangeException("food") ;
if ( newRank < 0 ) throw new ArgumentOutOfRangeException("newRank") ;
if ( newRank >= this.list.Count ) throw new ArgumentOutOfRangeException("newRank") ;
if ( newRank != currentRank )
{
this.Remove(food) ;
this.Add( food , newRank ) ;
}
return ;
}
public int GetRank( string food )
{
int rank = this.list.IndexOf(food) ;
if ( rank < 0 ) throw new ArgumentOutOfRangeException("food");
return rank ;
}
public IEnumerable<string> InRankOrder()
{
foreach ( string food in this.list )
{
yield return food ;
}
}
}
Let me restate your problem.
You have a collection of strings. You have a collection of ordinals.
You want to be able to quickly look up the ordinal of a string. And the string of an ordinal. You'd also like to be able to insert a string with a given ordinal. And change the ordinal of a string.
There are two ways to go. The first, simple, approach is to store a collection of the strings in order, with knowledge of their ordinal. You can scan the list in time O(n). You can also lookup, insert, move, and delete in time O(n) each. If you don't actually care about performance then I would strongly suggest going this way.
If you do care about performance, then you'll need to build a custom data structure. The simplest idea is to have two trees. One tree stores the strings in alphabetical order, and tells you where in the other tree the string is. The other tree stores the strings in order of the ordinals, and stores counts of how much stuff is in various subtrees.
Now here are your basic operations.
Insert. Insert in the second tree at the correct position (if you choose to move anything else in the process, updating those things in the first tree), then insert the string in the first tree.
Lookup by string. Search the first tree, find where it is in the second tree, walk back in the second tree to find its ordinal.
Lookup by ordinal. Search the second tree, find the string.
Delete. Delete from both trees.
Move ordinal. Remove from the second tree in the old position. Insert into the second tree in the new position. Update all appropriate nodes in the first tree.
For the simple version you can just use trees. If you want to get fancy, you can look up B-Trees, Red-Black trees and other types of self-balancing trees, then pick one of those.
If you program this correctly you can guarantee that all operations take time O(log(n)). However there will be a lot of constant overhead, and for small collections the effort to be clever may be a loss relative to the simple approach.

Removing duplicates from a list with "priority"

Given a collection of records like this:
string ID1;
string ID2;
string Data1;
string Data2;
// :
string DataN
Initially Data1..N are null, and can pretty much be ignored for this question. ID1 & ID2 both uniquely identify the record. All records will have an ID2; some will also have an ID1. Given an ID2, there is a (time-consuming) method to get it's corresponding ID1. Given an ID1, there is a (time-consuming) method to get Data1..N for the record. Our ultimate goal is to fill in Data1..N for all records as quickly as possible.
Our immediate goal is to (as quickly as possible) eliminate all duplicates in the list, keeping the one with more information.
For example, if Rec1 == {ID1="ABC", ID2="XYZ"}, and Rec2 = {ID1=null, ID2="XYZ"}, then these are duplicates, --- BUT we must specifically remove Rec2 and keep Rec1.
That last requirement eliminates the standard ways of removing Dups (e.g. HashSet), as they consider both sides of the "duplicate" to be interchangeable.
How about you split your original list into 3 - ones with all data, ones with ID1, and ones with just ID2.
Then do:
var unique = allData.Concat(id1Data.Except(allData))
.Concat(id2Data.Except(id1Data).Except(allData));
having defined equality just on the basis of ID2.
I suspect there are more efficient ways of expressing that, but the fundamental idea is sound as far as I can tell. Splitting the initial list into three is simply a matter of using GroupBy (and then calling ToList on each group to avoid repeated queries).
EDIT: Potentially nicer idea: split the data up as before, then do:
var result = new HashSet<...>(allData);
result.UnionWith(id1Data);
result.UnionWith(id2Data);
I believe that UnionWith keeps the existing elements rather than overwriting them with new but equal ones. On the other hand, that's not explicitly specified. It would be nice for it to be well-defined...
(Again, either make your type implement equality based on ID2, or create the hash set using an equality comparer which does so.)
This may smell quite a bit, but I think a LINQ-distinct will still work for you if you ensure the two compared objects come out to be the same. The following comparer would do this:
private class Comp : IEqualityComparer<Item>
{
public bool Equals(Item x, Item y)
{
var equalityOfB = x.ID2 == y.ID2;
if (x.ID1 == y.ID1 && equalityOfB)
return true;
if (x.ID1 == null && equalityOfB)
{
x.ID1 = y.ID1;
return true;
}
if (y.ID1 == null && equalityOfB)
{
y.ID1 = x.ID1;
return true;
}
return false;
}
public int GetHashCode(Item obj)
{
return obj.ID2.GetHashCode();
}
}
Then you could use it on a list as such...
var l = new[] {
new Item { ID1 = "a", ID2 = "b" },
new Item { ID1 = null, ID2 = "b" } };
var l2 = l.Distinct(new Comp()).ToArray();
I had a similar issue a couple of months ago.
Try something like this...
public static List<T> RemoveDuplicateSections<T>(List<T> sections) where T:INamedObject
{
Dictionary<string, int> uniqueStore = new Dictionary<string, int>();
List<T> finalList = new List<T>();
int i = 0;
foreach (T currValue in sections)
{
if (!uniqueStore.ContainsKey(currValue.Name))
{
uniqueStore.Add(currValue.Name, 0);
finalList.Add(sections[i]);
}
i++;
}
return finalList;
}
records.GroupBy(r => r, new RecordByIDsEqualityComparer())
.Select(g => g.OrderByDescending(r => r, new RecordByFullnessComparer()).First())
or if you want to merge the records, then Aggregate instead of OrderByDescending/First.

Categories