getting unstructured CSV-madness into order: does this make sense? - c#

I have logfiles in the file-format of csv.
The vendor of the app it comes from wants to stick with it.
MY goal is to build a logreader that does some highlighting and filtering.
The CSV looks ugly as hell, because it is not at all tidy, even if you put some work into it in Excel. A sample of that structure is attached.
Now, i've been pondering about the solution of this a while and came up with a kinda complex solution. But i don't like that solution. It feels too "special"
But Let's look at what we have:
- The columns and their order are defined by a customizable view in the app.
So sometimes e.g. the date-column is first, could also be last, could be missing even. The output is not in the correct order. Sometimes the DAte-coulmn contains the Details-Text. Sometimes a MAC. It's madness.
Edit: Here is how it can look when it's raw data:
http://s000.tinyupload.com/?file_id=79649596476923658435
So what i came up with is this:
- read line1 of your csv, then you know the columns of that file.
- read a config-file that defines "detection-rules" for Columns with set names. So for instance you have a filter for a Colunm for Mac-addresses. One for IP-addresses and vice versa. The Columnns themself are predefined so this is at least possible.
so you read the CSV-line2 and spilt that string to an array, loop through and check for each string if that string matches one of your filters.
Then - if you have a match - you loop through the array you created with the Column-titles (you read line1 of the CSV)
The looping-instance increments a number up.
Then you have another array with the same size as the array containing the titles. You put the found value in there (the incremented number represents the index you use in the new array)
This all probably sounds hard to understand so i wrote a bit of code to illustrate how i meant it.
Am i missing something or is this the way to go with what i have ?
Public Class Form1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim Line1csv() As String = {"Date", "MAC"}
Dim csvdata As String() = System.IO.File.ReadAllLines("tst.csv")
For Each csvline In csvdata
'lets split the line into pot. cell-values
Dim Csvlinestr() As String = csvline.Split(",")
'now lets loop through what we created and find out what it is
For Each csvfielstr In Csvlinestr
'ok lets ask what this field is
Dim Csvlinetype As String = AskmeWhatitis(csvfielstr)
'good lets say it told us that the result is title2
'so where do we put it ? lets loop through the line1csv to find out the indx
Dim found As Boolean = False
Dim Indx As Integer = 0
For Each l1csvstr In Line1csv
If l1csvstr = Csvlinetype Then
found = True
End If
Indx += 1
Next
If found = True Then
' here we'd put it into our "sorted array that has the same size as the line1csv-array()
' the index used is sthe indx-variable we used above
End If
Next
Next
End Sub
Public Function AskmeWhatitis(ByVal instr As String) As String
' this sub has a config-file that tells it what to search how and what the match will mean: does it mean its a MAC or date or whaterver
Return "truth"
End Function
End Class

Related

How can I use C# to quickly write an array of "transactions" to Excel?

I have an array of "transactions". My array looks like:
1, 12/15/18, 125.00, "phone bill"
2, 01/16/19, 37.25, "supplies"
3, 02/28/19, -50.00, "refund"
Write Array to Excel Range has good sample code that uses the line
[Excel]range.Value = arr
This works great if all my data types are the same. But they are not.
In Visual Basic (VB), it is easy to create an array of variants, and use the VB command
Worksheet.Range(UpperLeftCell).Resize(NumOfRangeRows, NumOfRangeColumns).Value = myArray
But I cannot find an equivalent in C#. Also I don't see how it is possible to create an array of variants in C#.
The best solution I can come up with is to split my array of transactions into 4 arrays: 1 for an integer, 1 for the date, 1 for the amount, and 1 for the description.
I could easily write each value individually to an Excel call, one at a time, but this is actually quite slow (I have thousands of transactions).
I am hoping someone can suggest a better way.
I have tried the code
Excel.Range range = arrayOfTrans;
but I get
System.NotSupportedException: 'Operation is not supported. (0x80131515)'

How to compare an array loaded from file with another array loaded from another file c#

I have to do a program in C# Form, which has to load from a file which looks something like that:
100ACTGGCTTACACTAATCAAG
101TTAAGGCACAGAAGTTTCCA
102ATGGTATAAACCAGAAGTCT
...
120GCATCAGTACGTACCCGTAC
20 lines formed with a number (ID) and 20 letters (ADN); the other file looks like that:
TGCAACGTGTACTATGGACC
In few words, this is a game where a murder is done, there are 20 people; i have to load and split the letters and.. i have to compare them and in the end i have to find the best match.
I have no idea how to do that, I don't know how to load the letters in the array and then to split them.. and then to compare them.
What you want to do here, is use something like a calculation of the Levenshtein distance between the strings.
In simple terms, that provides a count of how many single letters you have to change for a string to become equal to another. In the context of DNA or Proteins, this can be interpreted as representing the number of mutations between two individuals or samples. A shorter distance will therefore indicate a closer relationship between the two.
The algorithm can be fairly heavy computationally, but will give you a good answer. It's also quite fun and enlightening to implement. You can find a couple of ways of implementing it under the wikipedia article.
If you find it challenging to understand how it works, I recommend you set up an example grid by hand, with one short string horizontally along the top, and one vertically along the left side, and try going through the calculations manually, just to understand the concept properly (it can be confusing at first, but is really not that difficult).
This is a simple match function. It might not be of the complexity your game requires. This solution does not require an explicit split on the strings in order to get an array of DNA "letters". The DNA is compared in place.
Compare each "suspect" entry to the "evidence one.
int idLength = 3;
string evidence = //read from file
List<string> suspects = //read from file
List<double> matchScores = new List<double>();
foreach (string suspect in suspects)
{
int count = 0;
for (int i = idLength; i < suspect.Length; i++)
{
if (suspect[i + idLength] == evidence[i]) count++;
}
matchScores.Add(count * 100 / evidence.Length);
}
The matchScores list now contains all the individual match scores. I did not save the maximum match score in a separate variable as there can be several "suspects" with the same score. To find out which subject has the best match, just iterate the matchScores list. The index of the best match is the index of the suspect in the suspects list.
Optimization notes:
you could check each "suspect" string to see where (i.e. at what index does) the DNA sequence starts, as it could be variable;
a dictionary could be used here, instead of two lists, with the "suspect string" as key and the match score as value

How to generate a File

I want to generate a .txt file when user clicks a button. The issue is I am not sure how to do it? I would like to have it so it uses this format:
00000012 <-- at the very start of the text file
2011 11 29 <-- Year 2011, Month 11, Day 29 (the year month and day based on my PC)
0010 or 0054 <-- this is a random number randomly one of these 2 numbers...
123456 <-- Time, Hour 12, Minutes 34, Seconds 56.
so when I open the text file it should be something like this:
00000012201111290054123456
I am new to C#. I was able to accomplish this on visual basic with this:
Public Class Form1
'declare new random object
Dim r As New Random
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
'declare string + set initial value
Dim fileString As String = "00000012"
'add formatted date
fileString &= Now.ToString("yyyyMMdd")
'add random 0010 or 0054
fileString &= New String() {"0010", "0054"}(r.Next(0, 2))
'add random 6 digit number
fileString &= r.Next(0, 1000000).ToString("000000")
'write file
IO.File.WriteAllText("C:\Program Files\Code\maincode\maincode.txt", fileString)
End Sub
End Class
I have decided to change the last random 6 generation to the time instead. How to do it in C#?
The easiest way to solve your problem would be to use String.Format, which works pretty much the same way in VB.NET as it does in C#.
Before getting to that, though, it's worth addressing some issues with your VB.NET code; fixing these will make your C# code a lot better as well:
Your Random instance, r, is declared at the form (module) scope, but you only use it inside your Form1_Load procedure: it's a good idea to always keep your variable scope as limited as possible, in other words keep your declarations "as close to" the usage as you can;
While on the subject of System.Random: do keep in mind that this actually creates a quite predictable (time-based) series of numbers, so it shouldn't be used for anything security-related (that's what System.Security.Cryptography.RNGCryptoServiceProvider is for). Your usage of Random isn't very strong anyway, and I'm not sure of your actual use case, but this is always something to keep in mind;
You use comments that almost literally describe what your code does, which is rather pointless: only add comments to add additional insight into why your code does the things it does, not how;
Strings in .NET are immutable, which means that each time you use the & operator, you create a new string, leaving the old one to linger around until garbage collection kicks in. That gets really expensive after a while: instead, learn about Text.StringBuilder and use it whenever needed;
You create a file under C:\Program Files. Apart from the fact that that directory may have a different name on different versions of Windows (not to mention Linux, in case you're running on Mono), users don't have permission to write there on any non-legacy (i.e. post-XP) versions of Windows. So, if you do this, your program will crash when distributed: if you learn to use the proper file locations early on, that will save you lots of trouble.
Anyway, on to your question: whenever you want to create a string with lots of parameters and/or formatting, String.Format and its custom format strings comes in extremely handy. For example, your original VB.NET code can be rewritten as follows:
Sub Form1_Load(s As Object, e As EventArgs)
Using sw As New IO.StreamWriter(IO.Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData),
"maincode.txt"))
sw.WriteLine("00000012{0:yyyyMMdd}{1}{0:HHmmss}", Now,
If((New Random).Next(0, 2) = 0, "0010", "0054"))
End Using
End Sub
When executed, this should create C:\Users\YourUserName\AppData\Roaming\maincode.txt (or somewhere else if you're on XP or certain localized versions of Windows: check the value of Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData) in the debugger to find out).
Translating this to C# is trivial: if you use one of the many VB.NET to C# translators available online, it should give you something like this:
public void Form1_Load(object s, EventArgs e)
{
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData), "maincode.txt"))) {
sw.WriteLine("00000012{0:yyyyMMdd}{1}{0:HHmmss}", DateAndTime.Now, (new Random()).Next(0, 2) == 0 ? "0010" : "0054");
}
}
Basically, it added some semicolons and curly braces for you, included fully-qualified namespaces and rewrote the ternary condition: the .NET Framework bits are identical.
Hope this helps, and good luck with your programming efforts!
Start with creating Text files with C# then go through this tabel or this one and rewrite your code in C# to build the information as described in your question.

C# / VisualStudio: Sorting attributes for consistency - any hints?

I have a bit of a silly issue:
I have a large number of unit tests which all have method attributes like this:
[TestMethod]
[Owner("me")]
[Description("It tests something.")]
[TestProperty(TC.Name, "Some Test")]
[TestProperty(TC.Requirement, "req203")]
[TestProperty(TC.Reviewer, "someguy")]
[TestProperty(TC.Environment, "MSTest")]
[TestProperty(TC.CreationDate, "24.01.2012")]
[TestProperty(TC.InternalTcId, "{9221A494-2B31-479D-ADE6-D4773C2A9B08}")]
public void TestSomething()
{ ... }
(If you're wondering: these attributes are used for automated testing and requirement coverage stuff.. )
Now, unfortunately these attributes are in a different order at most test methods - which makes it a bit messy to review and such. So I'm looking for a way to order them..
Would you know any other way than rearranging them manually?
(I thought about writing some VS plugin or so) - I'm just wondering whether I'm really the first person with that wish.
Open up the Macro Explorer - and paste this code into a module (It's straight from my own little collection of macros):
Sub Sort()
Dim selection As EnvDTE.TextSelection = DTE.ActiveDocument.Selection
If selection Is Nothing Or String.IsNullOrWhiteSpace(selection.Text) Then
Exit Sub
End If
Dim lines As String() = selection.Text.Split(vbCrLf.ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
If lines.Length <= 1 Then Exit Sub
lines = lines.OrderBy(Function(s As String) s, StringComparer.CurrentCulture).ToArray()
DTE.UndoContext.Open("Sort Lines")
selection.Insert(String.Join(vbCrLf, lines))
selection.SmartFormat()
DTE.UndoContext.Close()
DTE.StatusBar.Text = "Sort Lines complete"
selection.SmartFormat()
End Sub
(just edited it as the try/end try wasn't really right - so I've taken it out)
Now you can bind a shortcut to this macro in VS - it uses a Linq OrderBy using the current culture's string comparer to sort the lines of the currently selected block of text. It should therefore group the attributes together accordingly.
If you need something that context-sensitive (i.e. the same attribute being called with different numbers of parameters) - then you'll need to do considerably more work.
You are the first person with that wish :)
I would arrange them manually, but also, if you are looking for a more solid solution, then I would implement a property in the TestPropertyAttribute class int Index { get; set; } and set the order in which I want them processed. In that case, you can control which how attributes are read in the reflection code that reads them.
This is how NHibernate does it.
[TestProperty(TC.Name, "Some Test", 0)]
[TestProperty(TC.Requirement, "req203", 1)]

How can i get the between cell addresses

I have a function which accepts fromRange and ToRange of an Excel cell. basically i want to read cell by cell values from the range.
suppose if i pass E2 and E9 i want to read in a loop something like Range(E2).value, Range(E3).value and so on till E9
How can i get the between cell addresses. Please help
Option Explicit
Private Sub calculateRangeOneByOne()
Dim rangeIterator As Range
Dim rangeToIterate As Range
Dim sum As Double
Set rangeToIterate = Range("A8", "E8")
sum = 0#
For Each rangeIterator In rangeToIterate
sum = sum + rangeIterator
Next
End Sub
You usually does not want to iterate over ranges one-by-one. there are tons of functions which work on ranges and so this example is definitly a poor one. You'd better use e.g Sum here but just to give you an idea. A range is a collection and you can iteratee over it with for each, You can also use for with index access. But this is at least a bit less "pain"

Categories