Using split() method without text qualifier - c#

I'm trying to get some field value from a text file using a streamReader.
To read my custom value, I'm using split() method. My separator is a colon ':' and my text format looks like:
Title: Mytitle
Manager: Him
Thema: Free
.....
Main Idea: best idea ever
.....
My problem is, when I try to get the first field, which is title, I use:
string title= text.Split(:)[1];
I get title = MyTitle Manager
instead of just: title= MyTitle.
Any suggestions would be nice.
My text looks like this:
My mail : ........................text............
Manager mail : ..................text.............
Entity :.......................text................
Project Title :...............text.................
Principal idea :...................................
Scope of the idea : .........text...................
........................text...........................
Description and detail :................text.......
..................text.....
Cost estimation :..........
........................text...........................
........................text...........................
........................text...........................
Advantage for us :.................................
.......................................................
Direct Manager IM :................................

Updated per your post
//I would create a class to use if you haven't
//Just cleaner and easier to read
public class Entry
{
public string MyMail { get; set; }
public string ManagerMail { get; set; }
public string Entity { get; set; }
public string ProjectTitle { get; set; }
// ......etc
}
//in case your format location ever changes only change the index value here
public enum EntryLocation
{
MyMail = 0,
ManagerMail = 1,
Entity = 2,
ProjectTitle = 3
}
//return the entry
private Entry ReadEntry()
{
string s =
string.Format("My mail: test#test.com{0}Manager mail: test2#test2.com{0}Entity: test entity{0}Project Title: test project title", Environment.NewLine);
//in case you change your delimiter only need to change it once here
char delimiter = ':';
//your entry contains newline so lets split on that first
string[] split = s.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
//populate the entry
Entry entry = new Entry()
{
//use the enum makes it cleaner to read what value you are pulling
MyMail = split[(int)EntryLocation.MyMail].Split(delimiter)[1].Trim(),
ManagerMail = split[(int)EntryLocation.ManagerMail].Split(delimiter)[1].Trim(),
Entity = split[(int)EntryLocation.Entity].Split(delimiter)[1].Trim(),
ProjectTitle = split[(int)EntryLocation.ProjectTitle].Split(delimiter)[1].Trim()
};
return entry;
}

That is because split returns strings delimited by the sign you've specified. In your case:
Title
Mytitle Manager
Him
.1. You can change your data format to get the value you need, for example:
Title: Mytitle:Manager: Him
There each second element will be the value.
text.Split(:)[1] == " Mytitle";
text.Split(:)[3] == " Him";
.2. Or you can call text.Split(' ', ':') to get identical list of name-value pairs without format change.
.3. Also if your data is placed each on a new line in the file like:
Title: Mytitle
Manager: Him
And you content is streamed into single string then you can also do:
text.Split(new string[] {Environment.NewLine, ":"}, StringSplitOptions.None);

Related

C# reading a string from text file, splitting and putting into tabstring

I got stuck writing some simple program which writes some data to the text file and reads them form this file later.
I have a function that writes lines to a txt file; each line contains Name, Surname, and Idnumber.
And below I have a function that reads the data from that file.
I want to separate Name, Surname and Idnumber so below code seems to be correct but during debugging I got a message "An unhandled exception of type 'System.NullReferenceException' occurred" for this line:
string[] tabstring = myString.Split(' ', ' ');.
I created the tab string which contains 3 elements - each for each word in the line i.e. tabstring[0]=Name and so on.
The while loop is to do it for each line in the text file. But something is wrong.
public void ReadFromFile()
{
FileStream fsListOfObjects = new FileStream("C:\\Users\\Dom\\Desktop\\ListOfObjects.txt",
FileMode.Open);
StreamReader srListOfObjects = new StreamReader(fsListOfObjects);
while (srListOfObjects.ReadLine() != null)
{
string myString= (srListOfObjects.ReadLine();
Console.WriteLine(myString);
**string[] tabstring = myString.Split(' ', ' ');**
Name = tabstring[0];
Surname = tabstring[1];
Id= long.Parse(tabstring[2]);
ClassName object= new ClassName(Name, Surname, Id);
myList.Add(object);
}
srListOfObjects.Close();
Console.ReadLine();
}
And here is what the text file looks like:
Ann Brown 1233456789
Bruce Willis 098987875
Bill Gates 789678678
and so on...
I would appreciate your comments on the described problem.
while (srListOfObjects.ReadLine().. reads a line but doesn't save it into a variable. string myString= (srListOfObjects.ReadLine()) reads another line.
Use while (!srListOfObjects.EndOfStream) to check for the end of the stream: StreamReader.EndOfStream Property.
Also, it is a good idea to check that the correct number of parts of the string were obtained by the Split - it guards against things like lines with only whitespace.
Things like StreamReaders need have .Dispose() called on them to clear up "unmanaged resources" - an easy way to do that which will work even if the program crashes is to use the using statement.
If you make the ReadFromFile method into a function instead of a void then you can avoid (no pun) using a global variable for the data. Global variables are not necessarily a problem, but it's usually good to avoid them.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApp1
{
public class ClassName
{
public string Name { get; set; }
public string Surname { get; set; }
public long Id { get; set; }
}
class Program
{
public static List<ClassName> ReadFromFile(string fileName)
{
var result = new List<ClassName>();
using (var sr = new StreamReader(fileName))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
var parts = line.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
if (parts.Count() == 3)
{
result.Add(new ClassName
{
Name = parts[0],
Surname = parts[1],
Id = long.Parse(parts[2])
});
}
}
}
return result;
}
static void Main(string[] args)
{
string myFile = #"C:\temp\namesList.txt";
var theList = ReadFromFile(myFile);
foreach(var c in theList)
{
Console.WriteLine($"{c.Id} - {c.Surname}, {c.Name}");
}
Console.ReadLine();
}
}
}
outputs:
1233456789 - Brown, Ann
98987875 - Willis, Bruce
789678678 - Gates, Bill
Your problem is here:
while (srListOfObjects.ReadLine() != null)
{
string myString= (srListOfObjects.ReadLine();
You are entering the loop on the condition that srListOfObjects.ReadLine() returns something other than null but then you are immediately reading a new line form srListOfObjects and storing the returned reference in myString. This has obviously two problems:
The second call to ReadLine can return null and you are not checking if it is. The error you are getting is due to this reason.
You are losing information. You are ignoring the line you are reading when checking the while condition. Until your program crashes or runs to the end (depends on wether the input file has even or odd number of lines), you will process only half of the data.
Update:
You should only read one line per iteration. One way to do it is declaring and initializing myString before entering the loop and updating it on every iteration:
var myString = srListOfObjects.ReadLine();
while (myString != null)
{
//do your stuff
myString = srListOfObjects.ReadLine();
}
https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader.readline?view=netcore-3.1
ReadLine() - Reads a line of characters from the current stream and returns the data as a string.
In your code you do a null check, but then call ReadLine again. When you hit the last line, you will get a NULL string, and splitting that will fail with the NULL ref

Parsing a string using command Line parser

I downloaded this package https://github.com/commandlineparser/commandline
and I wanted to perform parsing for strings like
string str = "file:xxxx\\xxxx\\xxxxx.sh val:-a nsdd m";
so
file = xxxx\\xxxx\\xxxxx.sh
val = -a nsdd m
I wanted to know if anyone had a library in mind or has used the specified library to obtain the parameters specified in the string.
I am having a hard time understanding the example on how to parse that string and obtain the file parameter and val parameter. I know i could do string manipulation but I rather use an existing tested durable solution for this.
I've used this library and it's a solid choice.
Here's a very basic sample using some of what you posted, see code comments for clarification.
class Program
{
static void Main(string[] args)
{
// args a space separated array so you should use an array for your test
// args are identified with the `-` so you should set args like `-f somefilenamehere`
// args specified are -f and -v
string[] arguments = new[] {"-f file:xxxx\\xxxx\\xxxxx.sh", "-v nsdd" };
string file = string.Empty;
string value = string.Empty;
// you would pull your args off the options, if they are successfully parsed
// and map them to your applications properties/settings
Parser.Default.ParseArguments<Options>(arguments)
.WithParsed<Options>(o =>
{
file = o.InputFile; // map InputFile arg to file property
value = o.Value; // map Value arg to value property
});
Console.WriteLine($"file = {file}");
Console.WriteLine($"value = {value}");
Console.ReadLine();
// output:
// file = file:xxxx\xxxx\xxxxx.sh
// value = nsdd
}
}
// the options class is used to define your arg tokens and map them to the Options property
class Options
{
[Option('f', "file", Required = true, HelpText = "Input files to be processed.")]
public string InputFile { get; set; }
[Option('v', "value", Required = true, HelpText = "Value to be used")]
public string Value { get; set; }
}

Parsing log file, ambiguous delimiter

I have to parse a log file and not sure how to best take different pieces of each line. The problem I am facing is original developer used ':' to delimit tokens which was a bit idiotic since the line contains timestamp which itself contains ':'!
A sample line looks something like this:
transaction_date_time:[systemid]:sending_system:receiving_system:data_length:data:[ws_name]
2019-05-08 15:03:13:494|2019-05-08 15:03:13:398:[192.168.1.2]:ABC:DEF:67:cd71f7d9a546ec2b32b,AACN90012001000012,OPNG:[WebService.SomeName.WebServiceModule::WebServiceName]
I have no problem reading the log file and accessing each line but no sure how to get the pieces parsed?
Since the input string is not exactly splittable, because of the delimiter char is also part of the content, a simple regex expression can be used instead.
Simple but probably fast enough, even with the default settings.
The different parts of the input string can be separated with these capturing groups:
string pattern = #"^(.*?)\|(.*?):\[(.*?)\]:(.*?):(.*?):(\d+):(.*?):\[(.*)\]$";
This will give you 8 groups + 1 (Group[0]) which contains the whole string.
Using the Regex class, simply pass a string to parse (named line, here) and the regex (named pattern) to the Match() method, using default settings:
var result = Regex.Match(line, pattern);
The Groups.Value property returns the result of each capturing group. For example, the two dates:
var dateEnd = DateTime.ParseExact(result.Groups[1].Value, "yyyy-MM-dd hh:mm:ss:ttt", CultureInfo.InvariantCulture),
var dateStart = DateTime.ParseExact(result.Groups[2].Value, "yyyy-MM-dd hh:mm:ss:ttt", CultureInfo.InvariantCulture),
The IpAddress is extracted with: \[(.*?)\].
You could give a name to this grouping, so it's more clear what the value refers to. Simply add a string, prefixed with ? and enclosed in <> or single quotes ' to name the grouping:
...\[(?<IpAddress>.*?)\]...
Note, however, that naming a group will modify the Regex.Groups indexing: the un-named groups will be inserted first, the named groups after. So, naming only the IpAddress group will cause it to become the last item, Groups[8]. Of course you can name all the groups and the indexing will be preserved.
var hostAddress = IPAddress.Parse(result.Groups["IpAddress"].Value);
This patter should allow a medium machine to parse 130,000~150,000 strings per second.
You'll have to test it to find the perfect pattern. For example, the first match (corresposnding to the first date): (.*?)\|, is much faster if non-greedy (using the *? lazy quantifier). The opposite for the last match: \[(.*)\]. The pattern used by jdweng is even faster than the one used here.
See Regex101 for a detailed description on the use and meaning of each token.
Using Regex I was able to parse everything. It looks like the data came from excel because the faction of seconds has a colon instead of a period. c# does not like the colon so I had to replace colon with a period. I also parsed from right to left to get around the colon issues.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
namespace ConsoleApplication3
{
class Program1
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
string line = "";
int rowCount = 0;
StreamReader reader = new StreamReader(FILENAME);
string pattern = #"^(?'time'.*):\[(?'systemid'[^\]]+)\]:(?'sending'[^:]+):(?'receiving'[^:]+):(?'length'[^:]+):(?'data'[^:]+):\[(?'ws_name'[^\]]+)\]";
while ((line = reader.ReadLine()) != null)
{
line = line.Trim();
if (line.Length > 0)
{
if (++rowCount != 1) //skip header row
{
Log_Data newRow = new Log_Data();
Log_Data.logData.Add(newRow);
Match match = Regex.Match(line, pattern, RegexOptions.RightToLeft);
newRow.ws_name = match.Groups["ws_name"].Value;
newRow.data = match.Groups["data"].Value;
newRow.length = int.Parse(match.Groups["length"].Value);
newRow.receiving_system = match.Groups["receiving"].Value;
newRow.sending_system = match.Groups["sending"].Value;
newRow.systemid = match.Groups["systemid"].Value;
//end data is first then start date is second
string[] date = match.Groups["time"].Value.Split(new char[] {'|'}).ToArray();
string replacePattern = #"(?'leader'.+):(?'trailer'\d+)";
string stringDate = Regex.Replace(date[1], replacePattern, "${leader}.${trailer}", RegexOptions.RightToLeft);
newRow.startDate = DateTime.Parse(stringDate);
stringDate = Regex.Replace(date[0], replacePattern, "${leader}.${trailer}", RegexOptions.RightToLeft);
newRow.endDate = DateTime.Parse(stringDate );
}
}
}
}
}
public class Log_Data
{
public static List<Log_Data> logData = new List<Log_Data>();
public DateTime startDate { get; set; } //transaction_date_time:[systemid]:sending_system:receiving_system:data_length:data:[ws_name]
public DateTime endDate { get; set; }
public string systemid { get; set; }
public string sending_system { get; set; }
public string receiving_system { get; set; }
public int length { get; set; }
public string data { get; set; }
public string ws_name { get; set; }
}
}

How to append newline before every occurrence of time stamp in string property?

I have a list containing string property's called Actions. Within each Actions string property there are multiple text entries separated by a timestamp like below:
05/10/2016 15:23:42- UTC--test
05/10/2016 16:07:04- UTC--test
05/10/2016 16:33:54- UTC--test
06/10/2016 08:24:52- UTC--test
What I'd like to do is insert a newline \n character before each timestamp in the string property.
So I looped through each record in the list, then tried to modify each string property by adding a newline to each timestamp. But I'm not sure how to get the timestamp value in the string to perform the replace:
//Not sure how to find the instance of timestamp in the string
foreach (var record in escList)
{
record.Actions = record.Actions.Replace("timestamp_text_string","\n" + "timestamp_text_value");
}
I was thinking of using a regex to match every string matching a timestamp pattern, but not sure if the regex works in this context:
string pattern = #"\[[0-9]:[0-9]{1,2}:[0-9]{1,2}\]"; //timestamp pattern
record.Actions = record.Actions.Replace(pattern,"\n" + pattern);
How can you append a newline before every occurrence of time stamp in string property?
The desired result is that for every entry in the string property, i.e, 05/10/2016 15:23:42- UTC--test there would be a new line added before that portion of the string. Giving the following output:
05/10/2016 15:23:42- UTC--test
05/10/2016 16:07:04- UTC--test
05/10/2016 16:33:54- UTC--test
06/10/2016 08:24:52- UTC--test
Use Split:
List<string> result=new List<string>();
foreach (var record in escList)
{
result.Add(record.Actions.Replace(record.Actions.Split(' ')[1], "\n" + record.Actions.Split(' ')[1]));
}
Not sure If I understood your desired result correctly, but I think performance wise you would be interested in using a StringBuilder instead of a List. Here's a sample I made:
class Program
{
static void Main(string[] args)
{
string action1 = "05/10/2016 15:23:42- UTC--test";
string action2 = "05/10/2016 16:07:04- UTC--test";
string action3 = "05/10/2016 16:33:54- UTC--test";
string action4 = "06/10/2016 08:24:52- UTC--test";
List<string> sample_actions = new List<string>() { action1, action2, action3, action4 };
Record rec = new Record();
foreach (string sample_action in sample_actions)
{
rec.Actions.AppendLine(sample_action).AppendLine();
}
}
}
class Record
{
public StringBuilder Actions { get; set; }
public Record()
{
Actions = new StringBuilder();
}
}
Edited to match your needs
Assuming actions has at least one element:
spacedAtions = actions.Take(1).Concat(actions.Skip(1).Select(a => $"\n{a}));

Set String.Format at runtime

I have an XML File that I want to allow the end user to set the format of a string.
ex:
<Viewdata>
<Format>{0} - {1}</Format>
<Parm>Name(property of obj being formatted)</Parm>
<Parm>Phone</Parm>
</Viewdata>
So at runtime I would somehow convert that to a String.Format("{0} - {1}", usr.Name, usr.Phone);
Is this even possible?
Of course. Format strings are just that, strings.
string fmt = "{0} - {1}"; // get this from your XML somehow
string name = "Chris";
string phone = "1234567";
string name_with_phone = String.Format(fmt, name, phone);
Just be careful with it, because your end user might be able to disrupt the program. Do not forget to FormatException.
I agree with the other posters who say you probably shouldn't be doing this but that doesn't mean we can't have fun with this interesting question. So first of all, this solution is half-baked/rough but it's a good start if someone wanted to build it out.
I wrote it in LinqPad which I love so Dump() can be replaced with console writelines.
void Main()
{
XElement root = XElement.Parse(
#"<Viewdata>
<Format>{0} | {1}</Format>
<Parm>Name</Parm>
<Parm>Phone</Parm>
</Viewdata>");
var formatter = root.Descendants("Format").FirstOrDefault().Value;
var parms = root.Descendants("Parm").Select(x => x.Value).ToArray();
Person person = new Person { Name = "Jack", Phone = "(123)456-7890" };
string formatted = MagicFormatter<Person>(person, formatter, parms);
formatted.Dump();
/// OUTPUT ///
/// Jack | (123)456-7890
}
public string MagicFormatter<T>(T theobj, string formatter, params string[] propertyNames)
{
for (var index = 0; index < propertyNames.Length; index++)
{
PropertyInfo property = typeof(T).GetProperty(propertyNames[index]);
propertyNames[index] = (string)property.GetValue(theobj);
}
return string.Format(formatter, propertyNames);
}
public class Person
{
public string Name { get; set; }
public string Phone { get; set; }
}
XElement root = XElement.Parse (
#"<Viewdata>
<Format>{0} - {1}</Format>
<Parm>damith</Parm>
<Parm>071444444</Parm>
</Viewdata>");
var format =root.Descendants("Format").FirstOrDefault().Value;
var result = string.Format(format, root.Descendants("Parm")
.Select(x=>x.Value).ToArray());
What about specify your format string with parameter names:
<Viewdata>
<Format>{Name} - {Phone}</Format>
</Viewdata>
Then with something like this:
http://www.codeproject.com/Articles/622309/Extended-string-Format
you can do the work.
Short answer is yes but it depends on the variety of your formatting options how difficult it is going to be.
If you have some formatting strings that accept 5 parameter and some other that accept only 3 that you need to take that into account.
I’d go with parsing XML for params and storing these into array of objects to pass to String.Format function.
You can use System.Linq.Dynamic and make entire format command editable:
class Person
{
public string Name;
public string Phone;
public Person(string n, string p)
{
Name = n;
Phone = p;
}
}
static void TestDynamicLinq()
{
foreach (var x in new Person[] { new Person("Joe", "123") }.AsQueryable().Select("string.Format(\"{0} - {1}\", it.Name, it.Phone)"))
Console.WriteLine(x);
}

Categories