Parse Text File Into Dictionary

Parse Text File Into Dictionary - c#

I have a text file that has several hundred configuration values. The general format of the configuration data is "Label:Value". Using C# .net, I would like to read these configurations, and use the Values in other portions of the code. My first thought is that I would use a string search to look for the Labels then parse out the values following the labels and add them to a dictionary, but this seems rather tedious considering the number of labels/values that I would have to search for. I am interested to hear some thoughts on a possible architecture to perform this task. I have included a small section of a sample text file that contains some of the labels and values (below). A couple of notes: The Values are not always numeric (as seen in the AUX Serial Number); For whatever reason, the text files were formatted using spaces (\s) rather than tabs (\t). Thanks in advance for any time you spend thinking about this.
Sample Text:
AUX Serial Number: 445P000023 AUX Hardware Rev: 1
Barometric Pressure Slope: -1.452153E-02
Barometric Pressure Intercept: 9.524336E+02

This is a nice little brain tickler. I think this code might be able to point you in the right direction. Keep in mind, this fills a Dictionary<string, string>, so there are no conversions of values into ints or the like. Also, please excuse the mess (and the poor naming conventions). It was a quick write-up based on my train of thought.
Dictionary<string, string> allTheThings = new Dictionary<string, string>();
public void ReadIt()
{
// Open the file into a streamreader
using (System.IO.StreamReader sr = new System.IO.StreamReader("text_path_here.txt"))
{
while (!sr.EndOfStream) // Keep reading until we get to the end
{
string splitMe = sr.ReadLine();
string[] bananaSplits = splitMe.Split(new char[] { ':' }); //Split at the colons
if (bananaSplits.Length < 2) // If we get less than 2 results, discard them
continue;
else if (bananaSplits.Length == 2) // Easy part. If there are 2 results, add them to the dictionary
allTheThings.Add(bananaSplits[0].Trim(), bananaSplits[1].Trim());
else if (bananaSplits.Length > 2)
SplitItGood(splitMe, allTheThings); // Hard part. If there are more than 2 results, use the method below.
}
}
}
public void SplitItGood(string stringInput, Dictionary<string, string> dictInput)
{
StringBuilder sb = new StringBuilder();
List<string> fish = new List<string>(); // This list will hold the keys and values as we find them
bool hasFirstValue = false;
foreach (char c in stringInput) // Iterate through each character in the input
{
if (c != ':') // Keep building the string until we reach a colon
sb.Append(c);
else if (c == ':' && !hasFirstValue)
{
fish.Add(sb.ToString().Trim());
sb.Clear();
hasFirstValue = true;
}
else if (c == ':' && hasFirstValue)
{
// Below, the StringBuilder currently has something like this:
// " 235235 Some Text Here"
// We trim the leading whitespace, then split at the first sign of a double space
string[] bananaSplit = sb.ToString()
.Trim()
.Split(new string[] { " " },
StringSplitOptions.RemoveEmptyEntries);
// Add both results to the list
fish.Add(bananaSplit[0].Trim());
fish.Add(bananaSplit[1].Trim());
sb.Clear();
}
}
fish.Add(sb.ToString().Trim()); // Add the last result to the list
for (int i = 0; i < fish.Count; i += 2)
{
// This for loop assumes that the amount of keys and values added together
// is an even number. If it comes out odd, then one of the lines on the input
// text file wasn't parsed correctly or wasn't generated correctly.
dictInput.Add(fish[i], fish[i + 1]);
}
}

So the only general approach that I can think of, given the format that you're limited to, is to first find the first colon on the line and take everything before it as the label. Skip all whilespace characters until you get to the first non-whitespace character. Take all non-whitespace characters as the value of the label. If there is a colon after the end of that value take everything after the end of the previous value to the colon as the next value and repeat. You'll also probably need to trim whitespace around the labels.
You might be able to capture that meaning with a regex, but it wouldn't likely be a pretty one if you could; I'd avoid it for something this complex unless you're entire development team is very proficient with them.

I would try something like this:
While string contains triple space, replace it with double space.
Replace all ": " and ": " (: with double space) with ":".
Replace all " " (double space) with '\n' (new line).
If line don't contain ':' than skip the line. Else, use string.Split(':'). This way you receive arrays of 2 strings (key and value). Some of them may contain empty characters at the beginning or at the end.
Use string.Trim() to get rid of those empty characters.
Add received key and value to Dictionary.
I am not sure if it solves all your cases but it's a general clue how I would try to do it.
If it works you could think about performance (use StringBuilder instead of string wherever it is possible etc.).

This is probably the dirtiest function I´ve ever written, but it works.
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
string[] data = line.Split(':');
if (line != String.Empty)
{
for (int i = 0; i < data.Length - 1; i++)
{
if (i != 0)
{
bool isPair;
if (i % 2 == 0)
{
isPair = true;
}
else
{
isPair = false;
}
if (isPair)
{
string keyOdd = data[i].Trim();
try { keyOdd = keyOdd.Substring(keyOdd.IndexOf(' ')).TrimStart(); }
catch { }
string valueOdd = data[i + 1].TrimStart();
try { valueOdd = valueOdd.Remove(valueOdd.IndexOf(' ')); } catch{}
yourDic.Add(keyOdd, valueOdd);
}
else
{
string keyPair = data[i].TrimStart();
keyPair = keyPair.Substring(keyPair.IndexOf(' ')).Trim();
string valuePair = data[i + 1].TrimStart();
try { valuePair = valuePair.Remove(valuePair.IndexOf(' ')); } catch { }
yourDic.Add(keyPair, valuePair);
}
}
else
{
string key = data[i].Trim();
string value = data[i + 1].TrimStart();
try { value = value.Remove(value.IndexOf(' ')); } catch{}
yourDic.Add(key, value);
}
}
}
}
How does it works?, well splitting the line you can know what you can get in every position of the array, so I just play with the even and odd values.
You will understand me when you debug this function :D. It fills the Dictionary that you need.

I have another idea. Does values contain spaces? If not you could do like this:
Ignore white spaces until you read some other char (first char of key).
Read string until ':' occures.
Trim key that you get.
Ignore white spaces until you read some other char (first char of value).
Read until you get empty char.
Trim value that you get.
If it is the end than stop. Else, go back to step 1.
Good luck.

Maybe something like this would work, be careful with the ':' character
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
yourDic.Add(line.Split(':')[0], line.Split(':')[1]);
}
Anyway, I recommend to organize that file in some way that you´ll always know in what format it comes.

Related

How to contact whole text from file into the string avoiding empty lines beetwen strings

How to get whole text from document contacted into the string. I'm trying to split text by dot: string[] words = s.Split('.'); I want take this text from text document. But if my text document contains empty lines between strings, for example:
pat said, “i’ll keep this ring.”
she displayed the silver and jade wedding ring which, in another time track,
she and joe had picked out; this
much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
result looks like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track
3. she and joe had picked out this
4. much of the alternate world she had elected to retain
5. he wondered what if any legal basis she had kept in addition
6. none he hoped wisely however he said nothing
7. better not even to ask
but desired correct output should be like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track she and joe had picked out this much of the alternate world she had elected to retain
3. he wondered what if any legal basis she had kept in addition
4. none he hoped wisely however he said nothing
5. better not even to ask
So to do this first I need to process text file content to get whole text as single string, like this:
pat said, “i’ll keep this ring.” she displayed the silver and jade wedding ring which, in another time track, she and joe had picked out; this much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
I can't to do this same way as it would be with list content for example: string concat = String.Join(" ", text.ToArray());,
I'm not sure how to contact text into string from text document

I think this is what you want:
var fileLocation = #"c:\\myfile.txt";
var stringFromFile = File.ReadAllText(fileLocation);
//replace Environment.NewLine with any new line character your file uses
var withoutNewLines = stringFromFile.Replace(Environment.NewLine, "");
//modify to remove any unwanted character
var withoutUglyCharacters = Regex.Replace(withoutNewLines, "[“’”,;-]", "");
var withoutTwoSpaces = withoutUglyCharacters.Replace(" ", " ");
var result = withoutTwoSpaces.Split('.').Where(i => i != "").Select(i => i.TrimStart()).ToList();
So first you read all text from your file, then you remove all unwanted characters and then split by . and return non empty items

Have you tried replacing double new-lines before splitting using a period?
static string[] GetSentences(string filePath) {
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
var sentences = Regex.Split(lines, #"\.[\s]{1,}?");
return sentences;
}
I haven't tested this, but it should work.
Explanation:
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
Throws an exception if the file could not be found. It is advisory you surround the method call with a try/catch.
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
Creates a string, and ignores any lines which are purely whitespace or empty.
var sentences = Regex.Split(lines, #".[\s]{1,}?");
Creates a string array, where the string is split at every period and whitespace following the period.
E.g:
The string "I came. I saw. I conquered" would become
I came
I saw
I conquered
Update:
Here's the method as a one-liner, if that's your style?
static string[] SplitSentences(string filePath) => File.Exists(filePath) ? Regex.Split(string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line))), #"") : null;

I would suggest you to iterate through all characters and just check if they are in range of 'a' >= char <= 'z' or if char == ' '. If it matches the condition then add it to the newly created string else check if it is '.' character and if it is then end your line and add another one :
List<string> lines = new List<string>();
string line = string.Empty;
foreach(char c in str)
{
if((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20)
line += c;
else if(c == '.')
{
lines.Add(line.Trim());
line = string.Empty;
}
}
Working online example
Or if you prefer "one-liner"s :
IEnumerable<string> lines = new string(str.Select(c => (char)(((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20) ? c : c == '.' ? '\n' : '\0')).ToArray()).Split('\n').Select(s => s.Trim());

I may be wrong about this. I would think that you may not want to alter the string if you are splitting it. Example, there are double/single quote(s) (“) in part of the string. Removing them may not be desired which brings up the possibly of a question, reading a text file that contains single/double quotes (as your example data text shows) like below:
var stringFromFile = File.ReadAllText(fileLocation);
will not display those characters properly in a text box or the console because the default encoding using the ReadAllText method is UTF8. Example the single/double quotes will display (replacement characters) as diamonds in a text box on a form and will be displayed as a question mark (?) when displayed to the console. To keep the single/double quotes and have them display properly you can get the encoding for the OS’s current ANSI encoding by adding a parameter to the ReadAllText method like below:
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
Below is code using a simple split method to .split the string on periods (.) Hope this helps.
private void button1_Click(object sender, EventArgs e) {
string fileLocation = #"C:\YourPath\YourFile.txt";
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
string bigString = stringFromFile.Replace(Environment.NewLine, "");
string[] result = bigString.Split('.');
int count = 1;
foreach (string s in result) {
if (s != "") {
textBox1.Text += count + ". " + s.Trim() + Environment.NewLine;
Console.WriteLine(count + ". " + s.Trim());
count++;
}
else {
// period at the end of the string
}
}
}

remove first line from string builder / string c#

Okay so I'm trying to make a 'console' like text box within a form, however once you reach the bottom, instaid of being able to scroll up, it will just delete the top line, Im having some difficulties.
So far, when it gets to bottom it deletes the top line, however only once, it just carries on as normal. Here is my function:
StringBuilder sr = new StringBuilder();
public void writeLine(string input)
{
string firstline = "";
int numLines = Convert.ToString(sr).Split('\n').Length;
if (numLines > 15) //Max Lines
{
sr.Remove(0, Convert.ToString(sr).Split('\n').FirstOrDefault().Length);
}
sr.Append(input + "\r\n");
consoleTxtBox.Text = Convert.ToString(sr) + numLines;
}
Would be great if someone could fix this, thanks
Lucas

First, what's wrong with your solution: the reason it does not work is that it removes the content of the line, but it ignores the \n at the end. Adding 1 should fix that:
sr.Remove(0, Convert.ToString(sr).Split('\n').FirstOrDefault().Length+1);
// ^
// |
// This will take care of the trailing '\n' after the first line ---+
Now to doing it a simpler way: all you need to do is finding the first \n, and taking substring after it, like this:
string RemoveFirstLine(string s) {
return s.Substring(s.IndexOf(Environment.NewLine)+1);
}
Note that this code does not crash even when there are no newline characters in the string, i.e. when IndexOf returns -1 (in which case nothing is removed).

You can use the Lines property from the TextBox. This will get all the lines in the TextBox, as an array, then create a new array that doesn't include the first element (Skip(1)). It assigns this new array back to the textbox.
string[] lines = textBox.Lines;
textBox.Lines = lines.Skip(1).ToArray();

A simple alternative: you could split the string by Environment.NewLine and return all but the first:
public static string RemoveFirstLine(string input)
{
var lines = input.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
return string.Join(Environment.NewLine, lines.Skip(1));
}
Demo

you can remove this line
var lines = lines.Remove(0, lines.ToString().IndexOf(Environment.NewLine));

Most solutions does not seem to take into account the fact that Enviroment.NewLine can consist of multiple characters (len > 1).
public void RemoveFirstStringFromStringBuilder()
{
var lines = new StringBuilder();
lines.AppendLine("abc");
var firstLine = lines.ToString().IndexOf(Environment.NewLine, StringComparison.Ordinal);
if (firstLine >= 0)
lines.Remove(0, firstLine + Environment.NewLine.Length);
Console.WriteLine(lines.Length);
Console.WriteLine(lines.ToString());
}
Prints out: 0 and ""

What worked for me is:
var strBuilder = new StringBuilder();
strBuilder.AppendLine("ABC");
strBuilder.AppendLine("54");
strBuilder.AppendLine("04");
strBuilder.Remove(0, strBuilder.ToString().IndexOf(Environment.NewLine) + 2);
Console.WriteLine(strBuilder);
Solution with +1 didn't work for me, probably because of EOF in this context being interpreted as 2 chars (\r\n)

Check string for invalid characters? Smartest way?

I would like to check some string for invalid characters. With invalid characters I mean characters that should not be there. What characters are these? This is different, but I think thats not that importan, important is how should I do that and what is the easiest and best way (performance) to do that?
Let say I just want strings that contains 'A-Z', 'empty', '.', '$', '0-9'
So if i have a string like "HELLO STaCKOVERFLOW" => invalid, because of the 'a'.
Ok now how to do that? I could make a List<char> and put every char in it that is not allowed and check the string with this list. Maybe not a good idea, because there a lot of chars then. But I could make a list that contains all of the allowed chars right? And then? For every char in the string I have to compare the List<char>? Any smart code for this? And another question: if I would add A-Z to the List<char> I have to add 25 chars manually, but these chars are as I know 65-90 in the ASCII Table, can I add them easier? Any suggestions? Thank you

You can use a regular expression for this:
Regex r = new Regex("[^A-Z0-9.$ ]$");
if (r.IsMatch(SomeString)) {
// validation failed
}
To create a list of characters from A-Z or 0-9 you would use a simple loop:
for (char c = 'A'; c <= 'Z'; c++) {
// c or c.ToString() depending on what you need
}
But you don't need that with the Regex - pretty much every regex engine understands the range syntax (A-Z).

I have only just written such a function, and an extended version to restrict the first and last characters when needed. The original function merely checks whether or not the string consists of valid characters only, the extended function adds two integers for the numbers of valid characters at the beginning of the list to be skipped when checking the first and last characters, in practice it simply calls the original function 3 times, in the example below it ensures that the string begins with a letter and doesn't end with an underscore.
StrChr(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
StrChrEx(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", 11, 1));
BOOL __cdecl StrChr(CHAR* str, CHAR* chars)
{
for (int s = 0; str[s] != 0; s++)
{
int c = 0;
while (true)
{
if (chars[c] == 0)
{
return false;
}
else if (str[s] == chars[c])
{
break;
}
else
{
c++;
}
}
}
return true;
}
BOOL __cdecl StrChrEx(CHAR* str, CHAR* chars, UINT excl_first, UINT excl_last)
{
char first[2] = {str[0], 0};
char last[2] = {str[strlen(str) - 1], 0};
if (!StrChr(str, chars))
{
return false;
}
if (excl_first != 0)
{
if (!StrChr(first, chars + excl_first))
{
return false;
}
}
if (excl_last != 0)
{
if (!StrChr(last, chars + excl_last))
{
return false;
}
}
return true;
}

If you are using c#, you do this easily using List and contains. You can do this with single characters (in a string) or a multicharacter string just the same
var pn = "The String To ChecK";
var badStrings = new List<string>()
{
" ","\t","\n","\r"
};
foreach(var badString in badStrings)
{
if(pn.Contains(badString))
{
//Do something
}
}

If you're not super good with regular expressions, then there is another way to go about this in C#. Here is a block of code I wrote to test a string variable named notifName:
var alphabet = "a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z";
var numbers = "0,1,2,3,4,5,6,7,8,9";
var specialChars = " ,(,),_,[,],!,*,-,.,+,-";
var validChars = (alphabet + "," + alphabet.ToUpper() + "," + numbers + "," + specialChars).Split(',');
for (int i = 0; i < notifName.Length; i++)
{
if (Array.IndexOf(validChars, notifName[i].ToString()) < 0) {
errorFound = $"Invalid character '{notifName[i]}' found in notification name.";
break;
}
}
You can change the characters added to the array as needed. The Array IndexOf method is the key to the whole thing. Of course if you want commas to be valid, then you would need to choose a different split character.

Not enough reps to comment directly, but I recommend the Regex approach. One small caveat: you probably need to anchor both ends of the input string, and you will want at least one character to match. So (with thanks to ThiefMaster), here's my regex to validate user input for a simple arithmetical calculator (plus, minus, multiply, divide):
Regex r = new Regex(#"^[0-9\.\-\+\*\/ ]+$");

I'd go with a regex, but still need to add my 2 cents here, because all the proposed non-regex solutions are O(MN) in the worst case (string is valid) which I find repulsive for religious reasons.
Even more so when LINQ offers a simpler and more efficient solution than nesting loops:
var isInvalid = "The String To Test".Intersect("ALL_INVALID_CHARS").Any();

What is the best way to parse this string in C#?

I have a string that I am reading from another system. It's basically a long string that represents a list of key value pairs that are separated by a space in between. It looks like this:
key:value[space]key:value[space]key:value[space]
So I wrote this code to parse it:
string myString = ReadinString();
string[] tokens = myString.split(' ');
foreach (string token in tokens) {
string key = token.split(':')[0];
string value = token.split(':')[1];
. . . .
}
The issue now is that some of the values have spaces in them so my "simplistic" split at the top no longer works. I wanted to see how I could still parse out the list of key value pairs (given space as a separator character) now that I know there also could be spaces in the value field as split doesn't seem like it's going to be able to work anymore.
NOTE: I now confirmed that KEYs will NOT have spaces in them so I only have to worry about the values. Apologies for the confusion.

Use this regular expression:
\w+:[\w\s]+(?![\w+:])
I tested it on
test:testvalue test2:test value test3:testvalue3
It returns three matches:
test:testvalue
test2:test value
test3:testvalue3
You can change \w to any character set that can occur in your input.
Code for testing this:
var regex = new Regex(#"\w+:[\w\s]+(?![\w+:])");
var test = "test:testvalue test2:test value test3:testvalue3";
foreach (Match match in regex.Matches(test))
{
var key = match.Value.Split(':')[0];
var value = match.Value.Split(':')[1];
Console.WriteLine("{0}:{1}", key, value);
}
Console.ReadLine();
As Wonko the Sane pointed out, this regular expression will fail on values with :. If you predict such situation, use \w+:[\w: ]+?(?![\w+:]) as the regular expression. This will still fail when a colon in value is preceded by space though... I'll think about solution to this.

This cannot work without changing your split from a space to something else such as a "|".
Consider this:
Alfred Bester:Alfred Bester Alfred:Alfred Bester
Is this Key "Alfred Bester" & value Alfred" or Key "Alfred" & value "Bester Alfred"?

string input = "foo:Foobarius Maximus Tiberius Kirk bar:Barforama zap:Zip Brannigan";
foreach (Match match in Regex.Matches(input, #"(\w+):([^:]+)(?![\w+:])"))
{
Console.WriteLine("{0} = {1}",
match.Groups[1].Value,
match.Groups[2].Value
);
}
Gives you:
foo = Foobarius Maximus Tiberius Kirk
bar = Barforama
zap = Zip Brannigan

You could try to Url encode the content between the space (The keys and the values not the : symbol) but this would require that you have control over the Input Method.
Or you could simply use another format (Like XML or JSON), but again you will need control over the Input Format.
If you can't control the input format you could always use a Regular expression and that searches for single spaces where a word plus : follows.
Update (Thanks Jon Grant)
It appears that you can have spaces in the key and the value. If this is the case you will need to seriously rethink your strategy as even Regex won't help.

string input = "key1:value key2:value key3:value";
Dictionary<string, string> dic = input.Split(' ').Select(x => x.Split(':')).ToDictionary(x => x[0], x => x[1]);
The first will produce an array:
"key:value", "key:value"
Then an array of arrays:
{ "key", "value" }, { "key", "value" }
And then a dictionary:
"key" => "value", "key" => "value"
Note, that Dictionary<K,V> doesn't allow duplicated keys, it will raise an exception in such a case. If such a scenario is possible, use ToLookup().

Using a regular expression can solve your problem:
private void DoSplit(string str)
{
str += str.Trim() + " ";
string patterns = #"\w+:([\w+\s*])+[^!\w+:]";
var r = new System.Text.RegularExpressions.Regex(patterns);
var ms = r.Matches(str);
foreach (System.Text.RegularExpressions.Match item in ms)
{
string[] s = item.Value.Split(new char[] { ':' });
//Do something
}
}

This code will do it (given the rules below). It parses the keys and values and returns them in a Dictonary<string, string> data structure. I have added some code at the end that assumes given your example that the last value of the entire string/stream will be appended with a [space]:
private Dictionary<string, string> ParseKeyValues(string input)
{
Dictionary<string, string> items = new Dictionary<string, string>();
string[] parts = input.Split(':');
string key = parts[0];
string value;
int currentIndex = 1;
while (currentIndex < parts.Length-1)
{
int indexOfLastSpace=parts[currentIndex].LastIndexOf(' ');
value = parts[currentIndex].Substring(0, indexOfLastSpace);
items.Add(key, value);
key = parts[currentIndex].Substring(indexOfLastSpace + 1);
currentIndex++;
}
value = parts[parts.Length - 1].Substring(0,parts[parts.Length - 1].Length-1);
items.Add(key, parts[parts.Length-1]);
return items;
}
Note: this algorithm assumes the following rules:
No spaces in the values
No colons in the keys
No colons in the values

Without any Regex nor string concat, and as an enumerable (it supposes keys don't have spaces, but values can):
public static IEnumerable<KeyValuePair<string, string>> Split(string text)
{
if (text == null)
yield break;
int keyStart = 0;
int keyEnd = -1;
int lastSpace = -1;
for(int i = 0; i < text.Length; i++)
{
if (text[i] == ' ')
{
lastSpace = i;
continue;
}
if (text[i] == ':')
{
if (lastSpace >= 0)
{
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1, lastSpace - keyEnd - 1));
keyStart = lastSpace + 1;
}
keyEnd = i;
continue;
}
}
if (keyEnd >= 0)
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1));
}

I guess you could take your method and expand upon it slightly to deal with this stuff...
Kind of pseudocode:
List<string> parsedTokens = new List<String>();
string[] tokens = myString.split(' ');
for(int i = 0; i < tokens.Length; i++)
{
// We need to deal with the special case of the last item,
// or if the following item does not contain a colon.
if(i == tokens.Length - 1 || tokens[i+1].IndexOf(':' > -1)
{
parsedTokens.Add(tokens[i]);
}
else
{
// This bit needs to be refined to deal with values with multiple spaces...
parsedTokens.Add(tokens[i] + " " + tokens[i+1]);
}
}
Another approach would be to split on the colon... That way, your first array item would be the name of the first key, second item would be the value of the first key and then name of the second key (can use LastIndexOf to split it out), and so on. This would obviously get very messy if the values can include colons, or the keys can contain spaces, but in that case you'd be pretty much out of luck...

Get parameters out of text file

I have a C# asp.net page that has to get username/password info from a text file.
Could someone please tell me how.
The text file looks as follows: (it is actually a lot larger, I just got a few lines)
DATASOURCEFILE=D:\folder\folder
var1= etc
var2= more
var3 = misc
var4 = stuff
USERID = user1
PASSWORD = pwd1
all I need is the UserID and password out of that file.
Thank you for your help,
Steve

This would work:
var dic = File.ReadAllLines("test.txt")
.Select(l => l.Split(new[] { '=' }))
.ToDictionary( s => s[0].Trim(), s => s[1].Trim());
dic is a dictionary, so you easily extract your values, i.e.:
string myUser = dic["USERID"];
string myPassword = dic["PASSWORD"];

Open the file, split on the newline, split again on the = for each item and then add it to a dictionary.
string contents = String.Empty;
using (FileStream fs = File.Open("path", FileMode.OpenRead))
using (StreamReader reader = new StreamReader(fs))
{
contents = reader.ReadToEnd();
}
if (contents.Length > 0)
{
string[] lines = contents.Split(new char[] { '\n' });
Dictionary<string, string> mysettings = new Dictionary<string, string>();
foreach (string line in lines)
{
string[] keyAndValue = line.Split(new char[] { '=' });
mysettings.Add(keyAndValue[0].Trim(), keyAndValue[1].Trim());
}
string test = mysettings["USERID"]; // example of getting userid
}

You can use Regular expressions to extract each variable. You can read one line at a time, or the entire file into one string. If the latter, you just look for a newline in the expression.
Regards,
Morten

Dictionary is not needed.
Old-fashioned parsing can do more, with less executable code, the same amount of compiled data, and less processing:
public string MyPath1;
public string MyPath2;
...
public void ReadConfig(string sConfigFile)
{
MyPath1 = MyPath2 = ""; // Clear the external values (in case the file does not set every parameter).
using (StreamReader sr = new StreamReader(sConfigFile)) // Open the file for reading (and auto-close).
{
while (!sr.EndOfStream)
{
string sLine = sr.ReadLine().Trim(); // Read the next line. Trim leading and trailing whitespace.
// Treat lines with NO "=" as comments (ignore; no syntax checking).
// Treat lines with "=" as the first character as comments too.
// Treat lines with "=" as the 2nd character or after as parameter lines.
// Side-benefit: Values containing "=" are processed correctly.
int i = sLine.IndexOf("="); // Find the first "=" in the line.
if (i <= 0) // IF the first "=" in the line is the first character (or not present),
continue; // the line is not a parameter line. Ignore it. (Iterate the while.)
string sParameter = sLine.Remove(i).TrimEnd(); // All before the "=" is the parameter name. Trim whitespace.
string sValue = sLine.Substring(i + 1).TrimStart(); // All after the "=" is the value. Trim whitespace.
// Extra characters before a parameter name are usually intended to comment it out. Here, we keep them (with or without whitespace between). That makes an unrecognized parameter name, which is ignored (acts as a comment, as intended).
// Extra characters after a value are usually intended as comments. Here, we trim them only if whitespace separates. (Parsing contiguous comments is too complex: need delimiter(s) and then a way to escape delimiters (when needed) within values.) Side-drawback: Values cannot contain " ".
i = sValue.IndexOfAny(new char[] {' ', '\t'}); // Find the first " " or tab in the value.
if (i > 1) // IF the first " " or tab is the second character or after,
sValue = sValue.Remove(i); // All before the " " or tab is the parameter. (Discard the rest.)
// IF a desired parameter is specified, collect it:
// (Could detect here if any parameter is set more than once.)
if (sParameter == "MyPathOne")
MyPath1 = sValue;
else if (sParameter == "MyPathTwo")
MyPath2 = sValue;
// (Could detect here if an invalid parameter name is specified.)
// (Could exit the loop here if every parameter has been set.)
} // end while
// (Could detect here if the config file set neither parameter or only one parameter.)
} // end using
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.