Splitting an element of an array - c#

In my C# program (I'm new to C# so I hope that I'm doing things correctly), I'm trying to read in all of the lines from a text file, which will look something along the lines of this, but with more entries (these are fictional people so don't worry about privacy):
Logan Babbleton ID #: 0000011 108 Crest Circle Mr. Logan M. Babbleton
Pittsburgh PA 15668 SSN: XXX-XX-XXXX
Current Program(s): Bachelor of Science in Cybersecurity
Mr. Carter J. Bairn ID #: 0000012 21340 North Drive Mr. Carter Joseph Bairn
Pittsburgh PA 15668 SSN: XXX-XX-XXXX
Current Program(s): Bachelor of Science in Computer Science
I have these lines read into an array, concentrationArray and want to find the lines that contain the word "Current", split them at the "(s): " in "Program(s): " and print the words that follow. I've done this earlier in my program, but splitting at an ID instead, like this:
nameLine = nameIDLine.Split(new string[] { "ID" }, StringSplitOptions.None)[1];
However, whenever I attempt to do this, I get an error that my index is out of the bounds of my split array (not my concentrationArray). Here's what I currently have:
for (int i = 0; i < concentrationArray.Length; i++)
{
if (concentrationArray[i].Contains("Current"))
{
lstTest.Items.Add(concentrationArray[i].Split(new string[] { "(s): " }, StringSplitOptions.None)[1]);
}
}
Where I'm confused is that if I change the index to 0 instead of 1, it will print everything out perfectly, but it will print out the first half, instead of the second half, which is what I want. What am I doing wrong? Any feedback is greatly appreciated since I'm fairly new at C# and would love to learn what I can. Thanks!
Edit - The only thing that I could think of was that maybe sometimes there wasn't anything after the string that I used to separate each element, but when I checked my text file, I found that was not the case and there is always something following the string used to separate.

You should check the result of split before trying to read at index 1.
If your line doesn't contain a "(s): " your code will crash with the exception given
for (int i = 0; i < concentrationArray.Length; i++)
{
if (concentrationArray[i].Contains("Current"))
{
string[] result = concentrationArray[i].Split(new string[] { "(s): " }, StringSplitOptions.None);
if(result.Length > 1)
lstTest.Items.Add(result[1]);
else
Console.WriteLine($"Line {i} has no (s): followeed by a space");
}
}
To complete the answer, if you always use index 0 then there is no error because when no separator is present in the input string then the output is an array with a single element containing the whole unsplitted string

If the line will always starts with
Current Program(s):
then why don't you just replace it with empty string like this:
concentrationArray[i].Replace("Current Program(s): ", "")

It is perhaps a little easier to understand and more reusable if you separate the concerns. It will also be easier to test. An example might be...
var allLines = File.ReadLines(#"C:\your\file\path\data.txt");
var currentPrograms = ExtractCurrentPrograms(allLines);
if (currentPrograms.Any())
{
lstTest.Items.AddRange(currentPrograms);
}
...
private static IEnumerable<string> ExtractCurrentPrograms(IEnumerable<string> lines)
{
const string targetPhrase = "Current Program(s):";
foreach (var line in lines.Where(l => !string.IsNullOrWhiteSpace(l)))
{
var index = line.IndexOf(targetPhrase);
if (index >= 0)
{
var programIndex = index + targetPhrase.Length;
var text = line.Substring(programIndex).Trim();
if (!string.IsNullOrWhiteSpace(text))
{
yield return text;
}
}
}
}

Here is a bit different approach
List<string> test = new List<string>();
string pattern = "Current Program(s):";
string[] allLines = File.ReadAllLines(#"C:\Users\xyz\Source\demo.txt");
foreach (var line in allLines)
{
if (line.Contains(pattern))
{
test.Add(line.Substring(line.IndexOf(pattern) + pattern.Length));
}
}
or
string pattern = "Current Program(s):";
lstTest.Items.AddRange(File.ReadLines(#"C:\Users\ODuritsyn\Source\demo.xml")
.Where(line => line.Contains(pattern))
.Select(line => line.Substring(line.IndexOf(pattern) + pattern.Length)));

Related

Loop iteration of an array string

I'm currently trying to solve a Title Capitalization problem. I have a method that takes in a sentence, splits it into words, compare the words with a check list of words.
Based on this check list, I lowercase the words if they are in the list. Uppercase any words not in the list. The first and last words are always capitalized.
Here is my method:
public string TitleCase(string title)
{
LinkedList<string> wordsList = new LinkedList<string>();
string[] listToCheck = { "a", "the", "to", "in", "with", "and", "but", "or" };
string[] words = title.Split(null);
var last = words.Length - 1;
var firstWord = CapitalizeWord(words[0]);
var lastWord = CapitalizeWord(words[last]);
wordsList.AddFirst(firstWord);
for (var i = 1; i <= last - 1; i++)
{
foreach (var s in listToCheck)
{
if (words[i].Equals(s))
{
wordsList.AddLast(LowercaseWord(words[i]));
}
else
{
wordsList.AddLast(CapitalizeWord(words[i]));
}
}
}
wordsList.AddLast(lastWord);
var sentence = string.Join(" ", wordsList);
return sentence;
}
Running this with the example and expecting the result:
var result = TitleCase("i love solving problems and it is fun");
Assert.AreEqual("I Love Solving Problems and It Is Fun", result);
I get instead:
"I Love Love Love Love Love Love Love Love Solving Solving Solving Solving Solving Solving Solving Solving Problems Problems Problems Problems Problems Problems Problems Problems And And And And And and And And It It It It It It It It Is Is Is Is Is Is Is Is Fun"
If you look closely one and is lowercased. Any tips to how I solve this?
You're doing some extra looping when you go through each of the words to check, and you're not exiting the loop as soon as you find a match (so you're adding the word on each check). To fix this issue in your specific code, you would do something like:
for (var i = 1; i <= last - 1; i++)
{
bool foundMatch = false;
foreach (var s in listToCheck)
{
if (words[i].Equals(s))
{
foundMatch = true;
break;
}
}
if (foundMatch)
{
wordsList.AddLast(LowercaseWord(words[i]));
}
else
{
wordsList.AddLast(CapitalizeWord(words[i]));
}
}
However there is a much easier way, which other answers have provided. But I wanted to point out a couple of other things:
You are creating an unnecessary LinkedList. You already have a list of the words you can manipulate in the words array, so you'll save some memory by just using that.
I think there is a bug in your code (and in some of the answers) where if someone passes in a string with a capital A word in the middle, it will not be converted to lowercase because the Equals method (or in the case of other answers, the Contains method) does a case-sensitive comparison by default. So you might want to pass a case-insensitive comparer to that method.
You don't need to do separate checks for the first and last word. You can just have a single if statement with these checks in the body of your loop
So, here's what I would do:
public static string TitleCase(string title)
{
var listToCheck = new[]{ "a", "the", "to", "in", "with", "and", "but", "or" };
var words = title.Split(null);
// Loop through all words in the array
for (int i = 0; i < words.Length; i++)
{
// If we're on the first or last index, or if
// the word is not in our list, Capitalize it
if (i == 0 || i == (words.Length - 1) ||
!listToCheck.Contains(words[i], StringComparer.OrdinalIgnoreCase))
{
words[i] = CapitalizeWord(words[i]);
}
else
{
words[i] = LowercaseWord(words[i]);
}
}
return string.Join(" ", words);
}
You have a loop within a loop which messes things up, simplify the code to have just one loop:
for (var i = 1; i <= last - 1; i++)
{
// No inner loop
// Use the .Contains() method to see if it's a key word
if (listToCheck.Contains(words[i]))
{
wordsList.AddLast(LowercaseWord(words[i]));
}
else
{
wordsList.AddLast(CapitalizeWord(words[i]));
}
}
Output:
I Love Solving Problems and It Is Fun
The problem is in the foreach loop, you are doing eight checks (the length of the listToCheck array) for each word - and adding the word to the list each time. I'd also recommend using a Linq query, so it should look like this:
for (var i = 1; i <= last - 1; i++) {
if(listToCheck.Contains(words[i]))
wordsList.AddLast(LowercaseWord(words[i]));
else
wordsList.AddLast(CapitalizeWord(words[i]));
}
Also, the reason the sixth 'and' is lowercased is because it is the sixth word in the listToCheck array. On the sixth time around the foreach loop, it succeeds the test and is written in lower case, all the others fail so it is capitalized.
As mentioned in the other answers the loop within the loop doesn't exit.
Just a suggestion, with Linq you could combine checking for the first and last word (through index) and check the ListToCheck together:
public string TitleCase(string title)
{
string[] listToCheck = { "a", "the", "to", "in", "with", "and", "but", "or" };
string[] words = title.Split(null);
var last = words.Length - 1;
return string.Join(" ", words.Select(w=>w.ToLower()).Select(((w,i) => i == 0 || i == last || !listToCheck.Contains(w) ? CapitalizeWord(w) : w)));
}
Note, in this solution the first Select makes sure all words are in lowercase, so the lookup in listToCheck can be done without special comparisons. Because the words are already in lowercase, that doesn't have to be done any more if the word doesn't have to be capitalized.

C# reading multiple lines into a single variable

I have looked but failed to find a tutorial that would address my question. Perhaps I didn't word my searches correctly. In any event, I am here.
I have taken over a handyman company and he had about 150 customers. He had a program that he bought that produces records for his customers. I have the records, but he wouldn't sell me the program as it is commercial and he's afraid of going to prison for selling something like that... whatever... They are written in a text file. The format appears to be:
string name
string last_job_description
int job_codes
string job_address
string comments
The text file looks like this
*Henderson*
*Cleaned gutters, fed the lawn, added mulch to the front flower beds,
cut the back yard, edged the side walk, pressure washed the driveway*
*04 34 32 1 18 99 32 22 43 72 11 18*
*123 Anywhere ave*
*Ms.always pays cash no tip. Mr. gives you a check but always tips*
Alright.. My question is in C# I want to write a program to edit these records, add new customers and delete some I may lose, moves, or dies... But the 2nd entry is broken over two lines, sometimes 3 and 4 lines. They all start and end with *. So, how do I read the 2 to 4 lines and get them into the last_job_description string variable? I can write the class, I can read lines, I can trim away the asterisks. I can find nothing on reading multiple lines into a single variable.
Let's do it right!
First define the customer model:
public class Customer
{
public string Name { get; set; }
public string LastJobDescription { get; set; }
public List<int> JobCodes { get; set; }
public string JobAddress { get; set; }
public string Comments { get; set; }
}
Then, we need a collection of customers:
var customers = new List<Customer>();
Fill the collection with data from the file:
string text = File.ReadAllText("customers.txt");
string pattern = #"(?<= ^ \*) .+? (?= \* \r? $)";
var options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled
| RegexOptions.Singleline | RegexOptions.Multiline;
var matches = Regex.Matches(text, pattern, options);
for (int i = 0; i < matches.Count; i += 5)
{
var customer = new Customer
{
Name = matches[i].Value,
LastJobDescription = matches[i + 1].Value,
JobCodes = matches[i + 2].Value.Split().Select(s => int.Parse(s)).ToList(),
JobAddress = matches[i + 3].Value,
Comments = matches[i + 4].Value
};
customers.Add(customer);
}
I'm using a regular expression that allows to have the * character in the middle of the lines.
Now we can comfortably work with this collection.
Examples of usage.
Remove the first customer:
customers.RemoveAt(0);
Add a comment to the latest client:
customers.Last().Comments += " Very generous.";
Find the first record for a client by the name of Henderson and add the code of the job performed:
customers.Find(c => c.Name == "Henderson").JobCodes.Add(42);
Add new customer:
var customer = new Customer
{
Name = "Chuck Norris",
LastJobDescription= "Saved the world.",
JobCodes = new List<int>() { 1 },
JobAddress = "UN",
Comments = "Nice guy!"
};
customers.Add(customer);
And so on.
To save data to a file, use the following:
var sb = new StringBuilder();
foreach (var customer in customers)
{
sb.Append('*').Append(customer.Name).Append('*').AppendLine();
sb.Append('*').Append(customer.LastJobDescription).Append('*').AppendLine();
sb.Append('*').Append(string.Join(" ", customer.JobCodes)).Append('*').AppendLine();
sb.Append('*').Append(customer.JobAddress).Append('*').AppendLine();
sb.Append('*').Append(customer.Comments).Append('*').AppendLine();
}
File.WriteAllText("customers.txt", sb.ToString());
You probably need a graphical user interface. If so, I suggest you to ask a new question where you specify what you use: WinForms, WPF, Web-application or something else.
if you want to read all the lines in a file into a single variable then you need to do this
var txt = File.ReadAllText(.. your file location here ..);
or you can do
var lines = File.ReadAllLines(.. your file location here); and then you can iterate through each line and remove the blanks of leave as it is.
But based on your question the first line is what you're really after
when you should read last_job_description read second line. if it starts with * it means that this line is job_codes otherwise append it to pervous read line. do this fo every lines until you find job_codes.
using (var fileReader = new StreamReader("someFile"))
{
// read some pervous data here
var last_job_description = fileReader.ReadLine();
var nextLine = fileReader.ReadLine();
while (nextLine.StartsWith("*") == false)
{
last_job_description += nextLine;
nextLine = fileReader.ReadLine();
}
//you have job_codes in nextline, so parse it now and read next data
}
or you may even use the fact that each set of data starts and ENDS!!! with * so you may create function that reads each "set" and returns it as singleline string, no matter how much lines it really was. let's assume that reader variable you pass to this function is the same StreamReader as in upper code
function string readSingleValue(StreamReader fileReader){
var result = "";
do
{
result += fileReader.ReadLine().Trim();
} while (result.EndsWith("*") == false);
return result;
}
I think your problem is little complex. What i am trying to say is it doesn't mention to a specific problem. It is contained at least 3 different programming aspects that you need to know or learn to reach your goal. The steps that you must take are described below.
unfortunately you need to consider your current file as huge string type.
Then you are able to process this string and separate different parts.
Next move is defining a neat, reliable and robust XML file which can hold your data.
Fill your xml with your manipulated string.
If you have a xml file you can simply update it.
There is not built-in method that will parse your file exactly like you want, so you have to get your hands dirty.
Here's an example of a working code for your specific case. I assumed you wanted your job_codes in an array since they're multiple integers separated by spaces.
Each time we encounter a line starting with '*', we increase a counter that'll tell us to work with the next of your properties (name, last job description, etc...).
Each line is appended to the current property.
And obviously, we remove the stars from the beginning and ending of every line.
string name = String.Empty;
string last_job_description = String.Empty;
int[] job_codes = null;
string job_address = String.Empty;
string comments = String.Empty;
int numberOfPropertiesRead = 0;
var lines = File.ReadAllLines("C:/yourfile.txt");
for (int i = 0; i < lines.Count(); i++)
{
var line = lines[i];
bool newProp = line.StartsWith("*");
bool endOfProp = line.EndsWith("*");
if (newProp)
{
numberOfPropertiesRead++;
line = line.Substring(1);
}
if (endOfProp)
line = line.Substring(0, line.Length - 1);
switch (numberOfPropertiesRead)
{
case 1: name += line; break;
case 2: last_job_description += line; break;
case 3:
job_codes = line.Split(' ').Select(el => Int32.Parse(el)).ToArray();
break;
case 4: job_address += line; break;
case 5: comments += line; break;
default:
throw new ArgumentException("Wow, that's too many properties dude.");
}
}
Console.WriteLine("name: " + name);
Console.WriteLine("last_job_description: " + last_job_description);
foreach (int job_code in job_codes)
Console.Write(job_code + " ");
Console.WriteLine();
Console.WriteLine("job_address: " + job_address);
Console.WriteLine("comments: " + comments);
Console.ReadLine();

C# -> Repeaters and Displaying them with Loops through Arrays

My question is part curiosity and part help, so bear with me.
My previous question had to do with passing text files as an argument to a function, which I managed to figure out with help, so thank you to all who helped previously.
So, consider this code bit:
protected bool FindWordInFile(StreamReader wordlist, string word_to_find)
{
// Read the first line.
string line = wordlist.ReadLine();
while (line != null)
{
if(line.Contains(word_to_find))
{
return true;
}
// Read the next line.
line = wordlist.ReadLine();
}
return false;
}
What happens with this particular function if you call in it the following way:
temp_sentence_string = post_words[i]; //Takes the first string in the array FROM the array and binds it to a temporary string variable
WordCount.Text = WordCount.Text + " ||| " + temp_sentence_string;
word_count = temp_sentence_string.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
for (int word_pos = 0; word_pos < word_count.Length; word_pos++)
{
bool WhatEver = FindWordInFile(goodwords_string, word_count[word_pos]);
if (WhatEver == true)
{
WordTest.Text = WordTest.Text + "{" + WhatEver + "} ";
}
WordTest.Text = WordTest.Text + "{" + WhatEver + "}";
}
AND:
The string passed is "good times are good" and the text file has the word "good" in it is this:
good{True}times{False}are{False}good{False}
Pretty strange. It looks like what happened is that:
1. The sentence "good times are good" got put into an array, split by the detection of a space. This happened correctly.
2. The first array element, "good" was compared against the text file and returned True. So that worked.
3. It then went to the next word "times", compared it, came up False.
4. Went to the next word "are", compared it, came up False.
5. THEN it got to the final word, "good", BUT it evaluated to False. This should NOT have happened.
So, my question is - what happened? It looks like the function of FindWordInFile was perhaps not coded right on my end, and somehow it kept returning False even though the word "good" was in the text file.
Second Part: Repeaters in ASP.NET and C#
So I have a repeater object bound to an array that is INSIDE a for loop. This particular algorithm takes an array of sentences and then breaks them down into a temp array of words. The temp array of words is bound to the Repeater.
But what happens is, let's say I have two sentences to do stuff to...
And so it's inside a loop. It does the stuff to the first array of words, and then does it to the second array of words, but what happens in the displaying the contents of the array, it only shows the contents of the LAST array that was generated and populated. Even though it's in the for loop, my expectation was that it would show all the word arrays, one after the other. But it only shows the last one. So if there's 5 sentences to break up, it only shows the 5th sentence that was populated by words.
Any ideas why?
for (int i = 0; i < num_sentences; i++) //num_sentences is an int that counted the number of elements in the array of sentences that was populated before. It populates the array by splitting based on punctuation.
{
temp_sentence_string = post_words[i]; //Takes the first string in the array FROM the sentence array and binds it to a temporary string variable
WordCount.Text = WordCount.Text + " ||| " + temp_sentence_string;
word_count = temp_sentence_string.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); //create a word count array with one word as a singular element in the array
//We have the Word Count Array now. We can go through it with a loop, right?
for (int j = 0; j < word_count.Length; j++)
{
Boolean FoundIt = File
.ReadLines(#"c:\wordfilelib\goodwords.txt") // <- Your file name
.Any(line => line.Contains(word_count[j]));
WordTest.Text = WordTest.Text + FoundIt + "(" + word_count[j] + ")";
}
Repeater2.DataSource = word_count;
Repeater2.DataBind();
}
First Part
You are passing a StreamReader into the Find function. A StreamReader must be reset in order to be used multiple times. Test with the following sentence and you will see the result.
"good good good good"
you will get
good{true}good{false}good{false}good{false}
I would suggest reading the file into an array only one time and then do your processing over the array.
using System.Linq
using System.Collections.Generic;
public class WordFinder
{
private static bool FindWordInLines(string word_to_find, string[] lines)
{
foreach(var line in lines)
{
if(line.Contains(word_to_find)
return true;
}
return false;
}
public string SearchForWordsInFile(string path, string sentence)
{
// https://msdn.microsoft.com/en-us/library/s2tte0y1(v=vs.110).aspx
var lines = System.IO.File.ReadAllLines(path);
var words = sentence.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
var result = string.Empty;
foreach(var word in words)
{
var found = FindWordInLines(word, lines);
// {{ in string.Format outputs {
// }} in string.Format outputs }
// {0} says use first parameter's ToString() method
result += string.Format("{{{0}}}", found);
}
return result;
}
}
Second Part:
If you bind it in the for loop like that it will only bind to the last result. If you accumulate the results in the outer loop you can pass the accumulated results to the repeater and bind outside the loop.
I created a sample loop class below that has two loops. The "resultList" is the variable that accumulates the results.
using System.Collections.Generic;
public class LoopExample
{
public void RunLoopExample()
{
var outerList = new string[]{"the", "quick", "brown", "fox"};
var innerList = new string[]{"jumps", "over", "the", "lazy", "dog"};
// define the resultList variable outside the outer loop
var resultList = new List<string>();
for(int outerIndex = 0; outerIndex < outerList.Length; outerIndex ++)
{
var outerValue = outerList[outerIndex];
for(int innerIndex = 0; innerIndex < innerList.Length; innerIndex++)
{
var innerValue = innerList[innerIndex];
resultList.Add(string.Format("{0}->{1}; ", outerValue, innerValue));
}
}
// use the resultList variable outside the outer loop
foreach(var result in resultList )
{
Console.WriteLine(result);
}
}
}
In your example, you would set the dataSource to the resultList
Repeater2.DataSource = resultList;
Repeater2.DataBind();

How to remove a duplicate set of characters in a string

For example a string contains the following (the string is variable):
http://www.google.comhttp://www.google.com
What would be the most efficient way of removing the duplicate url here - e.g. output would be:
http://www.google.com
I assume that input contains only urls.
string input = "http://www.google.comhttp://www.google.com";
// this will get you distinct URLs but without "http://" at the beginning
IEnumerable<string> distinctAddresses = input
.Split(new[] {"http://"}, StringSplitOptions.RemoveEmptyEntries)
.Distinct();
StringBuilder output = new StringBuilder();
foreach (string distinctAddress in distinctAddresses)
{
// when building the output, insert "http://" before each address so
// that it resembles the original
output.Append("http://");
output.Append(distinctAddress);
}
Console.WriteLine(output);
Efficiency has various definitions: code size, total execution time, CPU usage, space usage, time to write the code, etc. If you want to be "efficient", you should know which one of these you're trying for.
I'd do something like this:
string url = "http://www.google.comhttp://www.google.com";
if (url.Length % 2 == 0)
{
string secondHalf = url.Substring(url.Length / 2);
if (url.StartsWith(secondHalf))
{
url = secondHalf;
}
}
Depending on the kinds of duplicates you need to remove, this may or may not work for you.
collect strings into list and use distinct, if your string has http address you can apply regex http:.+?(?=((http:)|($)) with RegexOptions.SingleLine
var distinctList = list.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();
Given you don't know the length of the string, you don't know if something is double and you don't know what is double:
string yourprimarystring = "http://www.google.comhttp://www.google.com";
int firstCharacter;
string temp;
for(int i = 0; i <= yourprimarystring.length; i++)
{
for(int j = 0; j <= yourprimarystring.length; j++)
{
string search = yourprimarystring.substring(i,j);
firstCharacter = yourprimaryString.IndexOf(search);
if(firstCharacter != -1)
{
temp = yourprimarystring.substring(0,firstCharacter) + yourprimarystring.substring(firstCharacter + j - i,yourprimarystring.length)
yourprimarystring = temp;
}
}
This itterates through all your elements, takes all out from first to last letter and searches for them like this:
ABCDA - searches for A finds A exludes A, thats the problem, you need to specify how long the duplication needs to be if you want to make it variable, but maybe my code helps you.

c# how to put several groups of words from textfile into arrays

i have a textfile containing these kind of words
PEOPLE
John
0218753458
ENTERPRISE
stock
30%
HOME
Indiana
West Virginia
PEOPLE
Vahn
031245678
ENTERPRISE
Inc
50%
HOME
melbourne
Australia
i want to split these files into some strings that will divide the into each groups of PEOPLE, ENTERPRISE, and HOME. for example the output will be
part[0]
PEOPLE
John
0218753458
part[1]
ENTERPISE
stock
30%
part[2]
HOME
Indiana
West Virginia
and so on
i have a plan of using
EDIT #1 (thanks #Slade)
string[] part = s.Split(new string[] { "PEOPLE","ENTERPRISE","HOME" }, StringSplitOptions.None);
i can't change the structure.
is there any way to keep the HEADER? or better way to do this?
Don't use the || operator, that's for conditional/logical OR expressions. Instead, when filling elements of an array like you are doing, use a comma, like so:
string[] part = s.Split(new string[] { "PEOPLE", "ENTERPRISE", "HOME" }, StringSplitOptions.None);
However, unless you are always going to have these headings, it is not a good way of trying to split your text file. Instead, you need to define some structure to your file. For example, if you are always going to have headers in FULL CAPS, then you may want to start by splitting your text file into lines, then looping through each element and group the elements each time you hit a line containing only characters in FULL CAPS.
Personally, if possible, I would change the text file structure so you can flag headers with some symbol before or after: e.g. :THIS IS A HEADER. That way, you can split into lines then just look for the : symbol at the start of a line.
EDIT
For a sample approach on how to go about parsing this with the FULL CAPS headers, see my code example on PasteBin.
Note: The line ...
string[] lines = File.ReadAllLines(#"Sample.txt");
... could be replaced with ...
string textFromFile = File.ReadAllText(#"Sample.txt");
string[] lines = textFromFile.Split(new string[1] { System.Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
Using regex and because you want to keep the split strings in the results:
string[] tmp = Regex.Split(originalString, #"(PEOPLE|ENTERPRISE|HOME)");
List result = new List();
for(var i = 1; i < tmp.Count() - 1; i += 2) {
result.Add(tmp[i] + tmp[i+1]);
}
This gives you the result you want.
The reason why I'm concatenating the tmp array is because as of .NET 2.0, the Regex.Split will return the split strings as part of the array. I also start the indexing at 1 because we want our concatenation to happen late
s.Split(new string[] {"PEOPLE", "ENTERPRISE", ... }, StringSplitOptions.RemoveEmptyEntries);
And if you want save headers itself than possiblle it will be preferable to split your string multiple times by each arguments and add header by hands. For example you split your string by People and add people header to each chunk. Then split each chunk by HOME and add HOME header by hands and so on.
I'm going to give an answer that doesn't exactly match up with what you've asked for, so if you're dead set on having the output you've defined in your question then please disregard. Otherwise, I hope this is useful;
var peopleList = new List<string>();
var enterpriseList = new List<string>();
var homeList = new List<string>();
List<string> workingList = null;
using (var reader = new StreamReader("input.txt"))
{
string line = reader.ReadLine();
while (line != null)
{
switch (line)
{
case "PEOPLE": { workingList = peopleList; } break;
case "ENTERPRISE": { workingList = enterpriseList; } break;
case "HOME": { workingList = homeList; } break;
default: { workingList.Add(line); } break;
}
line = reader.ReadLine();
}
}
Based on your sample input, this will populate three lists as follows;
peopleList = { "John", "0218753458", "Vahn", "031245678" }
enterpriseList = { "stock", "30%", "Inc", "50%" }
homeList = { "Indiana", "West Virginia", "melbourne", "Australia" }

Categories