Divide a Text Document into Sections using c# - c#

I'm parsing a text file that has a semi-known repeating structure. There is a heading (1 line), a sub-heading(1 line or 2 lines), and a content area (arbitrary # of lines).
The format for each item in the document is shown below:
=========================
Head Text 1
=========================
SubHead Text1
SubHead Text2
=========================
Content Text Line 1
Content Text Line 2
...
Content Text Line 8
=========================
Head Text 2
=========================
SubHead Text1
SubHead Text2
=========================
Content Text Line 1
Content Text Line 2
...
Content Text Line 6
I would like each section to be inside a unique object, each with 3 sections... somethign like
section1.head
section1.subHead
section1.content
section2.head
section2.subHead
section2.content
The only way I can think of accomplishing this involves a lot of if and while statements. Is there an efficient way of accomplishing this?
I originally tried writing some code in JScipt, but I'm reading a RTF file and C# provides an easy way of converting RTF to plain text. It didn't work very well, I kept skipping some dividers and would get an error at the end of the file.
page = new Array();
fso = new ActiveXObject("Scripting.FileSystemObject");
f = fso.GetFile("test.rtf");
is = f.OpenAsTextStream( forReading, -2 );
var count = 0;
while( !is.AtEndOfStream ){
page[count] = is.ReadLine();
count++; ;
}
is.Close();
WScript.Echo( page[0].text);
var item = [];
var section = 0;
var i = 0, k = 0;
while (i < page.length) {
item[k] = {};
if (!page[i].indexOf("=====")) {
i++;
item[k].head = page[i];
i+=2;
while(page[i].indexOf("=====")) { // WScript.Echo( "index = " + i + " "+ page[i] +"\n" + "Next index = " + (i+1) + " "+ page[i+1] +"\n" );
item[k].subHead += page[i];
i++;
}
k++;
}
i++;
}

If you want to cut on the IFs, you could implement a state pattern submitting each line to the current state.
http://en.wikipedia.org/wiki/State_pattern

Related

Calculating richtextbox lines count asynchronously

I write a code that counts the number of lines and text length from richtextbox content. With small chunks of text it work perfect. But when there large chunks of text (more than 100k) when I press "Enter" or "Backspace" in richtextbox, response time becomes very slow. For example: https://i.imgur.com/QO2UrAw.gifv
My question. What a better way to run this code asynchronously?
Archive with the test project https://gofile.io/?c=LpF409
private void StatusPanelTextInfo()
{
int currentColumn = 0;
int currentLine = 0;
int linesCount = 0;
if (statusStrip1.Visible)
{
currentColumn = 1 + richTextBox1.SelectionStart - richTextBox1.GetFirstCharIndexOfCurrentLine();
RichTextBox rtb = new RichTextBox
{
WordWrap = false,
Text = richTextBox1.Text
};
currentLine = 1 + rtb.GetLineFromCharIndex(richTextBox1.SelectionStart);
linesCount = richTextBox1.Lines.Count();
if (linesCount == 0)
{
linesCount = 1;
}
}
toolStripStatusLabel1.Text = "Length: " + richTextBox1.TextLength;
toolStripStatusLabel2.Text = "Lines: " + linesCount;
toolStripStatusLabel3.Text = "Ln: " + currentLine;
toolStripStatusLabel4.Text = "Col: " + currentColumn;
}
I downloaded your code and I can not understand why do you create a new RichTextBox every time you call StatusPanelTextInfo method:
RichTextBox rtb = new RichTextBox
{
WordWrap = false,
Text = richTextBox1.Text
};
This is the reason you got such a lag in your program. Each time you change/select text, you create a new RichTextBox object and copy a large amount of text to its Text property. You should remove this code, and then it works fast enough. Just replace rtb in your calculation of currentLine with richTextBox1.
Next time please provide your code in your question instead of making people download it from outer link. Your whole form class was about 60 lines. With proper selection you could have given us all the info we needed using 20 lines.

C# - Reading a file to a list and splitting on a delimiter

I have a text file that I need to pull individual values from. An example of this is:
Name: John Doe
Key Length: 3
a90nm84ang9834n
90v84jgseidfrlg
f39048s9ipu4sdd
Random: true
And I would need my output to be something like:
Visitor: John Doe
Key Value: a90nm84ang9834n90v84jgseidfrlgf39048s9ipu4sdd
Right now, I am reading the file into a list and calling on the values individually, but this doesn't allow me to rename the first value of the string (e.g. Name -> Visitor).
My real question is after the file is read into the list, is it possible to further split each of those lines off of a delimiter and reference 1 portion of the pair?
Edit - Here's a sample of the code I'm using, but it doesn't do what I am trying to do:
string path = #"C:\temp\foo.txt";
List<string> lines = File.ReadAllLines(path).ToList();
Console.WriteLine("Filename: " + path);
Console.WriteLine("Length: " + lines[1]); //This outputs "Length: Key Length: 3"
Assuming your data is all formatted the same...how about something like this:
private static void ParseDataFile(string dataFile)
{
var lines = File.ReadAllLines(dataFile);
for (var i = 0; i < lines.Length; i++)
{
if (lines[i].Contains("Name"))
{
Console.WriteLine($"Visitor: {lines[i].Remove(0, 6)}");
var keyLineCount = Convert.ToInt32(lines[++i].Remove(0, 12));
string key = string.Empty;
for (var j = 0; j < keyLineCount; j++)
{
key += lines[++i];
}
i++;
Console.WriteLine($"Key Value: {key}");
}
}
}
To answer your specific question: Yes, it is possible to split strings on various characters at different times:
string s = "1234567890";
string[] parts1 = s.Split('5'); // 2 parts "1234" and "67890"
string[] parts2 = parts1[1].Split('7','9'); // 3 parts "6", "8" and "0"
etc.

How to remove lines one by one in Richtextbox C#

I use this code to delete lines one by one in richtextbox, but still leaving an empty line (whitespace).
var text = "";//Holds the text of current line being looped.
var startindex = 0;//The position where selection starts.
var endindex = 0;//The length of selection.
for (int i = 0; i < richtextbox1.Lines.Length; i++)//Loops through each line of text in RichTextBox
{
text = richtextbox1.Lines[i]; //Stores current line of text.
startindex = richtextbox1.GetFirstCharIndexFromLine(i);
endindex = text.Length;
richtextbox1.Select(startindex, endindex);
MessageBox.Show(richtextbox1.SelectedText);
richtextbox1.SelectedText = "";
}
How do I delete lines one by one without empty lines (whitespace)?
// Gets the number of newline characters in your rich text box
var numberOfNewLines = richTextBox1.Text.Count(r => r == '\n');
for (var i = 0; i < numberOfNewLines; i++)
{
// Finds the first occurance of the newline character
var newlineCharacterIndex = richTextBox1.Text.IndexOf('\n') + 1;
// Replaces the rich textbox text with everything but the above line
richTextBox1.Text = richTextBox1.Text.Substring(newlineCharacterIndex);
MessageBox.Show("OK!");
}
// Removes the final line.
richTextBox1.Text = string.Empty;
I think you were on the right track, but the way you were doing it was just removing the contents of the line and not the line itself.

How to add the values to the List<long[]> from the text file?

I have a text file this is a small part of him its format:
DANNY VIDEO HISTOGRAM DATA
FORMAT VERSION:1.00
SOURCE: <MVI_2483.AVI_Automatic>
DATA:
Frame 000000: 5977,40775,174395,305855,265805
Frame 000001: 5432,21333,456789,123456,111111
Now every line Frame.....Have 256 numbers 5977,40775,174395,305855,265805
In the example i show here only 5 numbers but each line/frame have 256 numbers.
In Form1 i have a List: List Histograms
What i need to do is to read the text file in this case the text file name is Histograms.txt to read the text file and each line/frame with 256 numbers i need to add back to the List.
So the List Histograms will be in the end that in index [0] for example i will have 256 indexes in [0] 5977 in 1 40775 in [2] 174395 in [3] 305955 in [4] 265805 and so on 256 numbers.
Then in index 1 again 256 numbers ...
Then in index [2] and so on...
In the end i should have in the List 3803 index that each index have inside 256 index that each one contain a number.
This is the code of how i am writing the text file when the List is with the numbers and then when i am running the program again i need that it will read and load the text file back to the List. When i am running the program the List is empty.
private void WriteHistograms() // For automatic mode only for now
{
HistogramsFile = new StreamWriter(_outputDir + "\\" + averagesListTextFileDirectory + "\\" + "Histograms.txt", false, Encoding.ASCII);
HistogramsFile.WriteLine("DANNY VIDEO HISTOGRAM DATA\r\nFORMAT VERSION:1.00\r\nSOURCE: " + "<" + averagesListTextFile + ">" + "\r\nDATA: ");
for (int i = 0; i < Histograms.Count; i++)
{
HistogramsFile.Write("Frame " + i.ToString("D6") + ": ");
for (int x = 0; x < Histograms[i].Length; x++ )
{
HistogramsFile.Write(Histograms[i][x] + ",");
}
HistogramsFile.WriteLine("!");
}
HistogramsFile.WriteLine("DATA");
HistogramsFile.Close();
}
Now i have another function: LoadHistograms(), i need to read the text file and add back the numbers to the List.
I added now a photo of the List how it is when writing it to the text file and how the List should looks like after reading it back from the text file.
You can have a fairly good idea how to get it done by using this piece of code.
string line;
List<long[]> list = new List<long[]>();
using (StreamReader file = new StreamReader(#"..\..\Histograms.txt"))
{
do { line = file.ReadLine(); } while (!line.Trim().Equals("DATA:"));
while ((line = file.ReadLine()) != null)
{
long[] valArray = new long[256];
var split = line.Split(new char[] { ':' });
if (split.Length == 2)
{
var valArrayStr = split[1].Split(new char[] { ',' });
for (int i = 0; i < valArrayStr.Length; i++)
{
int result;
if (int.TryParse(valArrayStr[i].Trim(), out result))
valArray[i] = result;
}
}
list.Add(valArray);
}
}
Happy Coding...

How to loop through all text files in a directory C#

This piece of code takes a row from 1.txt and splits it into columns. Now I have a directory of 200 + files with ending something.txt and I want them all to open one at a time and this process below run . What is the easiest way to loop thro all the files without changing my code too much ?
Snippet of code currently ;
string _nextLine;
string[] _columns;
char[] delimiters;
delimiters = "|".ToCharArray();
_nextLine = _reader.ReadLine();
string[] lines = File.ReadAllLines("C:\\P\\DataSource2_W\\TextFiles\\Batch1\\1.txt");
//Start at index 2 - and keep looping until index Length - 2
for (int i = 3; i < lines.Length - 2; i++)
{ _columns = lines[i].Split('|');
// Check if number of cols is 3
if (_columns.Length == 146)
{
JazzORBuffer.AddRow();
JazzORBuffer.Server = _columns[0];
JazzORBuffer.Country = _columns[1];
JazzORBuffer.QuoteNumber = _columns[2];
JazzORBuffer.DocumentName =_columns[3];
JazzORBuffer.CompanyNameSoldTo=_columns[4];
}
else
{
// Debug or messagebox the line that fails
MessageBox.Show("Cols:" + _columns.Length.ToString() + " Line: " + lines[i]);
return;
}
}
You can simply use Directory.EnumerateFiles() to iterate over the files colection of the specified directory.
So you can insert your code inside foreach loop, like:
foreach (var file in
Directory.EnumerateFiles(#"C:\\P\\DataSource2_W\\TextFiles\\Batch1", "*.txt"))
{
//your code
}

Categories