Splitting in multi line a multiline text at a specific line position - c#

I have a text file with this multi line structure:
12345 beautiful text in line01
95469 other text in line02
16987 nice text in line03
(etc...)
and want this:
12345
beautiful text in line01
95469
other text in line02
16987
nice text in line03
So, for every line, at position 5 i need a new line for the textual string.
Tried inserting \n with string.Remove().Insert() but works only for first line.
How can I do this?
EDIT
Code added by request
In input.txt there is the multiline textfile.
StreamReader myReader = new StreamReader("input.txt");
string myString00 = myReader.ReadLine();
string myStringFinal = myString00;
myStringFinal = myStringFinal.Remove(5, 1).Insert(5, "\n");
myReader.Close();
FileStream myFs = new FileStream("output.txt", FileMode.Create);
// First, save the standard output.
TextWriter tmp = Console.Out;
StreamWriter mySw = new StreamWriter(myFs);
Console.SetOut(mySw);
Console.WriteLine(myStringFinal);
Console.SetOut(tmp);
Console.WriteLine(myStringFinal);
mySw.Close();
Console.ReadLine();

Here is something you can try with a Regex
var subject = #"12345 beautiful text in line01
95469 other text in line02
16987 nice text in line03";
var expected = Regex.Replace(subject,#"(\d{5})\s?","$1\r\n");
Basically this finds 5 digits followed by a space(optional), if found replaces it with digits and a new line. And you're done.

This will work only if the number is exactly 5 characters.
string input = #"12345 beautiful text in line01
95469 other text in line02
16987 nice text in line03";
var lines = input.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
var formattedLines = lines
.Select(x => new
{
Number = int.Parse(x.Substring(0, 5)),
Data = x.Substring(5).TrimStart()
})
.ToList();
formattedLines will be a collection of your lines, with Number and Data holding the info from the lines.
var firstLinesData = formattedLines[0].Data;
So now, to make your output format:
StringBuilder builder = new StringBuilder();
foreach (var item in formattedLines)
{
builder.AppendLine(item.Number.ToString());
builder.AppendLine(item.Data);
}
string output = builder.ToString();

loop over each line. Use substring (http://msdn.microsoft.com/en-us/library/aka44szs(v=vs.110).aspx) to get the first 5 character as a string. Use a string builder and add the first part, grab the next part and add tot he string builder

Related

Got wrong data while split file into arrays

I have file contains two lines and each line contains 200 fields and I would like to split it into arrays
using (StreamReader sr = File.OpenText(pathSensorsCalc))
{
string s = String.Empty;
while ((s = sr.ReadLine()) == null) { };
String line1 = sr.ReadToEnd();
String line2 = sr.ReadToEnd();
CalcValue[0] = new String[200];
CalcValue[1] = new String[200];
CalcValue[0] = line1.Split(' ');
CalcValue[1] = line2.Split(' ');
}
After the code above, CalcValue[1] is empty and CalcValue[0] contains data of the second line (instad of the first one). Any ideas?
When using
sr.ReadToEnd()
, you are reading to the end of your input stream. That means, after the first call of
String line1 = sr.ReadToEnd()
your stream is already at the last position. Replace your ReadToEnd() call with ReadLine() calls. That should work.
In the Windows OS, a new line is represented by \r\n. So you should not split the lines by spaces (" ").
Which means you should use another overload of the Split method - Split(char[], StringSplitOptions). The first argument is the characters you want to split by and the second is the options. Why do you need the options? Because if you split by 2 continuous characters you get an empty element.
So now it is easy to understand what this code does and why:
line1.Split (new[] {'\r', '\n'}, StringSplitOptions.RemoveEmptyEntries);

Place every sentence from a text file into an array but detect headers/titles

I need to get each sentence from a text document/string into an array.
The issue is with how to handle headers, titles etc. sections of text which are not part of a sentence, but don't end in a full stop ". " to detect.
Being unable to detect these will result them being stuck on to the front of the following sentence (if I use ". " to distinguish sentences) which I can't have happen.
Initially I was going to use:
contentRefined = content.Replace(" \n", ". ");
Which I thought would remove all of the empty lines and newlines, as well as place full stops on the ends of headers to be detected and treated as sentences, it would result in ". . " but I could again Replace them with nothing.
But didn't work it simply left the full empty lines and just put a ". " at the start of the empty line.... As well as ". " at the start of every paragraph
I have now tried:
contentRefined = Regex.Replace(content, #"^\s+$[\r\n]*", "", RegexOptions.Multiline);
Which fully removes the full empty lines, but doesn't get me closer to adding a full stop to the ends of the headers.
I need to place the sentences and headers/titles in an array, I'm not sure if there is a method of which I can do this without having to split the string by something such as ". "
Edit: Full current code showing how I get the test from the file
public void sentenceSplit()
{
content = File.ReadAllText(#"I:\Project\TLDR\Test Text.txt");
contentRefined = Regex.Replace(content, #"^\s+$[\r\n]*", "", RegexOptions.Multiline);
//contentRefined = content.Replace("\n", ". ");
}
I'm making an assumption that 'Header' and 'Title' are on their own line and do not end in a period.
If that's the case, then this may work for you:
var filePath = #"C:\Temp\temp.txt";
var sentences = new List<string>();
using (TextReader reader = new StreamReader(filePath))
{
while (reader.Peek() >= 0)
{
var line = reader.ReadLine();
if (line.Trim().EndsWith("."))
{
line.Split(new[] {'.'}, StringSplitOptions.RemoveEmptyEntries)
.ToList()
.ForEach(l => sentences.Add(l.Trim() + "."));
}
}
}
// Output sentences to console
sentences.ForEach(Console.WriteLine);
UPDATE
Another approach using the File.ReadAllLines() method, and displaying the sentences in a RichTextBox:
private void Form1_Load(object sender, EventArgs e)
{
var filePath = #"C:\Temp\temp.txt";
var sentences = File.ReadAllLines(filePath)
// Only select lines that end in a period
.Where(l => l.Trim().EndsWith("."))
// Split each line into sentences (one line may have many sentences)
.SelectMany(s => s.Split(new[] {'.'}, StringSplitOptions.RemoveEmptyEntries))
// Trim any whitespace off the ends of the sentence and add a period to the end
.Select(s => s.Trim() + ".")
// And finally cast it to a List (or you could do 'ToArray()')
.ToList();
// To show each sentence in the list on it's own line in the rtb:
richTextBox1.Text = string.Join("\n", sentences);
// Or to show them all, one after another:
richTextBox1.Text = string.Join(" ", sentences);
}
UPDATE
Now that I think I understand what you're asking, here's what I would do. First, I would create some classes to manage all this stuff. If you break the document down into parts, you get something like:
HEADER
Paragraph sentence one. Paragraph sentence two. Paragraph
sentence three with a number, like in this quote: "$5.00 doesn't go as
far as it used to".
Header Over an Empty Section
Header over multiple paragraphs
Paragraph sentence one. Paragraph
sentence two. Paragraph sentence three with a number, like in this
quote: "$5.00 doesn't go as far as it used to".
Paragraph sentence one. Paragraph sentence two. Paragraph sentence
three with a number, like in this quote: "$5.00 doesn't go as far as
it used to".
Paragraph sentence one. Paragraph sentence two. Paragraph sentence
three with a number, like in this quote: "$5.00 doesn't go as far as
it used to".
So I would create the following classes. First, one to represent a 'Section'. This is defined by a Header and zero to many paragraphs:
private class Section
{
public string Header { get; set; }
public List<Paragraph> Paragraphs { get; set; }
public Section()
{
Paragraphs = new List<Paragraph>();
}
}
Then I would define a Paragraph, which contains one or more sentences:
private class Paragraph
{
public List<string> Sentences { get; set; }
public Paragraph()
{
Sentences = new List<string>();
}
}
Now I can populate a List of Sections to represent the document:
var filePath = #"C:\Temp\temp.txt";
var sections = new List<Section>();
var currentSection = new Section();
var currentParagraph = new Paragraph();
using (TextReader reader = new StreamReader(filePath))
{
while (reader.Peek() >= 0)
{
var line = reader.ReadLine().Trim();
// Ignore blank lines
if (string.IsNullOrWhiteSpace(line)) continue;
if (line.EndsWith("."))
{
// This line is a paragraph, so add all the sentences
// it contains to the current paragraph
line.Split(new[] {". "}, StringSplitOptions.RemoveEmptyEntries)
.Select(l => l.Trim().EndsWith(".") ? l.Trim() : l.Trim() + ".")
.ToList()
.ForEach(l => currentParagraph.Sentences.Add(l));
// Now add this paragraph to the current section
currentSection.Paragraphs.Add(currentParagraph);
// And set it to a new paragraph for the next loop
currentParagraph = new Paragraph();
}
else if (line.Length > 0)
{
// This line is a header, so we're starting a new section.
// Add the current section to our list and create a
// a new one, setting this line as the header.
sections.Add(currentSection);
currentSection = new Section {Header = line};
}
}
// Finally, if the current section contains any data, add it to the list
if (currentSection.Header.Length > 0 || currentSection.Paragraphs.Any())
{
sections.Add(currentSection);
}
}
Now we have the whole document in a list of sections, and we know the order, the headers, the paragraphs, and the sentences they contain. As an example of how you can analyze it, here's a way to write it back out to a RichTextBox:
// We can build the document section by section
var documentText = new StringBuilder();
foreach (var section in sections)
{
// Here we can display headers and paragraphs in a custom way.
// For example, we can separate all sections with a blank line:
documentText.AppendLine();
// If there is a header, we can underline it
if (!string.IsNullOrWhiteSpace(section.Header))
{
documentText.AppendLine(section.Header);
documentText.AppendLine(new string('-', section.Header.Length));
}
// We can mark each paragraph with an arrow (--> )
foreach (var paragraph in section.Paragraphs)
{
documentText.Append("--> ");
// And write out each sentence, separated by a space
documentText.AppendLine(string.Join(" ", paragraph.Sentences));
}
}
// To make the underline approach above look
// half-way decent, we need a fixed-width font
richTextBox1.Font = new Font(FontFamily.GenericMonospace, 9);
// Now set the RichTextBox Text equal to the StringBuilder Text
richTextBox1.Text = documentText.ToString();

Deleting text from text file

I want to know how I can delete a certain amount of text from a file on each line.
I can not think of a way of accomplishing such a task.
878 57 2
882 63 1
887 62 1
1001 71 0
1041 79 1
1046 73 2
This is what the text file looks like but I only want the numbers that are on the very left. I can not manually the 2 rows on the right because there is over 16,000 lines of this.
The numbers on the left also change in length so I can't read them by length.
I'm also not sure what character the numbers are separated by, it may be tab.
Anyone have any ideas on what I could try?
If you wish to take a look at the text file, here: http://pastebin.com/xyaCsc6W
var query = File.ReadLines("input.txt")
.Where(x => char.IsDigit(x.FirstOrDefault()))
.Select(x => string.Join("", x.TakeWhile(char.IsDigit)));
File.WriteAllLines("output.txt", query);
string line;
using (var sr = new StreamReader(#"E:\test1.txt"))
{
using (var sw = new StreamWriter(#"E:\test1.tmp"))
{
while (!sr.EndOfStream)
{
line = sr.ReadLine();
line = Regex.Match(line, #"([\d]*)").Groups[1].Value;
sw.WriteLine(line);
}
}
}
File.Replace(#"E:\test1.tmp", #"E:\test1.txt", null);
You could do:
var col =
from s in File.ReadAllLines(input_file_name);
select s.Split(" ".ToCharArray())[0];
Note: In the Split(" ") I have a space and a tab characters.
StringBuilder sb = new StringBuilder();
//read the line by line of file.txt
using (StreamReader sr = new StreamReader("file.txt"))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
//for each line identify the space
//cut the data from beginning of each line to where it finds space
string str = line.Substring(0, line.IndexOf(' '));
//Append each modifed line into string builder object
sb.AppendLine(str);
}
}
//Create temp newfile
using (File.Create("newfile.txt"))
{
//create newfile to store the modified data
}
//Add modified data into newfile
File.WriteAllText("newfile.txt",sb.ToString());
//Replace with new file
File.Replace("newfile.txt", "file.txt", null);
You could also do this which will give you a list of results of only the left (column) characters (numeric/alphanumeric) of the text file:
var results = File.ReadAllLines("filename.txt")
.Select(line => line.Split('\t').First())
.ToList();
It looks like the text file is delimited by tabs.
To save the list of results back into a text file add the following in addition:
File.WriteAllLines("results.txt", results.ToArray());

split a string from a text file into another list

Hi i know the Title might sound a little confusing but im reading in a text file with many lines of data
Example
12345 Test
34567 Test2
i read in the text 1 line at a time and add to a list
using (StreamReader reader = new StreamReader("Test.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
list.Add(line);
}
}
how do i then separate the 1234 from the test so i can pull only the first column of data if i need like list(1).pars[1] would be 12345 and list(2).pars[2] would be test2
i know this sounds foggy but i hope someone out there understands
Maybe something like this:
string test="12345 Test";
var ls= test.Split(' ');
This will get you a array of string. You can get them with ls[0] and ls[1].
If you just what the 12345 then ls[0] is the one to choose.
If you're ok with having a list of string[]'s you can simply do this:
var list = new List<string[]>();
using (StreamReader reader = new StreamReader("Test.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
list.Add(line.Split(' '));
}
}
string firstWord = list[0][0]; //12345
string secondWord = list[0][1]; //Test
When you have a string of text you can use the Split() method to split it in many parts. If you're sure every word (separated by one or more spaces) is a column you can simply write:
string[] columns = line.Split(' ');
There are several overloads of that function, you can specify if blank fields are skipped (you may have, for example columns[1] empty in a line composed by 2 words but separated by two spaces). If you're sure about the number of columns you can fix that limit too (so if any text after the last column will be treated as a single field).
In your case (add to the list only the first column) you may write:
if (String.IsNullOrWhiteSpace(line))
continue;
string[] columns = line.TrimLeft().Split(new char[] { ' ' }, 2);
list.Add(columns[0]);
First check is to skip empty or lines composed just of spaces. The TrimLeft() is to remove spaces from beginning of the line (if any). The first column can't be empty (because the TrimLeft() so yo do not even need to use StringSplitOptions.RemoveEmptyEntries with an additional if (columns.Length > 1). Finally, if the file is small enough you can read it in memory with a single call to File.ReadAllLines() and simplify everything with a little of LINQ:
list.Add(
File.ReadAllLines("test.txt")
.Where(x => !String.IsNullOrWhiteSpace(x))
.Select(x => x.TrimLeft().Split(new char[] { ' ' }, 2)[0]));
Note that with the first parameter you can specify more than one valid separator.
When you have multiple spaces
Regex r = new Regex(" +");
string [] splitString = r.Split(stringWithMultipleSpaces);
var splitted = System.IO.File.ReadAllLines("Test.txt")
.Select(line => line.Split(' ')).ToArray();
var list1 = splitted.Select(split_line => split_line[0]).ToArray();
var list2 = splitted.Select(split_line => split_line[1]).ToArray();

How do I save a multi-line textbox as one line to a text file?

Ok guys, I am making a function to save a file. I have come across a problem in that when I save the data from multi-line text boxes it saves x amount of lines as x amount of lines in the text file.
So for example if the user entered:
line one
line two
line three
it would show as:
line one
line two
line three
as I want it to display as:
line one \n line two \n line three \n
The code I have is:
savefile.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
savefile.Title = "Save your file";
savefile.FileName = "";
savefile.Filter = "ChemFile (*.cd)|*.cd|All Files|*.*";
if (savefile.ShowDialog() != DialogResult.Cancel)
{
// save the text file information
for (int i = 0; i < noofcrit; i++)
{
cdfile[i] = crittextbs[i].Text;
}
}
// Compile the file
SaveFile = savefile.FileName;
System.IO.File.WriteAllLines(SaveFile, cdfile);
Any ideas how I can save multiline text files as one line? Thanks.
Replace Newline character with #" \n "or" \\n ", using # to ignore any escape char
string s= yourTextBox.Text.Replace(Environment.NewLine, #" \n "));
I think you may need to do something like this. I'm not actually sure what the best way is to show escape characters. Also, I would use a StreamWriter.
string myData = txtMyTextBox.Text.Replace("\r"," \\r ").Replace("\n"," \\n ");
using(System.IO.StreamWriter sw = new System.IO.StreamWriter(filePath))
{
sw.Write(myData);
}
If you get the multi-line strings in a string array, you could just join them into a single line:
string[] multiline = new []{"multi","line","text"};
string singleLine = string.Join(#"\n",multiline);
if it's all a single line, a simple Replace would do the trick,
string singleLine = multiline.Replace("\r",string.Empty).Replace("\n",#"\n");
It's all one line really ;)
Multi-line text boxes depending on platform (Win32 here) will save as:
Line\r\n
Line\r\n
Line\r\n
So you just need to replace \r\n with \n or whatever character replacement you want.

Categories