C# get text from file between two hashes - c#

In my C# program (at this point) I have two fields in my form. One is a word list using a listbox; the other is a textbox. I have been able to successfully load a large word list into the listbox from a text file. I can also display the selected item in the listbox into the textbox this way:
private void wordList_SelectedIndexChanged(object sender, EventArgs e)
{
string word = wordList.Text;
concordanceDisplay.Text = word;
}
I have another local file I need to get at to display some of its contents in the textbox. In this file each headword (as in a dictionary) is preceded by a #. So, I would like to take the variable 'word' and search in this local file to put the entries into the textbox, like so:
#headword1
entry is here...
...
...
#headword2
entry is here...
...
...
#headword3
entry is here...
...
...
You get the format of the text file. I just need to search for the correct headword with # before that word, and copy all info from there until the next hash in the file, and place it in the text box.
Obviously, I am a newbie, so be gentle. Thanks much.
P.S. I used StreamReader to get at the word list and display it in the listbox like so:
StreamReader sr = new StreamReader("C:\\...\\list-final.txt");
string line;
while ((line = sr.ReadLine()) != null)
{
MyList.Add(line);
}
wordList.DataSource = MyList;

var sectionLines = File.ReadAllLines(fileName) // shortcut to read all lines from file
.SkipWhile(l => l != "#headword2") // skip everything before the heading you want
.Skip(1) // skip the heading itself
.TakeWhile(l => !l.StartsWith("#")) // grab stuff until the next heading or the end
.ToList(); // optional convert to list

string getSection(string sectionName)
{
StreamReader sr = new StreamReader(#"C:\Path\To\file.txt");
string line;
var MyList = new List<string>();
bool inCorrectSection = false;
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith("#"))
{
if (inCorrectSection)
break;
else
inCorrectSection = Regex.IsMatch(line, #"^#" + sectionName + #"($| -)");
}
else if (inCorrectSection)
MyList.Add(line);
}
return string.Join(Environment.NewLine, MyList);
}
// in another method
textBox.Text = getSection("headword1");
Here are a few alternate ways to check if the section matches, in rough order of how accurate they are in detecting the right section name:
// if the separator after the section name is always " -", this is the best way I've thought of, since it will work regardless of what's in the sectionName
inCorrectSection = Regex.IsMatch(line, #"^#" + sectionName + #"($| -)");
// as long as the section name can't contain # or spaces, this will work
inCorrectSection = line.Split('#', ' ')[1] == sectionName;
// as long as only alphanumeric characters can ever make up the section name, this is good
inCorrectSection = Regex.IsMatch(line, #"^#" + sectionName + #"\b");
// the problem with this is that if you are searching for "head", it will find "headOther" and think it's a match
inCorrectSection = line.StartsWith("#" + sectionName);

Related

How would I access a txt file and split the links

Alright, I have a program that grabs links off of a website and puts it into a txt BUT the links aren't separated onto their own lines and I need to somehow do that without having to manually do it myself, here is the code used to grab the links off of the website, write the links to a text file then grab the txt file and read it.
private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
var client = new WebClient();
string text = client.DownloadString("https://currentlinks.com");
File.WriteAllText("C:/ProgramData/oof.txt", text);
string searchKeyword = "https://foobar.to/showthread.php";
string fileName = "C:/ProgramData/oof.txt";
string[] textLines = File.ReadAllLines(fileName);
List<string> results = new List<string>();
foreach (string line in textLines)
{
if (line.Contains(searchKeyword))
{
results.Add(line);
}
var sb = new StringBuilder();
foreach (var item in results)
{
sb.Append(item);
}
textBox1.Text = sb.ToString();
var parsed = textBox1;
TextWriter tw = new StreamWriter("C:/ProgramData/parsed.txt");
// write lines of text to the file
tw.WriteLine(parsed);
// close the stream
tw.Close();
}
}
You are getting all the Links (URLs) in one single string. There is not straight forward way to get all the URLs individually without some assumptions.
With the sample data you shared, I assume that the URLs in the string follow simple URLs format and do not have any fancy stuff in it. They start with http and one url does not have any other http.
With above assumptions, I suggest following code.
// Sample data as shared by the OP
string data = "https://forum.to/showthread.php?tid=22305https://forum.to/showthread.php?tid=22405https://forum.to/showthread.php?tid=22318";
//Splitting the string by string `http`
var items = data.Split(new [] {"http"},StringSplitOptions.RemoveEmptyEntries).ToList();
//At this point all the strings in items collection will be without "http" at the start.
//So they will look like as following.
// s://forum.to/showthread.php?tid=22305
// s://forum.to/showthread.php?tid=22405
// s://forum.to/showthread.php?tid=22318
//So we need to add "http" at the start of each of the item as following.
items = items.Select(i => "http" + i).ToList();
// After this they will become like following.
// https://forum.to/showthread.php?tid=22305
// https://forum.to/showthread.php?tid=22405
// https://forum.to/showthread.php?tid=22318
//Now we need to create a single string with newline character between two items so
//that they represent a single line individually.
var text = String.Join("\r\n", items);
// Then write the text to the file.
File.WriteAllText("C:/ProgramData/oof.txt", text);
This should help you resolve your issue.
.Split way
Could you use yourString.Split("https://");?
Example:
//This simple example assumes that all links are https (not http)
string contents = "https://www.example.com/dogs/poodles/poodle1.htmlhttps://www.example.com/dogs/poodles/poodle2.html";
const string Prefix = "https://";
var linksWithoutPrefix = contents.Split(Prefix, StringSplitOptions.RemoveEmptyEntries);
//using System.Linq
var linksWithPrefix = linksWithoutPrefix.Select(l => Prefix + l);
foreach (var match in linksWithPrefix)
{
Console.WriteLine(match);
}
Regex way
Another option is to use reg exp.
Failed - cannot find/write the right regex ... got to go now
string contents = "http://www.example.com/dogs/poodles/poodle1.htmlhttp://www.example.com/dogs/poodles/poodle2.html";
//From https://regexr.com/
var rgx = new Regex(#"(?<Protocol>\w+):\/\/(?<Domain>[\w#][\w.:#]+)\/?[\w\.?=%&=\-#/$,]*");
var matches = rgx.Matches(contents);
foreach(var match in matches )
{
Console.WriteLine(match);
}
//This finds 'http://www.example.com/dogs/poodles/poodle1.htmlhttp' (note the htmlhttp at the end

Remove the first word in a string continuously and keep the last word [Xamarin Forms] C#

I have a function that will take a string and remove its first word and always keep the last word.
The string gets returned from my function SFSpeechRecognitionResult result.
With my current code it works when the code runs once, the first word gets deleted from the string and only the last word is left. But when the function runs again then the newly added words just keep stacking up in the result.BestTranscription.FormattedString string and the first word does not get removed.
This is my function:
RecognitionTask = SpeechRecognizer.GetRecognitionTask
(
LiveSpeechRequest,
(SFSpeechRecognitionResult result, NSError err) =>
{
if (result.BestTranscription.FormattedString.Contains(" "))
{
//and this is where I try to remove the first word and keep the last
string[] values = result.BestTranscription.FormattedString.Split(' ');
var words = values.Skip(1).ToList();
StringBuilder sb = new StringBuilder();
foreach (var word in words)
{
sb.Append(word + " ");
}
string newresult = sb.ToString();
System.Diagnostics.Debug.WriteLine(newresult);
}
else
{
//if the string only has one word then I will run this normally
thetextresult = result.BestTranscription.FormattedString.ToLower();
System.Diagnostics.Debug.WriteLine(thetextresult);
}
}
);
I would suggest to just take the last element after splitting:
string last_word = result.BestTranscription.FormattedString.Split(' ').Last();
This will give you always the last word
make sure that result.BestTranscription.FormattedString != null before splitting otherwise you get an exception.
May be there is also an option to clear the string of words after the processing of the first, so that you always get only the word that is recorded last. You could try to reset it at the end like this:
result.BestTranscription.FormattedString = "";
Basically your code would look something like this:
if (result.BestTranscription.FormattedString != null &&
result.BestTranscription.FormattedString.Contains(" "))
{
//and this is where I try to remove the first word and keep the last
string lastWord = result.BestTranscription.FormattedString.Split(' ')Last();
string newresult = lastWord;
System.Diagnostics.Debug.WriteLine(newresult);
}

Place every sentence from a text file into an array but detect headers/titles

I need to get each sentence from a text document/string into an array.
The issue is with how to handle headers, titles etc. sections of text which are not part of a sentence, but don't end in a full stop ". " to detect.
Being unable to detect these will result them being stuck on to the front of the following sentence (if I use ". " to distinguish sentences) which I can't have happen.
Initially I was going to use:
contentRefined = content.Replace(" \n", ". ");
Which I thought would remove all of the empty lines and newlines, as well as place full stops on the ends of headers to be detected and treated as sentences, it would result in ". . " but I could again Replace them with nothing.
But didn't work it simply left the full empty lines and just put a ". " at the start of the empty line.... As well as ". " at the start of every paragraph
I have now tried:
contentRefined = Regex.Replace(content, #"^\s+$[\r\n]*", "", RegexOptions.Multiline);
Which fully removes the full empty lines, but doesn't get me closer to adding a full stop to the ends of the headers.
I need to place the sentences and headers/titles in an array, I'm not sure if there is a method of which I can do this without having to split the string by something such as ". "
Edit: Full current code showing how I get the test from the file
public void sentenceSplit()
{
content = File.ReadAllText(#"I:\Project\TLDR\Test Text.txt");
contentRefined = Regex.Replace(content, #"^\s+$[\r\n]*", "", RegexOptions.Multiline);
//contentRefined = content.Replace("\n", ". ");
}
I'm making an assumption that 'Header' and 'Title' are on their own line and do not end in a period.
If that's the case, then this may work for you:
var filePath = #"C:\Temp\temp.txt";
var sentences = new List<string>();
using (TextReader reader = new StreamReader(filePath))
{
while (reader.Peek() >= 0)
{
var line = reader.ReadLine();
if (line.Trim().EndsWith("."))
{
line.Split(new[] {'.'}, StringSplitOptions.RemoveEmptyEntries)
.ToList()
.ForEach(l => sentences.Add(l.Trim() + "."));
}
}
}
// Output sentences to console
sentences.ForEach(Console.WriteLine);
UPDATE
Another approach using the File.ReadAllLines() method, and displaying the sentences in a RichTextBox:
private void Form1_Load(object sender, EventArgs e)
{
var filePath = #"C:\Temp\temp.txt";
var sentences = File.ReadAllLines(filePath)
// Only select lines that end in a period
.Where(l => l.Trim().EndsWith("."))
// Split each line into sentences (one line may have many sentences)
.SelectMany(s => s.Split(new[] {'.'}, StringSplitOptions.RemoveEmptyEntries))
// Trim any whitespace off the ends of the sentence and add a period to the end
.Select(s => s.Trim() + ".")
// And finally cast it to a List (or you could do 'ToArray()')
.ToList();
// To show each sentence in the list on it's own line in the rtb:
richTextBox1.Text = string.Join("\n", sentences);
// Or to show them all, one after another:
richTextBox1.Text = string.Join(" ", sentences);
}
UPDATE
Now that I think I understand what you're asking, here's what I would do. First, I would create some classes to manage all this stuff. If you break the document down into parts, you get something like:
HEADER
Paragraph sentence one. Paragraph sentence two. Paragraph
sentence three with a number, like in this quote: "$5.00 doesn't go as
far as it used to".
Header Over an Empty Section
Header over multiple paragraphs
Paragraph sentence one. Paragraph
sentence two. Paragraph sentence three with a number, like in this
quote: "$5.00 doesn't go as far as it used to".
Paragraph sentence one. Paragraph sentence two. Paragraph sentence
three with a number, like in this quote: "$5.00 doesn't go as far as
it used to".
Paragraph sentence one. Paragraph sentence two. Paragraph sentence
three with a number, like in this quote: "$5.00 doesn't go as far as
it used to".
So I would create the following classes. First, one to represent a 'Section'. This is defined by a Header and zero to many paragraphs:
private class Section
{
public string Header { get; set; }
public List<Paragraph> Paragraphs { get; set; }
public Section()
{
Paragraphs = new List<Paragraph>();
}
}
Then I would define a Paragraph, which contains one or more sentences:
private class Paragraph
{
public List<string> Sentences { get; set; }
public Paragraph()
{
Sentences = new List<string>();
}
}
Now I can populate a List of Sections to represent the document:
var filePath = #"C:\Temp\temp.txt";
var sections = new List<Section>();
var currentSection = new Section();
var currentParagraph = new Paragraph();
using (TextReader reader = new StreamReader(filePath))
{
while (reader.Peek() >= 0)
{
var line = reader.ReadLine().Trim();
// Ignore blank lines
if (string.IsNullOrWhiteSpace(line)) continue;
if (line.EndsWith("."))
{
// This line is a paragraph, so add all the sentences
// it contains to the current paragraph
line.Split(new[] {". "}, StringSplitOptions.RemoveEmptyEntries)
.Select(l => l.Trim().EndsWith(".") ? l.Trim() : l.Trim() + ".")
.ToList()
.ForEach(l => currentParagraph.Sentences.Add(l));
// Now add this paragraph to the current section
currentSection.Paragraphs.Add(currentParagraph);
// And set it to a new paragraph for the next loop
currentParagraph = new Paragraph();
}
else if (line.Length > 0)
{
// This line is a header, so we're starting a new section.
// Add the current section to our list and create a
// a new one, setting this line as the header.
sections.Add(currentSection);
currentSection = new Section {Header = line};
}
}
// Finally, if the current section contains any data, add it to the list
if (currentSection.Header.Length > 0 || currentSection.Paragraphs.Any())
{
sections.Add(currentSection);
}
}
Now we have the whole document in a list of sections, and we know the order, the headers, the paragraphs, and the sentences they contain. As an example of how you can analyze it, here's a way to write it back out to a RichTextBox:
// We can build the document section by section
var documentText = new StringBuilder();
foreach (var section in sections)
{
// Here we can display headers and paragraphs in a custom way.
// For example, we can separate all sections with a blank line:
documentText.AppendLine();
// If there is a header, we can underline it
if (!string.IsNullOrWhiteSpace(section.Header))
{
documentText.AppendLine(section.Header);
documentText.AppendLine(new string('-', section.Header.Length));
}
// We can mark each paragraph with an arrow (--> )
foreach (var paragraph in section.Paragraphs)
{
documentText.Append("--> ");
// And write out each sentence, separated by a space
documentText.AppendLine(string.Join(" ", paragraph.Sentences));
}
}
// To make the underline approach above look
// half-way decent, we need a fixed-width font
richTextBox1.Font = new Font(FontFamily.GenericMonospace, 9);
// Now set the RichTextBox Text equal to the StringBuilder Text
richTextBox1.Text = documentText.ToString();

Replace a word from a specific line in a text file

I'm working on a little test program to experiment with text files and storing some data in them, and I've stumbled accross a problem when trying to replace a value in a specific line.
This is how the formatting of my text file is done :
user1, 1500, 1
user2, 1700, 17
.. and so on.
This is the code I am using at the moment to read the file line by line :
string line;
Streamreader sr = new Streamreader(path);
while ((line = sr.ReadLine()) != null)
{
string[] infos = line.Split(',');
if (infos[0] == username) //the username is received as a parameter (not shown)
//This is where I'd like to change the value
}
Basically, my objective is to update the number of points (the second value in the text line - infos[1]) only if the username matches. I tried using the following code (edited to match my informations)
string text = File.ReadAllText("test.txt");
text = text.Replace("some text", "new value");
File.WriteAllText("test.txt", text);</pre>
The problem with this is that it will replace every corresponding value in the text file, and not just the one of the correct line (specified by the matching username). I know how to change the value of infos[1] (ex: 1500 for user1) but I don't know how to rewrite it to the file after that.
I've searched online and on StackOverflow, but I couldn't find anything for this specific problem where the value is only to be modified if it's on the proper line - not anywhere in the text.
I run out of ideas on how to do this, I would really appreciate some suggestions.
Thank you very much for your help.
Try this:
var path = #"c:\temp\test.txt";
var originalLines = File.ReadAllLines(path);
var updatedLines = new List<string>();
foreach (var line in originalLines)
{
string[] infos = line.Split(',');
if (infos[0] == "user2")
{
// update value
infos[1] = (int.Parse(infos[1]) + 1).ToString();
}
updatedLines.Add(string.Join(",", infos));
}
File.WriteAllLines(path, updatedLines);
use ReadLines and LINQ:
var line = File.ReadLines("path")
.FirstOrDefault(x => x.StartsWith(username));
if (line != null)
{
var parts = line.Split(',');
parts[1] = "1500"; // new number
line = string.Join(",", parts);
File.WriteAllLines("path", File.ReadLines("path")
.Where(x => !x.StartsWith(username)).Concat(new[] {line});
}

Get parameters out of text file

I have a C# asp.net page that has to get username/password info from a text file.
Could someone please tell me how.
The text file looks as follows: (it is actually a lot larger, I just got a few lines)
DATASOURCEFILE=D:\folder\folder
var1= etc
var2= more
var3 = misc
var4 = stuff
USERID = user1
PASSWORD = pwd1
all I need is the UserID and password out of that file.
Thank you for your help,
Steve
This would work:
var dic = File.ReadAllLines("test.txt")
.Select(l => l.Split(new[] { '=' }))
.ToDictionary( s => s[0].Trim(), s => s[1].Trim());
dic is a dictionary, so you easily extract your values, i.e.:
string myUser = dic["USERID"];
string myPassword = dic["PASSWORD"];
Open the file, split on the newline, split again on the = for each item and then add it to a dictionary.
string contents = String.Empty;
using (FileStream fs = File.Open("path", FileMode.OpenRead))
using (StreamReader reader = new StreamReader(fs))
{
contents = reader.ReadToEnd();
}
if (contents.Length > 0)
{
string[] lines = contents.Split(new char[] { '\n' });
Dictionary<string, string> mysettings = new Dictionary<string, string>();
foreach (string line in lines)
{
string[] keyAndValue = line.Split(new char[] { '=' });
mysettings.Add(keyAndValue[0].Trim(), keyAndValue[1].Trim());
}
string test = mysettings["USERID"]; // example of getting userid
}
You can use Regular expressions to extract each variable. You can read one line at a time, or the entire file into one string. If the latter, you just look for a newline in the expression.
Regards,
Morten
Dictionary is not needed.
Old-fashioned parsing can do more, with less executable code, the same amount of compiled data, and less processing:
public string MyPath1;
public string MyPath2;
...
public void ReadConfig(string sConfigFile)
{
MyPath1 = MyPath2 = ""; // Clear the external values (in case the file does not set every parameter).
using (StreamReader sr = new StreamReader(sConfigFile)) // Open the file for reading (and auto-close).
{
while (!sr.EndOfStream)
{
string sLine = sr.ReadLine().Trim(); // Read the next line. Trim leading and trailing whitespace.
// Treat lines with NO "=" as comments (ignore; no syntax checking).
// Treat lines with "=" as the first character as comments too.
// Treat lines with "=" as the 2nd character or after as parameter lines.
// Side-benefit: Values containing "=" are processed correctly.
int i = sLine.IndexOf("="); // Find the first "=" in the line.
if (i <= 0) // IF the first "=" in the line is the first character (or not present),
continue; // the line is not a parameter line. Ignore it. (Iterate the while.)
string sParameter = sLine.Remove(i).TrimEnd(); // All before the "=" is the parameter name. Trim whitespace.
string sValue = sLine.Substring(i + 1).TrimStart(); // All after the "=" is the value. Trim whitespace.
// Extra characters before a parameter name are usually intended to comment it out. Here, we keep them (with or without whitespace between). That makes an unrecognized parameter name, which is ignored (acts as a comment, as intended).
// Extra characters after a value are usually intended as comments. Here, we trim them only if whitespace separates. (Parsing contiguous comments is too complex: need delimiter(s) and then a way to escape delimiters (when needed) within values.) Side-drawback: Values cannot contain " ".
i = sValue.IndexOfAny(new char[] {' ', '\t'}); // Find the first " " or tab in the value.
if (i > 1) // IF the first " " or tab is the second character or after,
sValue = sValue.Remove(i); // All before the " " or tab is the parameter. (Discard the rest.)
// IF a desired parameter is specified, collect it:
// (Could detect here if any parameter is set more than once.)
if (sParameter == "MyPathOne")
MyPath1 = sValue;
else if (sParameter == "MyPathTwo")
MyPath2 = sValue;
// (Could detect here if an invalid parameter name is specified.)
// (Could exit the loop here if every parameter has been set.)
} // end while
// (Could detect here if the config file set neither parameter or only one parameter.)
} // end using
}

Categories