How to implement if before while - c#

My code goes like this:
public void button1_Click(object sender, EventArgs e)
{
StreamReader tx = null;
if (textBox1.Text != "")
{
tx = new StreamReader(textBox1.Text);
}
else
{
tx = new StreamReader("new.txt");
}
string line;
while ((line = tx.ReadLine()) != null)
{
string url = (line);
string sourceCode = Worker.getSourceCode(url);
MatchCollection m1 = Regex.Matches(sourceCode, #"title may-blank "" href=""(.+?)""", RegexOptions.Singleline);
MatchCollection m2 = Regex.Matches(sourceCode, #"(?<=tabindex=\""1\"" \>| tabindex=\""1\"" rel=""nofollow"" \>)(.+?) (?=<\/a>)", RegexOptions.Singleline);
List<string> adresy = new List<string>();
List<string> nazwy = new List<string>();
int counter = 0;
foreach (Match m in m1)
{
string adres = m.Groups[1].Value;
adresy.Add(adres);
counter++;
label1.Text = counter.ToString();
}
int counter2 = 0;
foreach (Match m in m2)
{
string nazwa = m.Groups[1].Value;
nazwy.Add(nazwa);
counter2++;
label2.Text = counter2.ToString();
}
listBox1.DataSource = adresy;
listBox2.DataSource = nazwy;
}
}
I am using RegEx to scrape text from web pages. And the thing is, that I want to scrape single URL if that URL is in textBox1. But if textbox1 is empty, I want to scrape all the URL's from new.txt file.
So... I have to implement "if" but I don't really know how to. I mean, it should go like this:
if textbox1 is empty
then read from single line
if not, then read from new.txt
do stuff like scraping..
But as you can see in my code which is upper, it doesn't work properly. I mean it works, but only if I read from new.txt. When I add some text to textbox1.Text and try to scrape URL, my app is crashing. I assume that it crashes, because I shouldn't have used streamreader to read from textbox. I don't know. Do you have any ideas?

If you want to write your code like this, then you can use a StringReader:
TextReader tx = null;
if (textBox1.Text != "")
{
tx = new StringReader(textBox1.Text);
}
else
{
tx = new StreamReader("new.txt");
}
Make sure to wrap your code in a try/finally block and call tx.Dispose() in the finally.

Related

extracting a substring within a multiline string

I have a text file containing the following lines:
<TestInfo."Content">
{
<Label> "Content"
<Visible> "true"
"This is the text I want to get"
}
<TestInfo."Content2">
{
<Label> "Content2"
<Visible> "true"
"I don't want e.g. this"
}
I want to extract This is the text I want to get.
I tried e.g. the following:
string tmp = File.ReadAllText(textfile);
string result = Regex.Match(tmp, #"<Label> ""Content"" \n\s+ <Visible> ""true"" \n\s+ ""(.+?)""", RegexOptions.Singleline).Groups[1].Value;
However, in this case I get only the first word.
So, my output is: This
And I have no idea why...
I would appreciate any help. Thanks!
If you want the entire line after the line that starts with <Visible>, you'd better read the file line by line instead of using File.ReadAllText and a regular expression:
string result;
using (StreamReader sr = new StreamReader(textfile))
{
while (sr.Peek() >= 0)
{
string line = sr.ReadLine();
if (line.StartsWith("<Visible>"))
{
result = sr.ReadLine();
break;
}
}
}
Try this:
var tmp = File.ReadAllText("TextFile1.txt");
var result = Regex.Match(tmp, "This is the text I want to get", RegexOptions.Multiline);
if (result.Groups.Count> 0)
for (int i = 0; i < result.Groups.Count; i++)
Console.WriteLine(result.Groups[i].Value);
else
Console.WriteLine("string not found.");
Regards,
//jafc
You could change your regex this way:
var result = Regex.Match(tmp, #"<Visible> ""true""\s*""([\S ]+)""", RegexOptions.Singleline).Groups[1].Value;
If you want to get all the matches, not only the first one, you could use Regex.Matches
Thanks a lot for your input! This helped me to find a final solution:
First, I extracted only a small part containing the string I want to extract to avoid ambiguities:
string[] tmp = File.ReadAllLines(textfile);
List<string> Content = new List<string>();
bool dumpA = false;
Regex regBEGIN = new Regex(#"<TestInfo\.""Content"">");
Regex regEND = new Regex(#"<TestInfo\.""Content2"">");
foreach (string line in tmp)
{
if (dumpA)
Content.Add(line.Trim());
if (regBEGIN.IsMatch(line))
dumpA = true;
if (regEND.IsMatch(line)) break;
}
Then I can extract the (now only once existing) line starting with '"':
string result = "";
foreach (string line in Content)
{
if (line.StartsWith("\""))
{
result = line;
result = result.Replace("\"", "");
result = result.Trim();
}
}

How to traverse multiple Log/Text Files of approx 200 MB Each using C#? and Apply Regex

I have to develop a utility that accepts path of a folder containing multiple log/text files of around 200 MB each and then traverse through all files to pick four elements from the lines where they exist.
I have tried multiple solutions, All solutions are working perfectly fine for smaller files but when i load bigger file the Windows Form just hangs or it shows "OutOfMemory Exception". Please help
Solution 1:
string textFile;
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";
FolderBrowserDialog fbd = new FolderBrowserDialog();
DialogResult result = fbd.ShowDialog();
if (!string.IsNullOrWhiteSpace(fbd.SelectedPath))
{
string[] files = Directory.GetFiles(fbd.SelectedPath);
System.Windows.Forms.MessageBox.Show("Files found: " + files.Length.ToString(), "Message");
foreach (string fileName in files)
{
textFile = File.ReadAllText(fileName);
MatchCollection mc = Regex.Matches(textFile, re1);
foreach (Match m in mc)
{
string a = m.ToString();
Path.Text += a; //Temporary, Just to check the output
Path.Text += Environment.NewLine;
}
}
}
Soltuion 2:
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";
FolderBrowserDialog fbd = new FolderBrowserDialog();
DialogResult result = fbd.ShowDialog();
foreach (string file in System.IO.Directory.GetFiles(fbd.SelectedPath))
{
const Int32 BufferSize = 512;
using (var fileStream = File.OpenRead(file))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize))
{
String line;
while ((line = streamReader.ReadLine()) != null)
{
MatchCollection mc = Regex.Matches(line, re1);
foreach (Match m in mc)
{
string a = m.ToString();
Path.Text += a; //Temporary, Just to check the output
Path.Text += Environment.NewLine;
}
}
}
Solution 3:
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";
FolderBrowserDialog fbd = new FolderBrowserDialog();
DialogResult result = fbd.ShowDialog();
using (StreamReader r = new StreamReader(file))
{
try
{
string line = String.Empty;
while (!r.EndOfStream)
{
line = r.ReadLine();
MatchCollection mc = Regex.Matches(line, re1);
foreach (Match m in mc)
{
string a = m.ToString();
Path.Text += a; //Temporary, Just to check the output
Path.Text += Environment.NewLine;
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Few things should be taken care of
Don't append to string Path.Text += .... I am assuming that is just a test code and hopefully should just get thrown out
You can just use the simple File.ReadLines call with no practical difference in file reading speed for your case
You should compile your Regex
You can try to simplify your regex
You can add simple string based pre-checks before doing regex matches
Below is a sample code to implement the above guidelines
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";
var buf = new List<string>();
var re2 = new Regex(re1, RegexOptions.Compiled);
FolderBrowserDialog fbd = new FolderBrowserDialog();
DialogResult result = fbd.ShowDialog();
foreach (string file in System.IO.Directory.GetFiles(fbd.SelectedPath)) {
foreach (var line in File.ReadLines(file)) {
if ((indx = line.IndexOf('-')) == -1 || line.IndexOf(':', indx + 1) == -1)
continue;
MatchCollection mc = re2.Matches(line);
foreach (Match m in mc) {
string a = m.ToString();
buf.Add(a + Environment.NewLine); //Temporary, Just to check the output
}
}
}
Your "Path" debug may be concatenating a ton of string litters. Change it to StringBuilder instead of += concatenation to see if that is the cause of your memory issue
Have up looked at MS Log Parser 2.2 for an alternate approach?

C# IO Stream not working like expected

I have this piece of code here:
private void button1_Click(object sender, EventArgs e)
{
StreamReader sr = new StreamReader("TextFile1.txt");
while ((line = sr.ReadLine()) != null)
{
if (line == textBox1.Text)
{
line = sr.ReadLine();
if (line == textBox2.Text)
{
MessageBox.Show("Logged in! Welcome " + textBox1.Text);
new Form2().Show();
this.Hide();
LoginSucces = true;
}
}
}
sr.Close();
if (LoginSucces == false) MessageBox.Show("Login Failed :(");
}
And it reads from this text:
AverageJavaGuy
Password
Chezzy
Password
The problem is that it doesnt work!
When I type in:
textBox1 = Chezzy.
textBox2 = Password.
it doesnt work...
it only works for AverageJavaGuy.
Does anyone know how to fix this?
Dictionary<string, string> userPass_dict = new Dictionary<string, string>(); // add this at class level
using (StreamReader sr = new StreamReader("TextFile1.txt"))
{
string line = "";
string line2 = "";
while (!sr.EndOfStream)
{
line = sr.ReadLine();
line2 = sr.ReadLine();
userPass_dict.Add(line, line2);
}
}
So it works for the first login/password, but not the second ?
Have you checked your textfile ?
Isn't the problem related to the dot beside "Chezzy" (in you post!) ?
Try to add "Console.WriteLine" or use debugger to fix your code and see what happens it the loop.
Also, I think the inner " sr.ReadLine();" in the loop may cause unexpected "shifting" according to the textfile content, use it carefully...
move line = sr.ReadLine(); to the outside the if statement

Changing a part of text line in a file on mouse doubleclick by comparing it with a value

I have a file-message.txt that contains raw data, my application reads the file, parses it and displays the data accordingly in the listview. The raw data contains a word called REC UNREAD meaning the record is unread. So for the first time when message is read it is UNREAD and I display such messages in bold. After I read it(Using doubleclick event) the word REC UNREAD should be changed to REC READ. This is what I have I tried, not working though
private void lvwMessages_MouseDoubleClick_1(object sender, MouseEventArgs e)
{
try
{
ListViewItem item = lvwMessages.SelectedItems[0];
if(item.Font.Bold)
{
lvwMessages.SelectedItems[0].Font = new Font(lvwMessages.Font, FontStyle.Regular);
string tfile = File.ReadAllText("C:\\message.txt");
string m1 = lvwMessages.SelectedItems[0].SubItems[1].Text;
string m2 = lvwMessages.SelectedItems[0].SubItems[2].Text;
//No idea how to go forward from here
This is a sample line in my text file:
+CMGL: 2,"REC UNREAD","+919030665834","","2012/08/10 17:04:15+22"
sample message
In simple words I should be able to search for the line containing m1 and m2(as in the code) and replace the REC UNREAD with REC READ.
This should solve your problem--
ListViewItem item = lvwMessages.SelectedItems[0];
if(item.Font.Bold)
{
lvwMessages.SelectedItems[0].Font = new Font(lvwMessages.Font, FontStyle.Regular);
string tfile = File.ReadAllText("C:\\message.txt");
string m1 = lvwMessages.SelectedItems[0].SubItems[1].Text;
string m2 = lvwMessages.SelectedItems[0].SubItems[2].Text;
string line = string.Empty;
string nfile= "";
using (StreamReader sr = new StreamReader("C:\\message.txt"))
{
while ((line = sr.ReadLine()) != null)
{
if (line.Contains(m2))
{
string pline = line;
string result = line.Replace("REC UNREAD", "REC READ");
nfile= tfile.Replace(pline, result);
}
}
sr.Close();
}
StreamWriter sw = new StreamWriter("C:\\message.txt");
{
sw.Write(nfile);
}
sw.Close();
}
you can try with this code based on IndexOf and Replace
string line = string.Empty;
using (StreamReader sr = new StreamReader("C:\\message.txt"))
{
while ((line = sr.ReadLine()) != null)
{
if (line.IndexOf(m1) > 0 &&
line.IndexOf(m2) )
{
var result = line.Replace(m2, "READ");
}
}
}

C# Find if a word is in a document

I am looking for a way to check if the "foo" word is present in a text file using C#.
I may use a regular expression but I'm not sure that is going to work if the word is splitted in two lines. I got the same issue with a streamreader that enumerates over the lines.
Any comments ?
What's wrong with a simple search?
If the file is not large, and memory is not a problem, simply read the entire file into a string (ReadToEnd() method), and use string Contains()
Here ya go. So we look at the string as we read the file and we keep track of the first word last word combo and check to see if matches your pattern.
string pattern = "foo";
string input = null;
string lastword = string.Empty;
string firstword = string.Empty;
bool result = false;
FileStream FS = new FileStream("File name and path", FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader SR = new StreamReader(FS);
while ((input = SR.ReadLine()) != null)
{
firstword = input.Substring(0, input.IndexOf(" "));
if(lastword.Trim() != string.Empty) { firstword = lastword.Trim() + firstword.Trim(); }
Regex RegPattern = new Regex(pattern);
Match Match1 = RegPattern.Match(input);
string value1 = Match1.ToString();
if (pattern.Trim() == firstword.Trim() || value1 != string.Empty) { result = true; }
lastword = input.Trim().Substring(input.Trim().LastIndexOf(" "));
}
Here is a quick quick example using LINQ
static void Main(string[] args)
{
{ //LINQ version
bool hasFoo = "file.txt".AsLines()
.Any(l => l.Contains("foo"));
}
{ // No LINQ or Extension Methods needed
bool hasFoo = false;
foreach (var line in Tools.AsLines("file.txt"))
if (line.Contains("foo"))
{
hasFoo = true;
break;
}
}
}
}
public static class Tools
{
public static IEnumerable<string> AsLines(this string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
while (line.EndsWith("-") && !reader.EndOfStream)
line = line.Substring(0, line.Length - 1)
+ reader.ReadLine();
yield return line;
}
}
}
What about if the line contains football? Or fool? If you are going to go down the regular expression route you need to look for word boundaries.
Regex r = new Regex("\bfoo\b");
Also ensure you are taking into consideration case insensitivity if you need to.
You don't need regular expressions in a case this simple. Simply loop over the lines and check if it contains foo.
using (StreamReader sr = File.Open("filename", FileMode.Open, FileAccess.Read))
{
string line = null;
while (!sr.EndOfStream) {
line = sr.ReadLine();
if (line.Contains("foo"))
{
// foo was found in the file
}
}
}
You could construct a regex which allows for newlines to be placed between every character.
private static bool IsSubstring(string input, string substring)
{
string[] letters = new string[substring.Length];
for (int i = 0; i < substring.Length; i += 1)
{
letters[i] = substring[i].ToString();
}
string regex = #"\b" + string.Join(#"(\r?\n?)", letters) + #"\b";
return Regex.IsMatch(input, regex, RegexOptions.ExplicitCapture);
}

Categories