Reading numbers from a text file that includes letters and numbers

Reading numbers from a text file that includes letters and numbers - c#

I have a log file (txt file) that i want to read only specific lines from. These lines include a specific set of words followed by number.
For example, the lines that i want to read from the file read:
10:03 Total query took 238.9 mili
10:08 Total query took 659.8 mili
How do I write a code that takes only the time the query executions took (the mili) and add them up?
I got down the part of reading from the text file only the lines that include "Total query took", but I'm stuck from here

Read the text from the file. StreamReader can do this. Look at the example in the link.
Get the number from the text. You can use Substring and IndexOf, as well as Convert.ToDouble. If you want to be fancy you can even use Regular Expressions, but this is overkill.
Add the numbers.

public decimal ReadMilliseconds()
{
var lines = File.ReadLines(#"\path\to\file");
decimal totalMilliseconds = 0;
foreach (string line in lines)
{
var match = Regex.Match(line, #"(?<ms>\d*\.?\d*)\s*mili");
if (!match.Success) continue;
decimal value = decimal.Parse(match.Groups["ms"].Value, new CultureInfo("en-US"));
totalMilliseconds += value;
}
return totalMilliseconds;
}

This should be easy with Regular Expression. I guess the line is on a string.
String str_line = "your line from file" ;
Regex regex = new Regex(#"\d.*");
Match match = regex.Match(str_line);
if (match.Success)
{
//you got the value with "match.Value;"
}

Related

How can I find and replace text in a larger file (150MB-250MB) with regular expressions in C#?

I am working with files that range between 150MB and 250MB, and I need to append a form feed (/f) character to each match found in a match collection. Currently, my regular expression for each match is this:
Regex myreg = new Regex("ABC: DEF11-1111(.*?)MORE DATA(.*?)EVEN MORE DATA(.*?)\f", RegexOptions.Singleline);
and I'd like to modify each match in the file (and then overwrite the file) to become something that could be later found with a shorter regular expression:
Regex myreg = new Regex("ABC: DEF11-1111(.*?)\f\f, RegexOptions.Singleline);
Put another way, I want to simply append a form feed character (\f) to each match that is found in my file and save it.
I see a ton of examples on stack overflow for replacing text, but not so much for larger files. Typical examples of what to do would include:
Using streamreader to store the entire file in a string, then do a
find and replace in that string.
Using MatchCollection in combination
with File.ReadAllText()
Read the file line by line and look for
matches there.
The problem with the first two is that is just eats up a ton of memory, and I worry about the program being able to handle all of that. The problem with the 3rd option is that my regular expression spans over many rows, and thus will not be found in a single line. I see other posts out there as well, but they cover replacing specific strings of text rather than working with regular expressions.
What would be a good approach for me to append a form feed character to each match found in a file, and then save that file?
Edit:
Per some suggestions, I tried playing around with StreamReader.ReadLine(). Specifically, I would read a line, see if it matched my expression, and then based on that result I would write to a file. If it matched the expression, I would write to the file. If it didn't match the expression, I would just append it to a string until it did match the expression. Like this:
Regex myreg = new Regex("ABC: DEF11-1111(.?)MORE DATA(.?)EVEN MORE DATA(.*?)\f", RegexOptions.Singleline);
//For storing/comparing our match.
string line, buildingmatch, match, whatremains;
buildingmatch = "";
match = "";
whatremains = "";
//For keep track of trailing bits after our match.
int matchlength = 0;
using (StreamWriter sw = new StreamWriter(destFile))
using (StreamReader sr = new StreamReader(srcFile))
{
//While we are still reading lines in the file...
while ((line = sr.ReadLine()) != null)
{
//Keep adding lines to buildingmatch until we can match the regular expression.
buildingmatch = buildingmatch + line + "\r\n";
if (myreg.IsMatch(buildingmatch)
{
match = myreg.Match(buildingmatch).Value;
matchlength = match.Lengh;
//Make sure we are not at the end of the file.
if (matchlength < buildingmatch.Length)
{
whatremains = buildingmatch.SubString(matchlength, buildingmatch.Length - matchlength);
}
sw.Write(match, + "\f\f");
buildingmatch = whatremains;
whatremains = "";
}
}
}
The problem is that this took about 55 minutes to run a roughly 150MB file. There HAS to be a better way to do this...

If you can load the whole string data into a single string variable, there is no need to first match and then append text to matches in a loop. You can use a single Regex.Replace operation:
string text = File.ReadAllText(srcFile);
using (StreamWriter sw = new StreamWriter(destfile, false, Encoding.UTF8, 5242880))
{
sw.Write(myregex.Replace(text, "$&\f\f"));
}
Details:
string text = File.ReadAllText(srcFile); - reads the srcFile file to the text variable (match would be confusing)
myregex.Replace(text, "$&\f\f") - replaces all occurrences of myregex matches with themselves ($& is a backreference to the whole match value) while appending two \f chars right after each match.

I was able to find a solution that works in a reasonable time; it can process my entire 150MB file in under 5 minutes.
First, as mentioned in the comments, it's a waste to compare the string to the Regex after every iteration. Rather, I started with this:
string match = File.ReadAllText(srcFile);
MatchCollection mymatches = myregex.Matches(match);
Strings can hold up to 2GB of data, so while not ideal, I figured roughly 150MB worth wouldn't hurt to be stored in a string. Then, as opposed to checking a match every x amount of lines read in from the file, I can check the file for matches all at once!
Next, I used this:
StringBuilder matchsb = new StringBuilder(134217728);
foreach (Match m in mymatches)
{
matchsb.Append(m.Value + "\f\f");
}
Since I already know (roughly) the size of my file, I can go ahead and initialize my stringbuilder. Not to mention, it's a lot more efficient to use string builder if you are doing multiple operations on a string (which I was). From there, it's just a matter of appending the form feed to each of my matches.
Finally, the part the cost the most on performance:
using (StreamWriter sw = new StreamWriter(destfile, false, Encoding.UTF8, 5242880))
{
sw.Write(matchsb.ToString());
}
The way that you initialize StreamWriter is critical. Normally, you just declare it as:
StreamWriter sw = new StreamWriter(destfile);
This is fine for most use cases, but the problem becomes apparent with you are dealing with larger files. When declared like this, you are writing to the file with a default buffer of 4KB. For a smaller file, this is fine. But for 150MB files? This will end up taking a long time. So I corrected the issue by changing the buffer to approximately 5MB.
I found this resource really helped me to understand how to write to files more efficiently: https://www.jeremyshanks.com/fastest-way-to-write-text-files-to-disk-in-c/
Hopefully this will help the next person along as well.

Importing .csv file in to listview

I'm trying to load a .csv file into a listview:
ofDialog.Filter = #"CSV Files|*.csv";
ofDialog.Title = #"Select your backlink file...";
ofDialog.FileName = "backlinks.csv";
// is cancel pressed?
if (ofDialog.ShowDialog() == DialogResult.Cancel)
return;
try
{
string filename = ofDialog.FileName;
var lines = File.ReadAllLines(filename);
foreach (string line in lines)
{
var parts = line.Split(' ');
ListViewItem lvi = new ListViewItem(parts[0]);
lvi.SubItems.Add(parts[1]);
listViewMain.Items.Add(lvi);
}
// update count
Helpers.returnMessage(File.ReadAllLines(ofDialog.FileName).Count() + " rows imported.");
}
catch (Exception ex)
{
Helpers.returnMessage(ex.Message);
}
The csv contents looks like:
URL Rating Domain Rating IP From Referring Page URL Referring Page Title Internal Links Count External Links Count Link URL TextPre Link Anchor TextPost Size Type NoFollow Site-wide Image Encoding Alt First Seen Previous Visited Last Check Original
24 89 91.198.174.192 http://en.wikipedia.org/wiki/Humbug_(sweet) "Humbug (sweet) - Wikipedia, the free encyclopedia" 118 16 http://www.bestbritishsweets.co.uk/user/products/large/everton.jpg http://www.bestbritishsweets.co.uk/user/products/large/everton.jpg 12163 href True False False utf8 2013-09-08T15:14:50Z 2015-03-11T01:48:40Z 2015-03-11T01:48:40Z True
There is no delimeter "," like in regular .csv files, and has different spaces between some fields, i'm stuck on the best way to split each section and add to the listview, i have a mental block lol
any help would be appreciated :)
cheers guys
Graham

For opening the CSV file, I would first check it is not a tab separated file, where you can use \t as the delimiter to read the file in a similar method as you are.
Failing this you could use a (very long and complicated) regex string to match the different "columns" as different parts. The regex string would look something like:
\s+([0-9]*)\s+([0-9]*)\s+([0-9]*.[0-9]*.[0-9]*.[0-9]*)\s+([a-zA-Z:\/._\(\)]*)\s+(\"[a-zA-Z0-9 \-\(\),]*\")\s+([0-9]*)\s+([0-9]*)\s+([a-zA-Z:\/._\(\)]*)\s+([a-zA-Z:\/._\(\)]*)\s+([0-9]*)\s+([a-zA-Z]*)\s+(True|False)\s+(True|False)\s+(True|False)\s+([a-z0-9]*)\s+([0-9\-T:Z]*)\s+([0-9\-T:Z]*)\s+([0-9\-T:Z]*)\s+(True|False)
This would return each column as a different group, which you can access as detailed below:
var regex = new Regex(regexString);
foreach(var line in lines)
{
var match = regex.Match(line);
var urlRating = match.Groups[0].Value;
var domainRating = match.Groups[1].Value;
var ip = match.Groups[2].Value;
// ...
}
You can see more about the regex string I have created (and possibly simplify it/extend it for the additional lines) here: https://regex101.com/r/oN4tW3/1
For more on C# regex look here: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex(v=vs.110).aspx
Edit: I would avoid the regex method if it is tab seperated as it is more complex and fragile

Find and replace file lines

I have a text file with over 12,000 lines. In that file I need to replace certain lines.
Some lines begin with a ;, some have random words, some start with space. However, I am only concerned with the two types of lines I describe below.
I have a line like
SET avariable:0 ;Comments
and I need to replace it to look like
set aDIFFvariable:0 :Integer // comments
The only CASE that is necessary is in the word Integer I needs to be capitalized.
I also have
String aSTRING(7) ;Comment
that needs to look like
STRING aSTRING(7) :array [0..7] of AnsiChar; // Comments
I need to keep all the spacing the same.
Here is what I have so far
static void Main(string[] args)
{
string text = File.ReadAllText("C:\\old.txt");
text = text.Replace("old text", "new text");
File.WriteAllText("C:\\new.txt", text);
}
I think I need to use REGEX, which I have tried to make for my first example:
\s\s[set]\s*{4}.*[:0]\s*[;].* <-- I now know this is invalid - please advise
I need help with properly setting up my program to find and replace those lines. Should I read one line at a time and if it matches then do something? I am confused really as to where to start.
BRIEF pseudo code of what I want to do
//open file
//step through file
//if line == [regex] then add/replace as needed
//else, go to next line
//if EOF, close file

Taking a stab at this separately because each line is so radically different that capturing both in the same expression will be a nightmare.
To match your first example and replace it:
String input = "SET avariable:0 ;Comments";
if (Regex.IsMatch(input, #"\s?(set)\s*(\w+):?(\d)\s+;?(.*)?"))
{
input = Regex.Replace(input, #"\s?(set)\s*(\w+):?(\d)\s+;?(.*)?", "$1 $2:$3 :Integer // $4";
}
Give that a shot (Play with it here: http://regex101.com/r/zY7hV2)
To match your second example and replace it:
String input = "String aSTRING(7) ;Comments";
if (Regex.IsMatch(input, #"\s?(string)\s*(\w+)\((\d)\)\s*;(.*)"))
{
input = Regex.Replace(input, #"\s?(string)\s*(\w+)\((\d)\)\s*;(.*)", "$1 $2($3) :array [0..$3] of AnsiChar; // $4";
}
And play around with this one here: http://regex101.com/r/jO5wP5

read specific websourcecode in c#

When I press a button the following happens:
HttpWebRequest request = (HttpWebRequest)WebRequest
.Create("http://oldschool.runescape.com/slu");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
richTextBox1.Text = sr.ReadToEnd();
sr.Close();
In short the data gets transferred to my textbox (this works perfectly)
Now if I choose world 78 (for example, from a combobox, it will refer to the last digits of that line) I want to get the value 968, if i choose world 14, I want to get the value 973.
This is an example of the printed data
e(378,true,0,"oldschool78",968,"United States","US","Old School 78");
e(314,true,0,"oldschool14",973,"United States","US","Old School 14");
What can I use to read this?

So there are two problems here, the first is selecting the right line, then getting the number out.
First you want a method for getting each of the lines in to a list, eg using something like this:
List<String> lines = new List<String>()
string line = sr.ReadLine();
while(line != null)
{
lines.Add(line);
line = sr.ReadLine(); // read the next line
}
Then you need to find the relevant line and get the token out of it.
Probably the most simple way is, for each line, split the string up by ',', '\"', '(' and ')' (using
String.Split). Ie, we get basically the parameters.
Eg
foreach(string lineInFile in lines)
{
// split the string in to tokens
string[] tokens = lineInFile.Split(',', '\"', '(', ')');
// based on the sample strings and how we've split this,
// we take the 15th entry
string endParameter = tokens[15]; //endParamter = "Old School 14"
...
We now use a regular expression to extract the number. The pattern we will use is d+, ie 1 or more digits.
Regex numberFinder = new Regex("\\d+");
Match numberMatch = numberFinder.Match(endParameter);
// we assume that there is a match, because if there isn't the string isn't
// correct, you should do some error handling here
string matchedNumber = numberMatch.Value;
int value = Int32.Parse(matchedValue); // we convert the string in to the number
if(value == desiredValue)
...
We check if the value matches the value we were looking for (eg 14), we now need to get the number you wanted.
We've already split the parameters, and the number we want is the 8th item (eg index 7 in string[] tokens). Since, at least in your example, this is just a lone number, we can just parse this to get the int.
{
return Int32.Parse(tokens[7]);
}
}
Again here we are assuming that the string is in the formats you showed, and you should do error protection here to.

how to find indexof substring in a text file

I have converted an asp.net c# project to framework 3.5 using VS 2008. Purpose of app is to parse a text file containing many rows of like information then inserting the data into a database.
I didn't write original app but developer used substring() to fetch individual fields because they always begin at the same position.
My question is:
What is best way to find the index of substring in text file without having to manually count the position? Does someone have preferred method they use to find position of characters in a text file?

I would say IndexOf() / IndexOfAny() together with Substring(). Alternatively, regular expressions. It the file has an XML-like structure, this.

If the files are delimited eg with commas you can use string.Split
If data is: string[] text = { "1, apple", "2, orange", "3, lemon" };
private void button1_Click(object sender, EventArgs e)
{
string[] lines = this.textBoxIn.Lines;
List<Fruit> fields = new List<Fruit>();
foreach(string s in lines)
{
char[] delim = {','};
string[] fruitData = s.Split(delim);
Fruit f = new Fruit();
int tmpid = 0;
Int32.TryParse(fruitData[0], out tmpid);
f.id = tmpid;
f.name = fruitData[1];
fields.Add(f);
}
this.textBoxOut.Clear();
string text=string.Empty;
foreach(Fruit item in fields)
{
text += item.ToString() + " \n";
}
this.textBoxOut.Text = text;
}
}

The text file I'm reading does not contain delimiters - sometimes there spaces between fields and sometimes they run together. In either case, every line is formatted the same. When I asked the question I was looking at the file in notepad.
Question was: how do you find the position in a file so that position (a number) could be specified as the startIndex of my substring function?
Answer: I've found that opening the text file in notepad++ will display the column # and line count of any position where the curser is in the file and makes this job easier.

You can use indexOf() and then use Length() as the second substring parameter
substr = str.substring(str.IndexOf("."), str.Length - str.IndexOf("."));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading numbers from a text file that includes letters and numbers - c#

Read the text from the file. StreamReader can do this. Look at the example in the link. Get the number from the text. You can use Substring and IndexOf, as well as Convert.ToDouble. If you want to be fancy you can even use Regular Expressions, but this is overkill. Add the numbers.

This should be easy with Regular Expression. I guess the line is on a string. String str_line = "your line from file" ; Regex regex = new Regex(#"\d.*"); Match match = regex.Match(str_line); if (match.Success) { //you got the value with "match.Value;" }

Related

How can I find and replace text in a larger file (150MB-250MB) with regular expressions in C#?

Importing .csv file in to listview

Find and replace file lines

read specific websourcecode in c#

how to find indexof substring in a text file

Categories

Resources