I have log file and i need to find some parameters.
For example:
11:26:42 In [INF] File opened
11:27:48 In [INF] some operations
And i want to find string numer 2- with extra space.
So, i try to find like this:
string pattern = #"\[INF\]";
foreach (String inf in lines)
{
if (Regex.IsMatch(inf, pattern))
{
//Console.WriteLine(inf);
using (System.IO.StreamWriter file = new System.IO.StreamWriter(outputPath, true,Encoding.ASCII))
{
file.WriteLine(inf);
}
}
}
But how to find INF category with extra white space?
I do it via c#, but it doesnt matter.
Thanks.
An easy way to find double (or more) spaces is
#"\s{2,}"
This will match the spaces only.
string pattern = #"\s\s\[INF\]\s\s";
Regex Test
Related
I am working with files that range between 150MB and 250MB, and I need to append a form feed (/f) character to each match found in a match collection. Currently, my regular expression for each match is this:
Regex myreg = new Regex("ABC: DEF11-1111(.*?)MORE DATA(.*?)EVEN MORE DATA(.*?)\f", RegexOptions.Singleline);
and I'd like to modify each match in the file (and then overwrite the file) to become something that could be later found with a shorter regular expression:
Regex myreg = new Regex("ABC: DEF11-1111(.*?)\f\f, RegexOptions.Singleline);
Put another way, I want to simply append a form feed character (\f) to each match that is found in my file and save it.
I see a ton of examples on stack overflow for replacing text, but not so much for larger files. Typical examples of what to do would include:
Using streamreader to store the entire file in a string, then do a
find and replace in that string.
Using MatchCollection in combination
with File.ReadAllText()
Read the file line by line and look for
matches there.
The problem with the first two is that is just eats up a ton of memory, and I worry about the program being able to handle all of that. The problem with the 3rd option is that my regular expression spans over many rows, and thus will not be found in a single line. I see other posts out there as well, but they cover replacing specific strings of text rather than working with regular expressions.
What would be a good approach for me to append a form feed character to each match found in a file, and then save that file?
Edit:
Per some suggestions, I tried playing around with StreamReader.ReadLine(). Specifically, I would read a line, see if it matched my expression, and then based on that result I would write to a file. If it matched the expression, I would write to the file. If it didn't match the expression, I would just append it to a string until it did match the expression. Like this:
Regex myreg = new Regex("ABC: DEF11-1111(.?)MORE DATA(.?)EVEN MORE DATA(.*?)\f", RegexOptions.Singleline);
//For storing/comparing our match.
string line, buildingmatch, match, whatremains;
buildingmatch = "";
match = "";
whatremains = "";
//For keep track of trailing bits after our match.
int matchlength = 0;
using (StreamWriter sw = new StreamWriter(destFile))
using (StreamReader sr = new StreamReader(srcFile))
{
//While we are still reading lines in the file...
while ((line = sr.ReadLine()) != null)
{
//Keep adding lines to buildingmatch until we can match the regular expression.
buildingmatch = buildingmatch + line + "\r\n";
if (myreg.IsMatch(buildingmatch)
{
match = myreg.Match(buildingmatch).Value;
matchlength = match.Lengh;
//Make sure we are not at the end of the file.
if (matchlength < buildingmatch.Length)
{
whatremains = buildingmatch.SubString(matchlength, buildingmatch.Length - matchlength);
}
sw.Write(match, + "\f\f");
buildingmatch = whatremains;
whatremains = "";
}
}
}
The problem is that this took about 55 minutes to run a roughly 150MB file. There HAS to be a better way to do this...
If you can load the whole string data into a single string variable, there is no need to first match and then append text to matches in a loop. You can use a single Regex.Replace operation:
string text = File.ReadAllText(srcFile);
using (StreamWriter sw = new StreamWriter(destfile, false, Encoding.UTF8, 5242880))
{
sw.Write(myregex.Replace(text, "$&\f\f"));
}
Details:
string text = File.ReadAllText(srcFile); - reads the srcFile file to the text variable (match would be confusing)
myregex.Replace(text, "$&\f\f") - replaces all occurrences of myregex matches with themselves ($& is a backreference to the whole match value) while appending two \f chars right after each match.
I was able to find a solution that works in a reasonable time; it can process my entire 150MB file in under 5 minutes.
First, as mentioned in the comments, it's a waste to compare the string to the Regex after every iteration. Rather, I started with this:
string match = File.ReadAllText(srcFile);
MatchCollection mymatches = myregex.Matches(match);
Strings can hold up to 2GB of data, so while not ideal, I figured roughly 150MB worth wouldn't hurt to be stored in a string. Then, as opposed to checking a match every x amount of lines read in from the file, I can check the file for matches all at once!
Next, I used this:
StringBuilder matchsb = new StringBuilder(134217728);
foreach (Match m in mymatches)
{
matchsb.Append(m.Value + "\f\f");
}
Since I already know (roughly) the size of my file, I can go ahead and initialize my stringbuilder. Not to mention, it's a lot more efficient to use string builder if you are doing multiple operations on a string (which I was). From there, it's just a matter of appending the form feed to each of my matches.
Finally, the part the cost the most on performance:
using (StreamWriter sw = new StreamWriter(destfile, false, Encoding.UTF8, 5242880))
{
sw.Write(matchsb.ToString());
}
The way that you initialize StreamWriter is critical. Normally, you just declare it as:
StreamWriter sw = new StreamWriter(destfile);
This is fine for most use cases, but the problem becomes apparent with you are dealing with larger files. When declared like this, you are writing to the file with a default buffer of 4KB. For a smaller file, this is fine. But for 150MB files? This will end up taking a long time. So I corrected the issue by changing the buffer to approximately 5MB.
I found this resource really helped me to understand how to write to files more efficiently: https://www.jeremyshanks.com/fastest-way-to-write-text-files-to-disk-in-c/
Hopefully this will help the next person along as well.
I'm trying to load a .csv file into a listview:
ofDialog.Filter = #"CSV Files|*.csv";
ofDialog.Title = #"Select your backlink file...";
ofDialog.FileName = "backlinks.csv";
// is cancel pressed?
if (ofDialog.ShowDialog() == DialogResult.Cancel)
return;
try
{
string filename = ofDialog.FileName;
var lines = File.ReadAllLines(filename);
foreach (string line in lines)
{
var parts = line.Split(' ');
ListViewItem lvi = new ListViewItem(parts[0]);
lvi.SubItems.Add(parts[1]);
listViewMain.Items.Add(lvi);
}
// update count
Helpers.returnMessage(File.ReadAllLines(ofDialog.FileName).Count() + " rows imported.");
}
catch (Exception ex)
{
Helpers.returnMessage(ex.Message);
}
The csv contents looks like:
URL Rating Domain Rating IP From Referring Page URL Referring Page Title Internal Links Count External Links Count Link URL TextPre Link Anchor TextPost Size Type NoFollow Site-wide Image Encoding Alt First Seen Previous Visited Last Check Original
24 89 91.198.174.192 http://en.wikipedia.org/wiki/Humbug_(sweet) "Humbug (sweet) - Wikipedia, the free encyclopedia" 118 16 http://www.bestbritishsweets.co.uk/user/products/large/everton.jpg http://www.bestbritishsweets.co.uk/user/products/large/everton.jpg 12163 href True False False utf8 2013-09-08T15:14:50Z 2015-03-11T01:48:40Z 2015-03-11T01:48:40Z True
There is no delimeter "," like in regular .csv files, and has different spaces between some fields, i'm stuck on the best way to split each section and add to the listview, i have a mental block lol
any help would be appreciated :)
cheers guys
Graham
For opening the CSV file, I would first check it is not a tab separated file, where you can use \t as the delimiter to read the file in a similar method as you are.
Failing this you could use a (very long and complicated) regex string to match the different "columns" as different parts. The regex string would look something like:
\s+([0-9]*)\s+([0-9]*)\s+([0-9]*.[0-9]*.[0-9]*.[0-9]*)\s+([a-zA-Z:\/._\(\)]*)\s+(\"[a-zA-Z0-9 \-\(\),]*\")\s+([0-9]*)\s+([0-9]*)\s+([a-zA-Z:\/._\(\)]*)\s+([a-zA-Z:\/._\(\)]*)\s+([0-9]*)\s+([a-zA-Z]*)\s+(True|False)\s+(True|False)\s+(True|False)\s+([a-z0-9]*)\s+([0-9\-T:Z]*)\s+([0-9\-T:Z]*)\s+([0-9\-T:Z]*)\s+(True|False)
This would return each column as a different group, which you can access as detailed below:
var regex = new Regex(regexString);
foreach(var line in lines)
{
var match = regex.Match(line);
var urlRating = match.Groups[0].Value;
var domainRating = match.Groups[1].Value;
var ip = match.Groups[2].Value;
// ...
}
You can see more about the regex string I have created (and possibly simplify it/extend it for the additional lines) here: https://regex101.com/r/oN4tW3/1
For more on C# regex look here: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex(v=vs.110).aspx
Edit: I would avoid the regex method if it is tab seperated as it is more complex and fragile
I created a little console program that will search text files and return all string lines that matches a variable entered by a user. One issue I ran into is, say I want to look up "1234" which represents a location code, but there is also a phone number that has "555-1234" in the string line, I get that one back too. I am thinking if I input the delimiter (ex: ",") with the variable (",1234,") then maybe I can ensure search is accurate. Am I on the right track, or is there a better way? This is where I am at so far:
string[] file = File.ReadAllLines(sPath);
foreach (string s in file)
{
using (StreamWriter sw = File.AppendText(rPath))
{
if (sFound = Regex.IsMatch(s, string.Format(#"\b{0}\b",
Regex.Escape(searchVariable))))
{
sw.WriteLine(s);
}
}
}
I'd say you are on the right track.
I'd suggest changing the regular expressions so that it uses a negative lookbehind to match "searchVariable" that is not preceeded by "-", so "1234" in "555-1234" wouldn't be matched, but ",1234" (for instance) would.
You will only need to use "Regex.Escape()" if you want to include special regular expression characters in your search, which from your question you don't want to do.
You could change the code to something like this (it's late so I haven't tested this!):
var lines= File.ReadAllLines(sPath);
var regex = new Regex(String.Format("(?<!-){0}\b", searchVariable));
if (lines.Any())
{
using (var streamWriter = File.AppendText(rPath))
{
foreach (var line in lines)
{
if (regex.IsMatch(line))
{
streamWriter.WriteLine(line);
}
}
}
}
A great website for testing these (often tricky!) regular expressions is Regex Hero.
Use Linq to CSV and make your life easier. Just go to Nuget and search Linq to CSV.
What I want to do is rename all file in a particular folder, such that if a filename contains any digit in it, it is removed.
say, if a filename is
someFileName.someExtension it remains the same, but if a file is like this,
03 - Rocketman Elton John it should be renamed to Rocketman Elton John (I did the part to remove the -), another example, if the filename is 15-Trey Songz - Unfortunate (Prod. by Noah 40 Shebib) it should be renamed to Trey Songz Unfortunate (Prod. by Noah Shebib) (again I can remove -). The user is asked to select the folder like this
private void txtFolder_MouseDown(object sender, MouseEventArgs e)
{
FolderBrowserDialog fd = new FolderBrowserDialog();
fd.RootFolder = Environment.SpecialFolder.Desktop;
fd.ShowNewFolderButton = true;
if (fd.ShowDialog() == DialogResult.OK)
{
txtFolder.Text = fd.SelectedPath;
}
}
Also, it renaming starts like this
private void btnGo_Click(object sender, EventArgs e)
{
StartRenaming(txtFolder.Text);
}
and
private void StartRenaming(string FolderName)
{
string[] files = Directory.GetFiles(FolderName);
foreach (string file in files)
RenameFile(file);
}
Now in rename file, I need the function, the regular expression that will remove any number(s) in file.
Its is implemented as
private void RenameFile(string FileName)
{
string fileName = Path.GetFileNameWithoutExtension(FileName);
/* here the function goes that will find numbers in filename using regular experssion and replace them */
}
so what I can do is, I can use something like
1 var matches = Regex.Matches(fileName, #"\d+");
2
3 if (matches.Count == 0)
4 return null;
5
6 // start the loop
7 foreach(var match in matches)
8 {
9 fileName = fileName.Replace(match, ""); /* or fileName.Replace(match.ToString(), ""), whatever be the case */
10 }
11 File.Move(FileName, Path.Combine(Path.GetDirectoryName(FileName), fileName));
12 return;
But I don't think that's the right way to do it? Is there any better option to do this? or is this the best (and only option) to do this? Also, is there anything like IN in String.Replace? Say in sql I can use IN in a select command and specify a bunch of where conditions, but is there something like this with String.Replace so that I don't have to run the loop I ran from line 7 to 10? Are there any other better options?
ps: about that regex, I posted a question Regular Expression for numbers? (apparently I wasn't clear enough) and from that I got my regex, if you think someother regex would do better please tell me, also if you need any other information please let me know...
You can try Regex.Replace to remove digits, ie:
Regex.Replace(fileName, #"\d", "");
In the off chance that you are merely looking to simply rename the files and you thought that creating your own program would be the best way - I would recommend PFrank as a standalone tool (especially if you understand regex already)
If you do desire this and if you do take my suggestion (and since it's not the simplest and clearest interface), you would use \d+(\s?-)? for the match expression (in the first column in PFrank), which should match any number of digits, optionally followed by a hyphen and an additional optional whitespace character between the two. You would then have no replacement expression (zero-length string or an empty second column in PFrank). Finally, select the folder containing the files you want renamed and click the scan button; in the dialog that pops up, confirm your results and click the rename button. Sorry if I wasted anyone's time!
For replacing you should look into Regex.Replace which can replace all occurences at once.
Otherwise code look ok (with exception of strange fileName.Replace("match", "") which uses constant string...)
How about this ?
private void StartRenaming(string FolderName)
{
string[] files = Directory.GetFiles(FolderName);
string[] applicableFiles = (from string s in files
where Regex.IsMatch(s, #"(\d+)|(-+)", RegexOptions.None)
select s).ToArray<string>();
foreach (string file in applicableFiles)
RenameFile(file);
}
private void RenameFile(string file)
{
string newFileName = Regex.Replace(file, #"(\d+)|(-+)", "");
File.Move(file, Path.Combine(Path.GetDirectoryName(file), newFileName));
}
StartRenaming method will now limit the number of files to be processed based on Regex match. If the file contains a digit or - then it will be processed, thus optimizing the complete process.
RenameFile replaces digits and - in a string and gives you a newFileName
I am not quite sure about the correctness of File.Move(file, Path.Combine(Path.GetDirectoryName(file), newFileName)); though, but I guess your problem was to avoid the foreach loop, and I think I have provided an appropriate solution.
Please note that I was not able to completely test this, so let me know whether it works for you and if it doesn't I will be happy to help you further.
EDIT : Forgot to mention that file.Replace(#"(\d+)|(-+)", "") will remove digits as well as - from the file string.
EDIT : Corrected file.Replace to Regex.Replace
I prefer to use brackets to select the before and after and then use the $n method to rebuild the string how you want it to be.
"03 - Rocketman Elton John" -Replace '^([^-]*) - ([^-]*)', '$1 $2'
I don't have much experience with regexes and I wanted to rectify that. I decided to build an application that takes a directory name, scans all files (that all have a increasing serial number but differ subtly in their filenames. Example : episode01.mp4, episode_02.mp4, episod03.mp4, episode04.rmvb etc.)
The application should scan the directory, find the number in each file name and rename the file along wit the extension to a common format (episode01.mp4,episode02.mp4,episode03.mp4,episode04.rmvb etc.).
I have the following code:
Dictionary<string, string> renameDictionary = new Dictionary<string,string>();
DirectoryInfo dInfo = new DirectoryInfo(path);
string newFormat = "Episode{0}.{1}";
Regex regex = new Regex(#".*?(?<no>\d+).*?\.(?<ext>.*)"); //look for a number(before .) aext: *(d+)*.*
foreach (var file in dInfo.GetFiles())
{
string fileName = file.Name;
var match = regex.Match(fileName);
if (match != null)
{
GroupCollection gc = match.Groups;
//Console.WriteLine("Number : {0}, Extension : {2} found in {1}.", gc["no"], fileName,gc["ext"]);
renameDictionary[fileName] = string.Format(newFormat, gc["no"], gc["ext"]);
}
}
foreach (var renamePair in renameDictionary)
{
Console.WriteLine("{0} will be renamed to {1}.", renamePair.Key, renamePair.Value);
//stuff for renaming here
}
One problem in this code is that it also includes files which don't have numbers in the renameDictionary. It would also be helpful if you could point out any other gotchas that I should be careful about.
PS: I am assuming that the filenames will only contain numbers corresponding to serial (nothing like cam7_0001.jpg)
This simplest solution is probably to use Path.GetFileNameWithoutExtension to get the file name, and then the regex \d+$ to get the number at its end (or Path.GetExtension and \d+ to get the number anywhere).
You can also achieve this in a single replace:
Regex.Replace(fileName, #".*?(\d+).*(\.[^.]+)$", "Episode$1$2")
This regex is a bit better, in that it forces the extension not to contain dots.