C# comparing two files regex problem - c#

what I'm trying to do is open a huge list of files (about 40k records, and match them on a line in a file that contains 2 millions records. And if my line from file A matches a line in file B write out that line.
File A contains a bunch of files without extensions and file B contains full file paths including extensions.
i'm using this but i cant get it to go...
string alphaFilePath = (#"C:\Documents and Settings\g\Desktop\Arrp\Find\natst_ready.txt");
List<string> alphaFileContent = new List<string>();
using (FileStream fs = new FileStream(alphaFilePath, FileMode.Open))
using (StreamReader rdr = new StreamReader(fs))
{
while (!rdr.EndOfStream)
{
alphaFileContent.Add(rdr.ReadLine());
}
}
string betaFilePath = #"C:\Documents and Settings\g\Desktop\Arryup\Find\eble.txt";
StringBuilder sb = new StringBuilder();
using (FileStream fs = new FileStream(betaFilePath, FileMode.Open))
using (StreamReader rdr = new StreamReader(fs))
{
while (!rdr.EndOfStream)
{
string betaFileLine = rdr.ReadLine();
string matchup = Regex.Match(alphaFileContent, #"(\\)(\\)(\\)(\\)(\\)(\\)(\\)(\\)(.*)(\.)").Groups[9].Value;
if (alphaFileContent.Equals(matchup))
{
File.AppendAllText(#"C:\array_tech.txt", betaFileLine);
}
}
}
This doesnt work because the alphafilecontent is a single line only and i'm having a hard time figuring out how to get my regex to work on the file that contains all the file paths (Betafilepath)
here is a sample of the beta file path.
C:\arres_i\Grn\Ora\SEC\DBZ_EX1\Nes\001\DZO-EX00001.txt
Here is the line i'm trying to compare from my alpha
DZO-EX00001

Use System.IO.Path.GetFileNameWithoutExtension instead of a regular expression.
static void Compare(string alpha, string beta)
{
HashSet<string> alphaContent = new HashSet<string>();
StreamReader reader = new StreamReader(alpha);
while (!reader.EndOfStream)
alphaContent.Add(reader.ReadLine());
reader.Close();
reader = new StreamReader(beta);
while (!reader.EndOfStream)
{
string fullpath = reader.ReadLine();
string filename = Path.GetFileNameWithoutExtension(fullpath);
if (alphaContent.Contains(filename))
{
File.AppendAllText(#"C:\array_tech.txt", fullpath);
}
}
reader.Close();
}

So, you read in all the lines in beta, and have the whole thing stored in a string, beta.
Next, you read a line from alpha and have DZO-EX00001 stored in a string, alpha.
var pattern = #"^.*" + alpha + ".*$";
var match = Regex.Match(beta, pattern, RegexOptions.Multiline);
if(match.Success)
{
string filepath = match.Value;
// do stuff
}
You need to load all of the lines from beta first. Then you can check each line from alpha against it.
You must specify RegexOptions.Multiline to check against all the lines in beta (so that the ^ and $ will match at each line instead of at the beginning and end of the whole string.
The pattern can be expanded if you need to be more specific; as is, it just gets the first line that contains the filename.

Related

Read & write a single line from a file without overwrite [duplicate]

I have two text files, Source.txt and Target.txt. The source will never be modified and contain N lines of text. So, I want to delete a specific line of text in Target.txt, and replace by an specific line of text from Source.txt, I know what number of line I need, actually is the line number 2, both files.
I haven something like this:
string line = string.Empty;
int line_number = 1;
int line_to_edit = 2;
using StreamReader reader = new StreamReader(#"C:\target.xml");
using StreamWriter writer = new StreamWriter(#"C:\target.xml");
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
writer.WriteLine(line);
line_number++;
}
But when I open the Writer, the target file get erased, it writes the lines, but, when opened, the target file only contains the copied lines, the rest get lost.
What can I do?
the easiest way is :
static void lineChanger(string newText, string fileName, int line_to_edit)
{
string[] arrLine = File.ReadAllLines(fileName);
arrLine[line_to_edit - 1] = newText;
File.WriteAllLines(fileName, arrLine);
}
usage :
lineChanger("new content for this line" , "sample.text" , 34);
You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). If your files are small then reading the entire target file into memory and then writing it out again might make sense. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2; // Warning: 1-based indexing!
string sourceFile = "source.txt";
string destinationFile = "target.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read the old file.
string[] lines = File.ReadAllLines(destinationFile);
// Write the new file over the old file.
using (StreamWriter writer = new StreamWriter(destinationFile))
{
for (int currentLine = 1; currentLine <= lines.Length; ++currentLine)
{
if (currentLine == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(lines[currentLine - 1]);
}
}
}
}
}
If your files are large it would be better to create a new file so that you can read streaming from one file while you write to the other. This means that you don't need to have the whole file in memory at once. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2;
string sourceFile = "source.txt";
string destinationFile = "target.txt";
string tempFile = "target2.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read from the target file and write to a new file.
int line_number = 1;
string line = null;
using (StreamReader reader = new StreamReader(destinationFile))
using (StreamWriter writer = new StreamWriter(tempFile))
{
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(line);
}
line_number++;
}
}
// TODO: Delete the old file and replace it with the new file here.
}
}
You can afterwards move the file once you are sure that the write operation has succeeded (no excecption was thrown and the writer is closed).
Note that in both cases it is a bit confusing that you are using 1-based indexing for your line numbers. It might make more sense in your code to use 0-based indexing. You can have 1-based index in your user interface to your program if you wish, but convert it to a 0-indexed before sending it further.
Also, a disadvantage of directly overwriting the old file with the new file is that if it fails halfway through then you might permanently lose whatever data wasn't written. By writing to a third file first you only delete the original data after you are sure that you have another (corrected) copy of it, so you can recover the data if the computer crashes halfway through.
A final remark: I noticed that your files had an xml extension. You might want to consider if it makes more sense for you to use an XML parser to modify the contents of the files instead of replacing specific lines.
When you create a StreamWriter it always create a file from scratch, you will have to create a third file and copy from target and replace what you need, and then replace the old one.
But as I can see what you need is XML manipulation, you might want to use XmlDocument and modify your file using Xpath.
You need to Open the output file for write access rather than using a new StreamReader, which always overwrites the output file.
StreamWriter stm = null;
fi = new FileInfo(#"C:\target.xml");
if (fi.Exists)
stm = fi.OpenWrite();
Of course, you will still have to seek to the correct line in the output file, which will be hard since you can't read from it, so unless you already KNOW the byte offset to seek to, you probably really want read/write access.
FileStream stm = fi.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
with this stream, you can read until you get to the point where you want to make changes, then write. Keep in mind that you are writing bytes, not lines, so to overwrite a line you will need to write the same number of characters as the line you want to change.
I guess the below should work (instead of the writer part from your example). I'm unfortunately with no build environment so It's from memory but I hope it helps
using (var fs = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite)))
{
var destinationReader = StreamReader(fs);
var writer = StreamWriter(fs);
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
destinationReader .ReadLine();
}
line_number++;
}
}
The solution works fine. But I need to change single-line text when the same text is in multiple places. For this, need to define a trackText to start finding after that text and finally change oldText with newText.
private int FindLineNumber(string fileName, string trackText, string oldText, string newText)
{
int lineNumber = 0;
string[] textLine = System.IO.File.ReadAllLines(fileName);
for (int i = 0; i< textLine.Length;i++)
{
if (textLine[i].Contains(trackText)) //start finding matching text after.
traced = true;
if (traced)
if (textLine[i].Contains(oldText)) // Match text
{
textLine[i] = newText; // replace text with new one.
traced = false;
System.IO.File.WriteAllLines(fileName, textLine);
lineNumber = i;
break; //go out from loop
}
}
return lineNumber
}

How can I read data from specific location in .asc file using c#

I have .asc file which has 1000's of rows. Each column in row is of the fixed length and separate by one space. I want to read email id column which is started from 296 position and ended at 326 position in a row.
Is there any way to read such data from .asc file?
This might do the trick for you. I am just reading the email ids in the file whatever extension file it might me, may it be txt or asc. Also it doesnt matter if the email address is locating at some other place instead of 296 or 326.
public void ExtractAllEmails()
{
string datafrmAsc = File.ReadAllText(YourASCFile); //read File
Regex emailRegex = new Regex(#"\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*", RegexOptions.IgnoreCase);
MatchCollection emailMatches = emailRegex.Matches(datafrmAsc);
StringBuilder sb = new StringBuilder();
foreach (Match emailMatch in emailMatches)
{
sb.AppendLine(emailMatch.Value);
}
File.WriteAllText(SomeTxtFile, sb.ToString());
}
Assuming that is large text file, you can do something like this:
List<string> emailsList = new List<string>();
int startIndex = 295;
int endIndex = 325;
using (FileStream stream = File.Open("c:\\test.asc", FileMode.Open))
using (StreamReader sr = new StreamReader(stream))
{
string line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
emailsList.Add(line.Substring(startIndex, endIndex - startIndex).Trim());
}
}

C# unique file naming with String from Streamreader

ok i have solved my problem of finding a unique word within the file that is then used as the newly created .txt file name.
for example: current.txt files have 200 lines of words/data per file but one of the words is unique("92222225") with every current.txt file.
so the newly created output files from streamwriter becomes 92222225.txt, 933333334.txt and so on.
the whole time i though what i needed was within streamreader or streamwriter.
but what i need to add to the two was "Regex.Match".
here is the code i figured out to use for pulling strings out of a .txt file to use as a name for the output files. also to add other words to the new output file.
string mydocpath = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + #"\reporting";
foreach (string txtName in Directory.EnumerateFiles(mydocpath, "*.txt"))
{
StringBuilder sb = new StringBuilder();
StreamReader sr = new StreamReader(txtName);
string content = sr.ReadToEnd();
sb.AppendLine(txtName.ToString());
sb.AppendLine("= = = = = =");
sb.Append(content);
if (content.Contains("helloworld"))
{
sb.AppendLine();
sb.AppendLine("byeworld");
}
sb.AppendLine();
sb.AppendLine();
//string fileName = content.Contains("helloworld").ToString();
string FindMatch = content;
Match match = Regex.Match(FindMatch, #"9(([A-Za-z0-9\-])\d+)");
if (match.Success)
{
//this is what adds unique word as 922225.txt file name.
string capture = match.Groups[1].Captures[0].Value;
using (StreamWriter outfile = new StreamWriter(mydocpath + #"\" + capture + ".txt"))
{
outfile.Write(sb.ToString());
}
}
}
i updated this whole post so if anyone else may need this.
never even used Regex.Match before, nor knew about it or maybe i forgot about it.
If you want to name your output file use:
using (StreamWriter outfile = new StreamWriter(mydocpath + #"\901232lOi.txt"))

Replace a line in text file without creating another file [duplicate]

I have two text files, Source.txt and Target.txt. The source will never be modified and contain N lines of text. So, I want to delete a specific line of text in Target.txt, and replace by an specific line of text from Source.txt, I know what number of line I need, actually is the line number 2, both files.
I haven something like this:
string line = string.Empty;
int line_number = 1;
int line_to_edit = 2;
using StreamReader reader = new StreamReader(#"C:\target.xml");
using StreamWriter writer = new StreamWriter(#"C:\target.xml");
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
writer.WriteLine(line);
line_number++;
}
But when I open the Writer, the target file get erased, it writes the lines, but, when opened, the target file only contains the copied lines, the rest get lost.
What can I do?
the easiest way is :
static void lineChanger(string newText, string fileName, int line_to_edit)
{
string[] arrLine = File.ReadAllLines(fileName);
arrLine[line_to_edit - 1] = newText;
File.WriteAllLines(fileName, arrLine);
}
usage :
lineChanger("new content for this line" , "sample.text" , 34);
You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). If your files are small then reading the entire target file into memory and then writing it out again might make sense. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2; // Warning: 1-based indexing!
string sourceFile = "source.txt";
string destinationFile = "target.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read the old file.
string[] lines = File.ReadAllLines(destinationFile);
// Write the new file over the old file.
using (StreamWriter writer = new StreamWriter(destinationFile))
{
for (int currentLine = 1; currentLine <= lines.Length; ++currentLine)
{
if (currentLine == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(lines[currentLine - 1]);
}
}
}
}
}
If your files are large it would be better to create a new file so that you can read streaming from one file while you write to the other. This means that you don't need to have the whole file in memory at once. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2;
string sourceFile = "source.txt";
string destinationFile = "target.txt";
string tempFile = "target2.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read from the target file and write to a new file.
int line_number = 1;
string line = null;
using (StreamReader reader = new StreamReader(destinationFile))
using (StreamWriter writer = new StreamWriter(tempFile))
{
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(line);
}
line_number++;
}
}
// TODO: Delete the old file and replace it with the new file here.
}
}
You can afterwards move the file once you are sure that the write operation has succeeded (no excecption was thrown and the writer is closed).
Note that in both cases it is a bit confusing that you are using 1-based indexing for your line numbers. It might make more sense in your code to use 0-based indexing. You can have 1-based index in your user interface to your program if you wish, but convert it to a 0-indexed before sending it further.
Also, a disadvantage of directly overwriting the old file with the new file is that if it fails halfway through then you might permanently lose whatever data wasn't written. By writing to a third file first you only delete the original data after you are sure that you have another (corrected) copy of it, so you can recover the data if the computer crashes halfway through.
A final remark: I noticed that your files had an xml extension. You might want to consider if it makes more sense for you to use an XML parser to modify the contents of the files instead of replacing specific lines.
When you create a StreamWriter it always create a file from scratch, you will have to create a third file and copy from target and replace what you need, and then replace the old one.
But as I can see what you need is XML manipulation, you might want to use XmlDocument and modify your file using Xpath.
You need to Open the output file for write access rather than using a new StreamReader, which always overwrites the output file.
StreamWriter stm = null;
fi = new FileInfo(#"C:\target.xml");
if (fi.Exists)
stm = fi.OpenWrite();
Of course, you will still have to seek to the correct line in the output file, which will be hard since you can't read from it, so unless you already KNOW the byte offset to seek to, you probably really want read/write access.
FileStream stm = fi.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
with this stream, you can read until you get to the point where you want to make changes, then write. Keep in mind that you are writing bytes, not lines, so to overwrite a line you will need to write the same number of characters as the line you want to change.
I guess the below should work (instead of the writer part from your example). I'm unfortunately with no build environment so It's from memory but I hope it helps
using (var fs = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite)))
{
var destinationReader = StreamReader(fs);
var writer = StreamWriter(fs);
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
destinationReader .ReadLine();
}
line_number++;
}
}
The solution works fine. But I need to change single-line text when the same text is in multiple places. For this, need to define a trackText to start finding after that text and finally change oldText with newText.
private int FindLineNumber(string fileName, string trackText, string oldText, string newText)
{
int lineNumber = 0;
string[] textLine = System.IO.File.ReadAllLines(fileName);
for (int i = 0; i< textLine.Length;i++)
{
if (textLine[i].Contains(trackText)) //start finding matching text after.
traced = true;
if (traced)
if (textLine[i].Contains(oldText)) // Match text
{
textLine[i] = newText; // replace text with new one.
traced = false;
System.IO.File.WriteAllLines(fileName, textLine);
lineNumber = i;
break; //go out from loop
}
}
return lineNumber
}

Fastest way to find strings in a file

I have a log file that is not more than 10KB (File size can go up to 2 MB max) and I want to find if atleast one group of these strings occurs in the files. These strings will be on different lines like,
ACTION:.......
INPUT:...........
RESULT:..........
I need to know atleast if one group of above exists in the file. And I have do this about 100 times for a test (each time log is different, so I have reload and read the log), so I am looking for fastest and bets way to do this.
I looked up in the forums for finding the fastest way, but I dont think my file is too big for those silutions.
Thansk for looking.
I would read it line by line and check the conditions. Once you have seen a group you can quit. This way you don't need to read the whole file into memory. Like this:
public bool ContainsGroup(string file)
{
using (var reader = new StreamReader(file))
{
var hasAction = false;
var hasInput = false;
var hasResult = false;
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (!hasAction)
{
if (line.StartsWith("ACTION:"))
hasAction = true;
}
else if (!hasInput)
{
if (line.StartsWith("INPUT:"))
hasInput = true;
}
else if (!hasResult)
{
if (line.StartsWith("RESULT:"))
hasResult = true;
}
if (hasAction && hasInput && hasResult)
return true;
}
return false;
}
}
This code checks if there is a line starting with ACTION then one with INPUT and then one with RESULT. If the order of those is not important then you can omit the if () else if () checks. In case the line does not start with the strings replace StartsWith with Contains.
Here's one possible way to do it:
StreamReader sr;
string fileContents;
string[] logFiles = Directory.GetFiles(#"C:\Logs");
foreach (string file in logFiles)
{
using (StreamReader sr = new StreamReader(file))
{
fileContents = sr.ReadAllText();
if (fileContents.Contains("ACTION:") || fileContents.Contains("INPUT:") || fileContents.Contains("RESULT:"))
{
// Do what you need to here
}
}
}
You may need to do some variation based on your exact implementation needs - for example, what if the word spans two lines, does the line need to start with the word, etc.
Added
Alternate line-by-line check:
StreamReader sr;
string[] lines;
string[] logFiles = Directory.GetFiles(#"C:\Logs");
foreach (string file in logFiles)
{
using (StreamReader sr = new StreamReader(file)
{
lines = sr.ReadAllLines();
foreach (string line in lines)
{
if (line.Contains("ACTION:") || line.Contains("INPUT:") || line.Contains("RESULT:"))
{
// Do what you need to here
}
}
}
}
Take a look at How to Read Text From a File. You might also want to take a look at the String.Contains() method.
Basically you will loop through all the files. For each file read line-by-line and see if any of the lines contains 1 of your special "Sections".
You don't have much of a choice with text files when it comes to efficiency. The easiest way would definitely be to loop through each line of data. When you grab a line in a string, split it on the spaces. Then match those words to your words until you find a match. Then do whatever you need.
I don't know how to do it in c# but in vb it would be something like...
Dim yourString as string
Dim words as string()
Do While objReader.Peek() <> -1
yourString = objReader.ReadLine()
words = yourString.split(" ")
For Each word in words()
If Myword = word Then
do stuff
End If
Next
Loop
Hope that helps
This code sample searches for strings in a large text file. The words are contained in a HashSet. It writes the found lines in a temp file.
if (File.Exists(#"temp.txt")) File.Delete(#"temp.txt");
String line;
String oldLine = "";
using (var fs = File.OpenRead(largeFileName))
using (var sr = new StreamReader(fs, Encoding.UTF8, true))
{
HashSet<String> hash = new HashSet<String>();
hash.Add("house");
using (var sw = new StreamWriter(#"temp.txt"))
{
while ((line = sr.ReadLine()) != null)
{
foreach (String str in hash)
{
if (oldLine.Contains(str))
{
sw.WriteLine(oldLine);
// write the next line as well (optional)
sw.WriteLine(line + "\r\n");
}
}
oldLine = line;
}
}
}

Categories