How to obtain regex matched string 's file path? - c#

I have successfully regex matched multiple string from a folder with txt.files with "streamreader" but i also need to obtain the matched string's file path.
How am i able to obtain the matched string's file paths?
static void abnormalitiescheck()
{
int count = 0;
Regex regex = new Regex(#"(#####)");
DirectoryInfo di = new DirectoryInfo(txtpath);
Console.WriteLine("No" + "\t" + "Name and location of file" + "\t" + "||" +" " + "Abnormal Text Detected");
Console.WriteLine("=" + "\t" + "=========================" + "\t" + "||" + " " + "=======================");
foreach (string files in Directory.GetFiles(txtpath, "*.txt"))
{
using (StreamReader reader = new StreamReader(files))
{
string line;
while ((line = reader.ReadLine()) != null)
{
Match match = regex.Match(line);
if (match.Success)
{
count++;
Console.WriteLine(count + "\t\t\t\t\t" + match.Value + "\n");
}
}
}
}
}
If possible , i want to have output of the strings's file path as well.
For e.g.,
C:/..../email_4.txt
C:/..../email_7.txt
C:/..../email_8.txt
C:/..../email_9.txt

As you already have the DirectoryInfo, you could get the FullName property.
You also have the filename called files. To get the name and location of the file, you could use Path.Combine
Your updated code could look like:
Console.WriteLine(count + "\t" + Path.Combine(di.FullName , Path.GetFileName(files)) + "\t" + match.Value + "\n");

I'm guessing that we might just want to maybe match some .txt files. If that might be the case, let's start with a simple expression that would collect everything from the start of our input strings up to .txt, then we add .txt as a right boundary:
^(.+?)(.txt)$
Demo
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"^(.+?)(.txt)$";
string input = #"C:/..../email_4.txt
C:/..../email_7.txt
C:/..../email_8.txt
C:/..../email_9.txt";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}

Related

How to add a string of text into another txt file

I have this txt file that contains this text:
MSH^~|\&^R3POCQUERYS^050~BCMABU.MED.VA.GOV~DNS^R3POCQUERYR^^201711081317040500^^RQC~I06^50279320^D^2.5^^^AL^NE^USA
QRD^20171108131704-0500^R^I^WQRY^^^^SSN~%ABC123^9A-MED~WA0034^^^T
but I only want the values that come after SSN~% and after the MED~
I want to be able read from the Line that starts with QRD and then be able to grab ANY value after SSN~% and MED~, so the value can be anything I'm just using ABC123 and WA0034 as examples.
Form1.cs
private void Parse(string filename)
{
string line;
var str = File.ReadAllText(filename);
System.IO.StreamReader file = new System.IO.StreamReader(filename);
targetRichTextBox = richTextBox1;
WriteTextSafelyInRichTextBox(str);
while ((line = file.ReadLine()) != null)
{
if ((line.Contains("QRD"))
{
//Enter code here
}
}
char[] delimiterChars = { '^' };
string[] words = str.Split(delimiterChars);
var createText = (RetrunTemplate.Get().Replace(words[24], "VHIC-").Replace(words[25], "9A-MED~WA0034"));
var outputFilename = outputDir + "\\OutboundMessage - " + DateTime.UtcNow.ToString("yyyy-MM-dd HH-mm-ss-ff", CultureInfo.InvariantCulture) + ".txt";
File.WriteAllText(outputFilename, createText);
targetRichTextBox = richTextBox2;
WriteTextSafelyInRichTextBox(createText);
file.Close();
File.Delete(filename);
MessageBox.Show("You have successfuly creatd an outbound Message");
}
RetrunTemplate
class RetrunTemplate
{
public static string Get()
{
string retrunTemplate = #"MSH^~|\&^R3POCSEND^442~CHEY209.FO-BAYPINES.MED.VA.GOV~DNS^R3POCRCV^^20171108131710-0400^^RCL~I06^442157252912^D^2.5^^^AL^NE^USA" + Environment.NewLine +
"PID^^^4420041228V165312~~~USVHA&&0363~NI~VA FACILITY ID&442&L~~20171108|666393848~~~" + Environment.NewLine +
#"USSSA&&0363~SS~VA FACILITY ID&442&L|""~~~USDOD&&0363~TIN~VA FACILITY ID&442&L" + Environment.NewLine +
#"""~~~USDOD&&0363~FIN~VA FACILITY ID&442&L|7209344~~~USVHA&&0363~PI~VA FACILITY ID&442&L" + Environment.NewLine +
#"^VHIC-ABC123~~~USVHA&&0363~PI~VA FACILITY ID&742V1&L^ZEIGLER~PG~EIGHT~~~~L" + Environment.NewLine +
#"|""~~~~~~N^^19220304^M^^^9234234~""~SAN FRANCISCO~CA~94114~USA~P~""~075|~~SAN JOSE~CO~~""~N^^""^^^^^^^^^^^^^^^^^^" + Environment.NewLine +
#"PV1^^^9A-MED" + Environment.NewLine + "HH1^WA0034";
return retrunTemplate;
}
}
Suppose you read the file line by line. You can validate each line against the following Regex, and extract what you want.
var text = "QRD^20171108131704-0500^R^I^WQRY^^^^SSN~%ABC123^9A-MED~WA0034^^^T";
var rgx = new Regex(#"QRD.+SSN~%(.+)MED~(.+)");
var match = rgx.Match(text);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
Console.WriteLine(match.Groups[2].Value);
}
The match.Groups[1] has ABC123^9A-, and match.Groups[2] has WA0034^^^T. You can now do what you will with those text.
Regex Breakdown
#"QRD.+SSN~%(.+)MED~(.+)"
QRD - Starts with the string QRD
.+ - Followed by one or more characters
SSN~% - Followed by SSN~~%
(.+) - Grab (to Groups[1]) one or more characters between SSN~% and MED~
MED! - Followed by MED~
(.+) - Grab everything else in the line to Groups[2]
Here's my effort.
var input = #"MSH^~|\&^R3POCQUERYS^050~BCMABU.MED.VA.GOV~DNS^R3POCQUERYR^^201711081317040500^^RQC~I06^50279320^D^2.5^^^AL^NE^USA
QRD^20171108131704-0500^R^I^WQRY^^^^SSN~%ABC123^9A-MED~WA0034^^^T" ;
var pattern = #"SSN\~\%([A-Z0-9]+).*MED\~([A-Z0-9]+)";
var matches = Regex.Matches(input, pattern, RegexOptions.Multiline).
Select( m => new { SSN = m.Groups[1].Value, MED = m.Groups[2].Value});
foreach(var m in matches ) {
Console.WriteLine($"SSN = {m.SSN}, MED = {m.MED}");
}
Output
SSN = ABC123, MED = WA0034
With QRD matching
var input = #"MSH^~|\&^R3POCQUERYS^050~BCMABU.MED.VA.GOV~DNS^R3POCQUERYR^^201711081317040500^^RQC~I06^50279320^D^2.5^^^AL^NE^USA
QRD^20171108131704-0500^R^I^WQRY^^^^SSN~%ABC123^9A-MED~WA0034^^^T";
var pattern = #"SSN\~\%([A-Z0-9]+).*MED\~([A-Z0-9]+)";
var matches = input
.Split()
.Where(l => l.StartsWith("QRD"))
.Select(l => Regex.Matches(l, pattern).Select(m => new { SSN = m.Groups[1].Value, MED = m.Groups[2].Value }));
foreach (var groups in matches)
{
foreach (var g in groups)
{
Console.WriteLine($"SSN = {g.SSN}, MED = {g.MED}");
}
}
Output
SSN = ABC123, MED = WA0034

C# - Get All Words Between Chars

I have string:
string mystring = "hello(hi,mo,wo,ka)";
And i need to get all arguments in brackets.
Like:
hi*mo*wo*ka
I tried that:
string res = "";
string mystring = "hello(hi,mo,wo,ka)";
mystring.Replace("hello", "");
string[] tokens = mystring.Split(',');
string[] tokenz = mystring.Split(')');
foreach (string s in tokens)
{
res += "*" + " " + s +" ";
}
foreach (string z in tokenz)
{
res += "*" + " " + z + " ";
}
return res;
But that returns all words before ",".
(I need to return between
"(" and ","
"," and ","
"," and ")"
)
You can try to use \\(([^)]+)\\) regex get the word contain in brackets,then use Replace function to let , to *
string res = "hello(hi,mo,wo,ka)";
var regex = Regex.Match(res, "\\(([^)]+)\\)");
var result = regex.Groups[1].Value.Replace(',','*');
c# online
Result
hi*mo*wo*ka
This way :
Regex rgx = new Regex(#"\((.*)\)");
var result = rgx.Match("hello(hi,mo,wo,ka)");
Split method has an override that lets you define multiple delimiter chars:
string mystring = "hello(hi,mo,wo,ka)";
var tokens = mystring.Replace("hello", "").Split(new[] { "(",",",")" }, StringSplitOptions.RemoveEmptyEntries);

How to increment a variable based on a column parsed from a list of files in a directory

I'm trying to create an import file based on a list of files in a directory. The problem is now I've been asked to increment the value of one of the output directories based on a specific part of the file name.
Below the LinkName is hard coded to Full Image_0. I actually want it to be more like LinkName = "Full Image_" + intXYZ.toString();
intXYZ would be a variable that starts at 0 and goes up by 1 for each file with identical PartNums.
Here is relevant code:
ImageName = Directory.GetFiles(#"\\192.168.0.144\iApps_Final_Images\ProductImages\", "*.*", SearchOption.AllDirectories);
ImageItem = Directory.GetFiles(#"\\192.168.0.144\iApps_Final_Images\ProductImages\", "*.*", SearchOption.AllDirectories).Select(file => Path.GetFileNameWithoutExtension(file)).ToArray();
// Set path for output file and open
FilePath = #"\\vhome\public\p21\Images.txt";
var writer = new StreamWriter(FilePath);
// Go through each file found (not named thumbs) and output row of data needed for inv_mast_links import
foreach (var item in ImageName)
{
SetNum = SetNum + 1;
LinkPath = item.ToString();
PartNum = ImageItem[SetNum - 1].ToString().Split('_').Last();
LinkName = "Full Image_0";
var line = SetNum + delimiter + PartNum + delimiter + LinkName + delimiter + LinkPath + delimiter + "Item Maintenance";
if (PartNum != "Thumbs")
{
writer.WriteLine(line);
}
}
// Close the output file
writer.Close();
Example: If files in directory are: TVImage_567.jpg, FrontView_888.jpg, BackView_888.jpg
then since two of the images are for the same PartNum of 888 the desired three LinkName outputs would be: Full Image_0, Full Image_0, Full Image_1.
If I understood your question correctly this is what you want:
Dictionary<string, Integer> counts = new Dictionary<string, Integer>();
foreach (var item in ImageName)
{
SetNum = SetNum + 1;
LinkPath = item.ToString();
PartNum = ImageItem[SetNum - 1].ToString().Split('_').Last();
if (counts.ContainsKey(PartNum)) {
counts[PartNum]++;
}
else
{
counts.Add(PartNum, 0);
}
LinkName = "Full Image_" + counts[PartNum];
var line = SetNum + delimiter + PartNum + delimiter + LinkName + delimiter + LinkPath + delimiter + "Item Maintenance";
if (PartNum != "Thumbs")
{
writer.WriteLine(line);
}
}
It is not quite clear what SetNum does so I just left it there. The idea is simply to store your the previous item's "number" and increment a counter until the actual "number" isn't equal to the previous one.
How about this, You dont require a SetNum
string[] ImageNames = Directory.GetFiles(#"Source-Directory\", "*.*", SearchOption.AllDirectories);
string[] ImageItem = ImageNames.Select(file => Path.GetFileNameWithoutExtension(file)).ToArray();
string FilePath = #"Destination.txt";
using (var writer = new StreamWriter(FilePath))
{
for (int i = 1; i < ImageNames.Length - 1; i++)
{
string LinkPath = ImageNames[i].ToString();
string PartNum = ImageItem[i - 1].ToString().Split('_').Last();
string LinkName = "Full Image_" + i;
var line = i + delimiter + PartNum + delimiter + LinkName + delimiter + LinkPath + delimiter + "Item Maintenance";
if (PartNum != "Thumbs")
{
writer.WriteLine(line);
}
}
}

Find number pattern in string and remove it

I need to remove a pattern from a string, I think regex could do the job, but I'm having trouble solving this.
The pattern must be in the end of the string.
string fileName = "File (123)";
string pattern = " (0)";
string cleanName = PatternRemover(fileName, pattern);
//Should result in: cleanName == "File"
Edit:
Ok, here is the code that I'm using now after your answers:
public static string GetNextFilePath2(string fullPath, ref uint id, string idFormat)
{
string dir = Path.GetDirectoryName(fullPath);
string ext = Path.GetExtension(fullPath);
string fileNameNoExt = Path.GetFileNameWithoutExtension(fullPath);
if (ext.Length > 0 && ext[0] != '.')
ext = "." + ext;
string baseName = Regex.Replace(fileNameNoExt, #"\s\(\d+\)", "");
string fileName = baseName + " (" + id.ToString(idFormat) + ")" + ext;
string path = Path.Combine(dir, fileName);
while (File.Exists(path))
{
id++;
fileName = baseName + " (" + id.ToString(idFormat) + ")" + ext;
path = Path.Combine(dir, fileName);
}
return path;
}
It works, but:
It always start to count from id, I think it may be better to start
from the file name number.
I was hopping to use something like "(0)" as a method parameter that would indicate the pattern to be removed and also the "(" would be parametrized. I'm doing it "manually" now on this line: string fileName = baseName + " (" + id.ToString(idFormat) + ")" + ext;
You can do that without REGEX like:
string newFileName = new String(fileName
.Where(r => !char.IsDigit(r)
&& r != '('
&& r != ')'
&& r != ' ').ToArray());
This would give you File.jpg
If you only want to get the file name then you can use:
string fileNameWithoutPath = Path.GetFileNameWithoutExtension(newFileName);
// it would give you `File`
Using regex:
var subject = "File (123).jpg";
var fileNameWithExtension = Regex.Replace(subject,#"\s*\(\d+\)","");
var fileNameWithoutPath = Path.GetFileNameWithoutExtension(fileNameWithExtension);
And thanks for #habib, I'd not have come with Path.GetFileNameWithoutExtension in this for stripping the extension.
You could use:
\s\(\d+\)\.jpg
assuming you do actually want the extension removed and the extension is always ".jpg". Otherwise:
\s\(\d+\)
Looks for a set of digits in brackets proceeded by a space.

Problem with Existing File Name & Creating a Unique File Name

I have this code:
public void FileCleanup(List<string> paths)
{
string regPattern = (#"[~#&!%+{}]+");
string replacement = "";
string replacement_unique = "_";
Regex regExPattern = new Regex(regPattern);
List<string> existingNames = new List<string>();
StreamWriter errors = new StreamWriter(#"C:\Documents and Settings\jane.doe\Desktop\SharePointTesting\Errors.txt");
StreamWriter resultsofRename = new StreamWriter(#"C:\Documents and Settings\jane.doe\Desktop\SharePointTesting\Results of File Rename.txt");
foreach (string files2 in paths)
try
{
string filenameOnly = Path.GetFileName(files2);
string pathOnly = Path.GetDirectoryName(files2);
string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
string sanitized = Path.Combine(pathOnly, sanitizedFileName);
if (!System.IO.File.Exists(sanitized))
{
existingNames.Add(sanitized);
try
{
foreach (string names in existingNames)
{
string filename = Path.GetFileName(names);
string filepath = Path.GetDirectoryName(names);
string cleanName = regExPattern.Replace(filename, replacement_unique);
string scrubbed = Path.Combine(filepath, cleanName);
System.IO.File.Move(names, scrubbed);
//resultsofRename.Write("Path: " + pathOnly + " / " + "Old File Name: " + filenameOnly + "New File Name: " + sanitized + "\r\n" + "\r\n");
resultsofRename = File.AppendText("Path: " + filepath + " / " + "Old File Name: " + filename + "New File Name: " + scrubbed + "\r\n" + "\r\n");
}
}
catch (Exception e)
{
errors.Write(e);
}
}
else
{
System.IO.File.Move(files2, sanitized);
resultsofRename.Write("Path: " + pathOnly + " / " + "Old File Name: " + filenameOnly + "New File Name: " + sanitized + "\r\n" + "\r\n");
}
}
catch (Exception e)
{
//write to streamwriter
}
}
}
}
What i'm trying to do here is rename "dirty" filenames by removing invalid chars (defined in the Regex), replace them with "". However, i noticed if i have duplicate file names, the app does not rename them. I.e. if i have ##test.txt and ~~test.txt in the same folder, they'd be renamed to test.txt. So, i created another foreach loop that instead replaces the invalid char with a "_" versus a blank space.
Problem is, whenever i try to run this, nothing ends up happening! None of the files are renamed!
Can someone tell me if my code is incorrect and how to fix it?
ALSO-- does anybody know how i could replace the invalid char in the 2nd foreach loop with a different char everytime? That way if there are multiple instances of i.e. %Test.txt, ~Test.txt and #test.txt (all to be renamed to test.txt), they can somehow be uniquely named with a different char?
However, would you know how to replace the invalid char with a different unique character every time so that each filename remains unique?
This is one way:
char[] uniques = ",'%".ToCharArray(); // whatever chars you want
foreach (string file in files)
{
foreach (char c in uniques)
{
string replaced = regexPattern.Replace(file, c.ToString());
if (File.Exists(replaced)) continue;
// create file
}
}
You may of course want to refactor this into its own method. Take note also that the maximum number of files only differing by unique character is limited to the number of characters in your uniques array, so if you have a lot of files with the same name only differing by the special characters you listed, it might be wise to use a different method, such as appending a digit to the end of the file name.
how would i append a digit to the end of the file name (with a different # everytime?)
A slightly modified version of Josh's suggestion would work that keeps track of the modified file names mapped to the number of times the same file name has been generated after the replacement:
var filesCount = new Dictionary<string, int>();
string replaceSpecialCharsWith = "_"; // or "", whatever
foreach (string file in files)
{
string sanitizedPath = regexPattern.Replace(file, replaceSpecialCharsWith);
if (filesCount.ContainsKey(sanitizedPath))
{
filesCount[file]++;
}
else
{
filesCount.Add(sanitizedPath, 0);
}
string newFileName = String.Format("{0}{1}{2}",
Path.GetFileNameWithoutExtension(sanitizedPath),
filesCount[sanitizedPath] != 0
? filesCount[sanitizedPath].ToString()
: "",
Path.GetExtension(sanitizedPath));
string newFilePath = Path.Combine(Path.GetDirectoryName(sanitizedPath),
newFileName);
// create file...
}
just a suggestion
after removing/replacing the special characters append timestamp to the file name. timestamps are unique so appending them to filenames will give you a unique filename.
How about maintaining a dictionary of all renamed files, checking each file against it, and if already existing add a number to the end of it?
In response to the answer #Josh Smeaton's gave here's some sample code using a dictionary to keep track of the file names :-
class Program
{
private static readonly Dictionary<string,int> _fileNames = new Dictionary<string, int>();
static void Main(string[] args)
{
var fileName = GetUniqueFileName("filename.txt");
Console.WriteLine(fileName);
fileName = GetUniqueFileName("someotherfilename.txt");
Console.WriteLine(fileName);
fileName = GetUniqueFileName("filename.txt");
Console.WriteLine(fileName);
fileName = GetUniqueFileName("adifferentfilename.txt");
Console.WriteLine(fileName);
fileName = GetUniqueFileName("filename.txt");
Console.WriteLine(fileName);
fileName = GetUniqueFileName("adifferentfilename.txt");
Console.WriteLine(fileName);
Console.ReadLine();
}
private static string GetUniqueFileName(string fileName)
{
// If not already in the dictionary add it otherwise increment the counter
if (!_fileNames.ContainsKey(fileName))
_fileNames.Add(fileName, 0);
else
_fileNames[fileName] += 1;
// Now return the new name using the counter if required (0 means it's just been added)
return _fileNames[fileName].ToString().Replace("0", string.Empty) + fileName;
}
}

Categories