Extract basename and extension from file name - file.txt.1

Extract basename and extension from file name - file.txt.1 - c#

One of our tool here maintains file in format like - file.txt.1.
Last 1 is numeric value always and is increased on each file save , so that file system has multiple files.
Now in another C# application I would like to process these file name. In this I want to split basename and extension.
I used Path.GetFilenameWithoutExtension() and Path.GetExtension(). In this case they will return file.txt and .1 respectively.
This forced me to run another round of call to get file name without extension and file extension.
Is there any simple/smart way to have basename and extension?

I don't think that there is a much SIMPLER way than your first idea.
Nevertheless, using a regular expression for this has the advantage that you have better control over filtering out the file with the correct syntax. I just wrote a small sample:
...
using System.IO;
using System.Text.RegularExpressions;
...
string path = #"C:\temp";
Regex numExtRegex = new Regex(#"^(.*)\.(\d+)$");
foreach (string file in Directory.GetFiles(path))
{
Match match = numExtRegex.Match(file);
if (match.Success)
{
string originalFile = match.Groups[1].Value;
string numericExtension = match.Groups[2].Value;
string originalFileNameWithoutExtension = Path.GetFileNameWithoutExtension(originalFile);
string extension = Path.GetExtension(originalFile);
Console.WriteLine("File: {0}, numeric extension: {1}, file name w/o ext: {2}, ext: {3}",
originalFile, numericExtension, originalFileNameWithoutExtension, extension);
}
}
The regular expression looks for something.digits
Using this way of filtering, you can be sure that you don't handle e.g. a readme.txt which someone placed into the directory...

Related

Changing file name

Consider the following code snippet
public static string AppendDateTimeToFileName(this string fileName)
{
return string.Concat(
Path.GetFileNameWithoutExtension(fileName),
DateTime.Now.ToString("yyyyMMddHHmmssfff"),
Path.GetExtension(fileName));
}
This basically puts a date time stamp on any file that is being uploaded by the users. Now this works great is the file name is something like
MyFile.png
AnotherFile.png
Now I'm trying to change this method so if the file name is something like
MyFile - Copy(1).png
AnotherFile - Copy(1).png
I want the file name to become
MyFile-Copy-120170303131815555.png
AnotherFile-Copy-120170303131815555.png
If there an easy soltuion for this with regex or similar or do I have to re-write the method again and check each of those values one by one.

return string.Concat(
Regex.Replace(Path.GetFileNameWithoutExtension(fileName), #" - Copy\s*\(\d+\)", "-Copy-", RegexOptions.IgnoreCase),
DateTime.Now.ToString("yyyyMMddHHmmssfff"),
Path.GetExtension(fileName));
This matches any number of digits and is a global replace.

Find and Delete Characters in String

My program reads registry key values and combines those values with the installation path. I also read the installation path from the registry.
i.e. String dllInstpath = installPath + rKey which equals to:
C:\Program Files (x86)\NSi\AutoStore Workflow 6\HpOXPdCaptureRes.dll
I then use FileVersionInfo on the string above to get the file information of HpOXPdCaptureRes.dll from it's install path and write all the values to a notepad.
My problem is the TRUE dll name does not have 'Res' in the file name. The registry only has the file name with 'Res' in the file name. What I need to do is read from a text file and find all 'Res' and remove them from the line of text within the notepad file.
So the output should look like this:
Current:
HpOXPdCaptureRes.dll
New:
HpOXPdCapture.dll
I have read online and I see the best way to do this is to use ReadAllLines and WriteAllLines. However I am not sure how to implement the find and replace. I have seen a lot of examples on how to remove spaces, invalid characters, etc., but I haven't been able to find an example for what I need.
Summary:
Read text file
Fine Res in all lines of text and remove
Retain current text file, i.e. remove Res and close file
Any help is greatly appreciated.
Thank you!

You can use File.ReadAllLines and File.WriteAllLines.
Example:
Read all the lines and replace the value you want on each line, then write the lines again
File.WriteAllLines("textFilePath",File.ReadAllLines("textFilePath").Select(line => line.Replace("Res.dll", ".dll")));

Just open the file and read all lines using 'File.ReadAllLines'. Then use Replace to remove the Res:
string[] lines = File.ReadAllLines("yourFileName");
var output = lines.Select(x => x.Replace("Res.dll", ".dll")).ToArray();
To later save them back you can use File.WriteAllLines:
File.WriteAllLines("yourFileName", output);

Read everything from file, replace all occurrences of 'res' and write to file:
String filename = "fileName";
StreamReader sr = new StreamReader(filename);
String content = sr.ReadToEnd();
sr.Close();
StreamWriter sw = new StreamWriter(filename);
sw.Write(content.Replace("res", ""));
sw.Close();

If the string you are replacing is guaranteed to be unique in the string - "res.dll" at the end of the string for instance - then you can use Replace method of the String class to do the replacement:
List<string> lines = File.ReadAllLines(sourceFile);
lines = lines.select(l => l.Replace("res.dll", ".dll").ToList();
Or if case sensitivity is an issue:
lines = lines.Select(l => l.Substring(l.Length - 7).ToLower() == "res.dll" ? l.Substring(0, l.Length - 7) + ".dll" : l).ToList();
For more complex cases you might need to use a regular expression to identify the section of the string to replace. Or you might want to split the string int path and filename, modify the filename and join it back together.

How to match two paths pointing to the same file?

I have two lists containing paths to a directory of music files and I want to determine which of these files are stored on both lists and which are only stored on one. The problem lies in that the format of the paths differ between the two lists.
Format example:
List1: file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3
List2: \\FILE\Musik\30 Seconds To Mars.mp3
How do I go about comparing these two file paths and matching them to the same source?

The answer depends on your notion of "same file". If you merely want to check if the file is equal, but not the very same file, you could simply generate a hash over the file's content and compare that. If the hashes are equal (please use a strong hash, like SHA-256), you can be confident that the files are also. Likewise you could of course also compare the files byte by byte.
If you really want to figure that the two files are actually the same file, i.e. just addressed via different means (like file-URL or UNC path), you have a little more work to do.
First you need to find out the true file system path for each of the addresses. For example, you need to find the file system path behind the UNC path and/or file-URL (which typically is the URL itself). In the case of UNC paths, that are shares on a remote computer, you might even be able to do so.
Also, even if you have the local path figured out somehow, you also need to deal with different redirection mechanisms for local paths (on Windows junctions/reparse points/links; on UNIX symbolic or hard links). For example, you could have a share using file system link as source, while the file URL uses the true source path. So to the casual observer they still look like different files.
Having all that said, the "algorithm" would be something like this:
Figure out the source path for the URLs, UNC paths/shares, etc. you have
Figure out the local source path from those paths (considering links/junctions, subst.exe, etc.)
Normalize those paths, if necessary (i.e. a/b/../c is actually a/c)
Compare the resulting paths.

I think the best way to do it is by temporarily converting one of the paths to the other one's format. I would suggest you change the first to match the second.
string List1 = "file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3"
string List2 = "\\FILE\Musik\30 Seconds To Mars.mp3"
I would recommend you use Replace()-method.
Get rid of "file://localhost":
var tempStr = List1.Replace("file://localhost", "");
Change all '%20' into spaces:
tempStr = List1.Replace("%20", " ");
Change all '/' into '\':
tempStr = List1.Replace("/", "\");
Voilà! To strings in matching format!

Use python: you can easily compare the two files like this
>>> import filecmp
>>> filecmp.cmp('file1.txt', 'file1.txt')
True
>>> filecmp.cmp('file1.txt', 'file2.txt')
False
to open the files with the file:// syntax use URLLIB
>>> import urllib
>>> file1 = urllib.urlopen('file://localhost/tmp/test')
for the normal files path use the standard file open.
>>> file2 = open('/pathtofile','r')

I agree completely with Christian, you should re-think structure of the lists, but the below should get you going.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication5
{
class Program
{
public static List<string> SanitiseList(List<string> list)
{
List<string> sanitisedList = new List<string>();
foreach (string filename in list)
{
String sanitisedFilename = String.Empty;
if (!String.IsNullOrEmpty(filename))
{
sanitisedFilename = filename;
// get rid of the encoding
sanitisedFilename = Uri.UnescapeDataString(sanitisedFilename);
// first of all change all back-slahses to forward slashes
sanitisedFilename = sanitisedFilename.Replace(#"\", #"/");
// if we have two back-slashes at the beginning assume its localhsot
if (sanitisedFilename.Substring(0, 2) == "//")
{
// remove these first double slashes and stick in localhost
sanitisedFilename = sanitisedFilename.TrimStart('/');
sanitisedFilename = sanitisedFilename = "//localhost" + "/" + sanitisedFilename;
}
// remove file
sanitisedFilename = sanitisedFilename.Replace(#"file://", "//");
// remove double back-slashes
sanitisedFilename = sanitisedFilename.Replace("\\", #"\");
// remove double forward-slashes (but not the first two)
sanitisedFilename = sanitisedFilename.Substring(0,2) + sanitisedFilename.Substring(2, sanitisedFilename.Length - 2).Replace("//", #"/");
}
if (!String.IsNullOrEmpty(sanitisedFilename))
{
sanitisedList.Add(sanitisedFilename);
}
}
return sanitisedList;
}
static void Main(string[] args)
{
List<string> listA = new List<string>();
List<string> listB = new List<string>();
listA.Add("file://localhost//FILE/Musik/BritneySpears.mp3");
listA.Add("file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3");
listB.Add("file://localhost//FILE/Musik/120%20Seconds%20To%20Mars.mp3");
listB.Add(#"\\FILE\Musik\30 Seconds To Mars.mp3");
listB.Add(#"\\FILE\Musik\5 Seconds To Mars.mp3");
listA = SanitiseList(listA);
listB = SanitiseList(listB);
List<string> missingFromA = listB.Except(listA).ToList();
List<string> missingFromB = listA.Except(listB).ToList();
}
}
}

Extract File extensions using regular expression in C#

I wanna write a regular expression that can extract file types from a string.
the string is like:
Text Files
(.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|PDF
Files (.pdf)|.pdf|Excel Files
(.xls;.xlsx;.xlsm;.xlsb;.xlam;.xltx;.xltm;.xlw)
result e.g.
.prn

You have the dialog filterformat.
The extensions already appear twice (first appearance is unreliable) and when you try to handle this with a RegEx directly you'll have to think about
Text.Files (.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|
etc.
It looks safer to follow the known structure:
string filter = "Text Files (.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|PDF Files (.pdf)|.pdf|Excel Files (.xls;.xlsx;.xlsm;.xlsb;.xlam;.xltx;.xltm;.xlw)";
string[] filterParts = filter.Split("|");
// go through the odd sections
for (int i = 1; i < filterParts.Length; i += 2)
{
// approx, you may want some validation here first
string filterPart = filterParts[i];
string[] fileTypes = filterPart.Split(";");
// add to collection
}
This (only) requires that the filter string has the correct syntax.

Regex extensionRegex = new Regex(#"\.\w+");
foreach(Match m in extensionRegex.Matches(text))
{
Console.WriteLine(m.Value);
}

If that string format you have there is fairly fixed, then the following should work:
\.[^.;)]+

finding a number in filename using regex

I don't have much experience with regexes and I wanted to rectify that. I decided to build an application that takes a directory name, scans all files (that all have a increasing serial number but differ subtly in their filenames. Example : episode01.mp4, episode_02.mp4, episod03.mp4, episode04.rmvb etc.)
The application should scan the directory, find the number in each file name and rename the file along wit the extension to a common format (episode01.mp4,episode02.mp4,episode03.mp4,episode04.rmvb etc.).
I have the following code:
Dictionary<string, string> renameDictionary = new Dictionary<string,string>();
DirectoryInfo dInfo = new DirectoryInfo(path);
string newFormat = "Episode{0}.{1}";
Regex regex = new Regex(#".*?(?<no>\d+).*?\.(?<ext>.*)"); //look for a number(before .) aext: *(d+)*.*
foreach (var file in dInfo.GetFiles())
{
string fileName = file.Name;
var match = regex.Match(fileName);
if (match != null)
{
GroupCollection gc = match.Groups;
//Console.WriteLine("Number : {0}, Extension : {2} found in {1}.", gc["no"], fileName,gc["ext"]);
renameDictionary[fileName] = string.Format(newFormat, gc["no"], gc["ext"]);
}
}
foreach (var renamePair in renameDictionary)
{
Console.WriteLine("{0} will be renamed to {1}.", renamePair.Key, renamePair.Value);
//stuff for renaming here
}
One problem in this code is that it also includes files which don't have numbers in the renameDictionary. It would also be helpful if you could point out any other gotchas that I should be careful about.
PS: I am assuming that the filenames will only contain numbers corresponding to serial (nothing like cam7_0001.jpg)

This simplest solution is probably to use Path.GetFileNameWithoutExtension to get the file name, and then the regex \d+$ to get the number at its end (or Path.GetExtension and \d+ to get the number anywhere).
You can also achieve this in a single replace:
Regex.Replace(fileName, #".*?(\d+).*(\.[^.]+)$", "Episode$1$2")
This regex is a bit better, in that it forces the extension not to contain dots.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.