How to find folder below given section of a path? - c#

Given a path and a certain section, how can I find the name of the folder immediately below that section?
This is hard to explain, let me give some examples. Suppose I am looking for the name of the folder below 'Dev/Branches'. Below are example inputs, with the expected results in bold
C:\Code\Dev\Branches\ Latest \bin\abc.dll
C:\Dev\Branches\ 5.1
D:\My Documents\Branches\ 7.0 \Source\Tests\test.cs
I am using C#
Edit: I suppose I could use the regex /Dev/Branches/(.*?)/ capturing the first group, but is there a neater solution without regex? That regex would fail on the second case, anyway.

// starting path
string path = #"C:\Code\Dev\Branches\Latest\bin\abc.dll";
// search path
string search = #"Dev\Branches";
// find the index of the search criteria
int idx = path.IndexOf(search);
// determine whether to exit or not
if (idx == -1 || idx + search.Length >= path.Length) return;
// get the substring AFTER the search criteria, split it and take the first item
string found = path.Substring(idx + search.Length).Split("\\".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).First();
Console.WriteLine(found);

Here's the code that will do exactly what you expect:
public static string GetSubdirectoryFromPath(string path, string parentDirectory, bool ignoreCase = true)
{
// 1. Standarize the path separators.
string safePath = path.Replace("/", #"\");
string safeParentDirectory = parentDirectory.Replace("/", #"\").TrimEnd('\\');
// 2. Prepare parentDirectory to use in Regex.
string directory = Regex.Escape(safeParentDirectory);
// 3. Find the immediate subdirectory to parentDirectory.
Regex match = new Regex(#"(?:|.+)" + directory + #"\\([^\\]+)(?:|.+)", ignoreCase ? RegexOptions.IgnoreCase : RegexOptions.None);
// 4. Return the match. If not found, it returns null.
string subDirectory = match.Match(safePath).Groups[1].ToString();
return subDirectory == "" ? null : subDirectory;
}
A test code:
void Test()
{
string path1 = #"C:\Code\Dev\Branches\Latest\bin\abc.dll";
string path2 = #"C:\Dev\Branches\5.1";
string path3 = #"D:\My Documents\Branches\7.0\Source\test.cs";
Console.WriteLine("Matches:");
Console.WriteLine(GetSubdirectoryFromPath(path1, "dev/branches/") ?? "Not found");
Console.WriteLine(GetSubdirectoryFromPath(path1, #"Dev\Branches") ?? "Not found");
Console.WriteLine(GetSubdirectoryFromPath(path3, "D:/My Documents/Branches") ?? "Not found");
// Incorrect parent directory.
Console.WriteLine(GetSubdirectoryFromPath(path2, "My Documents") ?? "Not found");
// Case sensitive checks.
Console.WriteLine(GetSubdirectoryFromPath(path3, #"My Documents\Branches", false) ?? "Not found");
Console.WriteLine(GetSubdirectoryFromPath(path3, #"my Documents\Branches", false) ?? "Not found");
// Output:
//
// Matches:
// Latest
// Latest
// 7.0
// Not found
// 7.0
// Not found
}

Break it down into smaller steps and you can solve this yourself:
(Optional, depending on further requirements): Get the directory name (file is irrelevant): Path.GetDirectoryName(string)
Get its parent directory Directory.GetParent(string).
This comes down to;
var directory = Path.GetDirectoryName(input);
var parentDirectory = Directory.GetParent(directory);
The supplied C:\Dev\Branches\5.1 -> 5.1 does not conform to your specification, that is the directory name of the input path itself. This will output Branches.

new Regex("\\?" + PathToMatchEscaped + "\\(\w+)\\?").Match()...

I went with this
public static string GetBranchName(string path, string prefix)
{
string folder = Path.GetDirectoryName(path);
// Walk up the path until it ends with Dev\Branches
while (!String.IsNullOrEmpty(folder) && folder.Contains(prefix))
{
string parent = Path.GetDirectoryName(folder);
if (parent != null && parent.EndsWith(prefix))
return Path.GetFileName(folder);
folder = parent;
}
return null;
}

Related

Sanitizing a file path in C# without compromising the drive letter

I need to process some file paths in C# that potentially contain illegal characters, for example:
C:\path\something\output_at_13:26:43.txt
in that path, the :s in the timestamp make the filename invalid, and I want to replace them with another safe character.
I've searched for solutions here on SO, but they seem to be all based around something like:
path = string.Join("_", path.Split(Path.GetInvalidFileNameChars()));
or similar solutions. These solutions however are not good, because they screw up the drive letter, and I obtain an output of:
C_\path\something\output_at_13_26_43.txt
I tried using Path.GetInvalidPathChars() but it still doesn't work, because it doesn't include the : in the illegal characters, so it doesn't replace the ones in the filename.
So, after figuring that out, I tried doing this:
string dir = Path.GetDirectoryName(path);
string file = Path.GetFileName(path);
file = string.Join(replacement, file.Split(Path.GetInvalidFileNameChars()));
dir = string.Join(replacement, dir.Split(Path.GetInvalidPathChars()));
path = Path.Combine(dir, file);
but this is not good either, because the :s in the filename seem to interfere with the Path.GetFilename() logic, and it only returns the last piece after the last :, so I'm losing pieces of the path.
How do I do this "properly" without hacky solutions?
You can write a simple sanitizer that iterates each character and knows when to expect the colon as a drive separator. This one will catch any combination of letter A-Z followed directly by a ":". It will also detect path separators and not escape them. It will not detect whitespace at the beginning of the input string, so in case your input data might come with them, you will have to trim it first or modify the sanitizer accordingly:
enum ParserState {
PossibleDriveLetter,
PossibleDriveLetterSeparator,
Path
}
static string SanitizeFileName(string input) {
StringBuilder output = new StringBuilder(input.Length);
ParserState state = ParserState.PossibleDriveLetter;
foreach(char current in input) {
if (((current >= 'a') && (current <= 'z')) || ((current >= 'A') && (current <= 'Z'))) {
output.Append(current);
if (state == ParserState.PossibleDriveLetter) {
state = ParserState.PossibleDriveLetterSeparator;
}
else {
state = ParserState.Path;
}
}
else if ((current == Path.DirectorySeparatorChar) ||
(current == Path.AltDirectorySeparatorChar) ||
((current == ':') && (state == ParserState.PossibleDriveLetterSeparator)) ||
!Path.GetInvalidFileNameChars().Contains(current)) {
output.Append(current);
state = ParserState.Path;
}
else {
output.Append('_');
state = ParserState.Path;
}
}
return output.ToString();
}
You can try it out here.
You definitely should make sure that you only receive valid filenames.
If you can't, and you're certain your directory names will be, you could split the path the last backslash (assuming Windows) and reassemble the string:
public static string SanitizePath(string path)
{
var lastBackslash = path.LastIndexOf('\\');
var dir = path.Substring(0, lastBackslash);
var file = path.Substring(lastBackslash, path.Length - lastBackslash);
foreach (var invalid in Path.GetInvalidFileNameChars())
{
file = file.Replace(invalid, '_');
}
return dir + file;
}

Match string (most characters) using C#

I have a folder structure (the - represent a folder sometimes, folders within folders where they are indented)
I'm given a string value of "D130202" to match the correct folder, I'm using C#'s System.IO.Directory.GetDirectories(#"c:\", "", SearchOption.TopDirectoryOnly);
I don't know what to put into the empty string for the search pattern.
Before, I was searching through all the folders with SearchOption.AllDirectories until I matched "D130202" but it was taking a long time going through every folder within all the other folders because there are thousands of folders.
I would like to search from D as soon as that value is matched, the program goes into the other folder, finds D13, matches that value, goes into the D1302 folder and so on without unnecessarily searching through all the other folders.
But I cannot think how I would do this.
Any help would be much appreciated.
You have to search the TopDirectoryOnly recursively:
public string SearchNestedDirectory(string path, string name)
{
if (string.IsNullOrEmpty(name))
throw new ArgumentException("name");
return SearchNestedDirectoryImpl(path, name);
}
private string SearchNestedDirectoryImpl(string path, string name, int depth = 1)
{
if (depth > name.Length)
return null;
var result = Directory.GetDirectories(path, name.Substring(0, depth)).FirstOrDefault();
if (result == null)
return SearchNestedDirectoryImpl(path, name, depth + 1);
if (result != null && Regex.Replace(result, #".+\\", "") == name)
return result;
return SearchNestedDirectoryImpl(result, name, depth + 1);
}
Usage:
SearchNestedDirectory(#"c:\", "D130202");
Returns: the path, or null if the path cannot be found.
EDIT: fixed an issue that occurs when subfolder length is increased by more than 1
I would utilize Directory.Exists(path)
Build the path from D130202 as (with C:\ as root): C:\D\D13\D1302\D130202

Is it a drive path or another? Check with Regex

I would like to check whether it is a drive path or a "pol" path.
For this I have already written a small code, unfortunately, I always return true.
The regex expression may be incorrect \W?\w{1}:{1}[/]{1}. How do I do it right?The path names can always be different and do not have to agree with the pole path.
Thank you in advance.
public bool isPolPath(string path)
{
bool isPolPath= true;
// Pol-Path: /Buy/Toy/Special/Clue
// drive-Path: Q:\Buy/Special/Clue
Regex myRegex = new Regex(#"\W?\w{1}:{1}[/]{1}", RegexOptions.IgnoreCase);
Match matchSuccess = myRegex.Match(path);
if (matchSuccess.Success)
isPolPath= false;
return isPolPath;
}
You don't need regexes to achieve this. Use System.IO.Path.GetPathRoot. It returns X:\ (where X is the actual drive letter) if the given path contains drive letter and an empty string or slash otherwise.
new List<string> {
#"/Buy/Toy/Special/Clue",
#"q:\Buy/Special/Clue",
#"Buy",
#"/",
#"\",
#"q:",
#"q:/",
#"q:\",
//#"", // This throws an exception saying path is illegal
}.ForEach(
p => Console.WriteLine(Path.GetPathRoot(p))
);
/* This code outputs:
\
q:\
\
\
q:
q:\
q:\
*/
Therefore your check may look like this:
isPolPath = Path.GetPathRoot(path).Length < 2;
If you wish to make your code more foolproof and protect from exception when an empty string is passed, you need to decide if an empty (or null) string is a pol-path or drive path. Depending on the decision the check would be either
sPolPath = string.IsNullOrEmpty(path) || Path.GetPathRoot(path).Length < 2;
or
if (string.IsNullOrEmpty(path))
sPolPath = false;
else
sPolPath = Path.GetPathRoot(path).Length < 2;

Remove part of the full directory name?

I have a list of filename with full path which I need to remove the filename and part of the file path considering a filter list I have.
Path.GetDirectoryName(file)
Does part of the job but I was wondering if there is a simple way to filter the paths using .Net 2.0 to remove part of it.
For example:
if I have the path + filename equal toC:\my documents\my folder\my other folder\filename.exe and all I need is what is above my folder\ means I need to extract only my other folder from it.
UPDATE:
The filter list is a text box with folder names separated by a , so I just have partial names on it like the above example the filter here would be my folder
Current Solution based on Rob's code:
string relativeFolder = null;
string file = #"C:\foo\bar\magic\bar.txt";
string folder = Path.GetDirectoryName(file);
string[] paths = folder.Split(Path.DirectorySeparatorChar);
string[] filterArray = iFilter.Text.Split(',');
foreach (string filter in filterArray)
{
int startAfter = Array.IndexOf(paths, filter) + 1;
if (startAfter > 0)
{
relativeFolder = string.Join(Path.DirectorySeparatorChar.ToString(), paths, startAfter, paths.Length - startAfter);
break;
}
}
How about something like this:
private static string GetRightPartOfPath(string path, string startAfterPart)
{
// use the correct seperator for the environment
var pathParts = path.Split(Path.DirectorySeparatorChar);
// this assumes a case sensitive check. If you don't want this, you may want to loop through the pathParts looking
// for your "startAfterPath" with a StringComparison.OrdinalIgnoreCase check instead
int startAfter = Array.IndexOf(pathParts, startAfterPart);
if (startAfter == -1)
{
// path not found
return null;
}
// try and work out if last part was a directory - if not, drop the last part as we don't want the filename
var lastPartWasDirectory = pathParts[pathParts.Length - 1].EndsWith(Path.DirectorySeparatorChar.ToString());
return string.Join(
Path.DirectorySeparatorChar.ToString(),
pathParts, startAfter,
pathParts.Length - startAfter - (lastPartWasDirectory?0:1));
}
This method attempts to work out if the last part is a filename and drops it if it is.
Calling it with
GetRightPartOfPath(#"C:\my documents\my folder\my other folder\filename.exe", "my folder");
returns
my folder\my other folder
Calling it with
GetRightPartOfPath(#"C:\my documents\my folder\my other folder\", "my folder");
returns the same.
you could use this method to split the path by "\" sign (or "/" in Unix environments). After this you get an array of strings back and you can pick what you need.
public static String[] SplitPath(string path)
{
String[] pathSeparators = new String[]
{
Path.DirectorySeparatorChar.ToString()
};
return path.Split(pathSeparators, StringSplitOptions.RemoveEmptyEntries);
}

Why does Path.Combine not properly concatenate filenames that start with Path.DirectorySeparatorChar?

From the Immediate Window in Visual Studio:
> Path.Combine(#"C:\x", "y")
"C:\\x\\y"
> Path.Combine(#"C:\x", #"\y")
"\\y"
It seems that they should both be the same.
The old FileSystemObject.BuildPath() didn't work this way...
This is kind of a philosophical question (which perhaps only Microsoft can truly answer), since it's doing exactly what the documentation says.
System.IO.Path.Combine
"If path2 contains an absolute path, this method returns path2."
Here's the actual Combine method from the .NET source. You can see that it calls CombineNoChecks, which then calls IsPathRooted on path2 and returns that path if so:
public static String Combine(String path1, String path2) {
if (path1==null || path2==null)
throw new ArgumentNullException((path1==null) ? "path1" : "path2");
Contract.EndContractBlock();
CheckInvalidPathChars(path1);
CheckInvalidPathChars(path2);
return CombineNoChecks(path1, path2);
}
internal static string CombineNoChecks(string path1, string path2)
{
if (path2.Length == 0)
return path1;
if (path1.Length == 0)
return path2;
if (IsPathRooted(path2))
return path2;
char ch = path1[path1.Length - 1];
if (ch != DirectorySeparatorChar && ch != AltDirectorySeparatorChar &&
ch != VolumeSeparatorChar)
return path1 + DirectorySeparatorCharAsString + path2;
return path1 + path2;
}
I don't know what the rationale is. I guess the solution is to strip off (or Trim) DirectorySeparatorChar from the beginning of the second path; maybe write your own Combine method that does that and then calls Path.Combine().
I wanted to solve this problem:
string sample1 = "configuration/config.xml";
string sample2 = "/configuration/config.xml";
string sample3 = "\\configuration/config.xml";
string dir1 = "c:\\temp";
string dir2 = "c:\\temp\\";
string dir3 = "c:\\temp/";
string path1 = PathCombine(dir1, sample1);
string path2 = PathCombine(dir1, sample2);
string path3 = PathCombine(dir1, sample3);
string path4 = PathCombine(dir2, sample1);
string path5 = PathCombine(dir2, sample2);
string path6 = PathCombine(dir2, sample3);
string path7 = PathCombine(dir3, sample1);
string path8 = PathCombine(dir3, sample2);
string path9 = PathCombine(dir3, sample3);
Of course, all paths 1-9 should contain an equivalent string in the end. Here is the PathCombine method I came up with:
private string PathCombine(string path1, string path2)
{
if (Path.IsPathRooted(path2))
{
path2 = path2.TrimStart(Path.DirectorySeparatorChar);
path2 = path2.TrimStart(Path.AltDirectorySeparatorChar);
}
return Path.Combine(path1, path2);
}
I also think that it is quite annoying that this string handling has to be done manually, and I'd be interested in the reason behind this.
This is the disassembled code from .NET Reflector for Path.Combine method. Check IsPathRooted function. If the second path is rooted (starts with a DirectorySeparatorChar), return second path as it is.
public static string Combine(string path1, string path2)
{
if ((path1 == null) || (path2 == null))
{
throw new ArgumentNullException((path1 == null) ? "path1" : "path2");
}
CheckInvalidPathChars(path1);
CheckInvalidPathChars(path2);
if (path2.Length == 0)
{
return path1;
}
if (path1.Length == 0)
{
return path2;
}
if (IsPathRooted(path2))
{
return path2;
}
char ch = path1[path1.Length - 1];
if (((ch != DirectorySeparatorChar) &&
(ch != AltDirectorySeparatorChar)) &&
(ch != VolumeSeparatorChar))
{
return (path1 + DirectorySeparatorChar + path2);
}
return (path1 + path2);
}
public static bool IsPathRooted(string path)
{
if (path != null)
{
CheckInvalidPathChars(path);
int length = path.Length;
if (
(
(length >= 1) &&
(
(path[0] == DirectorySeparatorChar) ||
(path[0] == AltDirectorySeparatorChar)
)
)
||
((length >= 2) &&
(path[1] == VolumeSeparatorChar))
)
{
return true;
}
}
return false;
}
In my opinion this is a bug. The problem is that there are two different types of "absolute" paths. The path "d:\mydir\myfile.txt" is absolute, the path "\mydir\myfile.txt" is also considered to be "absolute" even though it is missing the drive letter. The correct behavior, in my opinion, would be to prepend the drive letter from the first path when the second path starts with the directory separator (and is not a UNC path). I would recommend writing your own helper wrapper function which has the behavior you desire if you need it.
Following Christian Graus' advice in his "Things I Hate about Microsoft" blog titled "Path.Combine is essentially useless.", here is my solution:
public static class Pathy
{
public static string Combine(string path1, string path2)
{
if (path1 == null) return path2
else if (path2 == null) return path1
else return path1.Trim().TrimEnd(System.IO.Path.DirectorySeparatorChar)
+ System.IO.Path.DirectorySeparatorChar
+ path2.Trim().TrimStart(System.IO.Path.DirectorySeparatorChar);
}
public static string Combine(string path1, string path2, string path3)
{
return Combine(Combine(path1, path2), path3);
}
}
Some advise that the namespaces should collide, ... I went with Pathy, as a slight, and to avoid namespace collision with System.IO.Path.
Edit: Added null parameter checks
From MSDN:
If one of the specified paths is a zero-length string, this method returns the other path. If path2 contains an absolute path, this method returns path2.
In your example, path2 is absolute.
This code should do the trick:
string strFinalPath = string.Empty;
string normalizedFirstPath = Path1.TrimEnd(new char[] { '\\' });
string normalizedSecondPath = Path2.TrimStart(new char[] { '\\' });
strFinalPath = Path.Combine(normalizedFirstPath, normalizedSecondPath);
return strFinalPath;
Reason:
Your second URL is considered an absolute path, and the Combine method will only return the last path if the last path is an absolute path.
Solution:
Just remove the leading slash / from your second Path (/SecondPath to SecondPath), and it would work as excepted.
Not knowing the actual details, my guess is that it makes an attempt to join like you might join relative URIs. For example:
urljoin('/some/abs/path', '../other') = '/some/abs/other'
This means that when you join a path with a preceding slash, you are actually joining one base to another, in which case the second gets precedence.
This actually makes sense, in some way, considering how (relative) paths are treated usually:
string GetFullPath(string path)
{
string baseDir = #"C:\Users\Foo.Bar";
return Path.Combine(baseDir, path);
}
// Get full path for RELATIVE file path
GetFullPath("file.txt"); // = C:\Users\Foo.Bar\file.txt
// Get full path for ROOTED file path
GetFullPath(#"C:\Temp\file.txt"); // = C:\Temp\file.txt
The real question is: Why are paths, which start with "\", considered "rooted"? This was new to me too, but it works that way on Windows:
new FileInfo("\windows"); // FullName = C:\Windows, Exists = True
new FileInfo("windows"); // FullName = C:\Users\Foo.Bar\Windows, Exists = False
I used aggregate function to force paths combine as below:
public class MyPath
{
public static string ForceCombine(params string[] paths)
{
return paths.Aggregate((x, y) => Path.Combine(x, y.TrimStart('\\')));
}
}
If you want to combine both paths without losing any path you can use this:
?Path.Combine(#"C:\test", #"\test".Substring(0, 1) == #"\" ? #"\test".Substring(1, #"\test".Length - 1) : #"\test");
Or with variables:
string Path1 = #"C:\Test";
string Path2 = #"\test";
string FullPath = Path.Combine(Path1, Path2.IsRooted() ? Path2.Substring(1, Path2.Length - 1) : Path2);
Both cases return "C:\test\test".
First, I evaluate if Path2 starts with / and if it is true, return Path2 without the first character. Otherwise, return the full Path2.
Remove the starting slash ('\') in the second parameter (path2) of Path.Combine.
These two methods should save you from accidentally joining two strings that both have the delimiter in them.
public static string Combine(string x, string y, char delimiter) {
return $"{ x.TrimEnd(delimiter) }{ delimiter }{ y.TrimStart(delimiter) }";
}
public static string Combine(string[] xs, char delimiter) {
if (xs.Length < 1) return string.Empty;
if (xs.Length == 1) return xs[0];
var x = Combine(xs[0], xs[1], delimiter);
if (xs.Length == 2) return x;
var ys = new List<string>();
ys.Add(x);
ys.AddRange(xs.Skip(2).ToList());
return Combine(ys.ToArray(), delimiter);
}
This \ means "the root directory of the current drive". In your example it means the "test" folder in the current drive's root directory. So, this can be equal to "c:\test".
As mentiond by Ryan it's doing exactly what the documentation says.
From DOS times, current disk, and current path are distinguished.
\ is the root path, but for the CURRENT DISK.
For every "disk" there is a separate "current path".
If you change the disk using cd D: you do not change the current path to D:\, but to: "D:\whatever\was\the\last\path\accessed\on\this\disk"...
So, in windows, a literal #"\x" means: "CURRENTDISK:\x".
Hence Path.Combine(#"C:\x", #"\y") has as second parameter a root path, not a relative, though not in a known disk...
And since it is not known which might be the «current disk», python returns "\\y".
>cd C:
>cd \mydironC\apath
>cd D:
>cd \mydironD\bpath
>cd C:
>cd
>C:\mydironC\apath

Categories