Does my code prevent directory traversal or is it overkill? - c#

I want to make sure this is enough to prevent directory traversal and also any suggestions or tips would be appreciated. The directory "/wwwroot/Posts/" is the only directory which is allowed.
[HttpGet("/[controller]/[action]/{name}")]
public IActionResult Post(string name)
{
if(string.IsNullOrEmpty(name))
{
return View("Post", new BlogPostViewModel(true)); //error page
}
char[] InvalidFilenameChars = Path.GetInvalidFileNameChars();
if (name.IndexOfAny(InvalidFilenameChars) >= 0)
{
return View("Post", new BlogPostViewModel(true));
}
DirectoryInfo dir = new DirectoryInfo(Path.Combine(Directory.GetCurrentDirectory(), "wwwroot/Posts"));
var userpath = Path.GetFullPath(Path.Combine(Directory.GetCurrentDirectory(), "wwwroot/Posts", name));
if (Path.GetDirectoryName(userpath) != dir.FullName)
{
return View("Post", new BlogPostViewModel(true));
}
var temp = Path.Combine(dir.FullName, name + ".html");
if (!System.IO.File.Exists(temp))
{
return View("Post", new BlogPostViewModel(true));
}
BlogPostViewModel model = new BlogPostViewModel(Directory.GetCurrentDirectory(), name);
return View("Post", model);
}

Probably, but I wouldn't consider it bulletproof. Let's break this down:
First you are black-listing known invalid characters:
char[] InvalidFilenameChars = Path.GetInvalidFileNameChars();
if (name.IndexOfAny(InvalidFilenameChars) >= 0)
{
return View("Post", new BlogPostViewModel(true));
}
This is a good first step, but blacklisting input is rarely enough. It will prevent certain control characters, but the documentation does not explicitly state that directory separators ( e.g. / and \) are included. The documentation states:
The array returned from this method is not guaranteed to contain the
complete set of characters that are invalid in file and directory
names. The full set of invalid characters can vary by file system.
Next, you attempt to make sure that after path.combine you have the expected parent folder for your file:
DirectoryInfo dir = new DirectoryInfo(Path.Combine(Directory.GetCurrentDirectory(), "wwwroot/Posts"));
var userpath = Path.GetFullPath(Path.Combine(Directory.GetCurrentDirectory(), "wwwroot/Posts", name));
if (Path.GetDirectoryName(userpath) != dir.FullName)
{
return View("Post", new BlogPostViewModel(true));
}
In theory, if the attacker passed in ../foo (and perhaps that gets past the blacklisting attempt above if / isn't in the list of invalid characters), then Path.Combine should combine the paths and return /somerootpath/wwwroot/foo. GetParentFolder would return /somerootpath/wwwroot which would be a non-match and it would get rejected. However, suppose Path.Combine concatenates and returns /somerootpath/wwwroot/Posts/../foo. In this case GetParentFolder will return /somerootpath/wwwRoot/Posts which is a match and it proceeds. Seems unlikely, but there may be control characters which get past GetInvalidFileNameChars() based on the documentation stating that it is not exhaustive which trick Path.Combine into something along these lines.
Your approach will probably work. However, if it is at all possible, I would strongly recommend you whitelist the expected input rather than attempt to blacklist all possible invalid inputs. For example, if you can be certain that all valid filenames will be made up of letters, numbers, and underscores, build a regular expression that asserts that and check before continuing. Testing for ^[A-Za-z0-0_]+$ would assert that and be 100% bulletproof.

Related

How to get all files ending with the extension "_\<fileNum>of\<totalFileNum>" and sometimes without? [duplicate]

a user specifies a file name that can be either in the form "<name>_<fileNum>of<fileNumTotal>" or simply "<name>". I need to somehow extract the "<name>" part from the full file name.
Basically, I am looking for a solution to the method "ExtractName()" in the following example:
string fileName = "example_File"; \\ This var is specified by user
string extractedName = ExtractName(fileName); // Must return "example_File"
fileName = "example_File2_1of5";
extractedName = ExtractName(fileName); // Must return "example_File2"
fileName = "examp_File_3of15";
extractedName = ExtractName(fileName); // Must return "examp_File"
fileName = "example_12of15";
extractedName = ExtractName(fileName); // Must return "example"
Edit: Here's what I've tried so far:
ExtractName(string fullName)
{
return fullName.SubString(0, fullName.LastIndexOf('_'));
}
But this clearly does not work for the case where the full name is just "<name>".
Thanks
This would be easier to parse using Regex, because you don't know how many digits either number will have.
var inputs = new[]
{
"example_File",
"example_File2_1of5",
"examp_File_3of15",
"example_12of15"
};
var pattern = new Regex(#"^(.+)(_\d+of\d+)$");
foreach (var input in inputs)
{
var match = pattern.Match(input);
if (!match.Success)
{
// file doesn't end with "#of#", so use the whole input
Console.WriteLine(input);
}
else
{
// it does end with "#of#", so use the first capture group
Console.WriteLine(match.Groups[1].Value);
}
}
This code returns:
example_File
example_File2
examp_File
example
The Regex pattern has three parts:
^ and $ are anchors to ensure you capture the entire string, not just a subset of characters.
(.+) - match everything, be as greedy as possible.
(_\d+of\d+) - match "_#of#", where "#" can be any number of consecutive digits.

Turn A Full Path Into A Path With Environment Variables

I want to turn a full path into an environment variable path using c#
Is this even possible?
i.e.
C:\Users\Username\Documents\Text.txt -> %USERPROFILE%\Documents\Text.txt
C:\Windows\System32\cmd.exe -> %WINDIR%\System32\cmd.exe
C:\Program Files\Program\Program.exe -> %PROGRAMFILES%\Program\Program.exe
It is possible by going over all environment variables and checking which variable's value is contained in the string, then replacing that part of the string with the corresponding variable name surrounded by %.
First naive attempt:
string Tokenify(string path)
{
foreach (DictionaryEntry e in Environment.GetEnvironmentVariables())
{
int index = path.IndexOf(e.Value.ToString());
if (index > -1)
{
//we need to make sure we're not already inside a tokenized part.
int numDelimiters = path.Take(index).Count(c => c == '%');
if (numDelimiters % 2 == 0)
{
path = path.Replace(e.Value.ToString(), $"%{e.Key.ToString()}%");
}
}
}
return path;
}
The code currently makes a faulty assumption that the environment variable's value appears only once in the path. This needs to be corrected, but let's put that aside for now.
Also note that not all environment variables represent directories. For example, if I run this method on the string "6", I get "%PROCESSOR_LEVEL%". This can be remedied by checking for Directory.Exists() on the environment variable value before using it. This will probably also invalidate the need for checking whether we are already in a tokenized part of the string.
You may also want to sort the environment variables by length so to always use the most specific one. Otherwise you can end up with:
%HOMEDRIVE%%HOMEPATH%\AppData\Local\Folder
instead of:
%LOCALAPPDATA%\Folder
Updated code that prefers the longest variable:
string Tokenify(string path)
{
//first find all the environment variables that represent paths.
var validEnvVars = new List<KeyValuePair<string, string>>();
foreach (DictionaryEntry e in Environment.GetEnvironmentVariables())
{
string envPath = e.Value.ToString();
if (System.IO.Directory.Exists(envPath))
{
//this would be the place to add any other filters.
validEnvVars.Add(new KeyValuePair<string, string>(e.Key.ToString(), envPath));
}
}
//sort them by length so we always get the most specific one.
//if you are dealing with a large number of strings then orderedVars can be generated just once and cached.
var orderedVars = validEnvVars.OrderByDescending(kv => kv.Value.Length);
foreach (var kv in orderedVars)
{
//using regex just for case insensitivity. Otherwise just use string.Replace.
path = Regex.Replace(path, Regex.Escape(kv.Value), $"%{kv.Key}%", RegexOptions.IgnoreCase);
}
return path;
}
You may still want to add checks to avoid double-tokenizing parts of the string, but that is much less likely to be an issue in this version.
Also you might want to filter out some variables like drive roots, e.g. (%HOMEDRIVE%) or by any other criteria.

Is it a drive path or another? Check with Regex

I would like to check whether it is a drive path or a "pol" path.
For this I have already written a small code, unfortunately, I always return true.
The regex expression may be incorrect \W?\w{1}:{1}[/]{1}. How do I do it right?The path names can always be different and do not have to agree with the pole path.
Thank you in advance.
public bool isPolPath(string path)
{
bool isPolPath= true;
// Pol-Path: /Buy/Toy/Special/Clue
// drive-Path: Q:\Buy/Special/Clue
Regex myRegex = new Regex(#"\W?\w{1}:{1}[/]{1}", RegexOptions.IgnoreCase);
Match matchSuccess = myRegex.Match(path);
if (matchSuccess.Success)
isPolPath= false;
return isPolPath;
}
You don't need regexes to achieve this. Use System.IO.Path.GetPathRoot. It returns X:\ (where X is the actual drive letter) if the given path contains drive letter and an empty string or slash otherwise.
new List<string> {
#"/Buy/Toy/Special/Clue",
#"q:\Buy/Special/Clue",
#"Buy",
#"/",
#"\",
#"q:",
#"q:/",
#"q:\",
//#"", // This throws an exception saying path is illegal
}.ForEach(
p => Console.WriteLine(Path.GetPathRoot(p))
);
/* This code outputs:
\
q:\
\
\
q:
q:\
q:\
*/
Therefore your check may look like this:
isPolPath = Path.GetPathRoot(path).Length < 2;
If you wish to make your code more foolproof and protect from exception when an empty string is passed, you need to decide if an empty (or null) string is a pol-path or drive path. Depending on the decision the check would be either
sPolPath = string.IsNullOrEmpty(path) || Path.GetPathRoot(path).Length < 2;
or
if (string.IsNullOrEmpty(path))
sPolPath = false;
else
sPolPath = Path.GetPathRoot(path).Length < 2;

Find unbounded file paths in string

I have these error messages generated by a closed source third party software from which I need to extract file paths.
The said file paths are :
not bounded (i.e. not surrounded by quotation marks, parentheses, brackets, etc)
rooted (i.e. start with <letter>:\ such as C:\)
not guaranteed to have a file extension
representing files (only files, not directories) that are guaranteed to exist on the computer running the extraction code.
made of any valid characters, including spaces, making them hard to spot (e.g. C:\This\is a\path \but what is an existing file path here)
To be noted, there can be 0 or more file paths per message.
How can these file paths be found in the error messages?
I've suggested an answer below, but I have a feeling that there is a better way to go about this.
For each match, look forward for the next '\' character. So you might get "c:\mydir\". Check to see if that directory exists. Then find the next \, giving "c:\mydir\subdir`. Check for that path. Eventually you'll find a path that doesn't exist, or you'll get to the start of the next match.
At that point, you know what directory to look in. Then just call Directory.GetFiles and match the longest filename that matches the substring starting at the last path you found.
That should minimize backtracking.
Here's how this could be done:
static void FindFilenamesInMessage(string message) {
// Find all the "letter colon backslash", indicating filenames.
var matches = Regex.Matches(message, #"\w:\\", RegexOptions.Compiled);
// Go backwards. Useful if you need to replace stuff in the message
foreach (var idx in matches.Cast<Match>().Select(m => m.idx).Reverse()) {
int length = 3;
var potentialPath = message.Substring(idx, length);
var lastGoodPath = potentialPath;
// Eat "\" until we get an invalid path
while (Directory.Exists(potentialPath)) {
lastGoodPath = potentialPath;
while (idx+length < message.Length && message[idx+length] != '\\')
length++;
length++; // Include the trailing backslash
if (idx + length >= message.Length)
length = (message.Length - idx) - 1;
potentialPath = message.Substring(idx, length);
}
potentialPath = message.Substring(idx);
// Iterate over the files in directory we found until we get a match
foreach (var file in Directory.EnumerateFiles(lastGoodPath)
.OrderByDescending(s => s.Length)) {
if (!potentialPath.StartsWith(file))
continue;
// 'file' contains a valid file name
break;
}
}
}
This is how I would do it.
I don't think substringing the message over and over is a good idea however.
static void FindFilenamesInMessage(string message)
{
// Find all the "letter colon backslash", indicating filenames.
var matches = Regex.Matches(message, #"\w:\\", RegexOptions.Compiled);
int length = message.Length;
foreach (var index in matches.Cast<Match>().Select(m => m.Index).Reverse())
{
length = length - index;
while (length > 0)
{
var subString = message.Substring(index, length);
if (File.Exists(subString))
{
// subString contains a valid file name
///////////////////////
// Payload goes here
//////////////////////
length = index;
break;
}
length--;
}
}
}

Filtering file names: getting *.abc without *.abcd, or *.abcde, and so on

Directory.GetFiles(LocalFilePath, searchPattern);
MSDN Notes:
When using the asterisk wildcard character in a searchPattern, such as ".txt", the matching behavior when the extension is exactly three characters long is different than when the extension is more or less than three characters long. A searchPattern with a file extension of exactly three characters returns files having an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern. A searchPattern with a file extension of one, two, or more than three characters returns only files having extensions of exactly that length that match the file extension specified in the searchPattern. When using the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files, "file1.txt" and "file1.txtother", in a directory, a search pattern of "file?.txt" returns just the first file, while a search pattern of "file.txt" returns both files.
The following list shows the behavior of different lengths for the searchPattern parameter:
*.abc returns files having an extension of .abc, .abcd, .abcde, .abcdef, and so on.
*.abcd returns only files having an extension of .abcd.
*.abcde returns only files having an extension of .abcde.
*.abcdef returns only files having an extension of .abcdef.
With the searchPattern parameter set to *.abc, how can I return files having an extension of .abc, not .abcd, .abcde and so on?
Maybe this function will work:
private bool StriktMatch(string fileExtension, string searchPattern)
{
bool isStriktMatch = false;
string extension = searchPattern.Substring(searchPattern.LastIndexOf('.'));
if (String.IsNullOrEmpty(extension))
{
isStriktMatch = true;
}
else if (extension.IndexOfAny(new char[] { '*', '?' }) != -1)
{
isStriktMatch = true;
}
else if (String.Compare(fileExtension, extension, true) == 0)
{
isStriktMatch = true;
}
else
{
isStriktMatch = false;
}
return isStriktMatch;
}
Test Program:
class Program
{
static void Main(string[] args)
{
string[] fileNames = Directory.GetFiles("C:\\document", "*.abc");
ArrayList al = new ArrayList();
for (int i = 0; i < fileNames.Length; i++)
{
FileInfo file = new FileInfo(fileNames[i]);
if (StriktMatch(file.Extension, "*.abc"))
{
al.Add(fileNames[i]);
}
}
fileNames = (String[])al.ToArray(typeof(String));
foreach (string s in fileNames)
{
Console.WriteLine(s);
}
Console.Read();
}
Anybody else better solution?
The answer is that you must do post filtering. GetFiles alone cannot do it. Here's an example that will post process your results. With this you can use a search pattern with GetFiles or not - it will work either way.
List<string> fileNames = new List<string>();
// populate all filenames here with a Directory.GetFiles or whatever
string srcDir = "from"; // set this
string destDir = "to"; // set this too
// this filters the names in the list to just those that end with ".doc"
foreach (var f in fileNames.All(f => f.ToLower().EndsWith(".doc")))
{
try
{
File.Copy(Path.Combine(srcDir, f), Path.Combine(destDir, f));
}
catch { ... }
}
Not a bug, perverse but well-documented behavior. *.doc matches *.docx based on 8.3 fallback lookup.
You will have to manually post-filter the results for ending in doc.
use linq....
string strSomePath = "c:\\SomeFolder";
string strSomePattern = "*.abc";
string[] filez = Directory.GetFiles(strSomePath, strSomePattern);
var filtrd = from f in filez
where f.EndsWith( strSomePattern )
select f;
foreach (string strSomeFileName in filtrd)
{
Console.WriteLine( strSomeFileName );
}
This won't help in the short term, but voting on the MS Connect post for this issue may get things changed in the future.
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=95415
Since for "*.abc" GetFiles will return extensions of 3 or more, anything with a length of 3 after the "." is an exact match, and anything longer is not.
string[] fileList = Directory.GetFiles(path, "*.abc");
foreach (string file in fileList)
{
FileInfo fInfo = new FileInfo(file);
if (fInfo.Extension.Length == 4) // "." is counted in the length
{
// exact extension match - process the file...
}
}
Not sure of the performance of the above - while it uses simple length comparisons rather than string manipulations, new FileInfo() is called each time around the loop.

Categories