Drilling into a drive and changing folder names & document names - c#

I have a drive on my computer that has folder--some of which have lots of folders w/in other folders and also contain files.
I need to migrate these docs into sharepoint, however a LOT of these folders and files have invalid characters that will not allow me to migrate into sharepoint (i.e. " / \ &, etc.)
Is there any way to write something in C# that basically removes these invalid characters from all folders and files?
Please help!

Yes. A way to do this would be to drill through the directory structure recursively, and for each file name, check if it is valid, and if it is not valid, make a valid filename, and upload to Sharepoint.
You can make a Regex that matches all disallowed characters, and replace them with an allowed character, such as underscore. If you need names to be unique, and you are worried you might create duplicate names with this approach, store all names that has been used (ie. uploaded to Sharepoint) in something like a HashSet, and check that before using the generated name. If the name already exists, you can add a pre- or suffix, flag for human intervention, or do something else, depending on your requirements.

Related

How to store local directory path in database?

I'm working on restAPI project, I have to get list of directories paths, allow the user to choose one and save it in database. I created string variable and wanted to assign selected path to it, but when testing from postman I can't assign full path (e.g C:\dev\data) to string variable (receive bad string format). So I would like to know, what is the best way to store path in db, should I store it without C:\, and if so, how to take directory path without C:\?
The path "C:\dev\data" will give errors since backslashes are taken as escape sequences. If you need to store the whole path, you should replace the backslash with double backslash for it to work
"C:\\dev\\data\\name_of_file"
You can store this string in the database.
It's best however, to store the main root which in your case is C:\dev\data in a configuration file and just store the file name bit in the DB. To fetch the file from code, you read the folder root from the configuration file and just append the name of your file to it.
Hope this helps.
As far as I'm aware of, there is no such thing as limitation on what string you can save to certain cell in a DB (in terms of characters). It means, you must have made some mistake along the way.
Make sure first that you can pass chosen path to the back-end. (I assume it is Web-App you work on). Place breakpoint at the action in the controller and check if you receive the string in the first place.
Then step-by-step I'd suggest to move down the happy-path and make sure each step works as expected.
This way you'll easily locate your error.
Note: It feels like there is high chance of non-escaped characters in the string.
Make sure chars are escaped where applicable.

How do I programmatically (c#) validate drive letters for international cultures such as Chinese?

Background:
I am trying to validate a path that may not actually exist. So, I can't validate it using "Directory.Exists()" method. Also, the code may be run on a different machine.I just want to know what's valid and what's invalid. I researched and found that this is lot more complicated than I originally imagined. It would have been lot simpler if Microsoft had given a method for this.
I soon realised that methods such as Path.GetInvalidPathChars() and Path.GetInvalidFileNameChars() have very limited value. If you validate using GetInvalidPathChars(), it allows letters such as "*" and "?", which are not actually valid as directory name or file name. GetInvalidFileNameChars() is a better option. But, it doesn't allow colon (':'), which is actually a valid character only if it is immediately after the drive letter. Besides, there are few more rules that dictate that a folder name cannot have all characters as dots ('.'). Similarly, a directory cannot have some reserved words such as LPT1 to LPT9 or COM1 to COM9. The complete list of rules is documented here: https://msdn.microsoft.com/en-us/library/aa365247.aspx?f=255&MSPPError=-2147217396
So, to validate the path, I am splitting the path into 2 parts -
root part using Path.GetPathRoot() method and validate that separately
the remaining path - This is further split using Path.DirectorySeparatorChar and then I validate each directory name individually using a complex algorithm. (Once complete, I will post that code later).
Main Question
I want to validate root path by checking that the drive letter is valid. I know you can only have drives from A: to Z: on English cultures. But how can I validate that for international cultures such as Chinese or Japanese? I couldn't find any documentation for list of valid drive letters for German or Chinese machines. Do those systems allow other Unicode characters as drive letters?
It's simple: A-Z are the only valid drive letters, no matter what the culture is.
But you can of course have a path without a drive letter.
In general you don't need to worry about the culture when dealing with path names. A folder uses Unicode and is accessible to all users regardless of locale.
But there are a lot of other very big problems to solve.
I would say: this is not possible.
A full explanation why is to much for a Stackoverflow answer.
But think about this: I might have a network share share mapped to a drive letter. That might be a Linux or Mac folder shared with Samba. So you sundenly have to account for all the limitations of those OSes, the exact file system on that machine, and the limitations of the sharing protocol.
Or think a Windows machine sharing a FAT32 file system. Or even a FAT16 one, that does not support Unicode, nor long file names.
There are many other problems, even for local drives.
But, as I said, too much for this answer.

File.Exist returning true when path has multiple backslashes

I have an application that accepts user input and does stuff to files. The users select a file and it might move it, delete it, rename it, ftp it etc. The application uses a hash table to store recently used files and their paths.
The main problem I am looking into now is one of the add-ins is saving the path incorrectly, it is saving it as such: C:\David\\File.txt
The part of the application that deals with file io tries to ensure the file exists prior to doing stuff with a File.Exists(path) call. This call is returning true even for the above example. Can anyone explain why this might be?
The issue I am facing is, beyond one module saving the path incorrectly, certain modules that interact with the file are accepting that incorrect path and working fine while others see it and crash. Although currently I am going to fix this by getting the path saved correctly, I'd like to understand what is going on here.
You have a false premise: that C:\David\\File.txt is an invalid path. Multiple backslashes are accepted fine in Windows. Try notepad C:\David\\File.txt in a command prompt as an experiment--it should work.
For more info, see this other SO q/a that reaffirms this. Any number of backslashes are fine, and this can be used as a "easy" way to combine paths without worrying about the number of backslashes. For example, the user can provide C:\David or C:\David\ and you can add \test.txt without worrying which input the user provided. However, Path.Combine is the real way to do this in C#.
Edit: To remove your extra \'s easily before passing the path into the other program, try splitting the path into the drive and folder names and combining it back together into a path. Like this:
string path = Path.Combine(pathWithManyBackslashes.Split('\\'));
Because Split doesn't create new entries when the delimiter repeats, you get rid of them. For example, C:\David\\File.txt => [C:, David, File.txt].

Finding specific directories in C#.NET?

We have a concret directiory (e.g. "C:\personal\app\cherry\") where while the runtime of another application a folder with 2 significant information in its name will be generated randomly.
One of those information will remain constant, everytime the folder is generated. As well the folder will be removed while runtime too, but this is not really relevant in this case.
So there would be a folder with two information split with a simple dot.
Example: \oskdfo.chips\
Where oskdfo is the randomly generated part, and chips will be the constant.
So the constant is the information we need to find this specific directory, hence the other information will never remain the same, a uncommon way to find the actual position of this directory is needed here.
So now I'm searching for a procedure to find this directory with this specific format inside a given path, where also all subdirectories should be included for the search.
You never said if the directory is created under your application path or if you want to search the entire hard drive.
Anyway you should use Directory.GetDirectories method to search for it. The return value is an array with all directories that can be found in the specified path.
You can get all folder in the app path by using the following:
var folders = Directory.GetDirectories(AppDomain.CurrentDomain.BaseDirectory)
With LINQ you can narrow it down:
var folders = Directory
.GetDirectories(AppDomain.CurrentDomain.BaseDirectory)
.Where(folder => folder.Contains("usuall")
.ToList();

Globally Unique IDs for files in Windows

I'm wondering how to get globally unique IDs for files and folders in Windows (XP, Vista and 7), and also be able to get the full path of the file or folder just by having the ID, something like getFileByGUID. I'm trying to do this in C++, C# and PHP.
The globally unique IDs should stay the same even if the file is moved, so using the full path of the file or folder wouldn't work.
Any help would be much appreciated, thanks!
You may consider using the Distributed Link Tracking Service.
Subject to the caveats mentioned in the page for BY_HANDLE_INFORMATION, GetFileInformationByHandle might be helpful, depending on what the goal is.
This won't let one retrieve the file's name, though. Due to NTFS hard links there may be more than one path to the same file contents anyway...
You could hash together information about the file, such as its metadata and/or contents. It would be difficult to do this on an entire file system without collisions, but I assume you're not trying to index the whole file system. This wouldn't work if you need files to retain their IDs if they're modified, though.

Categories