How to match two paths pointing to the same file?

How to match two paths pointing to the same file? - c#

I have two lists containing paths to a directory of music files and I want to determine which of these files are stored on both lists and which are only stored on one. The problem lies in that the format of the paths differ between the two lists.
Format example:
List1: file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3
List2: \\FILE\Musik\30 Seconds To Mars.mp3
How do I go about comparing these two file paths and matching them to the same source?

The answer depends on your notion of "same file". If you merely want to check if the file is equal, but not the very same file, you could simply generate a hash over the file's content and compare that. If the hashes are equal (please use a strong hash, like SHA-256), you can be confident that the files are also. Likewise you could of course also compare the files byte by byte.
If you really want to figure that the two files are actually the same file, i.e. just addressed via different means (like file-URL or UNC path), you have a little more work to do.
First you need to find out the true file system path for each of the addresses. For example, you need to find the file system path behind the UNC path and/or file-URL (which typically is the URL itself). In the case of UNC paths, that are shares on a remote computer, you might even be able to do so.
Also, even if you have the local path figured out somehow, you also need to deal with different redirection mechanisms for local paths (on Windows junctions/reparse points/links; on UNIX symbolic or hard links). For example, you could have a share using file system link as source, while the file URL uses the true source path. So to the casual observer they still look like different files.
Having all that said, the "algorithm" would be something like this:
Figure out the source path for the URLs, UNC paths/shares, etc. you have
Figure out the local source path from those paths (considering links/junctions, subst.exe, etc.)
Normalize those paths, if necessary (i.e. a/b/../c is actually a/c)
Compare the resulting paths.

I think the best way to do it is by temporarily converting one of the paths to the other one's format. I would suggest you change the first to match the second.
string List1 = "file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3"
string List2 = "\\FILE\Musik\30 Seconds To Mars.mp3"
I would recommend you use Replace()-method.
Get rid of "file://localhost":
var tempStr = List1.Replace("file://localhost", "");
Change all '%20' into spaces:
tempStr = List1.Replace("%20", " ");
Change all '/' into '\':
tempStr = List1.Replace("/", "\");
Voilà! To strings in matching format!

Use python: you can easily compare the two files like this
>>> import filecmp
>>> filecmp.cmp('file1.txt', 'file1.txt')
True
>>> filecmp.cmp('file1.txt', 'file2.txt')
False
to open the files with the file:// syntax use URLLIB
>>> import urllib
>>> file1 = urllib.urlopen('file://localhost/tmp/test')
for the normal files path use the standard file open.
>>> file2 = open('/pathtofile','r')

I agree completely with Christian, you should re-think structure of the lists, but the below should get you going.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication5
{
class Program
{
public static List<string> SanitiseList(List<string> list)
{
List<string> sanitisedList = new List<string>();
foreach (string filename in list)
{
String sanitisedFilename = String.Empty;
if (!String.IsNullOrEmpty(filename))
{
sanitisedFilename = filename;
// get rid of the encoding
sanitisedFilename = Uri.UnescapeDataString(sanitisedFilename);
// first of all change all back-slahses to forward slashes
sanitisedFilename = sanitisedFilename.Replace(#"\", #"/");
// if we have two back-slashes at the beginning assume its localhsot
if (sanitisedFilename.Substring(0, 2) == "//")
{
// remove these first double slashes and stick in localhost
sanitisedFilename = sanitisedFilename.TrimStart('/');
sanitisedFilename = sanitisedFilename = "//localhost" + "/" + sanitisedFilename;
}
// remove file
sanitisedFilename = sanitisedFilename.Replace(#"file://", "//");
// remove double back-slashes
sanitisedFilename = sanitisedFilename.Replace("\\", #"\");
// remove double forward-slashes (but not the first two)
sanitisedFilename = sanitisedFilename.Substring(0,2) + sanitisedFilename.Substring(2, sanitisedFilename.Length - 2).Replace("//", #"/");
}
if (!String.IsNullOrEmpty(sanitisedFilename))
{
sanitisedList.Add(sanitisedFilename);
}
}
return sanitisedList;
}
static void Main(string[] args)
{
List<string> listA = new List<string>();
List<string> listB = new List<string>();
listA.Add("file://localhost//FILE/Musik/BritneySpears.mp3");
listA.Add("file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3");
listB.Add("file://localhost//FILE/Musik/120%20Seconds%20To%20Mars.mp3");
listB.Add(#"\\FILE\Musik\30 Seconds To Mars.mp3");
listB.Add(#"\\FILE\Musik\5 Seconds To Mars.mp3");
listA = SanitiseList(listA);
listB = SanitiseList(listB);
List<string> missingFromA = listB.Except(listA).ToList();
List<string> missingFromB = listA.Except(listB).ToList();
}
}
}

Related

c# - Remove old Files from FileInfo list

I have a File-Info-List of more than 200 log-files from a directory.
Most of the files need to be in the list, but there are a few lists that should be ignored.
Here is an example of the File-List:
A300a1_ContentLink.log
A301a20_ContentLink.log
A1_4a0_ContentLink.log
B200a101_ContentLink.log
B200a101_ContentLink_20221208_115905.log
B200a101_ContentLink_20221208_115907.log
B200a101_ContentLink_20221208_120647.log
B201a1_ContentLink.log
B202a0_ContentLink.log
Explanation of the file name:
The first chars refer to a room (e.g. room A300 or A1). A room could have any description, eg B200, CXS2 or only CDD, the next to a device-name (e.g. device a1 oder device a20). Each device starts with a, followed by 1-3 digits. Last part of each file is "_ContentLink" .
All files with further ending, like _202211208_115905 are duplicates of older versions, that are needed in other programs, but not in my List.
My problem is that I only need the newest File of each logfile in my File-Info-List.
I initialized a FileInfo[] allFiles that contains all of the files of the directory.
Next I initialized a new FileInfo[] in which I would like to store only the newest version of each file.
My first attempt was to compare the LastWrite time
FileInfo currentFile = allFiles[0];
foreach (FileInfo file in allFiles)
{
if (file.LastWriteTime > currentFile.LastWriteTime)
{
currentFile = file;
}
}
But I only get back the latest file of the whole folder.
Now, I am thinking about to use Regular Expressions insteadt of .LastWriteTime, to exclude all Files that have a suffix after ContentLink.
But I don't know how and how to remove the outdated files from the list with all files (or transfer only the relevatn to a new File Info[]-List)
Thank you in advance for your ideas.

You can use a LINQ query to:
extract the name and time part from each file name
group the files by name and
select the latest (maximum) file by time
Something like :
var regex=new Regex("^(.*?)_ContentLink(.*?).log");
var latest=allFiles.Select(f=>{
var parts=regex.Match(f.Name);
return new {
File=f,
Name=parts.Groups[1].ToString(),
Date=parts.Groups[2].ToString()
};
})
.GroupBy(f=>f.Name)
.Select(g=>g.MaxBy(f=>f.Date).File)
.ToArray();
foreach(var file in latest)
{
Console.WriteLine(file.Name);
}
This produces
A300a1_ContentLink.log
A301a20_ContentLink.log
A1_4a0_ContentLink.log
B200a101_ContentLink_20221208_120647.log
B201a1_ContentLink.log
B202a0_ContentLink.log
MaxBy was added in .NET 6. Before that you can use the equivalent method from the MoreLINQ library.
The regular expression captures the smallest possible string before _ContentLink in the first group (.*?) and the smallest possible date part in the second group.
You could get a bit fancier and use different regular expressions to capture the name and time part. Combined with local functions, this results in a somewhat cleaner query:
var nameRex=new Regex("^(.*?)_ContentLink.*.log");
var timeRex=new Regex("^.*_ContentLink(.*?).log");
string NamePart(FileInfo f)
{
return nameRex.Match(f.Name).Groups[1].ToString();
}
string TimePart(FileInfo f)
{
return timeRex.Match(f.Name).Groups[1].ToString();
}
var latest=allFiles
.GroupBy(NamePart)
.Select(g=>g.MaxBy(TimePart))
.ToArray();

Process.Start(FilePath) with commas

When searching a directory for files of a specific name driven by the _fileToSearch parameter, I then create a custom list of DrawingFound and store the files path in a string called FileDirectory.
I then require on a button click OpenDrawing() for the file stored within FileDirectory to open to the user. This works in most cases, however, if the path has a , for example then the explorer defaults to opening the users documents directory. How can I handle commas within a file path to achieve the desired outcome?
public partial class DrawingFound
{
public string DrawingName;
public string FileType;
public string FileDirectory;
public string Revision;
public void OpenDrawing()
{
Process.Start("Explorer.exe", FileDirectory);
}
}
public void GetDrawings()
{
string _searchFolder = #"C:\Users\ThisUser\Documents";
string _fileToSearch = "Example of file, where a comma is used.txt";
ObservableCollection<DrawingFound> _drawings = new();
DirectoryInfo dirInfo = new(_searchFolder);
FileInfo[] files = dirInfo.GetFiles($"*{_fileToSearch}*", SearchOption.AllDirectories);
foreach (FileInfo file in files)
{
if (!_drawings.Any(item => $"{item.DrawingName}{item.FileType}" == file.Name))
{
_drawings.Add(new DrawingFound
{
DrawingName = Path.GetFileNameWithoutExtension(file.Name),
FileType = file.Extension,
FileDirectory = file.FullName,
Revision = "- Ignore -"
});
}
}
}

depending on your OS, you may need to use "escaping"
For example, to store a string one "two" three in a literal delimited with quotation marks, you need to escape the quotation marks. Depending on the language and environment, the "escape character" can be e.g. a \
in this example:
foo = "one \"two\" three"
I hope this helps; otherwise, please be more specific about your language, OS, e.t.c.

Thank you to everyone for your assistance with this matter. I managed to fix the issue and the operation now works as expected. #George Rey following your example I added the escape characters to achieve this:
Process.Start("explorer.exe", $"\"{FileDirectory}\"");

After you did your edits, the problem is more clear:
I guess that your OS is windows.
The problem is not with the comma but rather with the space.
The system treats the characters before the space as "file path" and the rest as "parameters." This is for historical reasons.
wrap the entire path in "embedded quotes" so that it is clear to the OS that the entire string is a path. This should prevent it from trying to elide the command parameters out of that string

C# Split File Name Beginner Exercise

I have a directory filled with multiple excel files that I would like to rename. The names all have leading integers and a '-'. For example: 0123456-Test_01. I would like to rename all of the files within this directory by removing this prefix. 0123456-Test_01 should just be Test_01. I can rename a hard coded instance of a string, but am having trouble getting the files and renaming all of them.
My code is below. Any help is appreciated, as I am clearly new to C#.
public static void Main()
{
//Successfully splits hardcoded string
var temp = "0005689-Test_01".Split('-');
Console.WriteLine(temp[1]);
Console.ReadLine();
//Unsuccessful renaming of all files within directory
List<string> files = System.IO.Directory.GetFiles(#"C:\Users\acars\Desktop\B", "*").ToList();
System.IO.File.Move(#"C:\Users\acars\Desktop\B\", #"C:\Users\acars\Desktop\B\".Split('-'));
foreach (string file in files)
{
var temp = files.Split('-');
return temp[1];
};
}

There are some errors to fix in your code.
The first one is the wrong usage of the variable files. This is the full list of files, not the single file that you want to split and move. As explained comments you should use the iterator result stored in the variable file
The most important problem is the fact that the File.Move method throws an exception if the destination file exists. After removing the first part of your filename string, you cannot be sure that the resulting name is unique in your directory.
So a check for the existance of the file before the Move is mandatory.
Finally, it is better use Directory.EnumerateFiles because this method allows you to start the execution of your moving code without loading first all filenames in memory in a list. (In a folder full of files this could make a noticeable difference in speed)
public static void Main()
{
string workPath = #"C:\Users\acars\Desktop\B";
foreach (string file in Directory.EnumerateFiles(workPath)
{
string[] temp = file.Split('-');
if(temp.Length > 1)
{
string newName = Path.Combine(workPath, temp[1]);
if(!File.Exists(newName))
File.Move(file, newName);
}
}
}
Pay also attention to the comment below from CodeNotFound. You are using an hard-coded path so the problem actually doesn't exist, but if the directory contains a single "-" in its name then you should use something like this to get the last element in the splitted array
string newName = Path.Combine(workPath, temp[temp.Length-1]);

Retrieving a Single File and the Behavior of Directory.GetFiles C#

public int RunStageData(string rootDirectory, stringdataFolder)
{
string[] files = new string[] { };
files = Directory.GetFiles(rootDirectory + dataFolder);
string[] tableOrder = new string[] { };
tableOrder = Directory.GetFiles(#"C:\_projects\ExampleProject\src", "TableOrder.txt");
System.IO.StreamReader tableOrderReader = new System.IO.StreamReader(tableOrder[0]);
for (int count = 0; count < files.Length; count++)
{
string currentTableName =tableOrderReader.ReadLine();
//files[count] = Directory.GetFiles(#"C:\_projects\ExampleProject\src", currentTableName);
}
}
Hi everyone, sorry if my code is a bit sloppy. I'm having an issue primarily with the line I have commented out. So basically what I'm trying to do here is to populate a string array of file names based on the ordering of these names in a txt file. So I read the first line from the txt file, then retrieve the name of that file in the directory(assuming it exists) and put it in the first spot of the array, then move on.
For Example if the txt file had these words in the following order:
Dog
Sheep
Cat
I would want the array to have Dog first, then Sheep, then Cat. My issue is that the line that I have commented gives me an error that says "Error 41 Cannot implicitly convert type 'string[]' to 'string'"
I'm guessing the reason for this is that Directory.GetFiles has the possibility of returning multiple files. So, is there another method I could use to achieve the results I'm looking for? Thank you.

I am assuming you want the contents of the file (if you just want the file name and need to check for existance a different solution will be required).
files[count] = File.ReadAllText(Path.Combine(#"C:\_projects\ExampleProject\src", currentTableName));
And a couple other suggestions:
Don't initialize your variables with bogus data, = new string[] {} can be removed
Don't use count as an indexer, it is confusing (count is a property of the array after all)
Use Path.Combine when joining paths. It is much easier as it handles the \ for you.

From your question:
So basically what I'm trying to do here is to populate a string array
of file names based on the ordering of these names in a txt file. So I
read the first line from the txt file, then retrieve the name of that
file in the directory(assuming it exists) and put it in the first spot
of the array, then move on.
So, your TableOrder.txt already contains the files in the correct order, thus you can do:
string[] files = File.ReadAllLines(#"C:\_projects\ExampleProject\src\TableOrder.txt")

If your array files contains only paths, you can do it as:
path = #"C:\_projects\ExampleProject\src\" + currentTableName;
If(File.Exists(path))
{
files[count] = path;
}

find string using c#?

I am trying find a string in below string.
http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779
by using http://example.com/TIGS/SIM/Lists string. How can I get Team Discussion word from it?
Some times strings will be
http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779
I need `Team Discussion`
http://example.com/TIGS/ALIF/Lists/Artifical Lift Discussion Forum 2/DispForm.aspx?ID=8
I need `Artifical Lift Discussion Forum 2`

If you're always following that pattern, I recommend #Justin's answer. However, if you want a more robust method, you can always couple the System.Uri and Path.GetDirectoryName methods, then perform a String.Split. Like this example:
String url = #"http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779";
System.Uri uri = new System.Uri(url);
String dir = Path.GetDirectoryName(uri.AbsolutePath);
String[] parts = dir.Split(new[]{ Path.DirectorySeparatorChar });
Console.WriteLine(parts[parts.Length - 1]);
The only major problem, however, is you're going to wind up with a path that's been "encoded" (i.e. your space is now going to be represented by a %20)

This solution will get you the last directory of your URL regardless of how many directories are in your URL.
string[] arr = s.Split('/');
string lastPart = arr[arr.Length - 2];
You could combine this solution into one line, however it would require splitting the string twice, once for the values, the second for the length.

If you wanted to see a regular expression example:
string input = "http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779";
string given = "http://example.com/TIGS/SIM/Lists";
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(given + #"\/(.+)\/");
System.Text.RegularExpressions.Match match = regex.Match(input);
Console.WriteLine(match.Groups[1]); // Team Discussion

Here's a simple approach, assuming that your URL always has the same number of slashes before the are you want:
var value = url.Split(new[]{'/'}, StringSplitOptions.RemoveEmptyEntries)[5];

Here is another solution that provides the following advantages:
Does not require the use of regular expressions.
Does not require a certain 'count' of slashes be present (indexing based of a specific number). I consider this a key benefit because it makes the code less likely to fail if some part of the URL changes. Ultimately it is best to base your parsing logic off which part of the text's structure you consider least likely to change.
This method, however, DOES rely on the following assumptions, which I consider to be the least likely to change:
URL must have "/Lists/" right before target text.
URL must have "/" right after target text.
Basically, I just split the string twice, using text that I expect to be surrounding the area I am interested in.
String urlToSearch = "http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx";
String result = "";
// First, get everthing after "/Lists/"
string[] temp1 = urlToSearch.Split(new String[] { "/Lists/" }, StringSplitOptions.RemoveEmptyEntries);
if (temp1.Length > 1)
{
// Next, get everything before the first "/"
string[] temp2 = temp1[1].Split(new String[] { "/" }, StringSplitOptions.RemoveEmptyEntries);
result = temp2[0];
}
Your answer will then be stored in the 'result' variable.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.