Sort FileSystemInfo[] by Name - c#

I have probably spent about 500 hours Googling this and reading MSDN documentation and it still refuses to work the way I want.
I can sort by name for files like this:
01.png
02.png
03.png
04.png
I.e. all the same file length.
The second there is a file with a longer file length everything goes to hell.
For example in the sequence:
1.png
2.png
3.png
4.png
5.png
10.png
11.png
It reads:
1.png, 2.png then 10.png, 11.png
I don't want this.
My Code:
DirectoryInfo di = new DirectoryInfo(directoryLoc);
FileSystemInfo[] files = di.GetFileSystemInfos("*." + fileExtension);
Array.Sort<FileSystemInfo>(files, new Comparison<FileSystemInfo>(compareFiles));
foreach (FileInfo fri in files)
{
fri.MoveTo(directoryLoc + "\\" + prefix + "{" + operationNumber.ToString() + "}" + (i - 1).ToString("D10") +
"." + fileExtension);
i--;
x++;
progressPB.Value = (x / fileCount) * 100;
}
// compare by file name
int compareFiles(FileSystemInfo a, FileSystemInfo b)
{
// return a.LastWriteTime.CompareTo(b.LastWriteTime);
return a.Name.CompareTo(b.Name);
}

It's not a matter of the file length particularly - it's a matter of the names being compared in lexicographic order.
It sounds like in this particular case you want to get the name without the extension, try to parse it as an integer, and compare the two names that way - you could fall back to lexicographic ordering if that fails.
Of course, that won't work if you have "debug1.png,debug2.png,...debug10.png"...you'd need a more sophisticated algorithm in that case.

You're comparing the names as strings, even though (I'm assuming) you want them sorted by number.
This is a well-known problem where "10" comes before "9" because the first character in 10 (1) is less than the first character in 9.
If you know that the files will all consist of numbered names, you can modify your custom sort routine to convert the names to integers and sort them appropriately.

Your code is correct and working as expected, just the sort is performed alphabetically, not numerically.
For instance, the strings "1", "10", "2" are in alphabetical order. Instead if you know your filenames are always just a number plus ".png" you can do the sort numerically. For instance, something like this:
int compareFiles(FileSystemInfo a, FileSystemInfo b)
{
// Given an input 10.png, parses the filename as integer to return 10
int first = int.Parse(Path.GetFileNameWithoutExtension(a.Name));
int second = int.Parse(Path.GetFileNameWithoutExtension(b.Name));
// Performs the comparison on the integer part of the filename
return first.CompareTo(second);
}

I ran into this same issue, but instead of sorting the list myself, I changed the filename by using 6 digit '0' padded key.
My list now looks like this:
000001.jpg
000002.jpg
000003.jpg
...
000010.jpg
But, if you can't change the filenames, you're going to have to implement your own sorting routine to deal with the alpha sort.

How about a bit of linq and regex to fix the ordering?
var orderedFileSysInfos =
new DirectoryInfo(directoryloc)
.GetFileSystemInfos("*." + fileExtension)
//regex below grabs the first bunch of consecutive digits in file name
//you might want something different
.Select(fsi => new{fsi, match = Regex.Match(fsi.Name, #"\d+")})
//filter away names without digits
.Where(x => x.match.Success)
//parse the digits to int
.Select(x => new {x.fsi, order = int.Parse(x.match.Value)})
//use this value to perform ordering
.OrderBy(x => x.order)
//select original FileSystemInfo
.Select(x => x.fsi)
//.ToArray() //maybe?

Related

How to write sorting in more efficient way?

I have a project where I have to write an efficient code which will be working as fast as possible, but I have lack of knowledge do to it so...
So I have an asp.net(MVC) project using entity framework and as well I have to use Web Service to get info about details from it.
First I make request to Web service and is responds with a long string, which i have to parse in a list of strings for further activities.
I parse this string like this:
string resultString;
char[] delimiterChars = { ',', ':', '"', '}', '{' };
List<string> words = resultString.Split(delimiterChars).ToList();
From here i have list with a lot of rows, which have information and a lot of junk rows, which look like this:
I decided to clear this list from junk info, so as not to work with it in further methods and not to check this rows with ifs and so on:
for (int i = words.Count - 1; i >= 0; i--)
{
if (words[i] == "" || words[i] == "data" || words[i] == "array") words.RemoveAt(i);
}
After this I got clear list, but every decimal number like prices, sizes and so on got separated by ,, so if I had price 21,55 in my list it now looks like 2 elements 21 and 55. I cant just delete , from separators, because string I get as a response from web service mainly separates info by putting ,.
So I decided to glue decimal numbers back (before this block list elements looked like: 1)attrValue 2)21 3)55 and after like : 1)attrValue 2)21.55):
for (int i = 0; i < words.Count(); i++)
{
if (words[i] == "attrValue")
{
try
{
var seconPartInt = Int32.Parse(words[i + 2]);
words[i + 1] += "." + words[i + 2];
}
catch { }
}
if (words[i].Contains("\\/")) words[i].Replace("\\/", "/");
}
Every thing is ok, list is sorted, decimals are gathered, but speed is slowed down by 30%. After some tests with stopwatch and commenting blocks of code it became clear that this code above slows down the whole program too much...
To sum up:
I cant use that slow code and at the same time do not know how to make it work faster. May be the problem is that I convert string to int so as to check whether next element in the list is second part if my number.
How could I optimize my code?
The first thing you should do is use this version of Split to avoid getting empty entries (https://msdn.microsoft.com/en-us/library/ms131448(v=vs.110).aspx).
List<string> words = resultString.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries)
.ToList();
Also, if you know that "data" and "array" are in the string and you never want them, replace them with blanks before you split the string.
resultString = resultString.Replace("data", String.Empty)
.Replace("array", String.Empty);
What I don't understand is how the comma can be both a field delimiter and a meaningful character, and how you can possibly know the difference (i.e. whether 25,50 should be a single value or two values).

Reading file names, respectively?

I have 1000 files in a folder, I want to find the name of the file, but when I do it
The file names are not sorted.
For example: These are my filenames
1-CustomerApp.txt
2-CustomerApp.txt
3-CustomerApp.txt
...
var adddress = #"data\";
str = "";
DirectoryInfo d = new DirectoryInfo(adddress);//Assuming Test is your Folder
FileInfo[] Files = d.GetFiles("*.txt"); //Getting Text files
foreach (FileInfo file in Files)
{
str = str + ", " + file.Name;
}
You could use this to keep a "numerical" order
FileInfo[] files = d.GetFiles("*.txt")
.OrderBy(m => m.Name.PadLeft(200, '0')).ToArray();
200 is quite arbitrary, of course.
This will add "as many 0 as needed" so that the file name + n 0 are a string with a length of 200.
to make it a little bit less arbitrary and "brittle", you could do
var f = d.GetFiles("*.txt");
var maxLength = f.Select(l => l.Name.Length).Max() + 1;
FileInfo[] files = f.OrderBy(m => m.Name.PadLeft(maxLength, '0')).ToArray();
How does it work (bad explanation, someone could do better) and why is it far from perfect :
Ordering of characters in c# puts numeric characters before alpha characters.
So 0 (or 9) will be before a.
But 10a, will be before 2a as 1 comes before 2.
see char after char
1 / 2 => 10a first
But if we change
10a and 2a, with PadLeft, to
010a and 002a, we see, character after character
0 / 0 => equal
1 / 0 => 002a first
This "works" in your case, but really depends on your "file naming" logic.
CAUTION : This solution won't work with other file naming logic
For example, the numeric part is not at the start.
f-2-a and f-10-a
Because
00-f-2-a would still be before 0-f-10-a
or the "non-numeric part" is not of the same length.
1-abc and 2-z
Because
01-abc will come after 0002-z
They are sorted alphabetically. I don't know how do you see what they are not sorted (with what you compare or where have you seen different result), if you are using file manager, then perhaps it apply own sorting. In windows explorer result will be the same (try to sort by name column).
If you know template of how file name is created, then you can do a trick, to apply own sorting. In case of your example, extract number at start (until minus) and pad it with zeroes (until maximum possible number size), and then sort what you get.
Or you can pad with zeroes when files are created:
0001-CustomerApp.txt
0002-CustomerApp.txt
...
9999-CustomerApp.txt
This should give you more or less what you want; it skips any characters until it hits a number, then includes all directly following numerical characters to get the number. That number is then used as a key for sorting. Somewhat messy perhaps, but should work (and be less brittle than some other options, since this searches explicitly for the first "whole" number within the filename):
var filename1 = "10-file.txt";
var filename2 = "2-file.txt";
var filenames = new []{filename1, filename2};
var sorted =
filenames.Select(fn => new {
nr = int.Parse(new string(
fn.SkipWhile(c => !Char.IsNumber(c))
.TakeWhile(c => Char.IsNumber(c))
.ToArray())),
name = fn
})
.OrderBy(file => file.nr)
.Select(file => file.name);
Try the alphanumeric-sorting and sort the files' names according to it.
FileInfo[] files = d.GetFiles("*.txt");
files.OrderBy(file=>file.Name, new AlphanumComparatorFast())

How to get the files in numeric order from the specified directory in c#?

I have to retrieve list of file names from the specific directory using numeric order.Actually file names are combination of strings and numeric values but end with numeric values.
For example : page_1.png,page_2.png,page3.png...,page10.png,page_11.png,page_12.png...
my c# code is below :
string filePath="D:\\vs-2010projects\\delete_sample\\delete_sample\\myimages\\";
string[] filePaths = Directory.GetFiles(filePath, "*.png");
It retrieved in the following format:
page_1.png
page_10.png
page_11.png
page_12.png
page_2.png...
I am expecting to retrieve the list ordered like this:
page_1.png
page_2.png
page_3.png
[...]
page_10.png
page_11.png
page_12.png
Ian Griffiths has a natural sort for C#. It makes no assumptions about where the numbers appear, and even correctly sorts filenames with multiple numeric components, such as app-1.0.2, app-1.0.11.
You can try following code, which sort your file names based on the numeric values. Keep in mind, this logic works based on some conventions such as the availability of '_'. You are free to modify the code to add more defensive approach save you from any business case.
var vv = new DirectoryInfo(#"C:\Image").GetFileSystemInfos("*.bmp").OrderBy(fs=>int.Parse(fs.Name.Split('_')[1].Substring(0, fs.Name.Split('_')[1].Length - fs.Extension.Length)));
First you can extract the number:
static int ExtractNumber(string text)
{
Match match = Regex.Match(text, #"_(\d+)\.(png)");
if (match == null)
{
return 0;
}
int value;
if (!int.TryParse(match.Value, out value))
{
return 0;
}
return value;
}
Then you could sort your list using:
list.Sort((x, y) => ExtractNumber(x).CompareTo(ExtractNumber(y)));
Maybe this?
string[] filePaths = Directory.GetFiles(filePath, "*.png").OrderBy(n => n);
EDIT: As Marcelo pointed, I belive you can get get all file names you can get their numerical part with a regex, than you can sort them including their file names.
This code would do that:
var dir = #"C:\Pictures";
var sorted = (from fn in Directory.GetFiles(dir)
let m = Regex.Match(fn, #"(?<order>\d+)")
where m.Success
let n = int.Parse(m.Groups["order"].Value)
orderby n
select fn).ToList();
foreach (var fn in sorted) Console.WriteLine(fn);
It also filters out those files that has not a number in their names.
You may want to change the regex pattern to match more specific name structures for file names.

auto detect tag within a text

Does there is any library or algorithm that can do auto detection of tags in a text (ignoring the usual words of the chosen language)?
Something like this:
string[] keywords = GetKeyword("Your order is num #0123456789")
and keywords[] would contain "order" and "#0123456789" ...?
Does it exist? Or the user will select by himself all the tags of every document all the time? :?
foreach(string keyword in keywords) { // where keywords is a List<string>
if ("Your order is num #0123456789".Contains(keyword)) {
keywordsPresent.Add(keyword); // where keywordsPresent is a List<string>
}
}
return keywordsPresent;
What the above does is not cater for your #0123456789, for that add some more logic to find the index of the # or something...
Sorry, I misunderstood the question. If you want to look for specific words, the algorithm will depend on you strings. For example, you can use string.Split() to generate an array of words from one string, and then work with that, like this:
string[] words = string.Split("Your order is num #0123456789");
string orderNumber = "";
if(words.Contains("order") && w.StartsWith("#").Count > 0)
{
orderNumber = words.Where(w=>w.StartsWith("#").FirstOrDefault();
}
This will first generate an array of words from "Your order is num #0123456789" , then if it contains the word "order" it will wind a word that starts with "#" and select that;
I think that a lot of different algorithms can be used. Some of them are simple another are super complex. I can suggest you the next basic way:
Split all text into array of words.
Remove stop words from the array. (Goole "stop words list" to get full list of stop words.)
Walk through the array and calculate count of each word.
Sort words in accordance with their 'weight' in the array.
Choose necessary amount of tags.

Extracting values from a string in C#

I have the following string which i would like to retrieve some values from:
============================
Control 127232:
map #;-
============================
Control 127235:
map $;NULL
============================
Control 127236:
I want to take only the Control . Hence is there a way to retrieve from that string above into an array containing like [127232, 127235, 127236]?
One way of achieving this is with regular expressions, which does introduce some complexity but will give the answer you want with a little LINQ for good measure.
Start with a regular expression to capture, within a group, the data you want:
var regex = new Regex(#"Control\s+(\d+):");
This will look for the literal string "Control" followed by one or more whitespace characters, followed by one or more numbers (within a capture group) followed by a literal string ":".
Then capture matches from your input using the regular expression defined above:
var matches = regex.Matches(inputString);
Then, using a bit of LINQ you can turn this to an array
var arr = matches.OfType<Match>()
.Select(m => long.Parse(m.Groups[1].Value))
.ToArray();
now arr is an array of long's containing just the numbers.
Live example here: http://rextester.com/rundotnet?code=ZCMH97137
try this (assuming your string is named s and each line is made with \n):
List<string> ret = new List<string>();
foreach (string t in s.Split('\n').Where(p => p.StartsWith("Control")))
ret.Add(t.Replace("Control ", "").Replace(":", ""));
ret.Add(...) part is not elegant, but works...
EDITED:
If you want an array use string[] arr = ret.ToArray();
SYNOPSYS:
I see you're really a newbie, so I try to explain:
s.Split('\n') creates a string[] (every line in your string)
.Where(...) part extracts from the array only strings starting with Control
foreach part navigates through returned array taking one string at a time
t.Replace(..) cuts unwanted string out
ret.Add(...) finally adds searched items into returning list
Off the top of my head try this (it's quick and dirty), assuming the text you want to search is in the variable 'text':
List<string> numbers = System.Text.RegularExpressions.Regex.Split(text, "[^\\d+]").ToList();
numbers.RemoveAll(item => item == "");
The first line splits out all the numbers into separate items in a list, it also splits out lots of empty strings, the second line removes the empty strings leaving you with a list of the three numbers. if you want to convert that back to an array just add the following line to the end:
var numberArray = numbers.ToArray();
Yes, the way exists. I can't recall a simple way for It, but string is to be parsed for extracting this values. Algorithm of it is next:
Find a word "Control" in string and its end
Find a group of digits after the word
Extract number by int.parse or TryParse
If not the end of the string - goto to step one
realizing of this algorithm is almost primitive..)
This is simplest implementation (your string is str):
int i, number, index = 0;
while ((index = str.IndexOf(':', index)) != -1)
{
i = index - 1;
while (i >= 0 && char.IsDigit(str[i])) i--;
if (++i < index)
{
number = int.Parse(str.Substring(i, index - i));
Console.WriteLine("Number: " + number);
}
index ++;
}
Using LINQ for such a little operation is doubtful.

Categories