Grouping a list by the results of a split - c#

so I have some strings in a list
Folder1\File.png
Folder1\File2.png
File3.png
File4.png
and I would like to group these on a split('\\')[0]; for example
foreach (var group in files.GroupBy(x => //mysplit))
{
if (group.Count() > 1)
{
// this is a folder and its files are: group
}
else
{
//group is an individual file
}
}
but I'm not sure how to group the files by this split?

I would just group items that Contains() a backslash:
var lst1 = new string[] {"Folder1\\File.png", "Folder1\\File2.png" , "File3.png", "File4.png" };
var grps = lst1.GroupBy(x => x.Contains(#"\"));
foreach (var g in grps)
{
if (g.Key) // we have a path with directory
Console.WriteLine(String.Join("\r\n", g.ToList()));
else // we have an individual file
Console.WriteLine(String.Join("\r\n", g.ToList()));
}

So my solution was:
foreach (var groupedFiles in files.GroupBy(s => s.Split('\\')[0]))
{
if (Path.GetExtension(groupedFiles.Key) == string.Empty)
{
//this is a folder
var folder = groupedFiles.Key;
var folderFiles = groupedFiles.ToList();
}
else
{
//this is a file
var file = groupedFiles.First();
}
}

Related

Remove duplicate combination of numbers in C# from csv

I'm trying to remove the duplicate combination from a csv file.
I tried using Distinct but it seems to stay the same.
string path;
string newcsvpath = #"C:\Documents and Settings\MrGrimm\Desktop\clean.csv";
OpenFileDialog openfileDial = new OpenFileDialog();
if (openfileDial.ShowDialog() == DialogResult.OK)
{
path = openfileDial.FileName;
var lines = File.ReadLines(path);
var grouped = lines.GroupBy(line => string.Join(", ", line.Split(',').Distinct())).ToArray();
var unique = grouped.Select(g => g.First());
var buffer = new StringBuilder();
foreach (var name in unique)
{
string value = name;
buffer.AppendLine(value);
}
File.WriteAllText(newcsvpath ,buffer.ToString());
label5.Text = "Complete";
}
For example, I have a combination of
{ 1,1,1,1,1,1,1,1 } { 1,1,1,1,1,1,1,2 }
{ 2,1,1,1,1,1,1,1 } { 1,1,1,2,1,1,1,1 }
The output should be
{ 1,1,1,1,1,1,1,1 }
{ 2,1,1,1,1,1,1,1 }
From you example, it seems that you want to treat each line as a sequence of numbers and that you consider two lines equal if one sequence is a permutation of the other.
So from reading your file, you have:
var lines = new[]
{
"1,1,1,1,1,1,1,1",
"1,1,1,1,1,1,1,2",
"2,1,1,1,1,1,1,1",
"1,1,1,2,1,1,1,1"
};
Now let's convert it to an array of number sequences:
var linesAsNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.ToArray())
.ToArray();
Or better, since we are not interested in permutations, we can sort the numbers in the sequences immediately:
var linesAsSortedNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.OrderBy(number => number)
.ToArray())
.ToArray();
When using Distinct on this, we have to pass a comparer which considers two array equal, if they have the same elements. Let's use the one from this SO question
var result = linesAsSortedNumberSequences.Distinct(new IEnumerableComparer<int>());
Try it
HashSet<string> record = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
StringBuilder textEditor= new StringBuilder();
foreach (string col in columns)
{
textEditor.AppendFormat("[{0}={1}]", col, row[col].ToString());
}
if (!record.Add(textEditor.ToString())
{
}
}

How to Get Groups of Files from GetFiles()

I have to process files everyday. The files are named like so:
fg1a.mmddyyyy
fg1b.mmddyyyy
fg1c.mmddyyyy
fg2a.mmddyyyy
fg2b.mmddyyyy
fg2c.mmddyyyy
fg2d.mmddyyyy
If the entire file group is there for a particular date, I can process it. If it isn't there, I should not process it. I may have several partial file groups that run over several days. So when I have fg1a.12062017, fg1b.12062017 and fg1c.12062017, I can process that group (fg1) only.
Here is my code so far. It doesn't work because I can't figure out how to get only the full groups to add to the the processing file list.
fileList = Directory.GetFiles(#"c:\temp\");
string[] fileGroup1 = { "FG1A", "FG1B", "FG1C" }; // THIS IS A FULL GROUP
string[] fileGroup2 = { "FG2A", "FG2B", "FG2C", "FG2D" };
List<string> fileDates = new List<string>();
List<string> procFileList;
// get a list of file dates
foreach (string fn in fileList)
{
string dateString = fn.Substring(fn.IndexOf('.'), 9);
if (!fileDates.Contains(dateString))
{
fileDates.Add(dateString);
}
}
bool allFiles = true;
foreach (string fg in fileGroup1)
{
foreach (string fd in fileDates)
{
string finder = fg + fd;
bool foundIt = false;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
foundIt = true;
}
}
if (!foundIt)
{
allFiles = false;
}
else
{
foreach (string fn in fileList)
{
procFileList.Add(fn);
}
}
}
}
foreach (string fg in fileGroup2)
{
foreach (string fd in fileDates)
{
string finder = fg + fd;
bool foundIt = false;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
foundIt = true;
}
}
if (!foundIt)
{
allFiles = false;
}
else
{
foreach (string fn in fileList)
{
procFileList.Add(fn);
}
}
}
}
Any help or advice would be greatly appreciated.
Because it can sometimes get messy dealing with multiple lists, groupings, and parsing file names, I would start by creating a class that represents a FileGroupItem. This class would have a Parse method that takes in a file path, and then has properties that represent the group part and date part of the file name, as well as the full path to the file:
public class FileGroupItem
{
public string DatePart { get; set; }
public string GroupName { get; set; }
public string FilePath { get; set; }
public static FileGroupItem Parse(string filePath)
{
if (string.IsNullOrWhiteSpace(filePath)) return null;
// Split the file name on the '.' character to get the group and date parts
var fileParts = Path.GetFileName(filePath).Split('.');
if (fileParts.Length != 2) return null;
return new FileGroupItem
{
GroupName = fileParts[0],
DatePart = fileParts[1],
FilePath = filePath
};
}
}
Then, in my main code, I would create a list of the file group definitions, and then populate a list of FileGroupItems from the directory we're scanning. After that, we can determine if any file group definition is complete by comparing it's items (in a case-insensitive way) to the actual FileGroupItems we found in the directory (after first grouping the FileGroupItems by it's DatePart). If the intersection of these two lists has the same number of items as the file group definition, then it's complete and we can process that group.
Maybe it will make more sense in code:
private static void Main()
{
var scanDirectory = #"f:\public\temp\";
var processedDirectory = #"f:\public\temp2\";
// The lists that define a complete group
var fileGroupDefinitions = new List<List<string>>
{
new List<string> {"FG1A", "FG1B", "FG1C"},
new List<string> {"FG2A", "FG2B", "FG2C", "FG2D"}
};
// Populate a list of FileGroupItems from the files
// in our directory, and group them on the DatePart
var fileGroups = Directory.EnumerateFiles(scanDirectory)
.Select(FileGroupItem.Parse)
.GroupBy(f => f.DatePart);
// Now go through each group and compare the items
// for that date with our file group definitions
foreach (var fileGroup in fileGroups)
{
foreach (var fileGroupDefinition in fileGroupDefinitions)
{
// Get the intersection of the group definition and this file group
var intersection = fileGroup
.Where(f => fileGroupDefinition.Contains(
f.GroupName, StringComparer.OrdinalIgnoreCase))
.ToList();
// If all the items in the definition are there, then process the files
if (intersection.Count == fileGroupDefinition.Count)
{
foreach (var fileGroupItem in intersection)
{
Console.WriteLine($"Processing file: {fileGroupItem.FilePath}");
// Move the file to the processed directory
File.Move(fileGroupItem.FilePath,
Path.Combine(processedDirectory,
Path.GetFileName(fileGroupItem.FilePath)));
}
}
}
}
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
I think you could simplify your algorithm so you just have file groups as a prefix and a number of files to expect, fg1 is 3 files for a given date
I think your code to find the distinct dates present is a good idea, though you should use a hash set rather than a list, if you occasionally expect a large number of dates.. ("Valentine's Day?" - Ed)
Then you just need to work on the other loop that does the checking. An algorithm like this
//make a new Dictionary<string,int> for the filegroup prefixes and their counts3
//eg myDict["fg1"] = 3; myDict["fg2"] = 4;
//list the files in the directory, into an array of fileinfo objects
//see the DirectoryInfo.GetFiles method
//foreach string d in the list of dates
//foreach string fgKey in myDict.Keys - the list of group prefixes
//use a bit of Linq to get all the fileinfos with a
//name starting with the group and ending with the date
var grplist = myfileinfos.Where(fi => fi.Name.StartsWith(fg) && fi.Name.EndsWith(d));
//if the grplist.Count == the filegroup count ( myDict[fgKey] )
//then send every file in grplist for processing
//remember that grplist is a collection of fileinfo objects,
//if your processing method takes a string filename, use fileinfo.Fullname
Putting your file groupings into one dictionary will make things a lot easier than having them as x separate arrays
I haven't written all the code for you, but I've comment sketched the algorithm, and I've put in some of the more awkward bits like the link, dictionary declaration and how to fill it.. have a go at fleshing it out with code, ask any questions in a comment on this post
First, create an array of the groups to make processing easier:
var fileGroups = new[] {
new[] { "FG1A", "FG1B", "FG1C" },
new[] { "FG2A", "FG2B", "FG2C", "FG2D" }
};
Then you can convert the array into a Dictionary to map each name back to its group:
var fileGroupMap = fileGroups.SelectMany(g => g.Select(f => new { key = f, group = g })).ToDictionary(g => g.key, g => g.group);
Then, preprocess the files you get from the directory:
var fileList = from fname in Directory.GetFiles(...)
select new {
fname,
fdate = Path.GetExtension(fname),
ffilename = Path.GetFileNameWithoutExtension(fname).ToUpper()
};
Now you can take your fileList and group by date and group, and then filter to just completed groups:
var profFileList = (from file in fileList
group file by new { file.fdate, fgroup = fileGroupMap[file.ffilename] } into fng
where fng.Key.fgroup.All(f => fng.Select(fn => fn.ffilename).Contains(f))
from fn in fng
select fn.fname).ToList();
Since you didn't preserve the groups, I flattened the groups at the end of the query into just a list of files to be processed. If you needed, you could keep them in groups and process the groups instead.
Note: If a file exists that belongs to no group, you will get an error from the lookup in fileGroupMap. If that is a possiblity you can filter the fileList to just known names as follows:
var fileList = from fname in GetFiles
let ffilename = Path.GetFileNameWithoutExtension(fname).ToUpper()
where fileGroupMap.Keys.Contains(ffilename)
select new {
fname,
fdate = Path.GetExtension(fname),
ffilename
};
Also note that having a name in multiple groups will cause an error in the creation of fileGroupMap. If that is a possibility, the queries would become more complex and have to be written differently.
Here is a simple class
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string[] filenames = { "fg1a.12012017", "fg1b.12012017", "fg1c.12012017", "fg2a.12012017", "fg2b.12012017", "fg2c.12012017", "fg2d.12012017" };
new SplitFileName(filenames);
List<List<SplitFileName>> results = SplitFileName.GetGroups();
}
}
public class SplitFileName
{
public static List<SplitFileName> names = new List<SplitFileName>();
string filename { get; set; }
string prefix { get; set; }
string letter { get; set; }
DateTime date { get; set; }
public SplitFileName() { }
public SplitFileName(string[] splitNames)
{
foreach(string name in splitNames)
{
SplitFileName splitName = new SplitFileName();
names.Add(splitName);
splitName.filename = name;
string[] splitArray = name.Split(new char[] { '.' });
splitName.date = DateTime.ParseExact(splitArray[1],"MMddyyyy", System.Globalization.CultureInfo.InvariantCulture);
splitName.prefix = splitArray[0].Substring(0, splitArray[0].Length - 1);
splitName.letter = splitArray[0].Substring(splitArray[0].Length - 1,1);
}
}
public static List<List<SplitFileName>> GetGroups()
{
return names.OrderBy(x => x.letter).GroupBy(x => new { date = x.date, prefix = x.prefix })
.Where(x => string.Join(",",x.Select(y => y.letter)) == "a,b,c,d")
.Select(x => x.ToList())
.ToList();
}
}
}
With everyone's help, I solved it too. This is what I'm going with because it's the most maintainable for me but the solutions were so smart!!! Thanks everyone for your help.
private void CheckFiles()
{
var fileGroups = new[] {
new [] { "FG1A", "FG1B", "FG1C", "FG1D" },
new[] { "FG2A", "FG2B", "FG2C", "FG2D", "FG2E" } };
List<string> fileDates = new List<string>();
List<string> pfiles = new List<string>();
// get a list of file dates
foreach (string fn in fileList)
{
string dateString = fn.Substring(fn.IndexOf('.'), 9);
if (!fileDates.Contains(dateString))
{
fileDates.Add(dateString);
}
}
// check if a date has all the files
foreach (string fd in fileDates)
{
int fgCount = 0;
// for each file group
foreach (Array masterfg in fileGroups)
{
foreach (string fg in masterfg)
{
// see if all the files are there
bool foundIt = false;
string finder = fg + fd;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
pfiles.Add(fn);
}
}
fgCount++;
}
if (fgCount == pfiles.Count())
{
foreach (string fn in pfiles)
{
procFileList.Add(fn);
}
pfiles.Clear();
}
else
{
pfiles.Clear();
}
}
}
return;
}

Extract common path from a collection of grouped file paths?

I have a question related to https://en.wikipedia.org/wiki/Longest_common_substring_problem, my source collection contains a list of file paths that doesn't always share a common path (outside of the C:\ drive sometimes) ex:
Source collection :
C:\Test\Root\Common\Data\a.txt
C:\Test\Root\Common\Data\Home\b.txt
C:\Test\Root\Common\Data\Home\Dev\c.txt
C:\Test2\Random\Data\a.txt
C:\Test2\Random\b.txt
C:\Test2\c.txt
D:\Data\a.txt
Output should be a collection :
C:\Test\Root\Common\Data\
C:\Test2\
D:\Data\
How to find the common path of each "group" of file paths ? I've found many solutions here but it is always with a collection of file paths sharing at least one common directory which is not the case here.
I am still not sure that I understand problem correctly...
I hope this will work.
public List<string> ExtractCommonPaths(List<string> paths)
{
var separatedImput = paths
.Select(path => path.Split(new [] {":\\", "\\" }, StringSplitOptions.RemoveEmptyEntries))
.Select(path => path.Take(path.Length - 1).ToList());
return separatedImput.GroupBy(path => path[0] + ":\\" + path[1])
.Select(g =>
{
var commonPath = g.Key;
var commpoPathLength = 2;
for (;;)
{
var exit = false;
var pathItem = string.Empty;
foreach (var path in g)
{
if (path.Count <= commpoPathLength)
{
exit = true;
break;
}
if (pathItem == string.Empty)
pathItem = path[commpoPathLength];
else
{
if (pathItem != path[commpoPathLength])
{
exit = true;
break;
}
}
}
if (exit)
break;
commonPath += "\\" + pathItem;
commpoPathLength++;
}
return commonPath;
})
.ToList();
}
I have something if you want to look for the directories inside some location.
To this method i have every of directories from some (C:\Files1) example location.
If you want to get only main directories from this list:
public List<DirectoryInfo> ExtractDirectoriesCommonPaths(List<DirectoryInfo> directories, string location)
{
var result = new List<DirectoryInfo>() { };
directories.ForEach(directory =>
{
var otherDirectories = directories.Where(d => d.FullName != directory.FullName);
var anyDirectoryWithThisSamePath = otherDirectories.Where(x => directory.FullName.Contains(x.FullName) && x.FullName.Length < directory.FullName.Length);
if (anyDirectoryWithThisSamePath.Any())
{
result.Add(anyDirectoryWithThisSamePath.FirstOrDefault());
}
});
return result.Where(x => x.FullName != location && x.FullName.Length > location.Length).Distinct().ToList();
}
Input:
C:\Files1\test_announcements
C:\Files1\Parent
C:\Files1\test_announcements\test_announcements_archive
C:\Files1\Parent\child
C:\Files1\Parent\child2
C:\Files1\Parent\child\subchild
C:\Files1\test_announcements
C:\Files1\Parent
Output:
C:\Files1\test_announcements
C:\Files1\Parent

Group files in a directory based on their prefix

I have a folder with pictures:
Folder 1:
Files:
ABC-138923
ABC-3223
ABC-33489
ABC-3111
CBA-238923
CBA-1313
CBA-1313
DAC-38932
DAC-1111
DAC-13893
DAC-23232
DAC-9999
I want to go through this folder and count how many of each picture pre-fix I have.
For example, there are 4 pictures of pre-fix ABC and 3 pictures of pre-fix CBA above.
I'm having a hard time trying to figure out how to loop through this. Anyone can give me a hand?
Not a loop, but more clear and readable:
string[] fileNames = ...; //some initializing code
var prefixes = fileNames.GroupBy(x => x.Split('-')[0]).
Select(y => new {Prefix = y.Key, Count = y.Count()});
Upd:
To display the count for each prefix:
foreach (var prefix in prefixes)
{
Console.WriteLine("Prefix: {0}, Count: {1}", prefix.Prefix, prefix.Count);
}
Here it is with a 'foreach' loop:
var directoryPath = ".\Folder1\";
var prefixLength = 3;
var accumulator = new Dictionary<string, int>();
foreach (var file in System.IO.Directory.GetFiles(directoryPath)) {
var prefix = filefile.Replace(directoryPath, string.Empty).Substring(0, prefixLength);
if (!accumulator.ContainsKey(prefix))
{
accumulator.Add(prefix, 0);
}
accumulator[prefix]++;
}
foreach(var prefix in accumulator.Keys) {
Console.WriteLine("{0}: {1}", prefix, accumulator[prefix]);
}
in C#,
using System.IO;
using System.Collections.Generic;
...
DirectoryInfo dir = new DirectoryInfo("C:\\yourfolder");
FileInfo[] files = dir.GetFiles();
List<string> prefix = new List<string>();
List<int> count = new List<int>();
foreach (FileInfo file in files)
{
if (prefix.Count > 0)
{
Boolean AddNew = true;
for (int i = 0; i < prefix.Count; i++)
{
if (file.Name.Substring(0, 3) == prefix[i])
{
count[i]++;
AddNew = false;
}
}
if (AddNew)
{
prefix.Add(file.Name.Substring(0, 3));
count.Add(1);
}
}
else
{
prefix.Add(file.Name.Substring(0, 3));
count.Add(1);
}
}
...
The prefix string list is parallel to the count list, so to access you could loop through the array. I haven't tested or optimized it, but if you're heading down this route (c#) this could be a start.
The algorithm:
Create a dictionary:
Dictionary<string, int> D;
Loop through the directory using:
foreach (var file in System.IO.Directory.GetFiles(dir))
...
Complete the following 3 steps for each file:
Extract the prefix and see if a matching key exists in D. If TRUE, go to step 3.
Insert the prefix as a new key in D, with value 0
Increment the key's value by 1
To display results when the entire directory has been processed:
foreach (KeyValuePair<string, int> pair in D)
Console.WriteLine("{0} prefix has {1} files", pair.Key, pair.Value);

Exporting diectory structure to csv/xl file

my requirement is to enumerate all directories and specific .tif files (that are at the end of the structure). Sample is
A (path selected from UI) <has>
B<has> and C<has>
D <has> E F G H I J
K L<has>
1.tif 2.tif
In the above directory, A has B and C. Named as clients. B has D,E,F (as dated), D has K and L (family).
So Ineed your help in retrieving the directory structure in txt or excel file as
B D
K 0
L 2 (since there are two tif files)
E
F
Similary for c and other directories.
Maybe this would do the trick (or give at least a good start point):
public void OutputStructureToFile(string outputFileName, string folder, string searchPattern)
{
using (var file = new StreamWriter(outputFileName))
{
file.Write(GetStructure(new DirectoryInfo(folder), searchPattern));
}
}
public string GetStructure(DirectoryInfo directoryInfo, string searchPattern)
{
return GetStructureRecursive(directoryInfo, searchPattern, 0);
}
private string GetStructureRecursive(DirectoryInfo directoryInfo, string searchPattern, int level)
{
var sb = new StringBuilder();
var indentation = level * 5;
sb.Append(new String(' ', indentation));
sb.AppendLine(directoryInfo.Name);
foreach (var directory in directoryInfo.GetDirectories())
{
sb.Append(GetStructureRecursive(directory, searchPattern, level+1));
}
var groupedByExtension = directoryInfo.GetFiles(searchPattern)
.GroupBy(file => file.Extension)
.Select(group => new { Group = group.Key, Count = group.Count() });
foreach (var entry in groupedByExtension)
{
sb.Append(new String(' ', indentation));
sb.AppendLine(String.Format(" {0,10} {1,3}", entry.Group, entry.Count));
}
return sb.ToString();
}
And if you need it for Excel as a .csv file you should instead use this recursive function
private string GetStructureRecursiveForCsv(DirectoryInfo directoryInfo, string searchPattern, int level)
{
var sb = new StringBuilder();
var indentation = level;
sb.Append(new String(';', indentation));
sb.AppendLine(directoryInfo.Name);
foreach (var directory in directoryInfo.GetDirectories())
{
sb.Append(GetStructureRecursiveForCsv(directory, searchPattern, level+1));
}
var groupedByExtension = directoryInfo.GetFiles(searchPattern)
.GroupBy(file => file.Extension)
.Select(group => new { Group = group.Key, Count = group.Count() });
foreach (var entry in groupedByExtension)
{
sb.Append(new String(';', indentation));
sb.AppendLine(String.Format(";{0};{1}", entry.Group, entry.Count));
}
return sb.ToString();
}
I am not sure that I understand what you want but here is something that to get you started
private static void ProcessFolder(string folder, string level, string separator, StreamWriter output)
{
var dirs = Directory.GetDirectories(folder);
foreach ( var d in dirs )
{
output.Write(level);
output.WriteLine(d);
ProcessFolder(d, level + separator, separator, output);
}
Console.WriteLine();
var files = Directory.GetFiles(folder);
foreach ( var f in files )
{
output.Write(level);
output.WriteLine(f);
}
}
You will have to customize it to filter TIFF or whatever you want. You can call this function like this and it will generate the file
using ( var output = new StreamWriter(#"C:\test.csv") )
{
ProcessFolder(#"c:\Program files", "", ";", output);
}
Double-click on the generated file and Excel will probably open :)

Categories