I need to get a list of all files on a handheld device whose names fit a certain pattern, such as "ABC.XML"
I adapted code from here (Hernaldo's answer), like so:
public static List<string> GetXMLFiles(string fileType, string dir)
{
string dirName = dir; // call it like so: GetXMLFiles("ABC", "\\"); <= I think the double-whack is what I need for Windows CE device...am I right?
var fileNames = new List<String>();
try
{
foreach (string f in Directory.GetFiles(dirName))
{
if ((f.Contains(fileType)) && (f.Contains(".XML")))
{
fileNames.Add(f);
}
}
foreach (string d in Directory.GetDirectories(dirName))
{
GetXMLFiles(fileType, d);
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
return fileNames;
}
...but each time the method recursively calls itself (in the GetDirectories() loop), I'm passing the same old first arg. Is it possible (in Compact Framework) to do something like this instead:
public static List<string> GetXMLFiles(optional string fileType, string dir)
{
. . .
foreach (string d in Directory.GetDirectories(dirName))
{
GetXMLFiles(dir = d);
}
. . .
?
UPDATE
According to Habib, this should work (new "try" section):
try
{
string filePattern = string.Format("*{0}*.XML", fileType);
foreach (string f in Directory.GetFiles(dirName, filePattern))
{
fileNames.Add(f);
}
foreach (string d in Directory.GetDirectories(dirName))
{
GetXMLFiles(fileType, d);
}
}
Ja?
UPDATE 2
This goes along with my second response to Alan's comment below:
const string EXTENSION = ".XML";
. . .
try
{
foreach (string f in Directory.GetFiles(dirName))
{
string ext = Path.GetExtension(f);
string fileNameOnly = Path.GetFileNameWithoutExtension(f);
if ((ext.Equals(EXTENSION, StringComparison.Ordinal)) && (fileNameOnly.Contains(fileType)))
{
fileNames.Add(f);
}
}
foreach (string d in Directory.GetDirectories(dirName))
{
GetXMLFiles(fileType, d);
}
}
Optional parameters are a feature of .NET 4.0, so you won't be able to use them. The typical solution to this with a recursive function is to create two overloads for your method with different parameters.
For example,
BinarySearch(array a);
{
BinarySearch(a, -1, array.Length);
}
BinarySearch(array, low, high)
{
//code to update low high
return BinarySearch(array, low, high);
}
In this case, the method which kicks off the recursion has a slightly different signature. You could do the same.
Directory.GetFiles has an overload which takes search pattern you can use that:
Directory.GetFiles(dirName, "ABC.XML")
You can also use wildcards like "*.XML" which would return all XML files.
Related
I have a program that iterates through all files from directories and subdirectories, it's working smoothly but there is just a minor issue that my brain can't solve.
The one finding the simplest way to solve it is a genius :)
Here is the code :
int hello(string locat)
{
string[] files = Directory.GetFiles(locat);
string[] dirs = Directory.GetDirectories(locat);
int cpt = 0;
foreach (var file in files)
{
try
{
textBox1.AppendText(file+"\r\n");
cpt++;
textBox2.AppendText(cpt.ToString()+"\r\n");
}
catch { }
}
foreach (string directory in dirs)
{
try
{
cpt += hello(directory);
}
catch { }
}
return cpt;
}
So the problem is that the output of cpt inside textBox2 have a logic behavior but a behavior that is not adequate for my needs
This is how it looks like :
1
2
3
1
2
1
2
...
And I want it to be 1,2,3,4,5,6,7,8,9,...
I tried with EnumerateFiles instead of GetFiles, it was working smoothly too but i got some permissions issue and I'm working on .NET framework for this project
I haven't tried this but you can just make hello take cpt as a parameter.
int hello(string locat, ref int cpt)
{
string[] files = Directory.GetFiles(locat);
string[] dirs = Directory.GetDirectories(locat);
foreach (var file in files)
{
try
{
textBox1.AppendText(file+"\r\n");
cpt++;
textBox2.AppendText(cpt.ToString()+"\r\n");
}
catch { }
}
foreach (string directory in dirs)
{
try
{
hello(directory, ref cpt);
}
catch { }
}
return cpt;
}
Edit:
You need to run it with ref
int cpt = 0;
hello("C:\\", ref cpt);
Here is the output I get if I run it with the following folder structure:
testfolder/
> folder1/
> a.txt
> b.txt
> c.txt
> folder2/
> a.txt
> b.txt
> c.txt
> folder3/
> a.txt
> b.txt
> c.txt
Output:
D:\testfolder\folder1\a.txt
1
D:\testfolder\folder1\b.txt
2
D:\testfolder\folder1\c.txt
3
D:\testfolder\folder2\a.txt
4
D:\testfolder\folder2\b.txt
5
D:\testfolder\folder2\c.txt
6
D:\testfolder\folder3\a.txt
7
D:\testfolder\folder3\b.txt
8
D:\testfolder\folder3\c.txt
9
A variation that avoids ref
int hello(string locat, int counter = 0)
{
string[] files = Directory.GetFiles(locat);
string[] dirs = Directory.GetDirectories(locat);
foreach (var file in files)
{
try
{
textBox2.AppendText(file + "\r\n");
counter++;
textBox2.AppendText(counter.ToString() + "\r\n");
}
catch { }
}
foreach (string directory in dirs)
{
try
{
counter = hello(directory, counter);
}
catch { }
}
return counter;
}
Your variable cpt is locally scoped, so you get a new variable instance for every recursive call. You can instead use a field (and don't increment it based on the result of your recursive call):
int cpt = 0;
void hello(string locat)
{
string[] files = Directory.GetFiles(locat);
string[] dirs = Directory.GetDirectories(locat);
foreach (var file in files)
{
try
{
textBox1.AppendText(file + "\r\n");
cpt++;
textBox2.AppendText(cpt.ToString() + "\r\n");
}
catch { }
}
foreach (string directory in dirs)
{
try
{
hello(directory);
}
catch { }
}
}
This code is not thread-safe.
I was looking for a way to loop into all the files and folders in a given path and I stumbled into this:
get tree structure of a directory with its subfolders and files using C#.net in windows application
I was fascinated by Xiaoy312 repost. So I took their code and modified it to serve my intended purpose, which is returning a list of all files' paths in a given path:
using System;
using System.Collections.Generic;
using System.IO;
class Whatever
{
static List<string> filePaths = new List<string>();
static void Main()
{
string path = "some folder path";
DirectoryInfo directoryInfo = new DirectoryInfo(path);
IEnumerable<HierarchicalItem> items = SearchDirectory(directoryInfo, 0);
foreach (var item in items) { } // my query is about this line.
PrintList(filePaths);
Console.Read();
}
static void PrintList(List<string> list)
{
foreach(string path in list)
{
Console.WriteLine(path);
}
}
public static IEnumerable<HierarchicalItem> SearchDirectory(DirectoryInfo directory, int deep = 0)
{
yield return new HierarchicalItem(directory.Name, deep);
foreach (DirectoryInfo subdirectory in directory.GetDirectories())
{
foreach (HierarchicalItem item in SearchDirectory(subdirectory, deep + 1))
{
yield return item;
}
}
foreach (var file in directory.GetFiles())
{
filePaths.Add(file.FullName);
yield return new HierarchicalItem(file.Name + file.Extension, deep + 1);
}
}
}
Now I know the general theme of recursiveness and how the function calls itself, etc. But while I was testing the code by trail an error, I noticed that it doesn't matter whether that last foreach in the "Main" method is empty or not, also, when that foreach is removed, filePaths are not filled anymore.
My Questions:
So why that last foreach in "Main" method fills the list even if it is empty? And why when it is removed, filling the list fails?
Can someone mention the steps of the recursiveness cycle, such as
SearchDirectory called,
the Empty foreach iterates the first item,
SearchDirectory returns new HierarchicalItem of the path folder.
SearchDirectory loops inside each directory, etc.
I will be grateful for that, especially Question 2.
Thank you very much
IEnumerables are generally lazy – they are only evaluated/produced when they are enumerated/iterated. Without the foreach loop, it is never iterated, therefore never executed.
It is somewhat odd for your IEnumerable generator function to have side-effects that will only be executed when the enumerable is consumed.
Behind the scenes, functions with yield return statements are transformed into state machines which will produce the output on-demand.
Here's a simpler example show-casing the lazy behavior:
class Program
{
static void Main()
{
Console.Out.WriteLine("0");
IEnumerable<string> items = Generate("a", "b", "c");
Console.Out.WriteLine("1");
foreach (string item in items) {
Console.Out.WriteLine("for: " + item);
}
Console.Out.WriteLine("2");
foreach (string item in items)
;
Console.Out.WriteLine("3");
}
public static IEnumerable<string> Generate(params string[] args)
{
foreach (string arg in args) {
Console.Out.WriteLine("Generate: " + arg);
yield return arg;
}
}
}
Output of the above program:
0
1
Generate: a
for: a
Generate: b
for: b
Generate: c
for: c
2
Generate: a
Generate: b
Generate: c
3
Furthermore, yield return doesn't have to occur inside a loop, it can be used standalone and multiple times in a single function:
class Program
{
static void Main()
{
Console.Out.WriteLine("0");
IEnumerable<string> items = Generate();
Console.Out.WriteLine("1");
foreach (string item in items) {
Console.Out.WriteLine(item);
}
Console.Out.WriteLine("2");
}
public static IEnumerable<string> Generate()
{
yield return "x";
yield return "y";
yield return "z";
}
}
Output:
0
1
x
y
z
2
And for bonus points, consider the following program:
class Program
{
static void Main()
{
foreach (string item in Generate("a", "b", "c")) {
Console.Out.WriteLine("for: " + item);
}
Generate("42").ToList();
}
public static IEnumerable<string> Generate(params string[] args)
{
foreach (string arg in args) {
Console.Out.WriteLine("Generating: " + arg);
yield return arg;
yield return arg;
Console.Out.WriteLine("Generated: " + arg);
}
}
}
Its output is:
Generating: a
for: a
for: a
Generated: a
Generating: b
for: b
for: b
Generated: b
Generating: c
for: c
for: c
Generated: c
Generating: 42
Generated: 42
Now that we have covered the basics, what your code should probably be doing instead is to get rid of the side effect:
Yield all directories
Iterate those directories and yield their files
Something along the lines of:
static void Main()
{
string path = "some folder path";
DirectoryInfo directoryInfo = new DirectoryInfo(path);
IEnumerable<DirectoryInfo> dirs = SearchDirectory(directoryInfo);
IEnumerable<string> filePaths = GetFiles(dirs);
PrintList(filePaths);
Console.Read();
}
public static IEnumerable<DirectoryInfo> SearchDirectory(DirectoryInfo directory, int deep = 0)
{
yield return directory;
foreach (DirectoryInfo subdirectory in directory.GetDirectories())
{
foreach (DirectoryInfo item in SearchDirectory(subdirectory, deep + 1))
{
yield return item;
}
}
}
public static IEnumerable<string> GetFiles(IEnumerable<DirectoryInfo> dirs) {
foreach (var dir in dirs)
{
foreach (var file in dir.GetFiles())
{
yield return file.FullName;
}
}
}
I have to process files everyday. The files are named like so:
fg1a.mmddyyyy
fg1b.mmddyyyy
fg1c.mmddyyyy
fg2a.mmddyyyy
fg2b.mmddyyyy
fg2c.mmddyyyy
fg2d.mmddyyyy
If the entire file group is there for a particular date, I can process it. If it isn't there, I should not process it. I may have several partial file groups that run over several days. So when I have fg1a.12062017, fg1b.12062017 and fg1c.12062017, I can process that group (fg1) only.
Here is my code so far. It doesn't work because I can't figure out how to get only the full groups to add to the the processing file list.
fileList = Directory.GetFiles(#"c:\temp\");
string[] fileGroup1 = { "FG1A", "FG1B", "FG1C" }; // THIS IS A FULL GROUP
string[] fileGroup2 = { "FG2A", "FG2B", "FG2C", "FG2D" };
List<string> fileDates = new List<string>();
List<string> procFileList;
// get a list of file dates
foreach (string fn in fileList)
{
string dateString = fn.Substring(fn.IndexOf('.'), 9);
if (!fileDates.Contains(dateString))
{
fileDates.Add(dateString);
}
}
bool allFiles = true;
foreach (string fg in fileGroup1)
{
foreach (string fd in fileDates)
{
string finder = fg + fd;
bool foundIt = false;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
foundIt = true;
}
}
if (!foundIt)
{
allFiles = false;
}
else
{
foreach (string fn in fileList)
{
procFileList.Add(fn);
}
}
}
}
foreach (string fg in fileGroup2)
{
foreach (string fd in fileDates)
{
string finder = fg + fd;
bool foundIt = false;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
foundIt = true;
}
}
if (!foundIt)
{
allFiles = false;
}
else
{
foreach (string fn in fileList)
{
procFileList.Add(fn);
}
}
}
}
Any help or advice would be greatly appreciated.
Because it can sometimes get messy dealing with multiple lists, groupings, and parsing file names, I would start by creating a class that represents a FileGroupItem. This class would have a Parse method that takes in a file path, and then has properties that represent the group part and date part of the file name, as well as the full path to the file:
public class FileGroupItem
{
public string DatePart { get; set; }
public string GroupName { get; set; }
public string FilePath { get; set; }
public static FileGroupItem Parse(string filePath)
{
if (string.IsNullOrWhiteSpace(filePath)) return null;
// Split the file name on the '.' character to get the group and date parts
var fileParts = Path.GetFileName(filePath).Split('.');
if (fileParts.Length != 2) return null;
return new FileGroupItem
{
GroupName = fileParts[0],
DatePart = fileParts[1],
FilePath = filePath
};
}
}
Then, in my main code, I would create a list of the file group definitions, and then populate a list of FileGroupItems from the directory we're scanning. After that, we can determine if any file group definition is complete by comparing it's items (in a case-insensitive way) to the actual FileGroupItems we found in the directory (after first grouping the FileGroupItems by it's DatePart). If the intersection of these two lists has the same number of items as the file group definition, then it's complete and we can process that group.
Maybe it will make more sense in code:
private static void Main()
{
var scanDirectory = #"f:\public\temp\";
var processedDirectory = #"f:\public\temp2\";
// The lists that define a complete group
var fileGroupDefinitions = new List<List<string>>
{
new List<string> {"FG1A", "FG1B", "FG1C"},
new List<string> {"FG2A", "FG2B", "FG2C", "FG2D"}
};
// Populate a list of FileGroupItems from the files
// in our directory, and group them on the DatePart
var fileGroups = Directory.EnumerateFiles(scanDirectory)
.Select(FileGroupItem.Parse)
.GroupBy(f => f.DatePart);
// Now go through each group and compare the items
// for that date with our file group definitions
foreach (var fileGroup in fileGroups)
{
foreach (var fileGroupDefinition in fileGroupDefinitions)
{
// Get the intersection of the group definition and this file group
var intersection = fileGroup
.Where(f => fileGroupDefinition.Contains(
f.GroupName, StringComparer.OrdinalIgnoreCase))
.ToList();
// If all the items in the definition are there, then process the files
if (intersection.Count == fileGroupDefinition.Count)
{
foreach (var fileGroupItem in intersection)
{
Console.WriteLine($"Processing file: {fileGroupItem.FilePath}");
// Move the file to the processed directory
File.Move(fileGroupItem.FilePath,
Path.Combine(processedDirectory,
Path.GetFileName(fileGroupItem.FilePath)));
}
}
}
}
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
I think you could simplify your algorithm so you just have file groups as a prefix and a number of files to expect, fg1 is 3 files for a given date
I think your code to find the distinct dates present is a good idea, though you should use a hash set rather than a list, if you occasionally expect a large number of dates.. ("Valentine's Day?" - Ed)
Then you just need to work on the other loop that does the checking. An algorithm like this
//make a new Dictionary<string,int> for the filegroup prefixes and their counts3
//eg myDict["fg1"] = 3; myDict["fg2"] = 4;
//list the files in the directory, into an array of fileinfo objects
//see the DirectoryInfo.GetFiles method
//foreach string d in the list of dates
//foreach string fgKey in myDict.Keys - the list of group prefixes
//use a bit of Linq to get all the fileinfos with a
//name starting with the group and ending with the date
var grplist = myfileinfos.Where(fi => fi.Name.StartsWith(fg) && fi.Name.EndsWith(d));
//if the grplist.Count == the filegroup count ( myDict[fgKey] )
//then send every file in grplist for processing
//remember that grplist is a collection of fileinfo objects,
//if your processing method takes a string filename, use fileinfo.Fullname
Putting your file groupings into one dictionary will make things a lot easier than having them as x separate arrays
I haven't written all the code for you, but I've comment sketched the algorithm, and I've put in some of the more awkward bits like the link, dictionary declaration and how to fill it.. have a go at fleshing it out with code, ask any questions in a comment on this post
First, create an array of the groups to make processing easier:
var fileGroups = new[] {
new[] { "FG1A", "FG1B", "FG1C" },
new[] { "FG2A", "FG2B", "FG2C", "FG2D" }
};
Then you can convert the array into a Dictionary to map each name back to its group:
var fileGroupMap = fileGroups.SelectMany(g => g.Select(f => new { key = f, group = g })).ToDictionary(g => g.key, g => g.group);
Then, preprocess the files you get from the directory:
var fileList = from fname in Directory.GetFiles(...)
select new {
fname,
fdate = Path.GetExtension(fname),
ffilename = Path.GetFileNameWithoutExtension(fname).ToUpper()
};
Now you can take your fileList and group by date and group, and then filter to just completed groups:
var profFileList = (from file in fileList
group file by new { file.fdate, fgroup = fileGroupMap[file.ffilename] } into fng
where fng.Key.fgroup.All(f => fng.Select(fn => fn.ffilename).Contains(f))
from fn in fng
select fn.fname).ToList();
Since you didn't preserve the groups, I flattened the groups at the end of the query into just a list of files to be processed. If you needed, you could keep them in groups and process the groups instead.
Note: If a file exists that belongs to no group, you will get an error from the lookup in fileGroupMap. If that is a possiblity you can filter the fileList to just known names as follows:
var fileList = from fname in GetFiles
let ffilename = Path.GetFileNameWithoutExtension(fname).ToUpper()
where fileGroupMap.Keys.Contains(ffilename)
select new {
fname,
fdate = Path.GetExtension(fname),
ffilename
};
Also note that having a name in multiple groups will cause an error in the creation of fileGroupMap. If that is a possibility, the queries would become more complex and have to be written differently.
Here is a simple class
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string[] filenames = { "fg1a.12012017", "fg1b.12012017", "fg1c.12012017", "fg2a.12012017", "fg2b.12012017", "fg2c.12012017", "fg2d.12012017" };
new SplitFileName(filenames);
List<List<SplitFileName>> results = SplitFileName.GetGroups();
}
}
public class SplitFileName
{
public static List<SplitFileName> names = new List<SplitFileName>();
string filename { get; set; }
string prefix { get; set; }
string letter { get; set; }
DateTime date { get; set; }
public SplitFileName() { }
public SplitFileName(string[] splitNames)
{
foreach(string name in splitNames)
{
SplitFileName splitName = new SplitFileName();
names.Add(splitName);
splitName.filename = name;
string[] splitArray = name.Split(new char[] { '.' });
splitName.date = DateTime.ParseExact(splitArray[1],"MMddyyyy", System.Globalization.CultureInfo.InvariantCulture);
splitName.prefix = splitArray[0].Substring(0, splitArray[0].Length - 1);
splitName.letter = splitArray[0].Substring(splitArray[0].Length - 1,1);
}
}
public static List<List<SplitFileName>> GetGroups()
{
return names.OrderBy(x => x.letter).GroupBy(x => new { date = x.date, prefix = x.prefix })
.Where(x => string.Join(",",x.Select(y => y.letter)) == "a,b,c,d")
.Select(x => x.ToList())
.ToList();
}
}
}
With everyone's help, I solved it too. This is what I'm going with because it's the most maintainable for me but the solutions were so smart!!! Thanks everyone for your help.
private void CheckFiles()
{
var fileGroups = new[] {
new [] { "FG1A", "FG1B", "FG1C", "FG1D" },
new[] { "FG2A", "FG2B", "FG2C", "FG2D", "FG2E" } };
List<string> fileDates = new List<string>();
List<string> pfiles = new List<string>();
// get a list of file dates
foreach (string fn in fileList)
{
string dateString = fn.Substring(fn.IndexOf('.'), 9);
if (!fileDates.Contains(dateString))
{
fileDates.Add(dateString);
}
}
// check if a date has all the files
foreach (string fd in fileDates)
{
int fgCount = 0;
// for each file group
foreach (Array masterfg in fileGroups)
{
foreach (string fg in masterfg)
{
// see if all the files are there
bool foundIt = false;
string finder = fg + fd;
foreach (string fn in fileList)
{
if (fn.ToUpper().Contains(finder))
{
pfiles.Add(fn);
}
}
fgCount++;
}
if (fgCount == pfiles.Count())
{
foreach (string fn in pfiles)
{
procFileList.Add(fn);
}
pfiles.Clear();
}
else
{
pfiles.Clear();
}
}
}
return;
}
I need to be able to get all files from a directory and sub directories, but I would like to give the user the option to choose the depth of sub-directories.
I.e., not just current directory or all directories, but he should be able to choose a depth of 1,2,3,4 directories etc.
I've seen many examples of walking through directory trees and none of them seemed to address this issue. And personally, I get confused with recursion... (which I currently use). I am not sure how I would track the depth during a recursive function.
Any help would be greatly appreciated.
Thanks,
David
Here is my current code (which I found here):
static void FullDirList(DirectoryInfo dir, string searchPattern, string excludeFolders, int maxSz, string depth)
{
try
{
foreach (FileInfo file in dir.GetFiles(searchPattern))
{
if (excludeFolders != "")
if (Regex.IsMatch(file.FullName, excludeFolders, RegexOptions.IgnoreCase)) continue;
myStream.WriteLine(file.FullName);
MasterFileCounter += 1;
if (maxSz > 0 && myStream.BaseStream.Length >= maxSz)
{
myStream.Close();
myStream = new StreamWriter(nextOutPutFile());
}
}
}
catch
{
// make this a spearate streamwriter to accept files that failed to be read.
Console.WriteLine("Directory {0} \n could not be accessed!!!!", dir.FullName);
return; // We alredy got an error trying to access dir so dont try to access it again
}
MasterFolderCounter += 1;
foreach (DirectoryInfo d in dir.GetDirectories())
{
//folders.Add(d);
// if (MasterFolderCounter > maxFolders)
FullDirList(d, searchPattern, excludeFolders, maxSz, depth);
}
}
using a maxdepth varibale that could be decremented each recursive call and then you cannot just return once reached the desired depth.
static void FullDirList(DirectoryInfo dir, string searchPattern, string excludeFolders, int maxSz, int maxDepth)
{
if(maxDepth == 0)
{
return;
}
try
{
foreach (FileInfo file in dir.GetFiles(searchPattern))
{
if (excludeFolders != "")
if (Regex.IsMatch(file.FullName, excludeFolders, RegexOptions.IgnoreCase)) continue;
myStream.WriteLine(file.FullName);
MasterFileCounter += 1;
if (maxSz > 0 && myStream.BaseStream.Length >= maxSz)
{
myStream.Close();
myStream = new StreamWriter(nextOutPutFile());
}
}
}
catch
{
// make this a spearate streamwriter to accept files that failed to be read.
Console.WriteLine("Directory {0} \n could not be accessed!!!!", dir.FullName);
return; // We alredy got an error trying to access dir so dont try to access it again
}
MasterFolderCounter += 1;
foreach (DirectoryInfo d in dir.GetDirectories())
{
//folders.Add(d);
// if (MasterFolderCounter > maxFolders)
FullDirList(d, searchPattern, excludeFolders, maxSz, depth - 1);
}
}
Let's start out by refactoring the code a little bit to make its work a little easier to understand.
So, the key exercise here is to recursively return all of the files that match the patterns required, but only to a certain depth. Let's get those files first.
public static IEnumerable<FileInfo> GetFullDirList(
DirectoryInfo dir, string searchPattern, int depth)
{
foreach (FileInfo file in dir.GetFiles(searchPattern))
{
yield return file;
}
if (depth > 0)
{
foreach (DirectoryInfo d in dir.GetDirectories())
{
foreach (FileInfo f in GetFullDirList(d, searchPattern, depth - 1))
{
yield return f;
}
}
}
}
This is just simplified the job of recursing for your files.
But you'll notice that it didn't exclude files based on the excludeFolders parameter. Let's tackle that now. Let's start building FullDirList.
The first line would be
var results =
from fi in GetFullDirList(dir, searchPattern, depth)
where String.IsNullOrEmpty(excludeFolders)
|| !Regex.IsMatch(fi.FullName, excludeFolders, RegexOptions.IgnoreCase)
group fi.FullName by fi.Directory.FullName;
This goes and gets all of the files, restricts them against excludeFolders and then groups all the files by the folders they belong to. We do this so that we can do this next:
var directoriesFound = results.Count();
var filesFound = results.SelectMany(fi => fi).Count();
Now I noticed that you were counting MasterFileCounter & MasterFolderCounter.
You could easily write:
MasterFolderCounter+= results.Count();
MasterFileCounter += results.SelectMany(fi => fi).Count();
Now, to write out these files it appears you are trying to aggregate the file names into separate files, but keeping a maximum length (maxSz) of the file.
Here's how to do that:
var aggregateByLength =
results
.SelectMany(fi => fi)
.Aggregate(new [] { new StringBuilder() }.ToList(),
(sbs, s) =>
{
var nl = s + Environment.NewLine;
if (sbs.Last().Length + nl.Length > maxSz)
{
sbs.Add(new StringBuilder(nl));
}
else
{
sbs.Last().Append(nl);
}
return sbs;
});
Writing the files now becomes extremely simple:
foreach (var sb in aggregateByLength)
{
File.WriteAllText(nextOutPutFile(), sb.ToString());
}
So, the full thing becomes:
static void FullDirList(
DirectoryInfo dir, string searchPattern, string excludeFolders, int maxSz, int depth)
{
var results =
from fi in GetFullDirList(dir, searchPattern, depth)
where String.IsNullOrEmpty(excludeFolders)
|| !Regex.IsMatch(fi.FullName, excludeFolders, RegexOptions.IgnoreCase)
group fi.FullName by fi.Directory.FullName;
var directoriesFound = results.Count();
var filesFound = results.SelectMany(fi => fi).Count();
var aggregateByLength =
results
.SelectMany(fi => fi)
.Aggregate(new [] { new StringBuilder() }.ToList(),
(sbs, s) =>
{
var nl = s + Environment.NewLine;
if (sbs.Last().Length + nl.Length > maxSz)
{
sbs.Add(new StringBuilder(nl));
}
else
{
sbs.Last().Append(nl);
}
return sbs;
});
foreach (var sb in aggregateByLength)
{
File.WriteAllText(nextOutPutFile(), sb.ToString());
}
}
I require help to search a text file (log file) using c# and display the line number and the complete line that contains the search keyword.
This is a slight modification from: http://msdn.microsoft.com/en-us/library/aa287535%28VS.71%29.aspx
int counter = 0;
string line;
// Read the file and display it line by line.
System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt");
while((line = file.ReadLine()) != null)
{
if ( line.Contains("word") )
{
Console.WriteLine (counter.ToString() + ": " + line);
}
counter++;
}
file.Close();
Bit late to the game on this one, but happened across this post and thought I'd add an alternative answer.
foreach (var match in File.ReadLines(#"c:\LogFile.txt")
.Select((text, index) => new { text, lineNumber = index+ 1 })
.Where(x => x.text.Contains("SEARCHWORD")))
{
Console.WriteLine("{0}: {1}", match.lineNumber, match.text);
}
This uses:
File.ReadLines, which eliminates the need for a StreamReader, and it also plays nicely with LINQ's Where clause to return a filtered set of lines from a file.
The overload of Enumerable.Select that returns each element's index, which you can then add 1 to, to get the line number for the matching line.
Sample Input:
just a sample line
another sample line
first matching SEARCHWORD line
not a match
...here's aSEARCHWORDmatch
SEARCHWORD123
asdfasdfasdf
Output:
3: first matching SEARCHWORD line
5: ...here's aSEARCHWORDmatch
6: SEARCHWORD123
To export do Excel you can use the CSV file format, like the Pessimist wrote. If you are uncertain about what to write, try entering some data in MS Excel and click on "Save As" option in the Menu and choose CSV as file type.
Take care when writing a CSV file format as in some languages the default for separating values is not the comma. In brazilian portuguese, for example, the default is comma as decimal separator, dot as thousands separator and semicolon for separating values. Mind the culture when writing that.
The other alternative is using horizontal tabs as separators. Experiment to write a string, press the TAB key and then another string and paste it into Microsoft Excel. It is the default separator in that program.
If you're using an ad-hoc solution to your specific problem, either alternatives can be used without much thinking. If you are programming something to be used by other persons (or in other environments), mind the culture specific differences.
Oh, I've just remembered now: you can write a Spreadsheet using XML, you can do that with only the .NET package. I've done that years ago with C# .NET 2.0
I had a requirement where I needed to search through a list of directories looking for particular file types, containing a specific search terms but excluding other terms.
For example let's say you wanted to look through C:\DEV and only find .cs files that have terms "WriteLine" and "Readline" but not the term "hello".
I decided to write a small c# utility to do just this:
This is how you call it:
class Program
{
//Syntax:
//FileSearch <Directory> EXT <ext1> <ext2> LIKE <TERM1> <TERM2> NOT <TERM3> <TERM4>
//Example:
//Search for all files recursively in C:\Dev with an extension of cs that contain either "WriteLine" or "Readline" but not "hello"
//FileSearch C:\DEV EXT .cs LIKE "WriteLine" "ReadLine" NOT "hello"
static void Main(string[] args)
{
if (args.Length == 0)
{
Console.WriteLine("FileSearch <Directory> EXT <EXT1> LIKE <TERM1> <TERM2> NOT <TERM3> <TERM4>");
return;
}
Search s = new Search(args);
s.DoSearch();
}
}
This is the implementation:
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
class Hit
{
public string File { get; set; }
public int LineNo { get; set; }
public int Pos { get; set; }
public string Line { get; set; }
public string SearchTerm { get; set; }
public void Print()
{
Console.WriteLine(File);
Console.Write("(" + LineNo + "," + Pos + ") ");
Console.WriteLine(Line);
}
}
class Search
{
string rootDir;
List<string> likeTerms;
List<string> notTerms;
List<string> extensions;
List<Hit> hitList = new List<Hit>();
//FileSearch <Directory> EXT .CS LIKE "TERM1" "TERM2" NOT "TERM3" "TERM4"
public Search(string[] args)
{
this.rootDir = args[0];
this.extensions = ParseTerms("EXT", "LIKE", args);
this.likeTerms = ParseTerms("LIKE", "NOT", args);
this.notTerms = ParseTerms("NOT", "", args);
Print();
}
public void Print()
{
Console.WriteLine("Search Dir:" + rootDir);
Console.WriteLine("Extensions:");
foreach (string s in extensions)
Console.WriteLine(s);
Console.WriteLine("Like Terms:");
foreach (string s in likeTerms)
Console.WriteLine(s);
Console.WriteLine("Not Terms:");
foreach (string s in notTerms)
Console.WriteLine(s);
}
private List<string> ParseTerms(string keyword, string stopword, string[] args)
{
List<string> list = new List<string>();
bool collect = false;
foreach (string arg in args)
{
string argu = arg.ToUpper();
if (argu == stopword)
break;
if (argu == keyword)
{
collect = true;
continue;
}
if(collect)
list.Add(arg);
}
return list;
}
private void SearchDir(string dir)
{
foreach (string file in Directory.GetFiles(dir, "*.*"))
{
string extension = Path.GetExtension(file);
if (extension != null && extensions.Contains(extension))
SearchFile(file);
}
foreach (string subdir in Directory.GetDirectories(dir))
SearchDir(subdir);
}
private void SearchFile(string file)
{
using (StreamReader sr = new StreamReader(file))
{
int lineNo = 0;
while (!sr.EndOfStream)
{
int pos = 0;
string term = "";
string line = sr.ReadLine();
lineNo++;
//Look through each likeTerm
foreach(string likeTerm in likeTerms)
{
pos = line.IndexOf(likeTerm, StringComparison.OrdinalIgnoreCase);
if (pos >= 0)
{
term = likeTerm;
break;
}
}
//If found make sure not in the not term
if (pos >= 0)
{
bool notTermFound = false;
//Look through each not Term
foreach (string notTerm in notTerms)
{
if (line.IndexOf(notTerm, StringComparison.OrdinalIgnoreCase) >= 0)
{
notTermFound = true;
break;
}
}
//If not term not found finally add to hitList
if (!notTermFound)
{
Hit hit = new Hit();
hit.File = file;
hit.LineNo = lineNo;
hit.Pos = pos;
hit.Line = line;
hit.SearchTerm = term;
hitList.Add(hit);
}
}
}
}
}
public void DoSearch()
{
SearchDir(rootDir);
foreach (Hit hit in hitList)
{
hit.Print();
}
}
}