Count how many files starts with the same first characters c# - c#

I want to make function that will count how many files in selected folder starts with the same 10 characters.
For example in folder will be files named File1, File2, File3 and int count will give 1 because all 3 files starts with the same characters "File", if in folder will be
File1,File2,File3,Docs1,Docs2,pdfs1,pdfs2,pdfs3,pdfs4
will give 3, because there are 3 unique values for fileName.Substring(0, 4).
I've tried something like this, but it gives overall number of files in folder.
int count = 0;
foreach (string file in Directory.GetFiles(folderLocation))
{
string fileName = Path.GetFileName(file);
if (fileName.Substring(0, 10) == fileName.Substring(0, 10))
{
count++;
}
}
Any idea how to count this?

You can try querying directory with a help of Linq:
using System.IO;
using System.Linq;
...
int n = 10;
int count = Directory
.EnumerateFiles(folderLocation, "*.*")
.Select(file => Path.GetFileNameWithoutExtension(file))
.Select(file => file.Length > n ? file.Substring(0, n) : file)
.GroupBy(name => name, StringComparer.OrdinalIgnoreCase)
.OrderByDescending(group => group.Count())
.FirstOrDefault()
?.Count() ?? 0;

You could instantiate a list of strings of files with a unique name, and check if each file is in that list or not:
int count = 0;
int length = 0;
List<string> list = new List<string>();
foreach (string file in Directory.GetFiles(folderLocation))
{
boolean inKnown = false;
string fileName = Path.GetFileName(file);
for (string s in list)
{
if (s.Length() < length)
{
// Add to known list just so that we don't check for this string later
inKnown = true;
count--;
break;
}
if (s.Substring(0, length) == fileName.Substring(0, length))
{
inKnown = true;
break;
}
}
if (!inKnown)
{
count++;
list.Add(s);
}
}
The limitation here is that you are asking if the first ten characters are the same, but your examples given showed the first 4, so just adjust the length variable according to how many characters you would like to check for.

#acornTime give me idea, his solution didn't work but this worked. Thanks for help!
List<string> list = new List<string>();
foreach (string file in Directory.GetFiles(folderLocation))
{
string fileName = Path.GetFileName(file);
list.Add(fileName.Substring(0, 10));
}
list = list.Distinct().ToList();
//count how many items are in list
int count = list.Count;

Related

How to iterate through data and create a new text file every nth entries

I'm making a list of lines that need to be added to a .txt file (with tab delimitation). The text file needs to have a maximum of 500 entries plus a header.
Right now, I have this code, which is successfully iterating through my list and creating the text file with the header. If the file already exists, it appends the lines in my list without adding the header.
I can't quite figure out how to make a new file, add the header and add each line after my first file surpasses 500 entries.
Can you help me separate in 500 line files with headers? Thank you
This is the code I have so far:
var tab = new StringBuilder();
foreach (var line in textlinestoadd)
{
tab.AppendLine(line.ToString());
}
if (!File.Exists(textcsvpath))
{
string textheader = "Vendor\tDate\tInvoice\tPO\tTax\tTotal\tAcount\tType\tJobs\tClass" + Environment.NewLine;
File.WriteAllText(textcsvpath, textheader);
}
File.AppendAllLines(textcsvpath, textlinestoadd);
This seems like a good practice opportunity so I will leave the code part as exercise!
The basic idea is simple. Whenever you wrote 500 lines just reset and write to a new file
here is a high level pseudo code
Initialize StringBuilder sb
For each line do
Add line to sb
if line count == 500 then
save to file
reset sb
reset line count
update filename = next file
end if
End For
//writes the last chunk if # of lines is not multiple of 500
if line count is not 0 then
save to file
end if
I'd try something like this.
var tab = new StringBuilder();
int lineCount = 0;
string textheader = "Vendor\tDate\tInvoice\tPO\tTax\tTotal\tAcount\tType\tJobs\tClass" + Environment.NewLine;
if (File.Exists(textcsvpath)) {
FileStream fs = File.OpenRead(textcsvpath);
string[] fileContent = File.ReadAllLines(textcsvpath);
lineCount = fileContent.Length - 1; // assume the first line is the header
}
foreach (var line in textlinestoadd)
{
tab.AppendLine(line.ToString());
lineCount++;
if (lineCount > 0 && lineCount % 500 == 0)
{
if (!File.Exists(textcsvpath))
{
File.WriteAllText(textcsvpath, textheader);
}
File.AppendAllText(textcsvpath, tab.ToString());
tab.Clear();
textcsvpath = "some-new-file-name";
}
}
if (!File.Exists(textcsvpath))
{
File.WriteAllText(textcsvpath, textheader);
}
File.AppendAllText(textcsvpath, tab.ToString());
You'll need to do something to determine the new file name as you add a new file.
I'd do something like this:
const int limit = 500;
int iteration = 0;
string textHeader = "Vendor\tDate\tInvoice\tPO\tTax\tTotal\tAcount\tType\tJobs\tClass" + Environment.NewLine;
while(iteration * limit < textLinesToAdd.Count())
{
string fullPath = Path.Combine(filePath, $"{fileName}.{iteration}", extension);
IEnumerable<string> linesToAdd = textLinesToAdd.Skip(iteration++ * limit).Take(limit);
File.Create(fullPath);
File.WriteAllText(fullPath, textHeader);
File.AppendAllLines(fullPath, linesToAdd);
}
Define that filename as foo and the extension as bar, and you'll get a sequence of files called foo.0.bar, foo.1.bar, foo.2.bar and so on.
I'm assuming we want to create a file with the specified name, and then have some integer placed between the name and extension that increments every time a new file is created.
One way to do this would be to have a method that takes in a filePath string, a list of lines to write, a header string, and the maximum number of lines allowed per file. Then it could parse the directory of the file path, looking for a pattern related to the file name.
It would determine what the latest file name should be based on the contents of the directory and the number of lines in the last file that matches our pattern, then would write to that file until it was full, and then continue creating new files until the lines were all written.
Here's a sample class that can do that, where I added some helper methods to get a file's number, increment that number in the name, get the latest file from a directory, and write lines to the file. It also implements IComparer<string> so that we can pass it to OrderByDescending to easily sort the files we're interested in.
public class FileWriterHelper : IComparer<string>
{
public int Compare(string x, string y)
{
// Compare null
if (x == null) return y == null ? 0 : 1;
if (y == null) return -1;
// Compare count of parts split on '.'
var xParts = x.Split('.');
var yParts = y.Split('.');
if (xParts.Length < 3) return yParts.Length < 3 ? 0 : -1;
if (yParts.Length < 3) return 1;
// Compare numeric portion
int xNum, yNum;
if (int.TryParse(xParts[1], out xNum) &&
int.TryParse(yParts[1], out yNum))
{
return xNum.CompareTo(yNum);
}
// Unknown values
return string.Compare(x, y, StringComparison.Ordinal);
}
private static int? GetFileNumber(string fileName)
{
if (string.IsNullOrWhiteSpace(fileName)) return null;
var fileParts = fileName.Split('.');
int fileNum;
if (fileParts.Length < 3 || !int.TryParse(fileParts[1], out fileNum)) return null;
return fileNum;
}
private static string IncrementNumber(string fileName)
{
var number = GetFileNumber(fileName).GetValueOrDefault() + 1;
var fileParts = fileName.Split('.');
return $"{fileParts[0]}.{number}.{fileParts[fileParts.Length - 1]}";
}
private static string GetLatestFile(string filePath, int maxLines)
{
var fileDir = Path.GetDirectoryName(filePath);
var fileName = Path.GetFileNameWithoutExtension(filePath);
var fileExt = Path.GetExtension(filePath);
var latest = Directory.GetFiles(fileDir, $"{fileName}*{fileExt}")
.OrderByDescending(f => f, new FileWriterHelper())
.FirstOrDefault() ?? filePath;
return File.Exists(latest) && File.ReadAllLines(latest).Length >= maxLines
? Path.Combine(fileDir, IncrementNumber(Path.GetFileName(latest)))
: latest;
}
public static void WriteLinesToFile(string filePath, string header,
List<string> lines, int maxFileLines)
{
while ((lines?.Count ?? 0) > 0 && maxFileLines > 0)
{
var latestFile = GetLatestFile(filePath, maxFileLines);
if (!File.Exists(latestFile)) File.CreateText(latestFile).Close();
var lineCount = File.ReadAllLines(latestFile).Length;
if (lineCount == 0 && header != null)
{
File.WriteAllText(latestFile, string.Concat(header, Environment.NewLine));
lineCount = 1;
}
var numLinesToWrite = maxFileLines - lineCount;
File.AppendAllLines(latestFile, lines.Take(numLinesToWrite));
lines = lines.Skip(numLinesToWrite).ToList();
}
}
}
That was a bit of work, but now to use it is really simple:
private static void Main()
{
// Generate 5000 lines to write
var fileLines = Enumerable.Range(0, 5000).Select(i => $"Line number {i}").ToList();
// File path with base file name
var filePath = #"f:\public\temp\temp.csv";
// This should create 10 files
FileWriterHelper.WriteLinesToFile(filePath,
"HEADER: This should be the first line in each file.", fileLines, 500);
GetKeyFromUser("\nDone! Press any key to exit...");
}
If you run that once, it will create 10 files (because of the number of lines we're generating and the max number of lines per file we specified). And if you run it again, it will create 10 more, since we're using the same path and file name pattern, it recognizes the previous files that were in the location.
I'm sure it could use some work, but hopefully it's a start!

C# : How to save a zip file every X files

I have a program written in C# which should save a zip file every n records (like 500).
My idea was using the mod operator (%) and where the result of the operation is zero then write the file. Which is good, but: what if I have 520 records? I should write 500 files inside the first zip and then 20 file on the second one.
Here the code:
using (ZipFile zip = new ZipFile())
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.Level8;
zip.CompressionMethod = CompressionMethod.Deflate;
int indexrow = 0;
foreach(DataRow row in in_dt.Rows)
{
zip.AddFile(row["Path"].ToString(),"prova123");
if(indexrow % 500 == 0)
{
using (var myZipFile = new FileStream("c:\\tmp\\partial_"+indexrow.ToString()+".zip", FileMode.Create))
{
zip.Save(myZipFile);
}
indexrow = indexrow++;
}
}
}
}
in_dt is a datatable which contains all the file paths on filesystem.
zip object is an object based on the dotnetzip library.
I'd use LINQ for this problem:
// Define the group size
const int GROUP_SIZE = 500;
// Select a new object type that encapsulates the base item
// and a new property called "Grouping" that will group the
// objects based on their index relative to the group size
var groups = in_dt
.Rows
.AsEnumerable()
.Select(
(item, index) => new {
Item = item,
Index = index,
Grouping = Math.Floor(index / GROUP_SIZE)
}
)
.GroupBy(item => item.Grouping)
;
// Loop through the groups
foreach (var group in groups) {
// Generate a zip file for each group of files
}
For files 0 through 499, the Grouping property is 0.
For files 500 - 520, the Grouping property is 1.
What you probably want to do is something like this:
zipFiles(File[] Files, int MaxFilesInZip)
{
int Parts = Files.Count / MaxFilesInZip;
int Remaning = Files.Count % MaxFilesInZip;
for(int i = 0; i < Parts; i++)
//New zip
for(int u = 0; u < MaxFilesInZip; u++)
//Add Files[i*MaxFilesInZip + u]
//New Zip
//Add 'Remaning' amount of files
}
This way if you run the function like ths: zipFiles(520, 250), you would have 2*250 zip files and 1*20 with the remaning. You might have to work something with value on Parts (Floor/Celling).

Split text file every 120,000 Lines?

So I have a textfile I need to split every 120,000, when it's split at the 120,000th line I need the rest to into another text file. Any ideas on this guys?
You can use Batch from MoreLINQ to group your lines into batches of 120,000 lines, which can then each be handles separately.
foreach(var batch in File.ReadLines(inputFile).Batch(120000))
WriteToFile(batch);
var lines = new List<string>();
int counter = 0,i = 1;
string line;
using (var reader = new StreamReader("filePath"))
{
while ((line = reader.ReadLine()) != null)
{
lines.Add(line);
counter++;
if (counter == 120000)
{
string fileName = String.Format("file{0}.txt",i);
File.WriteAllLines(fileName,lines);
lines.Clear();
counter = 0;
i++;
}
}
}
if(lines.Count > 0) File.WriteAllLines("path", lines);
Note: You should use different file names when using the File.WriteAllLines, otherwise you will just overwrite a single file's content.For example you can use another counter for it and increment it for every file, "file1, file2 etc..".
Just another way using Enumerable.GroupBy and "integer division groups":
int batchSize = 120000;
var fileGroups = File.ReadLines(path)
.Select((line, index) => new { line, index })
.GroupBy(x => x.index / batchSize)
.Select((group, index) => new {
Path = Path.Combine(dir, string.Format("FileName_{0}.txt", index + 1)),
Lines = group.Select(x => x.line)
});
foreach (var file in fileGroups)
File.WriteAllLines(file.Path, file.Lines);

Group files in a directory based on their prefix

I have a folder with pictures:
Folder 1:
Files:
ABC-138923
ABC-3223
ABC-33489
ABC-3111
CBA-238923
CBA-1313
CBA-1313
DAC-38932
DAC-1111
DAC-13893
DAC-23232
DAC-9999
I want to go through this folder and count how many of each picture pre-fix I have.
For example, there are 4 pictures of pre-fix ABC and 3 pictures of pre-fix CBA above.
I'm having a hard time trying to figure out how to loop through this. Anyone can give me a hand?
Not a loop, but more clear and readable:
string[] fileNames = ...; //some initializing code
var prefixes = fileNames.GroupBy(x => x.Split('-')[0]).
Select(y => new {Prefix = y.Key, Count = y.Count()});
Upd:
To display the count for each prefix:
foreach (var prefix in prefixes)
{
Console.WriteLine("Prefix: {0}, Count: {1}", prefix.Prefix, prefix.Count);
}
Here it is with a 'foreach' loop:
var directoryPath = ".\Folder1\";
var prefixLength = 3;
var accumulator = new Dictionary<string, int>();
foreach (var file in System.IO.Directory.GetFiles(directoryPath)) {
var prefix = filefile.Replace(directoryPath, string.Empty).Substring(0, prefixLength);
if (!accumulator.ContainsKey(prefix))
{
accumulator.Add(prefix, 0);
}
accumulator[prefix]++;
}
foreach(var prefix in accumulator.Keys) {
Console.WriteLine("{0}: {1}", prefix, accumulator[prefix]);
}
in C#,
using System.IO;
using System.Collections.Generic;
...
DirectoryInfo dir = new DirectoryInfo("C:\\yourfolder");
FileInfo[] files = dir.GetFiles();
List<string> prefix = new List<string>();
List<int> count = new List<int>();
foreach (FileInfo file in files)
{
if (prefix.Count > 0)
{
Boolean AddNew = true;
for (int i = 0; i < prefix.Count; i++)
{
if (file.Name.Substring(0, 3) == prefix[i])
{
count[i]++;
AddNew = false;
}
}
if (AddNew)
{
prefix.Add(file.Name.Substring(0, 3));
count.Add(1);
}
}
else
{
prefix.Add(file.Name.Substring(0, 3));
count.Add(1);
}
}
...
The prefix string list is parallel to the count list, so to access you could loop through the array. I haven't tested or optimized it, but if you're heading down this route (c#) this could be a start.
The algorithm:
Create a dictionary:
Dictionary<string, int> D;
Loop through the directory using:
foreach (var file in System.IO.Directory.GetFiles(dir))
...
Complete the following 3 steps for each file:
Extract the prefix and see if a matching key exists in D. If TRUE, go to step 3.
Insert the prefix as a new key in D, with value 0
Increment the key's value by 1
To display results when the entire directory has been processed:
foreach (KeyValuePair<string, int> pair in D)
Console.WriteLine("{0} prefix has {1} files", pair.Key, pair.Value);

How to count number of Excel files from a folder using c#?

I need to count the number of excel files,pdf files from a directory.
I have Counted the total number of files from a directory using
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(#"D:\");
int count = dir.GetFiles().Length;
Any Suggestion?
Here's a LINQ solution.
var extensions = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
".xls",
".xlsx",
".pdf",
};
var baseDir = #"D:\";
var count = Directory.EnumerateFiles(baseDir)
.Count(filename =>
extensions.Contains(Path.GetExtension(filename)));
Use SearchPattern in GetFiles method.
dir.GetFiles("*.XLS");
int count = 0;
foreach (string file in Directory.GetFiles(#"D:\"))
{
if (file.EndsWith(".pdf") || file.EndsWith(".xls"))
{
count++;
}
}
String[] excelFiles=Directory.GetFiles("C:\\", "*.xls");
int count = Directory.GetFiles(path).Count(f =>(f.EndsWith(".xls") || f.EndsWith(".xlsx")));
simple
int count = dir.GetFiles("*.txt").Length + dir.GetFiles("*.pdf").Length
var count = System.IO.Directory.GetFiles(#"D:\")
.Count(p => Path.GetExtension(p) == ".xls");

Categories