Split audio file into pieces - c#

I´m trying to split an audio file in some pieces.
The fact is: I have a byte array and I would like do split the wav file into some random pieces (3 for example).
Of course, I know that I can´t do something like this. But does anyone have an idea on how to do it?
byte[] result = stream.ToArray();
byte[] testing = new byte[44];
for (int ix = 0; ix < testing.Length; ix++)
{
testing[ix] = result[ix];
}
System.IO.File.WriteAllBytes("yourfilepath_" + System.Guid.NewGuid() + ".wav", testing);
I would like do build this solution in C# but I heard that there is a lib called Sox and I can split with silence gap like this:
sox in.wav out.wav silence 1 0.5 1% 1 5.0 1% : newfile : restart
But everytime I run this command, only one file is generated. (audio file lasts 5 seconds, and each splitted file must have something aroung 1 second).
What is the best way to do this?
Thank you very much!

EDIT
With SOX:
string sox = #"C:\Program Files (x86)\sox-14-4-1\sox.exe";
string inputFile = #"D:\Brothers Vibe - Rainforest.mp3";
string outputDirectory = #"D:\splittest";
string outputPrefix = "split";
int[] segments = { 10, 15, 30 };
IEnumerable<string> enumerable = segments.Select(s => "trim 0 " + s.ToString(CultureInfo.InvariantCulture));
string #join = string.Join(" : newfile : ", enumerable);
string cmdline = string.Format("\"{0}\" \"{1}%1n.wav" + "\" {2}", inputFile,
Path.Combine(outputDirectory, outputPrefix), #join);
var processStartInfo = new ProcessStartInfo(sox, cmdline);
Process start = System.Diagnostics.Process.Start(processStartInfo);
If SOX complains about libmad (for MP3) : copy DLLs next to it, see here
Alternatively you can use FFMPEG in the same manner :
ffmpeg -ss 0 -t 30 -i "Brothers Vibe - Rainforest.mp3" "Brothers Vibe - Rainforest.wav"
(see the docs for all the details)
You can do that easily with BASS.NET :
For the code below you pass in :
input file name
desired duration for each segment
output directory
prefix to use for each segment file
The method will check whether the file is long enough for the specified segments, if yes then it will cut the file to WAVs with the same sample rate, channels, bit depth.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Windows.Forms;
using Un4seen.Bass;
using Un4seen.Bass.Misc;
namespace WindowsFormsApplication2
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
if (!Bass.BASS_Init(-1, 44100, BASSInit.BASS_DEVICE_DEFAULT, IntPtr.Zero))
throw new InvalidOperationException("Couldn't initialize BASS");
string fileName = #"D:\Brothers Vibe - Rainforest.mp3";
var segments = new double[] {30, 15, 20};
string[] splitAudio = SplitAudio(fileName, segments, "output", #"D:\split");
}
private static string[] SplitAudio(string fileName, double[] segments, string prefix, string outputDirectory)
{
if (fileName == null) throw new ArgumentNullException("fileName");
if (segments == null) throw new ArgumentNullException("segments");
if (prefix == null) throw new ArgumentNullException("prefix");
if (outputDirectory == null) throw new ArgumentNullException("outputDirectory");
int i = Bass.BASS_StreamCreateFile(fileName, 0, 0,
BASSFlag.BASS_STREAM_PRESCAN | BASSFlag.BASS_STREAM_DECODE);
if (i == 0)
throw new InvalidOperationException("Couldn't create stream");
double sum = segments.Sum();
long length = Bass.BASS_ChannelGetLength(i);
double seconds = Bass.BASS_ChannelBytes2Seconds(i, length);
if (sum > seconds)
throw new ArgumentOutOfRangeException("segments", "Required segments exceed file duration");
BASS_CHANNELINFO info = Bass.BASS_ChannelGetInfo(i);
if (!Directory.Exists(outputDirectory)) Directory.CreateDirectory(outputDirectory);
int index = 0;
var list = new List<string>();
foreach (double segment in segments)
{
double d = segment;
long seconds2Bytes = Bass.BASS_ChannelSeconds2Bytes(i, d);
var buffer = new byte[seconds2Bytes];
int getData = Bass.BASS_ChannelGetData(i, buffer, buffer.Length);
string name = string.Format("{0}_{1}.wav", prefix, index);
string combine = Path.Combine(outputDirectory, name);
int bitsPerSample = info.Is8bit ? 8 : info.Is32bit ? 32 : 16;
var waveWriter = new WaveWriter(combine, info.chans, info.freq, bitsPerSample, true);
waveWriter.WriteNoConvert(buffer, buffer.Length);
waveWriter.Close();
list.Add(combine);
index++;
}
bool free = Bass.BASS_StreamFree(i);
return list.ToArray();
}
}
}
TODO
The extraction is not optimized, if you are concerned with memory usage, then the function should be enhanced to grab parts of a segments and write them progressively to the WaveWriter.
Notes
BASS.NET has a nag screen, but you can request for a free registration serial at their website.
Note, install BASS.NET then make sure to copy bass.dll from the base package next to your EXE. Also, you can use pretty much any audio formats, see their website for formats plugins and how to load them (BASS_PluginLoad).

Related

multithreading vs async for directory traverse [duplicate]

I've written the following routine to manually traverse through a directory and calculate its size in C#/.NET:
protected static float CalculateFolderSize(string folder)
{
float folderSize = 0.0f;
try
{
//Checks if the path is valid or not
if (!Directory.Exists(folder))
return folderSize;
else
{
try
{
foreach (string file in Directory.GetFiles(folder))
{
if (File.Exists(file))
{
FileInfo finfo = new FileInfo(file);
folderSize += finfo.Length;
}
}
foreach (string dir in Directory.GetDirectories(folder))
folderSize += CalculateFolderSize(dir);
}
catch (NotSupportedException e)
{
Console.WriteLine("Unable to calculate folder size: {0}", e.Message);
}
}
}
catch (UnauthorizedAccessException e)
{
Console.WriteLine("Unable to calculate folder size: {0}", e.Message);
}
return folderSize;
}
I have an application which is running this routine repeatedly for a large number of folders. I'm wondering if there's a more efficient way to calculate the size of a folder with .NET? I didn't see anything specific in the framework. Should I be using P/Invoke and a Win32 API? What's the most efficient way of calculating the size of a folder in .NET?
No, this looks like the recommended way to calculate directory size, the relevent method included below:
public static long DirSize(DirectoryInfo d)
{
long size = 0;
// Add file sizes.
FileInfo[] fis = d.GetFiles();
foreach (FileInfo fi in fis)
{
size += fi.Length;
}
// Add subdirectory sizes.
DirectoryInfo[] dis = d.GetDirectories();
foreach (DirectoryInfo di in dis)
{
size += DirSize(di);
}
return size;
}
You would call with the root as:
Console.WriteLine("The size is {0} bytes.", DirSize(new DirectoryInfo(targetFolder));
...where targetFolder is the folder-size to calculate.
DirectoryInfo dirInfo = new DirectoryInfo(#strDirPath);
long dirSize = await Task.Run(() => dirInfo.EnumerateFiles( "*", SearchOption.AllDirectories).Sum(file => file.Length));
I do not believe there is a Win32 API to calculate the space consumed by a directory, although I stand to be corrected on this. If there were then I would assume Explorer would use it. If you get the Properties of a large directory in Explorer, the time it takes to give you the folder size is proportional to the number of files/sub-directories it contains.
Your routine seems fairly neat & simple. Bear in mind that you are calculating the sum of the file lengths, not the actual space consumed on the disk. Space consumed by wasted space at the end of clusters, file streams etc, are being ignored.
public static long DirSize(DirectoryInfo dir)
{
return dir.GetFiles().Sum(fi => fi.Length) +
dir.GetDirectories().Sum(di => DirSize(di));
}
The real question is, what do you intend to use the size for?
Your first problem is that there are at least four definitions for "file size":
The "end of file" offset, which is the number of bytes you have to skip to go from the beginning to the end of the file.
In other words, it is the number of bytes logically in the file (from a usage perspective).
The "valid data length", which is equal to the offset of the first byte which is not actually stored.
This is always less than or equal to the "end of file", and is a multiple of the cluster size.
For example, a 1 GB file can have a valid data length of 1 MB. If you ask Windows to read the first 8 MB, it will read the first 1 MB and pretend the rest of the data was there, returning it as zeros.
The "allocated size" of a file. This is always greater than or equal to the "end of file".
This is the number of clusters that the OS has allocated for the file, multiplied by the cluster size.
Unlike the case where the "end of file" is greater than the "valid data length", The excess bytes are not considered to be part of the file's data, so the OS will not fill a buffer with zeros if you try to read in the allocated region beyond the end of the file.
The "compressed size" of a file, which is only valid for compressed (and sparse?) files.
It is equal to the size of a cluster, multiplied by the number of clusters on the volume that are actually allocated to this file.
For non-compressed and non-sparse files, there is no notion of "compressed size"; you would use the "allocated size" instead.
Your second problem is that a "file" like C:\Foo can actually have multiple streams of data.
This name just refers to the default stream. A file might have alternate streams, like C:\Foo:Bar, whose size is not even shown in Explorer!
Your third problem is that a "file" can have multiple names ("hard links").
For example, C:\Windows\notepad.exe and C:\Windows\System32\notepad.exe are two names for the same file. Any name can be used to open any stream of the file.
Your fourth problem is that a "file" (or directory) might in fact not even be a file (or directory):
It might be a soft link (a "symbolic link" or a "reparse point") to some other file (or directory).
That other file might not even be on the same drive. It might even point to something on the network, or it might even be recursive! Should the size be infinity if it's recursive?
Your fifth is that there are "filter" drivers that make certain files or directories look like actual files or directories, even though they aren't. For example, Microsoft's WIM image files (which are compressed) can be "mounted" on a folder using a tool called ImageX, and those do not look like reparse points or links. They look just like directories -- except that the're not actually directories, and the notion of "size" doesn't really make sense for them.
Your sixth problem is that every file requires metadata.
For example, having 10 names for the same file requires more metadata, which requires space. If the file names are short, having 10 names might be as cheap as having 1 name -- and if they're long, then having multiple names can use more disk space for the metadata. (Same story with multiple streams, etc.)
Do you count these, too?
var size = new DirectoryInfo("E:\\").GetDirectorySize();
and here's the code behind this Extension method
public static long GetDirectorySize(this System.IO.DirectoryInfo directoryInfo, bool recursive = true)
{
var startDirectorySize = default(long);
if (directoryInfo == null || !directoryInfo.Exists)
return startDirectorySize; //Return 0 while Directory does not exist.
//Add size of files in the Current Directory to main size.
foreach (var fileInfo in directoryInfo.GetFiles())
System.Threading.Interlocked.Add(ref startDirectorySize, fileInfo.Length);
if (recursive) //Loop on Sub Direcotries in the Current Directory and Calculate it's files size.
System.Threading.Tasks.Parallel.ForEach(directoryInfo.GetDirectories(), (subDirectory) =>
System.Threading.Interlocked.Add(ref startDirectorySize, GetDirectorySize(subDirectory, recursive)));
return startDirectorySize; //Return full Size of this Directory.
}
More faster! Add COM reference "Windows Script Host Object..."
public double GetWSHFolderSize(string Fldr)
{
//Reference "Windows Script Host Object Model" on the COM tab.
IWshRuntimeLibrary.FileSystemObject FSO = new IWshRuntimeLibrary.FileSystemObject();
double FldrSize = (double)FSO.GetFolder(Fldr).Size;
Marshal.FinalReleaseComObject(FSO);
return FldrSize;
}
private void button1_Click(object sender, EventArgs e)
{
string folderPath = #"C:\Windows";
Stopwatch sWatch = new Stopwatch();
sWatch.Start();
double sizeOfDir = GetWSHFolderSize(folderPath);
sWatch.Stop();
MessageBox.Show("Directory size in Bytes : " + sizeOfDir + ", Time: " + sWatch.ElapsedMilliseconds.ToString());
}
It appears, that following method performs your task faster, than recursive function:
long size = 0;
DirectoryInfo dir = new DirectoryInfo(folder);
foreach (FileInfo fi in dir.GetFiles("*.*", SearchOption.AllDirectories))
{
size += fi.Length;
}
A simple console application test shows, that this loop sums files faster, than recursive function, and provides the same result. And you probably want to use LINQ methods (like Sum()) to shorten this code.
this solution works very well.
it's collecting all the sub folders:
Directory.GetFiles(#"MainFolderPath", "*", SearchOption.AllDirectories).Sum(t => (new FileInfo(t).Length));
An alternative to Trikaldarshi's one line solution. (It avoids having to construct FileInfo objects)
long sizeInBytes = Directory.EnumerateFiles("{path}","*", SearchOption.AllDirectories).Sum(fileInfo => new FileInfo(fileInfo).Length);
I've been fiddling with VS2008 and LINQ up until recently and this compact and short method works great for me (example is in VB.NET; requires LINQ / .NET FW 3.5+ of course):
Dim size As Int64 = (From strFile In My.Computer.FileSystem.GetFiles(strFolder, _
FileIO.SearchOption.SearchAllSubDirectories) _
Select New System.IO.FileInfo(strFile).Length).Sum()
Its short, it searches sub-directories and is simple to understand if you know LINQ syntax. You could even specify wildcards to search for specific files using the third parameter of the .GetFiles function.
I'm not a C# expert but you can add the My namespace on C# this way.
I think this way of obtaining a folder size is not only shorter and more modern than the way described on Hao's link, it basically uses the same loop-of-FileInfo method described there in the end.
This it the best way to calculate the size of a directory. Only other way would still use recursion but be a bit easier to use and isn't as flexible.
float folderSize = 0.0f;
FileInfo[] files = Directory.GetFiles(folder, "*", SearchOption.AllDirectories);
foreach(FileInfo file in files) folderSize += file.Length;
I extended #Hao's answer using the same counting principal but supporting richer data return, so you get back size, recursive size, directory count, and recursive directory count, N levels deep.
public class DiskSizeUtil
{
/// <summary>
/// Calculate disk space usage under <paramref name="root"/>. If <paramref name="levels"/> is provided,
/// then return subdirectory disk usages as well, up to <paramref name="levels"/> levels deep.
/// If levels is not provided or is 0, return a list with a single element representing the
/// directory specified by <paramref name="root"/>.
/// </summary>
/// <returns></returns>
public static FolderSizeInfo GetDirectorySize(DirectoryInfo root, int levels = 0)
{
var currentDirectory = new FolderSizeInfo();
// Add file sizes.
FileInfo[] fis = root.GetFiles();
currentDirectory.Size = 0;
foreach (FileInfo fi in fis)
{
currentDirectory.Size += fi.Length;
}
// Add subdirectory sizes.
DirectoryInfo[] dis = root.GetDirectories();
currentDirectory.Path = root;
currentDirectory.SizeWithChildren = currentDirectory.Size;
currentDirectory.DirectoryCount = dis.Length;
currentDirectory.DirectoryCountWithChildren = dis.Length;
currentDirectory.FileCount = fis.Length;
currentDirectory.FileCountWithChildren = fis.Length;
if (levels >= 0)
currentDirectory.Children = new List<FolderSizeInfo>();
foreach (DirectoryInfo di in dis)
{
var dd = GetDirectorySize(di, levels - 1);
if (levels >= 0)
currentDirectory.Children.Add(dd);
currentDirectory.SizeWithChildren += dd.SizeWithChildren;
currentDirectory.DirectoryCountWithChildren += dd.DirectoryCountWithChildren;
currentDirectory.FileCountWithChildren += dd.FileCountWithChildren;
}
return currentDirectory;
}
public class FolderSizeInfo
{
public DirectoryInfo Path { get; set; }
public long SizeWithChildren { get; set; }
public long Size { get; set; }
public int DirectoryCount { get; set; }
public int DirectoryCountWithChildren { get; set; }
public int FileCount { get; set; }
public int FileCountWithChildren { get; set; }
public List<FolderSizeInfo> Children { get; set; }
}
}
public static long GetDirSize(string path)
{
try
{
return Directory.EnumerateFiles(path).Sum(x => new FileInfo(x).Length)
+
Directory.EnumerateDirectories(path).Sum(x => GetDirSize(x));
}
catch
{
return 0L;
}
}
As far as the best algorithm goes you probably have it right. I would recommend that you unravel the recursive function and use a stack of your own (remember a stack overflow is the end of the world in a .Net 2.0+ app, the exception can not be caught IIRC).
The most important thing is that if you are using it in any form of a UI put it on a worker thread that signals the UI thread with updates.
To improve the performance, you could use the Task Parallel Library (TPL).
Here is a good sample: Directory file size calculation - how to make it faster?
I didn't test it, but the author says it is 3 times faster than a non-multithreaded method...
Directory.GetFiles(#"C:\Users\AliBayat","*",SearchOption.AllDirectories)
.Select (d => new FileInfo(d))
.Select (d => new { Directory = d.DirectoryName,FileSize = d.Length} )
.ToLookup (d => d.Directory )
.Select (d => new { Directory = d.Key,TotalSizeInMB =Math.Round(d.Select (x =>x.FileSize)
.Sum () /Math.Pow(1024.0,2),2)})
.OrderByDescending (d => d.TotalSizeInMB).ToList();
Calling GetFiles with SearchOption.AllDirectories returns the full name of all the files in all the subdirectories of the specified directory. The OS represents the size of files in bytes. You can retrieve the file’s size from its Length property. Dividing it by 1024 raised to the power of 2 gives you the size of the file in megabytes. Because a directory/folder can contain many files, d.Select(x => x.FileSize) returns a collection of file sizes measured in megabytes. The final call to Sum() finds the total size of the files in the specified directory.
Update: the filterMask="." does not work with files without extension
Multi thread example to calculate directory size from Microsoft Docs, which would be faster
using System;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
public class Example
{
public static void Main()
{
long totalSize = 0;
String[] args = Environment.GetCommandLineArgs();
if (args.Length == 1) {
Console.WriteLine("There are no command line arguments.");
return;
}
if (! Directory.Exists(args[1])) {
Console.WriteLine("The directory does not exist.");
return;
}
String[] files = Directory.GetFiles(args[1]);
Parallel.For(0, files.Length,
index => { FileInfo fi = new FileInfo(files[index]);
long size = fi.Length;
Interlocked.Add(ref totalSize, size);
} );
Console.WriteLine("Directory '{0}':", args[1]);
Console.WriteLine("{0:N0} files, {1:N0} bytes", files.Length, totalSize);
}
}
// The example displaysoutput like the following:
// Directory 'c:\windows\':
// 32 files, 6,587,222 bytes
This example only calculate the files in current folder, so if you want to calculate all the files recursively, you can change the
String[] files = Directory.GetFiles(args[1]);
to
String[] files = Directory.GetFiles(args[1], "*", SearchOption.AllDirectories);
The fastest way that I came up is using EnumerateFiles with SearchOption.AllDirectories. This method also allows updating the UI while going through the files and counting the size. Long path names don't cause any problems since FileInfo or DirectoryInfo are not tried to be created for the long path name. While enumerating files even though the filename is long the FileInfo returned by the EnumerateFiles don't cause problems as long as the starting directory name is not too long. There is still a problem with UnauthorizedAccess.
private void DirectoryCountEnumTest(string sourceDirName)
{
// Get the subdirectories for the specified directory.
long dataSize = 0;
long fileCount = 0;
string prevText = richTextBox1.Text;
if (Directory.Exists(sourceDirName))
{
DirectoryInfo dir = new DirectoryInfo(sourceDirName);
foreach (FileInfo file in dir.EnumerateFiles("*", SearchOption.AllDirectories))
{
fileCount++;
try
{
dataSize += file.Length;
richTextBox1.Text = prevText + ("\nCounting size: " + dataSize.ToString());
}
catch (Exception e)
{
richTextBox1.AppendText("\n" + e.Message);
}
}
richTextBox1.AppendText("\n files:" + fileCount.ToString());
}
}
This .NET core command line app here calculates directory sizes for a given path:
https://github.com/garethrbrown/folder-size
The key method is this one which recursively inspects sub-directories to come up with a total size.
private static long DirectorySize(SortDirection sortDirection, DirectoryInfo directoryInfo, DirectoryData directoryData)
{
long directorySizeBytes = 0;
// Add file sizes for current directory
FileInfo[] fileInfos = directoryInfo.GetFiles();
foreach (FileInfo fileInfo in fileInfos)
{
directorySizeBytes += fileInfo.Length;
}
directoryData.Name = directoryInfo.Name;
directoryData.SizeBytes += directorySizeBytes;
// Recursively add subdirectory sizes
DirectoryInfo[] subDirectories = directoryInfo.GetDirectories();
foreach (DirectoryInfo di in subDirectories)
{
var subDirectoryData = new DirectoryData(sortDirection);
directoryData.DirectoryDatas.Add(subDirectoryData);
directorySizeBytes += DirectorySize(sortDirection, di, subDirectoryData);
}
directoryData.SizeBytes = directorySizeBytes;
return directorySizeBytes;
}
}
In this link https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/size-property-filesystemobject-object there is a description of how to get the folder size directly using Visual Basic, without having to get a list of files and loop over them to add up their lengths.
Sub ShowFolderSize(filespec)
Dim fs, f, s
Set fs = CreateObject("Scripting.FileSystemObject")
Set f = fs.GetFolder(filespec)
s = UCase(f.Name) & " uses " & f.size & " bytes."
MsgBox s, 0, "Folder Size Info"
End Sub
In a c# project, you can also add a reference to Microsoft Scripting and use the FileSystemObject.
Here is a routine for c# to use this method to output the sizes of all the folders in the given path. It recurses up to a specified level, examining subfolders whose size is larger than average in an attempt to find where storage use problems are caused.
using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
namespace ShowFolderSizes
{
public class ShowFolderSizesMain
{
double GBFactor = 1024.0 * 1024.0 * 1024.0;
Scripting.FileSystemObject fileSystemObject = new Scripting.FileSystemObject();
public static void Main(string[] args)
{
ShowFolderSizesMain instance = new ShowFolderSizesMain();
instance.Run(args);
}
void Run(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: ShowFolderSizes path levels");
return;
}
string path = args[0];
if (!Int32.TryParse(args[1], out int levels))
{
Console.WriteLine($"Can't interpret {args[1]} as an integer.");
return;
}
writeFolderSizes(path, levels);
//Console.WriteLine("Press any key to continue...");
//Console.ReadKey();
}
public void writeFolderSizes(string topPath, int levels)
{
List<string> folderNames;
try
{
folderNames = new List<string>(Directory.GetDirectories(topPath));
}
catch (System.UnauthorizedAccessException e)
{
Console.WriteLine($"Can't access {topPath}");
return;
}
if (folderNames.Count == 0)
{
return;
}
var dic = new Dictionary<string, long>();
double sum = 0.0;
foreach (string folderPath in folderNames)
{
Scripting.Folder folder = fileSystemObject.GetFolder(folderPath);
try
{
dynamic dsize = folder.Size;
long size = Convert.ToInt64(dsize);
dic.Add(folderPath, size);
sum += Convert.ToDouble(size);
}
catch (System.Security.SecurityException e)
{
Console.WriteLine($"Can't access {folderPath}");
dic.Remove(folderPath);
}
}
sum = sum / GBFactor;
double avg = (sum / folderNames.Count);
Console.WriteLine($"{topPath} {sum.ToString("0.000")} GB:");
var sortedResults = (
from KeyValuePair<string, long> kvp in dic
orderby kvp.Value descending
select kvp);
foreach (KeyValuePair<string, long> kvp in sortedResults)
{
double gb = Convert.ToDouble(kvp.Value) / GBFactor;
Console.WriteLine($"{gb.ToString("000.000")} GB {kvp.Key}");
}
Console.WriteLine();
if (levels > 0)
{
long cutoff = Convert.ToInt64(avg * GBFactor);
var foldersToRecurse = (
from KeyValuePair<string, long> kvp in dic
where kvp.Value >= cutoff
orderby kvp.Value descending
select kvp.Key);
int nextLevel = levels - 1;
foreach (string folderPath in foldersToRecurse)
{
writeFolderSizes(folderPath, nextLevel);
}
}
}
}
}
For it to be really useful, it often needs to be run as administrator, since trying to access folders like C:\Program files or C:\Users causes it to go into the "catch" parts with my normal user.
I try to change the sample (Alexandre Pepin and hao's Answer)
As is
private long GetDirectorySize(string dirPath)
{
if (Directory.Exists(dirPath) == false)
{
return 0;
}
DirectoryInfo dirInfo = new DirectoryInfo(dirPath);
long size = 0;
// Add file sizes.
FileInfo[] fis = dirInfo.GetFiles();
foreach (FileInfo fi in fis)
{
size += fi.Length;
}
// Add subdirectory sizes.
DirectoryInfo[] dis = dirInfo.GetDirectories();
foreach (DirectoryInfo di in dis)
{
size += GetDirectorySize(di.FullName);
}
return size;
}
To be
private long GetDirectorySize2(string dirPath)
{
if (Directory.Exists(dirPath) == false)
{
return 0;
}
DirectoryInfo dirInfo = new DirectoryInfo(dirPath);
long size = 0;
// Add file sizes.
IEnumerable<FileInfo> fis = dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories);
foreach (FileInfo fi in fis)
{
size += fi.Length;
}
return size;
}
finally you can check the result
// ---------------------------------------------
// size of directory
using System.IO;
string log1Path = #"D:\SampleDirPath1";
string log2Path = #"D:\SampleDirPath2";
string log1DirName = Path.GetDirectoryName(log1Path);
string log2DirName = Path.GetDirectoryName(log2Path);
long log1Size = GetDirectorySize(log1Path);
long log2Size = GetDirectorySize(log2Path);
long log1Size2 = GetDirectorySize2(log1Path);
long log2Size2 = GetDirectorySize2(log2Path);
Console.WriteLine($#"{log1DirName} Size: {SizeSuffix(log1Size)}, {SizeSuffix(log1Size2)}
{log2DirName} Size: {SizeSuffix(log2Size)}, {SizeSuffix(log2Size2)}");
and this is the SizeSuffix function
private static readonly string[] SizeSuffixes =
{ "bytes", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB" };
/// <summary>
/// Size Display
/// </summary>
/// <param name="value">bytes 數值</param>
/// <param name="decimalPlaces">小數位數</param>
/// <returns></returns>
public static string SizeSuffix(Int64 value, int decimalPlaces = 2)
{
if (decimalPlaces < 0) { throw new ArgumentOutOfRangeException("decimalPlaces"); }
if (value < 0) { return "-" + SizeSuffix(-value); }
if (value == 0) { return string.Format("{0:n" + decimalPlaces + "} bytes", 0); }
// mag is 0 for bytes, 1 for KB, 2, for MB, etc.
int mag = (int)Math.Log(value, 1024);
// 1L << (mag * 10) == 2 ^ (10 * mag)
// [i.e. the number of bytes in the unit corresponding to mag]
decimal adjustedSize = (decimal)value / (1L << (mag * 10));
// make adjustment when the value is large enough that
// it would round up to 1000 or more
if (Math.Round(adjustedSize, decimalPlaces) >= 1000)
{
mag += 1;
adjustedSize /= 1024;
}
return string.Format("{0:n" + decimalPlaces + "} {1}",
adjustedSize,
SizeSuffixes[mag]);
}
I know this not a .net solution but here it comes anyways. Maybe it comes handy for people that have windows 10 and want a faster solution. For example if you run this command con your command prompt or by pressing winKey + R:
bash -c "du -sh /mnt/c/Users/; sleep 5"
The sleep 5 is so you have time to see the results and the windows does not closes
In my computer that displays:
Note at the end how it shows 85G (85 Gigabytes). It is supper fast compared to doing it with .Net. If you want to see the size more accurately remove the h which stands for human readable.
So just do something like Processes.Start("bash",... arguments) That is not the exact code but you get the idea.

Variable not adding up other variable values

I am trying to read from each line in a file and grab specific strings and integers. However the values of the found integers are not added up at the end and i'm unsure why. I apologize if this is a simple error.
If a line in the file contains "Event Type: Music", store "Music" in the EventType[] array using MusicTrace. Music trace begins at 0 and increments each time the string above is found. So it works its way down the array. the array size is the amount of lines in the file to ensure there is always enough array space.
I have another Array for attendance named EventAttendance[] which does the same steps above, but cuts the first 18 characters from the found line giving the remaining number (the line in the file is a fixed length). AttendanceTrace is used in the same manner a the above MusicTrace.
I then have a loop for the EventAttendance array which uses i and starts at 0 and carries out code until the EventAttendance.Length property is reached. The code adds up the total attendance from each EventAttendance[] index using i
The code is below:
private void frmActivitiesSummary_Load(object sender, EventArgs e)
{
if (File.Exists(sVenueName.ToString() + ".txt"))
{
using (StreamReader RetrieveEvents = new StreamReader(sVenueName.ToString() + ".txt")) //Create a new file with the name of the username variable
{
string[] ReadLines = File.ReadAllLines(sVenueName + ".txt"); //Read File
int MusicTrace = 0;
int AttendanceTrace = 0;
string[] EventType = new string[ReadLines.Length]; //Store found event types
int[] EventAttendance = new int[ReadLines.Length]; //Store Event Attendance
string line; //Declare String to store line
using (StreamReader file = new StreamReader(sVenueName + ".txt")) //Using StreamReader
{
while (!file.EndOfStream)
{
line = file.ReadToEnd();
//Get All Music Event to Array
if (line.Contains("Event Type: Music"))
{
EventType[MusicTrace] = "Music"; //[0] = Music
if (MusicTrace != 0)
MusicTrace = MusicTrace + 1;
else
MusicTrace = 1;
}
//Get All attendances to Array
if (line.Contains("People Attending:"))
{
line.Remove(0, 18);
int ConvertedLine = Convert.ToInt32(line);
EventAttendance[AttendanceTrace] = ConvertedLine; //[0] = 10
if (AttendanceTrace != 0)
AttendanceTrace = AttendanceTrace + 1;
else
AttendanceTrace = 1;
}
}
}
//for each array index and if array index contains music, add this to total amount of music events
for (int i = 0; i <= EventAttendance.Length; i++)
{
if (EventAttendance[i] > 0)
{
if (iMusicAttendance > 0)
iMusicAttendance = iMusicAttendance + EventAttendance[i];
else
iMusicAttendance = EventAttendance[i];
}
}
}
}
}
The Attendance is then show on the click on a button:
private void btnShow_Click(object sender, EventArgs e)
{
lblMusicOutput.Text = "After analysis, we can see that Music Events have a total attendance of " + iMusicAttendance;
lblArtOutput.Text = "After Analysis, we can see that Events have a total Attenance of " + iArtAttendance;
lblDance.Text = "After Analysis, we can see that Dance Events have a total Attenance of " + iDanceAttendance;
lblTheatreOutput.Text = "After Analysis, we can see that Theatre Events have a total Attenance of " + iTheatreAttendance;
}
There where a several useless variables inside your code, that I took the liberty to remove. I also changed arrays for List<T> in order to use Linq.
You were adding a Convert.ToIn32 with the full line, because String.Remove() doesn't change the object it's called on but return a new string that you have to assign to something : line = line.Remove(0, 18);
Also, you were doing useless checks for the counters:
if (MusicTrace != 0)
MusicTrace = MusicTrace + 1;
else
MusicTrace = 1;
is the same than
MusicTrace++;
which leads us to:
if (!File.Exists(sVenueName.ToString() + ".txt"))
return;
List<String> EventType = new List<string>(); //Store found event types
List<int> EventAttendance = new List<int>(); //Store Event Attendance
using (StreamReader file = new StreamReader(sVenueName + ".txt")) //Using StreamReader
{
while (!file.EndOfStream)
{
var line = file.ReadLine(); //Declare String to store line
//Get All Music Event to Array
if (line.Contains("Event Type: Music"))
{
EventType.Add("Music"); //[0] = Music
}
//Get All attendances to Array
if (line.Contains("People Attending:"))
{
line = line.Remove(0, 18);
EventAttendance.Add(Convert.ToInt32(line)); //[0] = 10
}
}
}
//for each array index and if array index contains music, add this to total amount of music events
iMusicAttendance = EventAttendance.Sum();
Please change :
while (!file.EndOfStream)
{
line = file.ReadToEnd();
to
while (!file.EndOfStream)
{
line = file.ReadLine();
explanation:
You are reading the entire file at once, then you check once your two conditions. But you want to read line by line. So you need to use ReadLine.
As for the rest, you declare but never use the StreamReader RetrieveEvents. You can get rid of it.
You can use List<T> to store the read information. This way you get more flexibility into your code. And the Sum can be calculated without a loop.
EDIT:
I took the liberty to cut down your programm a little. The Code below should do exactly what you describe in your post:
string[] allLines = File.ReadAllLines(sVenueName + ".txt");
List<string> EventType = allLines.Where(x => x.Contains("Event Type: Music"))
.Select(x => x = "Music").ToList();
List<int> EventAttendance = allLines.Where(x => x.Contains("People Attending:"))
.Select(x => Convert.ToInt32(x.Remove(0,18))).ToList();
int iMusicAttendance = EventAttendance.Sum();
EDIT2:
seeing your file content it becomes obvious that you want only to sum up the attending people of the music event, but in your approach you sum up all attending people of all event.
Looking at your file it seems you have an offset of 3 lines. So I would suggest, to get all indices of the Music lines and then grab only the numbers that are 3 lines further:
List<string> allLines = File.ReadAllLines("input.txt").ToList();
List<int> indices = Enumerable.Range(0, allLines.Count)
.Where(index => allLines[index].Contains("Event Type: Music"))
.Select(x => x+=3).ToList();
List<int> EventAttendance = allLines.Where(x => indices.Contains(allLines.IndexOf(x))).Select(x => Convert.ToInt32(x.Remove(0,18))).ToList();
int iMusicAttendance = EventAttendance.Sum();
This will get you the sum of only the music people ;) hop it helps.

System out of memory exception large txt file into an array [duplicate]

This question already has answers here:
Read Big TXT File, Out of Memory Exception
(6 answers)
Closed 6 years ago.
The following code works fine with small txt files , but if we have large txt files its giving outofmemory exception at string[] array = File.ReadAllLines("hash.txt");
hash.txt file is a 500 mb
I tried few suggestions from internet but i didn't get that worked.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
namespace Hash_Parser
{
internal class Program
{
private static List<string> users = new List<string>();
private static Dictionary<string, int> hash_original = new Dictionary<string, int>();
private static List<string> hash_found = new List<string>();
private static List<string> pass = new List<string>();
private static string hash_path = "split.txt";
private static void split()
{
Console.WriteLine("Splitting...");
StreamWriter streamWriter = new StreamWriter("user.txt");
StreamWriter streamWriter2 = new StreamWriter("hash.txt");
string[] array = File.ReadAllLines(Program.hash_path);
for (int i = 0; i < array.Length; i++)
{
string text = array[i];
string[] array2 = text.Split(new char[]
{
':'
}, 2);
if (array2.Count<string>() >= 2)
{
streamWriter.WriteLine(array2[0]);
streamWriter2.WriteLine(array2[1]);
}
}
streamWriter.Close();
streamWriter2.Close();
Console.WriteLine("Saved as user.txt and hash.txt");
}
private static void populate()
{
Console.WriteLine("Populating lists...");
Program.users.AddRange(File.ReadAllLines("user.txt"));
Program.pass.AddRange(File.ReadAllLines("pass.txt"));
Program.hash_found.AddRange(File.ReadAllLines("found.txt"));
int num = 0;
string[] array = File.ReadAllLines("hash.txt");
for (int i = 0; i < array.Length; i++)
{
string key = array[i];
Program.hash_original.Add(key, num);
num++;
}
}
private static void seek()
{
StreamWriter streamWriter = new StreamWriter("userpass.txt");
int num = 0;
int num2 = 100;
foreach (string current in Program.hash_found)
{
if (Program.hash_original.ContainsKey(current))
{
streamWriter.WriteLine(Program.users[Program.hash_original[current]] + ":" + Program.pass[num]);
}
num++;
if (num >= num2)
{
Console.Title = string.Concat(new object[]
{
"Processed: ",
num,
" : ",
Program.hash_found.Count
});
num2 += 1000;
}
}
Console.Title = string.Concat(new object[]
{
"Processed: ",
num,
" : ",
Program.hash_found.Count
});
streamWriter.Close();
}
private static void Main(string[] args)
{
Console.WriteLine("Split hash /split");
Console.WriteLine("'split.txt'\n");
Console.WriteLine("Parse hashes /parse");
Console.WriteLine("'user.txt' | 'found.txt' | 'hash.txt' | 'pass.txt'");
string a = Console.ReadLine();
if (a == "/split")
{
Program.split();
}
else
{
if (a == "/parse")
{
Program.populate();
Console.WriteLine("Processing...");
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
Program.seek();
stopwatch.Stop();
Console.WriteLine("Saved as userpass.txt");
Console.WriteLine("Time elapsed: " + stopwatch.Elapsed);
Console.ReadKey();
}
}
}
}
}
Thnaks for ur help.
Try this code :
foreach (var line in File.ReadLines(_filePath))
{
//Don't put "line" into a list or collection.
//Just make your processing on it.
}
Quoted Text: Just use File.ReadLines which returns an IEnumerable and doesn't load all the lines at once to the memory.
Quote Link : https://stackoverflow.com/a/13416225/3041974
I hope it helps.
Please be aware of process limits in .NET
http://www.codeproject.com/Articles/483475/Memory-Limits-in-a-NET-Process
For instance, a 32 bit system cannot have more than 4 GB of physical memory. Needless to say that 2^32 will give you a virtual address space with 4.294.967.296 different entries, and that’s precisely where the 4GB limit comes from. But even having those 4GB available on the system, your application will actually be able to see 2GB only. Why?
Because on 32 bits systems, Windows splits the virtual address space
into two equal parts: one for User Mode applications, and another one
for the Kernel (system applications). This behavior can be overridden
by using the "/3gb" flag in the Windows boot.ini config file. If we do
so, the system will then reserve 3GB for user applications, and 1 GB
for the kernel.
What is the process MEM Usage in Task Manager?

Importing files with streamReader.ReadBlock (buffer)

Needed to import a large number of text files and find some research material, particularly for my problem, I decided to post the solution here. I believe it will help someone else.
My files are registries of 3,000,000 up. Tried to read line by line, with StreamReader.ReadLine(), but it was impractical. Moreover, the files are too large to loads them in memory.
The solution was to load files in memory in blocks (buffers) using the streamReader.ReadBlock().
The difficulty I had was that the ReadBlock() reads byte-by-byte, occurring in a row or get another half. Then the next buffer the first line was incomplete. To correct, I load a string (resto) and concatenate with the 1st line (primeiraLinha) of the next buffer.
Another important detail in using the Split, in most examples the 1st verification of variables are followed Trim() to eliminate spaces. In this case I do not use because it joined the 1st and 2nd line buffer.
using System;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication2
{
class Program
{
static void Main()
{
const string arquivo = "Arquivo1.txt";
using (var streamReader = new StreamReader(arquivo))
{
int deslocamento = 1000;
int pStart = 0; // buffer starting position
int pEnd = deslocamento; // buffer end position
string resto = "";
for (int i = pStart; i < int.MaxValue; i += pStart)
{
string primeiraLinha;
char[] buffer = new char[pEnd-pStart];
streamReader.ReadBlock(buffer, 0, buffer.Length);
var bufferString = new String(buffer);
string[] bufferSplit = null;
bufferSplit = bufferString.Split(new char[] { '\n' });
foreach (var bs in bufferSplit )
{
if (bs != "")
{
if (resto != "")
{
primeiraLinha = resto + bs;
Console.WriteLine(primeiraLinha);
resto = "";
}
else
{
if (bs.Contains('\r'))
{
Console.WriteLine(bs);
}
else
{
resto = bs;
}
}
}
}
Console.ReadLine();
// Moves pointers
pStart = pEnd;
pEnd += deslocamento;
if (bufferString == null)
break;
}
}
}
}
}
I had a great help from my friend training, Gabriel Gustaf, the resolution of this problem.
If anyone has any suggestions to further improve the performance, or to make any comments, feel free.
C# have a designed class to work with large files: MemoryMappedFile. It's simple and I think could help you.

reading and writing multiple files at the same time and performing same tasks on them

I am a beginner to programming. I wrote a code in C# to open a single file (that has 4 columns of data) and extract the fourth column into a list. Then did some basic work on the data to extract the mean, minimum and maximum values of the data set. Then, the results was written to dedicated files for the mean, minimum and maximum values.
Now I want to repeat the same tests but for a multiple sets of files - each with over 100,000 lines of data. I want to enable the program to read a multiple set of files in the same folder and then do the same calculations for each file and compile all the results for mean, minimum and maximum values into separate folders, as before.
The code for the single file is as follows;
private void button1_Click_1(object sender, EventArgs e)
{
string text = "";
DialogResult result = openFileDialog1.ShowDialog(); // Show the dialog.
// create a list to insert the data into
List<float> noise = new List<float>();
int count = 0;
float sum = 0;
float mean = 0;
float max = 0;
float min = 100;
TextWriter tw = new StreamWriter("c:/Users/a3708906/Documents/Filereader - 13062012/Filereader/date.txt");
if (result == DialogResult.OK) // Test result.
{
string file = openFileDialog1.FileName;
FileInfo src = new FileInfo(file);
TextReader reader = src.OpenText();
text = reader.ReadLine();
// while the text being read in from reader.Readline() is not null
while (text != null)
{
text = reader.ReadLine();
if (text != null)
{
string[] words = text.Split(',');
noise.Add(Convert.ToSingle(words[3]));
// write text to a file
tw.WriteLine(text);
//foreach (string word in words)
//{
// tw.WriteLine(word);
//}
}
}
}
tw.Close();
TextWriter tw1 = new StreamWriter("c:/Users/a3708906/Documents/Filereader - 13062012/Filereader/noise.txt");
foreach (float ns in noise)
{
tw1.WriteLine(Convert.ToString(ns));
count++;
sum += ns;
mean = sum/count;
float min1 = 0;
if (ns > max)
max = ns;
else if (ns < max)
min1 = ns;
if (min1 < min && min1 >0)
min = min1;
else
min = min;
}
tw1.Close();
TextWriter tw2 = new StreamWriter("c:/Users/a3708906/Documents/Filereader - 13062012/Filereader/summarymeans.txt");
tw2.WriteLine("Mean Noise");
tw2.WriteLine("==========");
tw2.WriteLine("mote_noise 2: {0}", Convert.ToString(mean));
tw2.Close();
TextWriter tw3 = new StreamWriter("c:/Users/a3708906/Documents/Filereader - 13062012/Filereader/summarymaximums.txt");
tw3.WriteLine("Maximum Noise");
tw3.WriteLine("=============");
tw3.WriteLine("mote_noise 2: {0}", Convert.ToString(max));
tw3.Close();
TextWriter tw4 = new StreamWriter("c:/Users/a3708906/Documents/Filereader - 13062012/Filereader/summaryminimums.txt");
tw4.WriteLine("Minimum Noise");
tw4.WriteLine("=============");
tw4.WriteLine("mote_noise 2: {0}", Convert.ToString(min));
tw4.Close();
}
I will be grateful if someone could help to translate this code for working with multiple files. Thank you.
Wrap your logic for processing a single file into a single Action or a void-returning function, then enumerate the files, switch them to ParallelEnumerable and call Parallel.ForAll
For example, if you made an Action or function named DoStuff(string filename) which will do the process for a single file, you can then call it with :
Directory.EnumerateFiles(dialog.SelectedPath).AsParallel().ForAll(doStuff);
Your current code will work if you simply use Directory.GetFiles() properly. The easiest way to do it would be to have three inputs; one to get the Directory, and a second to get the file extension (if wanted), and a checkbox to ask whether or not you want to recursively search the folders or not.
Then instead of
string file = openFileDialog1.FileName;
you would instead have something like
//ensure the default fileExtensionDropdown.SelectedValue is "*"
string[] filePaths;
if(chkRecursiveSearch.IsChecked == true)
filePaths = Directory.GetFiles(dlgFolderBrowser.SelectedPath, #"*"+ddlFileExtension.SelectedValue, SearchOption.AllDirectories);
else
filePaths = Directory.GetFiles(dlgFolderBrowser.SelectedPath, #"*"+ddlFileExtension.SelectedValue);
Then you can use:
for(string path in filePaths){ // do things }
to handle each file path the way you are right now.
Please note the code I've put here is definitely not as idiomatic and tidy as it could be, but since you said you were a beginner I decided to be a bit more clear. If requested I'll put up a more idiomatic take on things, though if we do that we should probably clean up your initial code a bit as well.

Categories