What is the fastest way to get a media file's duration? - c#

I am working on a program that scans drop folders for files, and registers them to another system that requires a duration for the file. The best solution I've been able to find so far is to use MediaInfo to get the duration from the header, but for some reason it tends to take a few seconds to return a result.
Suppose I have a list of 1,000 file paths, and I want to get the duration for each one, but getting the duration takes 15 seconds. Linear iteration over the list would take just over 4 hours, and even running 8 tasks in parallel would take half an hour. With my tests, this would be the best case scenario.
I've tried using the MediaInfo DLL as well as calling the .exe, and both seemed to have similar processing times.
DLL Code:
MediaInfo MI;
public Form1()
{
InitializeComponent();
MI = new MediaInfo();
}
private void button1_Click(object sender, EventArgs e)
{
MI.Open(textBox1.Text);
MI.Option("Inform", "Video;%Duration%");
label2.Text = MI.Inform();
MI.Close();
}
Executable code:
Process proc = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = "MediaInfo.exe",
Arguments = $"--Output=Video;%Duration% \"{textBox1.Text}\"",
UseShellExecute = false,
RedirectStandardOutput = true,
CreateNoWindow = true
}
};
StringBuilder line = new StringBuilder();
proc.Start();
while (!proc.StandardOutput.EndOfStream)
{
line.Append(proc.StandardOutput.ReadLine());
}
label2.Text = line.ToString();
It should be noted that the files being processed are on a networked drive, but I have tested retrieving the duration of a local file and it was only a few seconds faster.
Note, this program has to run on Windows Server 2003 R2, which means .net 4.0 only. Most of the files I will be processing are .mov but I can't restrict it to that.

Some better code (prefer DLL call, init takes time) with options for reducing the scan duration:
MediaInfo MI;
public Form1()
{
InitializeComponent();
MI = new MediaInfo();
MI.Option("ParseSpeed", "0"); // Advanced information (e.g. GOP size, captions detection) not needed, request to scan as fast as possible
MI.Option("ReadByHuman", "0"); // Human readable strings are not needed, no noeed to spend time on them
}
private void button1_Click(object sender, EventArgs e)
{
MI.Open(textBox1.Text);
label2.Text = MI.Get(Stream_Video, "Duration"); //Note: prefer Stream_General if you want the duration of the program (here, you select the duration of the video stream)
MI.Close();
}
There are several possibilities for improving parsing time depending of your specific needs (i.e. you don't care of lot of features) but this is code to add directly to MediaInfo (e.g. for MP4/QuickTime files, getting only the duration could take less than 200 ms if I disable other features), add a feature request if you need speed.
Jérôme, developer of MediaInfo

Related

How to make FileSystemWatcher more reliable

I'm monitoring a folder on a network mapped drive using FileSystemWatcher. I know it can be janky and report repeated events and various other issues, but I'm also running into a problem of it reporting file modification before and my app not being able to open the file (as in, after receiving multiple Changed events, I try to open the file and get FileNotFoundException).
My current protection:
I keep a dictionary of per-filename timers and a list of things that were already processed:
static private Dictionary<string, Timer> fileChangeTimeout = new Dictionary<string, Timer>();
static private Mutex newFileHandlerLock = new Mutex();
static private List<(DateTime, string)> recentlyHandled = new List<(DateTime, string)>();
Modifications are registered to fire after waitTime (3s) of no changes:
private void ScheduleFileHandling(object sender, FileSystemEventArgs e){
newFileHandlerLock.WaitOne();
if(fileChangeTimeout.TryGetValue(e.FullPath, out var timer))
timer.Change(waitTime, Timeout.Infinite);
else
fileChangeTimeout[e.FullPath] = new Timer(HandleFile, e.FullPath, waitTime, Timeout.Infinite);
newFileHandlerLock.ReleaseMutex();
}
No-more-writes delay timer finally runs out:
private void HandleFile(object state)
{
newFileHandlerLock.WaitOne();
// remove the timer
// return if file in recentlyHandled with time difference < 10s
// add file to recentlyHandled
newFileHandlerLock.ReleaseMutex();
This eliminated all the issues I've had with files either being empty or not fully written. But even after all that delay I sometimes run into "file not found" situations which don't make sense. (the file is in the folder when checked manually)
Is there any way I can make this process more reliable, or do I have to resort to just retrying for a few seconds and hoping for the best?

Threading with writing to file system [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have this. It is an application for generating bank Accounts
static void Main(string[] args)
{
string path = #"G:\BankNumbers";
var bans = BankAcoutNumbers.BANS;
const int MAX_FILES = 80;
const int BANS_PER_FILE = 81818182/80;
int bansCounter = 0;
var part = new List<int>();
var maxNumberOfFiles = 10;
Stopwatch timer = new Stopwatch();
var fileCounter = 0;
if (!Directory.Exists(path))
{
DirectoryInfo di = Directory.CreateDirectory(path);
}
try
{
while (fileCounter <= maxNumberOfFiles)
{
timer.Start();
foreach (var bank in BankAcoutNumbers.BANS)
{
part.Add(bank);
if (++bansCounter >= BANS_PER_FILE)
{
string fileName = string.Format("{0}-{1}", part[0], part[part.Count - 1]);
string outputToFile = "";// Otherwise you dont see the lines in the file. Just single line!!
Console.WriteLine("NR{0}", fileName);
string subString = System.IO.Path.Combine(path, "BankNumbers");//Needed to add, because otherwise the files will not stored in the correct folder!!
fileName = subString + fileName;
foreach (var partBan in part)
{
Console.WriteLine(partBan);
outputToFile += partBan + Environment.NewLine;//Writing the lines to the file
}
System.IO.File.WriteAllText(fileName, outputToFile);//Writes to file system.
part.Clear();
bansCounter = 0;
//System.IO.File.WriteAllText(fileName, part.ToString());
if (++fileCounter >= MAX_FILES)
break;
}
}
}
timer.Stop();
Console.WriteLine(timer.Elapsed.Seconds);
}
catch (Exception)
{
throw;
}
System.Console.WriteLine("Press any key to exit.");
System.Console.ReadKey();
}
But this generates 81 million bank account records seperated over 80 files. But can I speed up the process with threading?
You're talking about speeding up a process whose bottleneck is overwhelmingly likely the file write speed. You can't really effectively parallelize writing to a single disk.
You may see slight increases in speed if you spawn a worker thread responsible for just fileIO. In other words, create a buffer, have your main thread dump contents into it while the other thread writes it to disk. It's the classic producer/consumer dynamic. I wouldn't expect serious speed gains, however.
Also keep in mind that writing to the console will slow you down, but you can keep that in the main thread and you'll probably be fine. Just make sure you put a limit on the buffer size and have the producer thread hang back when the buffer is full.
Edit: Also have a look at the link L-Three provided, using a BufferedStream would be an improvement (and probably render a consumer thread unnecessary)
Your process can be divided into two steps:
Generate an account
Save the account in the file
First step can be done in parallel as there is no dependency between accounts. That is wile creating an account number xyz you don't have to rely on data from the account xyz - 1 (as it may not yet be created).
The problematic bit is writing the data into file. You don't want several threads trying to access and write to the same file. And adding locks will likely make your code a nightmare to maintain. Other issue is that it's the writing to the file that slows the whole process down.
At the moment, in your code creating account and writing to the file happens in one process.
What you can try is to separate these processes. So First you create all the accounts and keep them in some collection. Here multi-threading can be used safely. Only when all the accounts are created you save them.
Improving the saving process will take bit more work. You will have to divide all the accounts into 8 separate collections. For each collection you create a separate file. Then you can take first collection, first file, and create a thread that will write the data to the file. The same for second collection and second file. And so on. These 8 processes can run in parallel and you do not have to worry that more than one thread will try to access same file.
Below some pseudo-code to illustrate the idea:
public void CreateAndSaveAccounts()
{
List<Account> accounts = this.CreateAccounts();
// Divide the accounts into separate batches
// Of course the process can (and shoudl) be automated.
List<List<Account>> accountsInSeparateBatches =
new List<List<Account>>
{
accounts.GetRange(0, 10000000), // Fist batch of 10 million
accounts.GetRange(10000000, 10000000), // Second batch of 10 million
accounts.GetRange(20000000, 10000000) // Third batch of 10 million
// ...
};
// Save accounts in parallel
Parallel.For(0, accountsInSeparateBatches.Count,
i =>
{
string filePath = string.Format(#"C:\file{0}", i);
this.SaveAccounts(accountsInSeparateBatches[i], filePath);
}
);
}
public List<Account> CreateAccounts()
{
// Create accounts here
// and return them as a collection.
// Use parallel processing wherever possible
}
public void SaveAccounts(List<Account> accounts, string filePath)
{
// Save accounts to file
// The method creates a thread to do the work.
}

Can this code have bottleneck or be resource-intensive?

It's code that will execute 4 threads in 15-min intervals. The last time that I ran it, the first 15-minutes were copied fast (20 files in 6 minutes), but the 2nd 15-minutes are much slower. It's something sporadic and I want to make certain that, if there's any bottleneck, it's in a bandwidth limitation with the remote server.
EDIT: I'm monitoring the last run and the 15:00 and :45 copied in under 8 minutes each. The :15 hasn't finished and neither has :30, and both began at least 10 minutes before :45.
Here's my code:
static void Main(string[] args)
{
Timer t0 = new Timer((s) =>
{
Class myClass0 = new Class();
myClass0.DownloadFilesByPeriod(taskRunDateTime, 0, cts0.Token);
Copy0Done.Set();
}, null, TimeSpan.FromMinutes(20), TimeSpan.FromMilliseconds(-1));
Timer t1 = new Timer((s) =>
{
Class myClass1 = new Class();
myClass1.DownloadFilesByPeriod(taskRunDateTime, 1, cts1.Token);
Copy1Done.Set();
}, null, TimeSpan.FromMinutes(35), TimeSpan.FromMilliseconds(-1));
Timer t2 = new Timer((s) =>
{
Class myClass2 = new Class();
myClass2.DownloadFilesByPeriod(taskRunDateTime, 2, cts2.Token);
Copy2Done.Set();
}, null, TimeSpan.FromMinutes(50), TimeSpan.FromMilliseconds(-1));
Timer t3 = new Timer((s) =>
{
Class myClass3 = new Class();
myClass3.DownloadFilesByPeriod(taskRunDateTime, 3, cts3.Token);
Copy3Done.Set();
}, null, TimeSpan.FromMinutes(65), TimeSpan.FromMilliseconds(-1));
}
public struct FilesStruct
{
public string RemoteFilePath;
public string LocalFilePath;
}
Private void DownloadFilesByPeriod(DateTime TaskRunDateTime, int Period, Object obj)
{
FilesStruct[] Array = GetAllFiles(TaskRunDateTime, Period);
//Array has 20 files for the specific period.
using (Session session = new Session())
{
// Connect
session.Open(sessionOptions);
TransferOperationResult transferResult;
foreach (FilesStruct u in Array)
{
if (session.FileExists(u.RemoteFilePath)) //File exists remotely
{
if (!File.Exists(u.LocalFilePath)) //File does not exist locally
{
transferResult = session.GetFiles(u.RemoteFilePath, u.LocalFilePath);
transferResult.Check();
foreach (TransferEventArgs transfer in transferResult.Transfers)
{
//Log that File has been transferred
}
}
else
{
using (StreamWriter w = File.AppendText(Logger._LogName))
{
//Log that File exists locally
}
}
}
else
{
using (StreamWriter w = File.AppendText(Logger._LogName))
{
//Log that File exists remotely
}
}
if (token.IsCancellationRequested)
{
break;
}
}
}
}
Something is not quite right here. First thing is, you're setting 4 timers to run parallel. If you think about it, there is no need. You don't need 4 threads running parallel all the time. You just need to initiate tasks at specific intervals. So how many timers do you need? ONE.
The second problem is why TimeSpan.FromMilliseconds(-1)? What is the purpose of that? I can't figure out why you put that in there, but I wouldn't.
The third problem, not related to multi-programming, but I should point out anyway, is that you create a new instance of Class each time, which is unnecessary. It would be necessary if, in your class, you need to set constructors and your logic access different methods or fields of the class in some order. In your case, all you want to do is to call the method. So you don't need a new instance of the class every time. You just need to make the method you're calling static.
Here is what I would do:
Store the files you need to download in an array / List<>. Can't you spot out that you're doing the same thing every time? Why write 4 different versions of code for that? This is unnecessary. Store items in an array, then just change the index in the call!
Setup the timer at perhaps 5 seconds interval. When it reaches the 20 min/ 35 min/ etc. mark, spawn a new thread to do the task. That way a new task can start even if the previous one is not finished.
Wait for all threads to complete (terminate). When they do, check if they throw exceptions, and handle them / log them if necessary.
After everything is done, terminate the program.
For step 2, you have the option to use the new async keyword if you're using .NET 4.5. But it won't make a noticeable difference if you use threads manually.
And why is it so slow...why don't you check your system status using task manager? Is the CPU high and running or is the network throughput occupied by something else or what? You can easily tell the answer yourself from there.
The problem was the sftp client.
The purpose of the console application was to loop through a list<> and download the files. I tried with winscp and, even though, it did the job, it was very slow. I also tested sharpSSH and it was even slower than winscp.
I finally ended up using ssh.net which, at least in my particular case, was much faster than both winscp and sharpssh. I think the problem with winscp is that there was no evident way of disconnecting after I was done. With ssh.net I could connect/disconnect after every file download was made, something I couldn't do with winscp.

How to set maximum number of external processes the program can start at the same time?

I need to run an external program for every PDF file in specified directory.
The problem is - how to limit the number of external program processes to user-specified value? I run it in the loop, like this:
foreach(string file in Directory.GetFiles(sourcePath))
{
Process p = new Process();
p.StartInfo.FileName = #"path\program.exe";
p.StartInfo.Arguments = previouslySetArguments;
p.Start();
}
Now the problem is that there is sometimes a really huge amount of files and with that code, all processes would be ran at the same time. It really slows the machine down.
Other idea is to put p.WaitForExit(); after the p.Start(); but then it would run only one process at a time, which on the other hand - slows down the whole work :)
What is the easiest way to limit processes number to run the exact amount of them at the same time? I want to let the user decide.
Let's say I want to run maximum 5 processes at once. So:
- first 5 processes starts in more-or-less the same time for first 5 files in the directory
- when one of them (doesn't matter, which) ends work, the next one starts - for the next file
If I were you I would look into a the producer-consumer model of queuing. It's intended to do pretty much exactly this, and there are lots of good examples that you can modify to suit your needs.
Here's an example:
C# producer/consumer
And another example:
http://msdn.microsoft.com/en-us/library/hh228601%28v=vs.110%29.aspx
(this last one is for 4.5, but still valid IMO)
OK, so I tried the answers from this question, but - strange thing - I couldn't get them to work. I admit, I was in a hurry, I could made some mistakes...
For now, the quickiest and simpliest (and ugliest) method I've found is just a loop, like the code below. It works with the test program that just calls Thread.Sleep() with given command line argument as miliseconds.
Now, please, explain me, why is it not a good solution - I assume it is not the correct way (not only because the code is ugly), even if it works with this test example.
class Program
{
// hardcoded, it's just a test:
static int activeProcesses = 0;
static int finishedProcesses = 0;
static int maxProcesses = 5;
static int runProcesses = 20;
static string file = #"c:\tmp\dummyDelay.exe";
static void Main(string[] args)
{
Random rnd = new Random();
while (activeProcesses + finishedProcesses < runProcesses)
{
if (activeProcesses < maxProcesses)
{
Process p = new Process();
p.EnableRaisingEvents = true;
p.Exited += new EventHandler(pExited);
p.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
p.StartInfo.FileName = file;
p.StartInfo.Arguments = rnd.Next(2000, 5000).ToString();
p.Start();
Console.WriteLine("Started: {0}", p.Id.ToString());
activeProcesses++;
}
}
}
static void pExited(object sender, EventArgs e)
{
Console.WriteLine("Finished: {0}", ((Process)sender).Id.ToString());
((Process)sender).Dispose();
activeProcesses--;
finishedProcesses++;
}
}

SoundPlayer.PlaySync stopping prematurely

I want to play a wav file synchronously on the gui thread, but my call to PlaySync is returning early (and prematurely stopping playback). The wav file is 2-3 minutes.
Here's what my code looks like:
//in gui code (event handler)
//play first audio file
JE_SP.playSound("example1.wav");
//do a few other statements
doSomethingUnrelated();
//play another audio file
JE_SP.playSound("example2.wav");
//library method written by me, called in gui code, but located in another assembly
public static int playSound(string wavFile, bool synchronous = true,
bool debug = true, string logFile = "", int loadTimeout = FIVE_MINUTES_IN_MS)
{
SoundPlayer sp = new SoundPlayer();
sp.LoadTimeout = loadTimeout;
sp.SoundLocation = wavFile;
sp.Load();
switch (synchronous)
{
case true:
sp.PlaySync();
break;
case false:
sp.Play();
break;
}
if (debug)
{
string writeMe = "JE_SP: \r\n\tSoundLocation = " + sp.SoundLocation
+ "\r\n\t" + "Synchronous = " + synchronous.ToString();
JE_Log.logMessage(writeMe);
}
sp.Dispose();
sp = null;
return 0;
}
Some things I've thought of are the load timeout, and playing the audio on another thread and then manually 'freeze' the gui by forcing the gui thread to wait for the duration of the sound file. I tried lengthening the load timeout, but that did nothing.
I'm not quite sure what the best way to get the duration of a wav file is without using code written by somebody who isn't me/Microsoft. I suppose this can be calculated since I know the file size, and all of the encoding properties (bitrate, sample rate, sample size, etc) are consistent across all files I intend to play. Can somebody elaborate on how to calculate the duration of a wav file using this info? That is, if nobody has an idea about why PlaySync is returning early.
Edits:
Of Note: I encountered a similar problem in VB 6 a while ago, but that was caused by a timeout, which I don't suspect to be a problem here. Shorter (< 1min) files seem to play fine, so I might decide to manually edit the longer files down, then play them separately with multiple calls.
Additional Info: I noticed that the same file stops consistently at the same time. The files were created using Audacity. Would it be possible that PlaySync is expecting a certain encoding of the files that differs from what I had Audacity produce?
Just in case anybody else runs into problems with playing a large wav file synchronously, here is a method I wrote which uses WMP as an alternative:
public static int playSoundWMP(string soundFile, bool synchronous = true)
{
Stopwatch sw = new Stopwatch();
sw.Start();
wmp.URL = soundFile;
wmp.controls.play();
Thread.Yield();
while (wmp.playState == WMPLib.WMPPlayState.wmppsTransitioning)
{
Application.DoEvents();
Thread.Yield();
}
int duration = Convert.ToInt32(wmp.currentMedia.duration * 1000);
double waitTime = wmp.currentMedia.duration;
if (synchronous)
{
Thread.Sleep(duration);
}
long elapsed = sw.ElapsedMilliseconds;
sw.Stop();
sw = null;
return (int) wmp.currentMedia.duration * 1000;
}
This method uses WMP to play an audio file instead of the SoundPlayer class, so it can play larger wav files more reliably...

Categories