Audio Sync problems using DirectShow.NET

Audio Sync problems using DirectShow.NET - c#

I have started a thread on this at DirectShow.NET's forum, here is the link http://sourceforge.net/projects/directshownet/forums/forum/460697/topic/5194414/index/page/1
but unfortunately the problem still persists...
I have an application that captures video from a webcam and audio from the microphone and save it to a file, for some reason the audio and video are never in-sync, i tried the following:
1. Started with ffdshow encoder and changed to AVI Mux - problem persists, audio is delayed and at the end of the video the picture remains frozen and the audio continues
2. Changed from AVI Mux to WM ASF Writer - video is frozen at the beginning (2 seconds) and rest of video is in-sync (but the two first seconds are not usable)
3. create SampleGrabber that prints the timestamp for both audio and video - saw that the audio timestamp is 500ms earlier but I have no idea what to do with this fact...
4. tried manually setting the ReferenceClock to one of the capture filters (audio/video) but both won't cast to IReferenceClock
5. Created a SystemClock and set it has the ReferenceClock - made no difference
6. Set SyncUsingStreamOffset(true) on the grap - timestamps are much closer now but the final result is the same
7. Tried saving the audio and video to two different files and used VirtualDub to see if they match, they still dont...
Oh i forgot to mention I also tried building the graph in GraphEditPlus but the problem still remains, here's a link to the graph: http://www2.picturepush.com/photo/a/8030745/img/8030745.png
Currently I am testing all my changes on the CapWMV sample from DirectShow.NET's samples.
Please any advice would be highly appreciated, I am hopeless :/
Thanks,
Eran.
Update:
It seems there's a constant 500ms gap between the audio and video, if I use virtualDub to delay the audio by 500ms it looks fine, how can set this in the graph?

You are having latency on the audio stream equal to size of capture buffer. That is, you obtain the full buffer which started being captured 0.5 seconds away. You need to use smaller buffers and/or apply offset on the buffers to adjust the latency.
See:
Minimizing audio capture latency in DirectShow
How to eliminate 1 second delay in DirectShow filter chain? (Using Delphi and DSPACK)
IAMBufferNegotiation is the keyword.

Just wanted to add the solution for my situation, maybe it will help someone.
I was trying to record video from a webcam together with audio from a microphone, video is HD (1080p) so I wanted to save an AVI file encoded in MPEG4, so I used ffshow-tryous (free Mpeg4 encoder) together with an Avi Mux Filter, the problem was that some (well most of them :) ) of my videos had sync issues.
What I discovered was that Avi Mux does not handle synchronization, it assumes the data arrives at the appropriate time (written here - http://msdn.microsoft.com/en-us/library/dd407208(v=vs.85).aspx), so I tried using WMAsfWriter which does handle synchronization and it worked fine (The 2 seconds freeze I mentioned above was just a glitch with VLC Player) but it doesn't work good with high resolutions and I had trouble using it with custom profiles (filters won't get connected).
I also tried Roman's suggestion and although the links were very interesting and promising (I really recommend reading them - can't give +1 to a post yet...) it just didn't made any difference :/
My final solution was to give up on MPEG4 and just use MPEG2, I switched from Avi Mux to Microsoft MPEG2 Encoder which works great, should have thought about that long time ago :)
Hopefully this will help someone else.
Thanks,
Eran.

I had the same problem rendering video from WMV to AVI using Xvid MPEG-4 decoder.
My final solution without giving up MPEG-4 was to configure the AviMuxer setting ConfigAviMux::SetMasterStream property
As explained in the Capturing Video to an AVI File article from MSDN configuration:
If you are capturing audio and video from two separate devices, it is a
good idea to make the audio stream the master stream. This helps to
prevent drift between the two streams, because the AVI Mux filter
adjust the playback rate on the video stream to match the audio
stream.
Example Code :
IConfigAviMux _filterAVIMuxerCfg = (IConfigAviMux)_filterAVIMuxer;
_filterAVIMuxerCfg.SetMasterStream(0); // I've add first audio ;)

Related

IP Camera continuous snapshots vs. video

I'm making a project using c# 2013, windows forms and this project will use an IP camera to display a video for a long time using CGI Commands.
I know from the articles I've read that the return of the streaming video of the IP camera is a continuous multi-part stream. and I found some samples to display the video like this one Writing an IP Camera Viewer in C# 5.0
but I see a lot of code to extract the single part that represents a single image and displays it and so on.
Also I tried to take continuous snap shots from the camera using the following code.
HttpWebRequest req=(HttpWebRequest)WebRequest.Create("http://192.168.1.200/snap1080");
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
Stream strm = res.GetResponseStream();
image.Image = Image.FromStream(strm);
and I repeated this code in a loop that remains for a second and counts the no. of snapshots that were taken in a second and it gives me a number between 88 and 114 snapshots per second
IMHO the first example that displays the video makes a lot of processing to extract the single part of the multi-part response and displays it which may be as slow as the other method of taking a continuous snapshots.
So I ask for other developers' experiences in this issue if they see other difference between the 2 methods of displaying the video. Also I want to know the effect of receiving a continuous multi-part stream on the memory is it safe or will generate an out of memory errors.
Thanks in advance

If you are taking more than 1 jpeg per 1-3 seconds, better capture H264 video stream, it will take less bandwidth and cpu.
Usually mjpeg stream is 10-20 times bigger than the same h264 stream. So 80 snapshots per second is a really big amount.

As long as you dispose of the image and stream correctly, you should not have memory issues. I have done a similar thing in the past with an IP Camera, even converting all the images that I take as a snapshot back into a video using ffmpeg (I think it was).

Raw H264 to JPEG, in C#

I get a stream of frames, an initial SPS and PPS h264 data packet and then packets for the I and P frames.
Using c# .NET I want to convert into a series of JPEGs? Has anyone done this?
I have tried AForge.NET FFMpeg wrapper, but can only go from MP4 file to JPEGs?
Also looked at DirectShow.
I can't seem to find an examples that even come close to doing this?
Thanks

In Directshow, it sounds like you need to introduce a SampleGrabber filter into your filter graph. Insert this after the H264 decoder.
This is a pass through filter which can receive a callback containing each video frame. You can then obviously choose what you do with the frame, is save it to disk as a jpeg etc.
Rather than regurgitate MSDN, there is a great page explaining its usage here:
https://msdn.microsoft.com/en-us/library/windows/desktop/dd407288(v=vs.85).aspx
Update taking account of Roman's comments:
SampleGrabber is deprecated as its part of Directshow Editing Services. However, it is quite self contained. If it was removed from a later version of Windows it would be straight forward to replace with an alternative filter. I still use it in one of my consumer applications. Roman is correct though - it has a steep learning curve.

C# and Audio Generation/Playback

I’m making an audio synthesizer and I’m having issues figuring out what to use for audio playback. I’m using physics and math to calculate the source waveforms and then need to feed that waveform to something which can play it as sound. I need something that can 1) play the waveforms I calculate and 2) play multiple sounds simultaneously (like holding one key down on a piano while pressing other keys). I’ve done a fair bit of research into this and I can’t find something that does both of those things. As far as I know, I have 5 potential options:
DirectSound. It can take a waveform (a short[]) as a parameter and play it as sound, and can play multiple sounds simultaneously. But it won’t work with .NET 4.5.
System.Media.SoundPlayer. It works with .NET 4.5 and has better quality audio than Direct Sound, but it has to play sound from a .wav file and cannot play multiple sounds at once (nor can multiple instances of SoundPlayer). I ‘trick’ SoundPlayer into working by translating my waveform into .wav format in memory and then send SoundPlayer a MemoryStream of the in-memory .wav file. Could I potentially achieve control over the playback by altering the stream? I cannot append bytes to the stream (I tried) but I could potentially make the stream an arbitrary size and just re-write all the bytes in the stream with the next segment of audio data every time the end of the stream is reached.
System.Windows.Controls.MediaElement. I have not experimented with this yet, but from MSDNs documentation I don’t see a way to send it a waveform in memory without saving it to disk first and then reading it; I don’t think I can send it a stream.
System.Windows.Controls.MediaPlayer. I have not experimented with this either, but the documentation says it’s meant to be used as a companion to some kind of animation. I could potentially use this without doing any real (user-perceivable) animation to achieve my desired effect.
An open source solution. I’m hesitant to use an open source solution as I find they are typically poorly documented and not very maintainable, but I am open to ideas if there is one out there that is well documented and can do what I need.
Can anyone offer me any guidance on this or how to create flexible audio playback?

http://naudio.codeplex.com , without a doubt. Mark is a regular here on SO, the product is well alive, there are good code examples.
It works. We built some great stuff with it.

Capture a DVB-T Stream to a movie-file

I have a form with a liveview of the tv-signal (from dvb-t stick). I've the sampleproject "DTViewer" from http://directshownet.sourceforge.net/about.html.
Now I try to capture the stream to a movie-file by clicking a button, but how?
I use C# and DirectShow.NET.
I tried to search in many sampleprojcets but these are made for videoinputs not a dvb-t stick with a BDA (Broadcast Driver Architecture) interface.
Help!

Don’t really know what exactly do you mean by a “movie-file”, but I can tell you how to capture the entire MUX (transport stream). Create a graph with a Microsoft DVBT Network Provider, You_Name_It BDA DVBT Tuner, You_Name_It BDA Digital Capture and MPEG-2 Demultiplexer filters. Once you connect them, enumerate all output pins on the MPEG-2 Demultiplexer and render them. Tune the frequency of your choice (put_TuneRequest). At this point everything is ready to run the graph, but don’t run it! Enumerate all filters in the graph. Disconnect all filters except Microsoft DVBT Network Provider, You_Name_It BDA DVBT Tuner and You_Name_It BDA Digital Capture. Remove all these disconnected filters from the graph except the MPEG-2 Demultiplexer (it has to be in the graph although it is not connected). Add Sample Grabber filter and NULL Renderer filter. Connect Digital Capture filter to Sample Grabber and Sample Grabber to NULL Renderer. You can run the graph now. Through the callback in Sample Grabber filter you will receive the entire MUX. Of course, there is still some work to demux the data, but once you do that, you can capture all TV programs in one MUX at once. The easiest way is to capture it in a TS format because the TS is being broadcasted (188 bytes long packets).

It seems to me VLC has BDA support (BDA.c file reference), maybe you can snoop up something from their code?

There is no simple answer to your question. I have started one such project and have found out that there is very little I know about it, so here is little something from my research.
First, you'll have to understand that dvb-t tuner card or stick doesn't give video frames in the classical sense, but the decoding is done in the pc, on the cpu. External card will provide you with compressed data only, as it fetches it from the air.
Next - data that is delivered to you will be in MPEG2 or MPEG4 Transport Stream format. Which is suitable for streaming or broadcasting, not for saving to file. VLC is able to play TS written to the file, but to record a proper video file, you'll have to either transcode the file or repack it to Program Stream. Google it a little, you'll find the differences.
More - one frequency on the air consists of many channels, and that channel packing is called 'mux'. So - from the BDA tuner/capturer you'll get ALL data, and you'll have to demux it manually or let BDA demuxer do it for you.
Hope that's enough info to get you started, I can post you some interesting links when I get to the real keyboard.

Recording SWF and Converting to FLV

I have tonss of videos in database and they can't be accessed directly but I can play them one by one and can record them. Now I want to write a program (probably in C#) that will get a URL and will start Internet Explorer or any other default browser to start the link. Once the link will be started, video will be playing.
Now my job is to record the video for "x" seconds along with audio. I can record the video by taking screenshots very frequently but what about audio and it's quality? Do I need to put microphone in a sound proof room attached with speaker so that I record it or I can directly pull the audio off from audio interface card before letting it toward the speakers?
Any ideas?
Umair

This is not wise at all. You will have a huge quality loss recording video from screen and re-encoding it, not to mention the time this will take.
You should find a way to access those videos directly from the database, and run them through a converter like ffmpeg.

What about using rtmpdump to retrieve the video stream as a .flv? Of course, you will need to parse the stream information from the respective web pages, but that should be manageable.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.