Decode H264 frames C#

Decode H264 frames C# - c#

I am using Managed Media Aggregation in C# - https://net7mma.codeplex.com/.
I have a Rtsp Client that receives RTP Frames encoded in h264 (payload type 96).
I want to be able to save the frames into a video file, and also be bale to tell when the video starts\ends.
I did some reading and I read that its a problem to decode h264 frames one-by-one.. didn't really understand why.
Here is the method that is raised for each RTP frame that I receive
void Client_RtpFrameChanged(object sender, Media.Rtp.RtpFrame frame)
{
// Decode
}
Can someone explain why its a problem to decode h264 frames one-by-one?
Is there a open source/library/dll for this?
Thanks a lot!

There is an included class in the RtspServer project.
The class is RFC6184Media, it contains methods for packetization and depacketiation and handles all defined Nal Unit types.
After you call Depacketize there is a Buffer which contains the Raw Bit Stream Payload, you will have to add a start code consisting of 0x000001 and then the data contained in the raw bitstream.
There are several examples in the Discussion area for the project.
After that you can feed the stream to a decoder for decoding and only then can the frames can be displayed; usually by conversion from Yuv to Rgb respective to the sub sampling used when encoding.
I can see about adding a small demo for a few static packets which corresponds to a frame and show how to achieve the desired result.
In the future if you make a discussion on the project page I will probably get to it much quicker.

Related

extracting single frame from MediaElemet or FFmpegInterop

I am writing app (Windows Phone 8.1 Store App) that allows user to connect to IP Camera. I am using FFmpeg Interop library for ffmpeg which allows me to play eg. rtsp streams in media element. I need now a way to somehow extract a single frame from stream or from media element.
I have tested other application wchih allows connecting to IP cameras - IP Centcom, and they have working snapshots only for mjpeg streams as far as I now (they were not working for rtsp). Becouse of that I belive that it is impossible or at very least very hard to export frame from media element.
I have different question - if anyone has ever used FFmpeg Interop and would like to help/explain me how could I modify/extend FFmpegInteropMSS to add method called 'GetThumbnailForStream' that would work similary to 'GetMediaStreamSource' but would return single frame (bitmap or jpg) instead of MediaStreamSource?
Every help would be appreciated
EDIT:
I have found something;
in MediaSampleProvider in method WriteAVPacketToStream (line ~123) there is line
auto aBuffer = ref new Platform::Array<uint8_t>(avPacket->data, avPacket->size);
and I belive that this is the place that stores single frame data that is needed to convert into bitmap - now since I do not know c++ too much I have a question : how can I convert it into a form that I could return via public method ?
When returning:
Platform::Array<uint8_t>^
I get
'FFmpegInterop::MediaSampleProvider' : a non-value type cannot have any public data members
EDIT2:
Ok I am doing approprate projection to byte according to this microsoft information, now I need to check if this is correct data.

Socket and ports setup for high-speed audio/video streaming

I have a one-on-one connection between a server and a client. The server is streaming real-time audio/video data.
My question may sound weird, but should I use multiple ports/socket or only one? Is it faster to use multiple ports or a single one offer better performance? Should I have a port only for messages, one for video and one for audio or is it more simple to package the whole thing in a single port?
One of my current problem is that I need to first send the size of the current frame as the size - in bytes - may change from one frame to the next. I'm fairly new to Networking, but I haven't found any mechanism that would automatically detect the correct range for a specific object being transmitted. For example, if I send a 2934 bytes long packet, do I really need to tell the receiver the size of that packet?
I first tried to package the frame as fast as they were coming in, but I found out the receiving end would sometime not get the appropriated number of bytes. Most of the time, it would read faster than I send them, getting only a partial frame. What's the best way to get only the appropriated number of bytes as quickly as possible?
Or am I looking too low and there's a higher-level class/framework used to handle object transmission?

I think it is better to use an object mechanism and send data in an interleaved fashion. This mechanism may work faster than multiple port mechanism.
eg:
class Data {
DataType, - (Adio/Video)
Size, - (Size of the Data buffer)
Data Buffer - (Data depends on the type)
}
'DataType' and 'Size' always of constant size. At the client side take the 'DataType' and 'Size' and then read the specifed size of corresponding sent data(Adio/Video).

Just making something up off the top of my head. Shove "packets" like this down the wire:
1 byte - packet type (audio or video)
2 bytes - data length
(whatever else you need)
|
| (raw data)
|
So whenever you get one of these packets on the other end, you know exactly how much data to read, and where the beginning of the next packet should start.
[430 byte audio L packet]
[430 byte audio R packet]
[1000 byte video packet]
[20 byte control packet]
[2000 byte video packet]
...
But why re-invent the wheel? There are protocols to do these things already.

Raise wave file volume with NAudio

I am an audio noob
I am looking to embed audio in an html page by passing the data as a string such as
< Audio src="data:audio/wav;base64,AA....." />
doing that works, but I need to raise the volume. I tried working with NAudio but it seems like it does some conversion and it will no longer play. This is the code I use to raise the volume:
public string ConvertToString(Stream audioStream)
{
audioStream.Seek(0,SeekOrigin.Begin);
byte[] bytes = new byte[audioStream.Length];
audioStream.Read(bytes,0,(int)audioStream.Length);
audioStream.Seek(0,SeekOrigin.Begin);
return Convert.ToBase64String(bytes);
}
var fReader = new WaveFileReader(strm);
var chan32 = new WaveChannel32(fReader,50.0F,0F);
var ouputString = "data:audio/wav;base64," + ConvertToString(chan32);
but when I put outputString into an audio tag it fails to play. What type of transformation does NAudio do, and how can I get it ton give me the audio stream in such a way that I can serialize it and the browser will be able to play it?
Or for another suggestion: if NAudio to heavyweight for something as simple as raising the volume what's a better option for me?

I'm no expert in embedding WAV files in web pages (and to be honest it doesn't seem like a good idea - WAV is one of the most inefficient ways of delivering sound to a web page), but I'd expect that the entier WAV file, including headers needs to be encoded. You are just writing out the sample data. So with NAudio you'd need to use a WaveFileWriter writing into a MemoryStream or a temporary file to create a volume adjusted WAV file that can be written to your page.
There are two additional problems with your code. One is that you have gone to 32 bit floating point, making the WAV file format even more inefficent (doubling the size of the original file). You need to use the Wave32To16Stream to go back to 16 bit before creating the WAV file. The second is that you are multiplying each sample by 50. This will almost certainly horribly distort the signal. Clipping can very easily occur when amplifying a WAV file, and it depends on how much headroom there is in the original recording. Often dynamic range compression is a better option than simply increasing the volume.

Writing contents of a DirectSound CaptureBuffer to a WAV file in C#

I have recorded audio to a CaptureBuffer, but I can't figure out how to save it into wav file. I have tried this (http://www.tek-tips.com/faqs.cfm?fid=4782), but it didn't work, or I didn't use it properly. Does anybody know how to solve this? Sample code would be very appreciated.

A WAV file is a RIFF file consisting of two main "chunks". The first is a format chunk, describing the format of the audio. This will be what sample rate you recorded at (e.g. 44.1kHz), what bit depth (e.g. 16 bits) and how many channels (mono or stereo). WAV also supports compressed formats but it is unlikely you recorded the audio compressed, so your record buffer will contain PCM audio.
Then there is the data chunk. This is the part of the WAV file that contains the actual audio data in your capture buffer. This must be in the format described in the format chunk.
As part of the NAudio project I created a WaveFileWriter class to simplify creating WAV files. You pass in a WaveFormat that describes the format of your captured audio. Then you can simply write the raw captured data in.
Here's some simple example code for how you might use WaveFileWriter:
WaveFormat format = new WaveFormat(16000, 16, 1); // for mono 16 bit audio captured at 16kHz
using (var writer = new WaveFileWriter("out.wav", format)
{
writer.WriteData(captureBuffer, 0, captureBuffer.Length);
}

MP3 byte array - convert to WAV and navigate to time index?

I have a byte array containing an MP3 stream.
Is it correct to assume that this stream would have to be further decoded if I want to be able convert to a WAV?
In its current byte state, is it possible to do basic functionality such as get/set position (time-wise)?

Yeah, MP3 files are very different from WAV files. WAV files contain raw audio data in the form of samples from beginning to end to paint the waveform of the output, the same way a bitmap file contains raw data about pixels from left to right, top to bottom. You can think of a WAV file as a bitmap picture of sound waves -- but rather than pixel colors, it stores audio intensities, typically 44,100 of them per second, for two channels if it's stereo, and 2 bytes per channel.
(Knowing this you can actually calculate the file size of a WAV file -- to store 1 minute of audio, you'd need 60 seconds * 44100 samples * 2 channels * 2 bytes = 10.09MB.)
MP3 files contain a mathematically modified version of this image and discards audio that humans can't hear. It works similarly to how jpeg images work to compress images.
Just as video cards ultimately need bitmaps to work with, sound cards ultimately need WAV data to work with -- so yes, you need a decoder.
At the beginning of Mp3 files is a block of data called an ID3 tag, which contains a bunch of basic information about the file -- artist names, track length, album names, stuff like that. You can use something like C# ID3 to read/write ID3 tags in C#.
As for audio itself, I'm not sure there are Mp3 decoders written entirely in C#. Technically there's no reason that it can't be done (it should be fine performance wise too), but the standard is pretty loose and the math is intense so people tend to just use things like FFMpeg to decode. Some ideas in this Google search.
If you don't need to do any special processing and you just want to play the audio, you can use the WPF/Silverlight Media element.
You can probably get some hints out of Josh Smith's Podder app.

NAudio is an open source .NET library that can read MP3 files.
To convert MP3 to WAV, use code similar to the following:
Stream inputStream = ...;
Stream outputStream = ...;
using (WaveStream waveStream = WaveFormatConversionStream.CreatePcmStream(new Mp3FileReader(inputStream)))
using (WaveFileWriter waveFileWriter = new WaveFileWriter(outputStream, waveStream.WaveFormat))
{
byte[] bytes = new byte[waveStream.Length];
waveStream.Read(bytes, 0, waveStream.Length);
waveFileWriter.WriteData(bytes, 0, bytes.Length);
waveFileWriter.Flush();
}

As per #Rei Miyasaka's answer, there is an MP3 decoder written in C#. Open source, too. Check out Mp3Sharp.

You can use http://sourceforge.net/projects/mpg123net/ to decode your mp3 in byte[] and further use decoded PCM for your liking.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.