I need some help with an algorithm. I'm using an artificial neural network to read an electrocardiogram and trying to recognize some disturbances in the waves. That's OK, and I have the neural network and I can test it no problem.
What I'd like to do is to give the function to the user to open an electrocardiogram (import a jpeg) and have the program find the waves and convert it in to the arrays that will feed my ANN, but there's the problem. I did some code that reads the image and transforms it into a binary image, but I can't find a nice way for the program to locate the waves, since the exact position can vary from hospital to hospital, I need some suggestions of approaches I should use.
If you've got the wave values in a list, you can use a Fourier transform or FFT (fast Fourier transform) to determine the frequency content at any particular time value. Disturbances typically create additional high-frequency content (ie, sharp, steep waves) that you should be able to use to spot irregularities.
You'd have to assume a certain minimal contrast between the "signal" (the waves) and the background of the image. An edge-finding algorithm might be useful in that case. You could isolate the wave from the background and plot the wave.
This post by Rick Barraza deals with vector fields in Silverlight. You might be able to adapt the concept to your particular problem.
Related
I am using MagicLeap Headset and MLCamera API to capture a rawvideocapture which the output is YUV_420_888 which I am assuming is YUV420P. API returns yBuffer, uBuffer and vBuffer separately. I am having trouble combining these channels on c# without bitmap since I am using unity I am using Mono. What I am trying to do is to combine these channels and send it to my remote python server to process the image that I have captured. To process the image, it needs to be a full image. I have tried just using the Y plane and creating a gray-scale image but the server couldn't process it so I need to combine all 3 channels on the client and then compress it to preferable jpeg since the size decreases drastically and I am processing the images at 420x420 size although the camera output is 1920x1080. I am trying different methods for the last week and half but couldn't find something decent. There are a couple methods especially for Android but I don't want to convert it to NV21 if I don't have to. I have also seen one with ARCore but I also can't use that one since I am using MagicLeap.
PS: The latency and the processing time is super important so if there is a way to convert YCbCr to jpeg directly without converting it to RGB, I think it would help my case better but I don't know if it's possible. In general I think I lack some basic knowledge that prevents me from going further.
Any help is greatly appreciated!
I've tried something similar in the past, was beating my head on the YUV420 stuff for weeks, but couldn't solve it. In the end, I bought this library OpenCV for Unity. It has custom parts just for the MagicLeap, including reading frames from the Camera in reduced resolution for speed up.
I'm not sure how ever if it managed real time. Maybe in the reduced resolution, yes.
I'm novice in OpenCVSharp3 and I've been having a look at some examples for image matching using this library.
The key to my question is that I don't know what kind of modifications the code from this question needs to compare two images that are almost 100% identical, but one of them is rotated (unlimited rotation) and sometimes slightly displaced from the source (some pixels).
The method from such question basically compare if one image is inside the other, but my project only need to compare 5 images with the same size, where two of them are the same with slightly differences.
Is valid such algorithm?
EDIT:
Here is an example of 5 images to detect the same:
It can be valid but:
if you want unlimited rotation, you have to compare your reference image with an infinite combination of rotated other image.
if your other image is displaced from the source you will have to generate all the possible displacement images.
If you combine this two technique, you will have a lot of combination.
So yes, it can be done in generating all possible different images for one image, and compare them with your reference image.
It's not very robust, what will append if you try it on images displaced with a superior amount of pixels? if a color adjustment had be done on a image? if one is in gray-scale?
I recommend you to use machine learning for this problem.
I would proceed like this:
make a dataset of images
for each images, make a data augmentation (make all rotation, displacement, noise that it is possible).
Use a CNN and train it to recognize each variation of an image as the same image.
You're done, you have an algorithm who do the job :)
Here an implementation of tensorflow for C#
https://github.com/migueldeicaza/TensorFlowSharp
For a simple implementation of MNIST CNN with python see here
Here A video that explain how CNN works (look at the feature detection and pooling operation, it can help you)
When the iPhone records a video it puts the data from the camera directly onto the disk. What tells the player how to reorient the video is the Transform Matrix. Its a Mathematics structure that is used to change the position of the pixels in X,Y space.
In the iPhone and in the Macintosh I can ask the Video what its Transform is, and I get back a CGAffineTransform with a b c d tx and ty. Apple describes the Transform Matrix here
With this information I can determine what the Video layout is supposed to be and determine if it is expecting to be rotated before display.
I can get this information with ease in the osx and ios environments. I am trying to determine a way to get the same matrix information with Windows. Preferrable C# however if I must use C++ then so be it. Active X solutions are entirely undesirable and I am hoping that the Quicktime SDK for windows has some use. Otherwise what the heck did Apple write it for ??
If anyone knows how to obtain the Transform Matrix from a video or any place to start please, point me in the right direction.
It appears that the CGAffineTransform is something that will need to be pulled right out of the file itself. I used the Quicktime File Format Specification pdf to gain an understanding of the file and where to get the CGAffineMatrix
Here is a link to the page with the Matrix data on it
Quicktime File Format Specification Matrix Info
as you can see from this clip the matrix is in the Movie header atom. dignified with the 'mvhd' type.
it is a total of 36 Bytes long and is a total of 36 bytes after the 'mvhd' atom typename.
Given the file format specifications of the frames and tracks the Matrix can change throughout the playback of the video. But it is my experience that this method is not exercised on the videos that are output from the iphone.
I imagine that the matrix will need to be grabbed from each frame sometime in the near future and perhaps this is something that FFMpeg or other video format applications can work into their frame grabbers and video translators. But since I currently do not have a version of ffmpeg that used this matrix information I will be creating a simple Movie header grabber that will pull out the matrix and allow me to adjust my ffmpeg command line parameters accordingly, allowing me to translate my video appropriately..
If I come up with a better idea I will try to include this post on that knowledge.
A side note on the journey to this answer
For all of those who downvoted this question because You did not know what I was talking about. A request to clarify could have been quite sufficient. Coming in and downvoting because you dont understand or dont know the answer is neither constructive or fair. This is a very specific question and this answer will assist more than just myself.
I believe in helping to spread all kinds of knowledge and those of you who think its helpful to downvote because you just dont have any clue on how to help. I hope this gives you a better understanding of the issues that people are looking to solve. Just because you dont know what the problem is does not mean you should turn your nose up at it and certainly does not mean that you should discount those looking for answers that you cannot provide.
I am glad that I have an answer to my question and will definitely be open to any further criticism to the answer that I have given. Perhaps this answer will spark more questions about this issue and I will be able to learn and contribute to the future discussions about it
Thank you StackOverflow for restoring my question so I could answer it appropriately.
I am comfortable with several programming languages (stronger in C#, C, Java than the others) so please feel free to suggest whichever would provide me with a way to read in a (preferably uncompressed) video file and look at the color of each pixel in a frame, for every frame. So what I mean is, say in a 1 pixel display of a trivially small video that runs for 5 frames, are there standard library classes or ways I can access the 5 colors that one pixel will show during the video?
Having never worked with video properly I am not too clued up on the data structure a video file would use to represent the color information, or how one would manipulate this!
Many thanks
For processing uncompressed video data (as it might come off a camera) you get an array of pixel data per-frame; you probably want to read up about pixel formats and how frames are defined within the array, which will depend entirely on what is producing the video. The YUV444, YUV422 and YUV420 formats are quite common; they're expressed in the YUV colour space but you can readily convert between them and RGB (or indeed HSV) if that's what you want to do.
Compressed video formats are a nightmare unto themselves, but you can decompress them into a raw format with ffmpeg or a similar tool. (Be careful - uncompressed video quickly produces vast quantities of data!) Indeed, I would use ffmpeg's libraries to manipulate video, but they're written in C(C++?) for speed - I don't know whether they're available to java or c#.
I am trying to achieve the following:
Using Skype, call my mailbox (works)
Enter password and tell the mailbox that I want to record a new welcome message (works)
Now, my mailbox tells me to record the new welcome message after the beep
I want to wait for the beep and then play the new message (doesn't work)
How I tried to achieve the last point:
Create a spectrogram using FFT and sliding windows (works)
Create a "finger print" for the beep
Search for that fingerprint in the audio that comes from skype
The problem I am facing is the following:
The result of the FFTs on the audio from skype and the reference beep are not the same in a digital sense, i.e. they are similar, but not the same, although the beep was extracted from an audio file with a recording of the skype audio. The following picture shows the spectrogram of the beep from the Skype audio on the left side and the spectrogram of the reference beep on the right side. As you can see, they are very similar, but not the same...
uploaded a picture http://img27.imageshack.us/img27/6717/spectrogram.png
I don't know, how to continue from here. Should I average it, i.e. divide it into column and rows and compare the averages of those cells as described here? I am not sure this is the best way, because he already states, that it doesn't work very good with short audio samples, and the beep is less than a second in length...
Any hints on how to proceed?
You should determine the peak frequency and duration (possibly a minumum power over that duration for the frequency (RMS being the simplest measure)
This should be easy enough to measure. To make things even more clever (but probably completely unnecessary for this simple matching task), you could assert the non-existance of other peaks during the window of the beep.
Update
To compare a complete audio fragment, you'll want to use a Convolution algorithm. I suggest using a ready made library implementation instead of rolling your own.
The most common fast convolution algorithms use fast Fourier transform (FFT) algorithms via the circular convolution theorem. Specifically, the circular convolution of two finite-length sequences is found by taking an FFT of each sequence, multiplying pointwise, and then performing an inverse FFT. Convolutions of the type defined above are then efficiently implemented using that technique in conjunction with zero-extension and/or discarding portions of the output. Other fast convolution algorithms, such as the Schönhage–Strassen algorithm, use fast Fourier transforms in other rings.
Wikipedia lists http://freeverb3.sourceforge.net as an open source candidate
Edit Added link to API tutorial page: http://freeverb3.sourceforge.net/tutorial_lib.shtml
Additional resources:
http://en.wikipedia.org/wiki/Finite_impulse_response
http://dspguru.com/dsp/faqs/fir
Existing packages with relevant tools on debian:
[brutefir - a software convolution engine][3]
jconvolver - Convolution reverb Engine for JACK
libzita-convolver2 - C++ library implementing a real-time convolution matrix
teem-apps - Tools to process and visualize scientific data and images - command line tools
teem-doc - Tools to process and visualize scientific data and images - documentation
libteem1 - Tools to process and visualize scientific data and images - runtime
yorick-yeti - utility plugin for the Yorick language
First I'd smooth it a bit in frequency-direction so that small variations in frequency become less relevant. Then simply take each frequency and subtract the two amplitudes. Square the differences and add them up. Perhaps normalize the signals first so differences in total amplitude don't matter. And then compare the difference to a threshold.