Getting Matrix (CGAffineTransform) Info from iPhone Movies in C#/C++

Getting Matrix (CGAffineTransform) Info from iPhone Movies in C#/C++ - c#

When the iPhone records a video it puts the data from the camera directly onto the disk. What tells the player how to reorient the video is the Transform Matrix. Its a Mathematics structure that is used to change the position of the pixels in X,Y space.
In the iPhone and in the Macintosh I can ask the Video what its Transform is, and I get back a CGAffineTransform with a b c d tx and ty. Apple describes the Transform Matrix here
With this information I can determine what the Video layout is supposed to be and determine if it is expecting to be rotated before display.
I can get this information with ease in the osx and ios environments. I am trying to determine a way to get the same matrix information with Windows. Preferrable C# however if I must use C++ then so be it. Active X solutions are entirely undesirable and I am hoping that the Quicktime SDK for windows has some use. Otherwise what the heck did Apple write it for ??
If anyone knows how to obtain the Transform Matrix from a video or any place to start please, point me in the right direction.

It appears that the CGAffineTransform is something that will need to be pulled right out of the file itself. I used the Quicktime File Format Specification pdf to gain an understanding of the file and where to get the CGAffineMatrix
Here is a link to the page with the Matrix data on it
Quicktime File Format Specification Matrix Info
as you can see from this clip the matrix is in the Movie header atom. dignified with the 'mvhd' type.
it is a total of 36 Bytes long and is a total of 36 bytes after the 'mvhd' atom typename.
Given the file format specifications of the frames and tracks the Matrix can change throughout the playback of the video. But it is my experience that this method is not exercised on the videos that are output from the iphone.
I imagine that the matrix will need to be grabbed from each frame sometime in the near future and perhaps this is something that FFMpeg or other video format applications can work into their frame grabbers and video translators. But since I currently do not have a version of ffmpeg that used this matrix information I will be creating a simple Movie header grabber that will pull out the matrix and allow me to adjust my ffmpeg command line parameters accordingly, allowing me to translate my video appropriately..
If I come up with a better idea I will try to include this post on that knowledge.
A side note on the journey to this answer
For all of those who downvoted this question because You did not know what I was talking about. A request to clarify could have been quite sufficient. Coming in and downvoting because you dont understand or dont know the answer is neither constructive or fair. This is a very specific question and this answer will assist more than just myself.
I believe in helping to spread all kinds of knowledge and those of you who think its helpful to downvote because you just dont have any clue on how to help. I hope this gives you a better understanding of the issues that people are looking to solve. Just because you dont know what the problem is does not mean you should turn your nose up at it and certainly does not mean that you should discount those looking for answers that you cannot provide.
I am glad that I have an answer to my question and will definitely be open to any further criticism to the answer that I have given. Perhaps this answer will spark more questions about this issue and I will be able to learn and contribute to the future discussions about it
Thank you StackOverflow for restoring my question so I could answer it appropriately.

Related

Converting YUV420P to JPEG on Unity without Shaders

I am using MagicLeap Headset and MLCamera API to capture a rawvideocapture which the output is YUV_420_888 which I am assuming is YUV420P. API returns yBuffer, uBuffer and vBuffer separately. I am having trouble combining these channels on c# without bitmap since I am using unity I am using Mono. What I am trying to do is to combine these channels and send it to my remote python server to process the image that I have captured. To process the image, it needs to be a full image. I have tried just using the Y plane and creating a gray-scale image but the server couldn't process it so I need to combine all 3 channels on the client and then compress it to preferable jpeg since the size decreases drastically and I am processing the images at 420x420 size although the camera output is 1920x1080. I am trying different methods for the last week and half but couldn't find something decent. There are a couple methods especially for Android but I don't want to convert it to NV21 if I don't have to. I have also seen one with ARCore but I also can't use that one since I am using MagicLeap.
PS: The latency and the processing time is super important so if there is a way to convert YCbCr to jpeg directly without converting it to RGB, I think it would help my case better but I don't know if it's possible. In general I think I lack some basic knowledge that prevents me from going further.
Any help is greatly appreciated!

I've tried something similar in the past, was beating my head on the YUV420 stuff for weeks, but couldn't solve it. In the end, I bought this library OpenCV for Unity. It has custom parts just for the MagicLeap, including reading frames from the Camera in reduced resolution for speed up.
I'm not sure how ever if it managed real time. Maybe in the reduced resolution, yes.

How to handle large Images?

I want to create my own Google Map like this:
My problem is that I can't load and edit my large images.
My Images:
PNG / JPG
700 MiB
300000px x 300000px
My attempts:
ImageMagick
.NET C# / BitmapImages ...
C++ / OpenCV
general image classes in Java and Python
With which language / library I can edit these big images.

I help maintain libvips, an image processing library designed to work with very large images. It's free and works on Linux, Mac and Windows. You can use it from the command-line, C#, C/C++, Python, Ruby and others.
You can make your google maps tiles from the command-line like this:
vips dzsave hugefile.tif myoutputdir --layout google
Or from Python (for example) like this:
import pyvips
image = pyvips.Image.new_from_file("somehugefile.tif", access="sequential")
image.dzsave("filename/of/pyramid", layout="google")
And it'll scan your huge tiff image and generate all the tiles. It's fast, it needs little memory and it'll work on images of any size. I regularly make 200,000 x 200,000 deepzoom images from microscope slides using my small laptop.
There's a chapter in the libvips docs introducing dzsave and explaining how to use it.

This is not a full answer, but I need a little more space than a comment can give.
Take a look at the large image support section on the ImageMagick or the discussion board.
This answer mentions the VIPS package which might be helpful.
You might also consider posting in photography stackexchange, or even blender stackexchange - for example I saw this answer which mentions writing individual image tiles - also here, although that question is about rendering. Blender is not specifically for image processing and editing, but it's pretty amazing and flexible and has a very active and supportive community. You can use python within Blender as well.
You could also think of asking in gis stackexchange.
When you post in the other stackexchanges, take a look around first and make sure you write your question so that it does not look too off-topic for that site.
Good luck - it seems tiling is everywhere!

parse jpeg binary file

I exploring internet for two days and still can't find a good head start for this. I want to write a code with c# to get a .jpeg binary file and decode it and display the image. everywhere I looked there is lots of explanation about jpeg algorithm but still I can't find good explanation about how to parse and decode this file. I mean for example how can I know Huffman DC table starts with what number and end's with what number?
I appreciate if someone can link me somewhere that I can find explanation about parsing binary jpeg file.
thank you and sorry for my english.

Trust me, it isn't something you can do. I wouldn't touch the thing with a pole long various meters...
http://ijg.org/
Here there is the site of:
IJG is an informal group that writes and distributes a widely used free library for JPEG image compression. The first version was released on 7-Oct-1991.
There is the source code for libjpeg.
if you just want to take a look, here http://elm-chan.org/fsw/tjpgd/00index.html there is the source of
TJpgDec is a generic JPEG image decompressor module that highly optimized for small embedded systems.
it is even
Platform independent. Written in ANSI-C.
Being tiny it will be probably easy to reimplement in C# :-)

Image straightening algorithm

I am looking for a way to auto-straighten my images, and I was wondering if anyone has come across any algorithms to do this. I realize that the ability to do this depends on the content of the image, but any known algorithms would be a start.
I am looking to eventually implement this in C# or PHP, however, I am mainly after the algorithm right now.
Is this possible with OpenCV? ImageMagick? Others?
Many thanks,
Brett

Here is my idea:
edge detection (Sobel, Prewitt, Canny, ...)
hough transformation (horizontal lines +/- 10 degrees)
straighten the image according to the longest/strongest line
This is obviously not going to work in any type of image. This is just meant to fuel the discussion.

Most OCR programs straighten the scanned image prior to running recognition. You probably find good code in the many open source'd OCR programs, such Tesseract

Of course this does depend on what type of images you want to straighten, but there seems to be some resources available for automatic straightening of text scans.
One post I found mentioned 3 programs that could do auto-straightening:
TechSoft's PixEdit 7.0.11
Mystik Media's AutoImager 3.03
Spicer's Imagenation 7.50
If manual straightening is acceptable, there are many tutorials out there for how to straighten them manually using Photoshop; just google "image straightening"

ImageMagick has the -deskew option. This will simply rotate the image to be straight.
Most commercial OCR engines like ABBYY FineReader and Nuance OmniPage do this automatically.
The Leptonica research library has a command line tool called skewtest which will rotate the image.
I have not found a library which can take an image which has been distorted in any other way (like pin cushion or if it has been moved during a scanning operation, or removing the warp at the edge of a book). I am looking for a library or tool that can do this, but cannot find one.
Patrick.

Finding a wave graphic inside an image

I need some help with an algorithm. I'm using an artificial neural network to read an electrocardiogram and trying to recognize some disturbances in the waves. That's OK, and I have the neural network and I can test it no problem.
What I'd like to do is to give the function to the user to open an electrocardiogram (import a jpeg) and have the program find the waves and convert it in to the arrays that will feed my ANN, but there's the problem. I did some code that reads the image and transforms it into a binary image, but I can't find a nice way for the program to locate the waves, since the exact position can vary from hospital to hospital, I need some suggestions of approaches I should use.

If you've got the wave values in a list, you can use a Fourier transform or FFT (fast Fourier transform) to determine the frequency content at any particular time value. Disturbances typically create additional high-frequency content (ie, sharp, steep waves) that you should be able to use to spot irregularities.

You'd have to assume a certain minimal contrast between the "signal" (the waves) and the background of the image. An edge-finding algorithm might be useful in that case. You could isolate the wave from the background and plot the wave.
This post by Rick Barraza deals with vector fields in Silverlight. You might be able to adapt the concept to your particular problem.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.